[jira] [Created] (CASSANDRA-14882) Exception in Cassandra during Compaction

2018-11-08 Thread Alex C Punnen (JIRA)
Alex C Punnen created CASSANDRA-14882:
-

 Summary: Exception in Cassandra during Compaction
 Key: CASSANDRA-14882
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14882
 Project: Cassandra
  Issue Type: Bug
 Environment: Cassandra single node 3.11.2 running on Linux (CentOS I 
think)
Reporter: Alex C Punnen


Exception coming in Debug logs every minute. The other information is that the 
particular table entries were deleted (not sure if by program error or human 
error) in  a testbed.

---

DEBUG [CompactionExecutor:750] 2018-10-26 21:59:00,512 CompactionTask.java:155 
- Compacting (583f8a00-d96a-11e8-9c62-3944322fd453) 
[/var/lib/cassandra/data/xx/xxt_uuid-35106240c6d511e897e559c5a71f3c3c/mc-4-big-Data.db:level=0,
 ]
ERROR [CompactionExecutor:750] 2018-10-26 21:59:00,514 CassandraDaemon.java:228 
- Exception in thread Thread[CompactionExecutor:750,1,main]
java.lang.NullPointerException: null
at 
org.apache.cassandra.config.CFMetaData.enforceStrictLiveness(CFMetaData.java:1266)
 ~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.db.compaction.CompactionIterator$Purger.(CompactionIterator.java:280)
 ~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.db.compaction.CompactionIterator$Purger.(CompactionIterator.java:264)
 ~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.db.compaction.CompactionIterator.(CompactionIterator.java:108)
 ~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.db.compaction.CompactionIterator.(CompactionIterator.java:80)
 ~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:177)
 ~[apache-cassandra-3.11.2.jar:3.11.2]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:85)
 ~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
 ~[apache-cassandra-3.11.2.jar:3.11.2]
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:275)
 ~[apache-cassandra-3.11.2.jar:3.11.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_161]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_161]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[na:1.8.0_161]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_161]
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
 [apache-cassandra-3.11.2.jar:3.11.2]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_161]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14842) SSL connection problems when upgrading to 4.0 when upgrading from 3.0.x

2018-11-08 Thread Tommy Stendahl (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommy Stendahl reassigned CASSANDRA-14842:
--

Assignee: Tommy Stendahl

> SSL connection problems when upgrading to 4.0 when upgrading from 3.0.x
> ---
>
> Key: CASSANDRA-14842
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14842
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
>Priority: Major
>
> While testing to upgrade from 3.0.15 to 4.0 the old nodes fails to connect to 
> the 4.0 node, I get this exception on the 4.0 node:
>  
> {noformat}
> 2018-10-22T11:57:44.366+0200 ERROR [MessagingService-NettyInbound-Thread-3-8] 
> InboundHandshakeHandler.java:300 Failed to properly handshake with peer 
> /10.216.193.246:58296. Closing the channel.
> io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: 
> SSLv2Hello is disabled
> at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
> at 
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:808)
> at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:417)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:317)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
> at sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:637)
> at sun.security.ssl.InputRecord.read(InputRecord.java:527)
> at sun.security.ssl.EngineInputRecord.read(EngineInputRecord.java:382)
> at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:962)
> at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907)
> at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
> at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
> at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:294)
> at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1275)
> at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1177)
> at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1221)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
> ... 14 common frames omitted{noformat}
> In the server encryption options on the 4.0 node I have both "enabled and 
> "enable_legacy_ssl_storage_port" set to true so it should accept incoming 
> connections on the "ssl_storage_port".
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14881) Inconsistent behaviour of counter update during cassandra upgrade 2.1.16 -> 3.11.2

2018-11-08 Thread Laxmikant Upadhyay (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laxmikant Upadhyay updated CASSANDRA-14881:
---
Description: 
I have 3 node (A,B,C) 2.1.16 cassandra cluster which i am upgrading to 
cassandra 3.11.2.
 My current cluster status is node a has been upgrade to 3.11.2, B is down, and 
C is on cassandra 2.1.16

when i run counter update using cqlsh it is behaving strange inconsistent way , 
sometimes the update actually applied and sometimes it does not apply at all.

See the below example of cqlsh logged in to node A:
 ===Incorrect update : update not applied 
{code:java}
user@cqlsh:ks> select * from "counterTable";

 key  | column1    | value
--++---
 key1 | count | 0
 
user@cqlsh:ks> UPDATE "counterTable" SET value = value + 100 WHERE key = 'key1' 
AND column1 = 'count';

user@cqlsh:ks> select * from "counterTable";

 key  | column1    | value
--++---
 key1 | count | 0{code}
 ===Correct update : update applied successfully 
{code:java}
user@cqlsh> USE ks ;
user@cqlsh:ks> select * from "counterTable";

 key  | column1    | value
--++---
 key1 | count |  -100
 
user@cqlsh:ks> UPDATE "counterTable" SET value = value + 100 WHERE key = 'key1' 
AND column1 = 'count';

user@cqlsh:ks> select * from "counterTable";

 key  | column1    | value
--++---
 key1 | count | 0{code}
I have attached the output with trace enabled (with actual timestamp) for both 
correct and incorrect counter update .

What is the reason of this weird behavior ?

Below is my table details:
{code:java}
CREATE TABLE ks."counterTable" (
    key text,
    column1 text,
    value counter,
    PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
    AND CLUSTERING ORDER BY (column1 ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';{code}
*Note :* When all nodes are successfully upgraded OR when 2 nodes gets upgraded 
to 3.11.2 and 3rd non-upgraded node is down then we don't see such issue . 
couner update works as expected

  was:
I have 3 node (A,B,C) 2.1.16 cassandra cluster which i am upgrading to 
cassandra 3.11.2.
My current cluster status is node a has been upgrade to 3.11.2, B is down, and 
C is on cassandra 2.1.16

when i run counter update using cqlsh it is behaving strange inconsistent way , 
sometimes the update actually applied and sometimes it does not apply at all.


See the below example of cqlsh logged in to node A:
===Incorrect update : update not applied 
{code:java}
user@cqlsh:ks> select * from "counterTable";

 key  | column1    | value
--++---
 key1 | count | 0
 
user@cqlsh:ks> UPDATE "counterTable" SET value = value + 100 WHERE key = 'key1' 
AND column1 = 'count';

user@cqlsh:ks> select * from "counterTable";

 key  | column1    | value
--++---
 key1 | count | 0{code}



 ===Correct update : update applied successfully 
{code:java}
user@cqlsh> USE ks ;
user@cqlsh:ks> select * from "counterTable";

 key  | column1    | value
--++---
 key1 | count |  -100
 
user@cqlsh:ks> UPDATE "counterTable" SET value = value + 100 WHERE key = 'key1' 
AND column1 = 'count';

user@cqlsh:ks> select * from "counterTable";

 key  | column1    | value
--++---
 key1 | count | 0{code}



I have attached the output with trace enabled (with actual timestamp) for both 
correct and incorrect counter update . 

What is the reason of this weird behavior ?



*Note :* When all nodes are successfully upgraded OR when 2 nodes gets upgraded 
to 3.11.2 and 3rd non-upgraded node is down then we don't see such issue . 
couner update works as expected


> Inconsistent behaviour of counter update during cassandra upgrade 2.1.16 -> 
> 3.11.2
> --
>
> Key: CASSANDRA-14881
> URL: 

[jira] [Created] (CASSANDRA-14881) Inconsistent behaviour of counter update during cassandra upgrade 2.1.16 -> 3.11.2

2018-11-08 Thread Laxmikant Upadhyay (JIRA)
Laxmikant Upadhyay created CASSANDRA-14881:
--

 Summary: Inconsistent behaviour of counter update during cassandra 
upgrade 2.1.16 -> 3.11.2
 Key: CASSANDRA-14881
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14881
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: {code:java}
// code placeholder
{code}
Reporter: Laxmikant Upadhyay
 Attachments: CorrectCounterUpdateTrace.txt, 
IncorrectCounterUpdateTrace.txt

I have 3 node (A,B,C) 2.1.16 cassandra cluster which i am upgrading to 
cassandra 3.11.2.
My current cluster status is node a has been upgrade to 3.11.2, B is down, and 
C is on cassandra 2.1.16

when i run counter update using cqlsh it is behaving strange inconsistent way , 
sometimes the update actually applied and sometimes it does not apply at all.


See the below example of cqlsh logged in to node A:
===Incorrect update : update not applied 
{code:java}
user@cqlsh:ks> select * from "counterTable";

 key  | column1    | value
--++---
 key1 | count | 0
 
user@cqlsh:ks> UPDATE "counterTable" SET value = value + 100 WHERE key = 'key1' 
AND column1 = 'count';

user@cqlsh:ks> select * from "counterTable";

 key  | column1    | value
--++---
 key1 | count | 0{code}



 ===Correct update : update applied successfully 
{code:java}
user@cqlsh> USE ks ;
user@cqlsh:ks> select * from "counterTable";

 key  | column1    | value
--++---
 key1 | count |  -100
 
user@cqlsh:ks> UPDATE "counterTable" SET value = value + 100 WHERE key = 'key1' 
AND column1 = 'count';

user@cqlsh:ks> select * from "counterTable";

 key  | column1    | value
--++---
 key1 | count | 0{code}



I have attached the output with trace enabled (with actual timestamp) for both 
correct and incorrect counter update . 

What is the reason of this weird behavior ?



*Note :* When all nodes are successfully upgraded OR when 2 nodes gets upgraded 
to 3.11.2 and 3rd non-upgraded node is down then we don't see such issue . 
couner update works as expected



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14554) LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded context

2018-11-08 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680787#comment-16680787
 ] 

Stefania commented on CASSANDRA-14554:
--

bq.  The other methods on a LifecycleTransaction could simply be marked 
synchronised as well.

If we are synchronizing the LifecycleTransaction methods anyway, I'm not sure I 
understand why we need child transactions. Even in 4.0, where Netty threads 
call {{trackNew}}, I don't think we're adding sstables so frequently to 
introduce contention on a shared, synchronized txn. Considering {{trackNew}} 
performs a file sync as you correctly reminded me, surely this blocks Netty 
threads more than a synchronized {{trackNew}}. Maybe if many sstables are 
created concurrently during streaming, child transactions would make sense. I'm 
still not totally sure.

It's fine with me if we prefer to try a different alternative, the patch is 
available at any time. This code is not changing much so there is little risk 
of the patch getting stale. For info, internally [~snazy] already reviewed the 
patch.


> LifecycleTransaction encounters ConcurrentModificationException when used in 
> multi-threaded context
> ---
>
> Key: CASSANDRA-14554
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14554
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Major
>
> When LifecycleTransaction is used in a multi-threaded context, we encounter 
> this exception -
> {quote}java.util.ConcurrentModificationException: null
>  at 
> java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719)
>  at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742)
>  at java.lang.Iterable.forEach(Iterable.java:74)
>  at 
> org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78)
>  at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320)
>  at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285)
>  at 
> org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136)
>  at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529)
> {quote}
> During streaming we create a reference to a {{LifeCycleTransaction}} and 
> share it between threads -
> [https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156]
> This is used in a multi-threaded context inside {{CassandraIncomingFile}} 
> which is an {{IncomingStreamMessage}}. This is being deserialized in parallel.
> {{LifecycleTransaction}} is not meant to be used in a multi-threaded context 
> and this leads to streaming failures due to object sharing. On trunk, this 
> object is shared across all threads that transfer sstables in parallel for 
> the given {{TableId}} in a {{StreamSession}}. There are two options to solve 
> this - make {{LifecycleTransaction}} and the associated objects thread safe, 
> scope the transaction to a single {{CassandraIncomingFile}}. The consequences 
> of the latter option is that if we experience streaming failure we may have 
> redundant SSTables on disk. This is ok as compaction should clean this up. A 
> third option is we synchronize access in the streaming infrastructure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14880) drop table and materialized view frequently get error over time

2018-11-08 Thread zourun (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zourun updated CASSANDRA-14880:
---
Description: 
    when i create table and materialized view,then i drop it, if i drop it 
frequently it will got error: "no response received from cassandra within 
timeout period",for example:

 for i=0; i<100; i++ {            

          create table;            

          create materialized view;          

          drop materialized view;          

          drop table;

how can i solve it ? 

 

  was:
when i create table and materialized view,then i drop it, if i drop it 
frequently it will got error: "no response received from cassandra within 
timeout period",for example:

{{   for i=0; i<100; i++ {            

          create table;            

          create materialized view;          

          drop materialized view;          

          drop table;

}}}

how can i solve it ? 

 


> drop table and materialized view frequently get error over time
> ---
>
> Key: CASSANDRA-14880
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14880
> Project: Cassandra
>  Issue Type: Bug
>Reporter: zourun
>Priority: Major
>
>     when i create table and materialized view,then i drop it, if i drop it 
> frequently it will got error: "no response received from cassandra within 
> timeout period",for example:
>  for i=0; i<100; i++ {            
>           create table;            
>           create materialized view;          
>           drop materialized view;          
>           drop table;
> how can i solve it ? 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14880) drop table and materialized view frequently get error over time

2018-11-08 Thread zourun (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zourun updated CASSANDRA-14880:
---
Description: 
when i create table and materialized view,then i drop it, if i drop it 
frequently it will got error: "no response received from cassandra within 
timeout period",for example:

{{   for i=0; i<100; i++ {            

          create table;            

          create materialized view;          

          drop materialized view;          

          drop table;

}}}

how can i solve it ? 

 

  was:
when i create table and materialized view,then i drop it, if i drop it 
frequently it will got error: "no response received from cassandra within 
timeout period",for example:

  

 for i=0; i<100; i++ {            

          create table;            

          create materialized view;          

          drop materialized view;          

          drop table;

}

 

 


> drop table and materialized view frequently get error over time
> ---
>
> Key: CASSANDRA-14880
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14880
> Project: Cassandra
>  Issue Type: Bug
>Reporter: zourun
>Priority: Major
>
> when i create table and materialized view,then i drop it, if i drop it 
> frequently it will got error: "no response received from cassandra within 
> timeout period",for example:
> {{   for i=0; i<100; i++ {            
>           create table;            
>           create materialized view;          
>           drop materialized view;          
>           drop table;
> }}}
> how can i solve it ? 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14880) drop table and materialized view frequently get error over time

2018-11-08 Thread zourun (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zourun updated CASSANDRA-14880:
---
Description: 
when i create table and materialized view,then i drop it, if i drop it 
frequently it will got error: "no response received from cassandra within 
timeout period",for example:

  

 for i=0; i<100; i++ {            

          create table;            

          create materialized view;          

          drop materialized view;          

          drop table;

}

 

 

  was:
when i create table and materialized view,then i drop it, if i drop it 
frequently it will got error: "no response received from cassandra within 
timeout period",for example:

 
{quote} for i=0; i<100; i++ {            

          create table;            

          create materialized view;          

          drop materialized view;          

          drop table;

}
{quote}
 


> drop table and materialized view frequently get error over time
> ---
>
> Key: CASSANDRA-14880
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14880
> Project: Cassandra
>  Issue Type: Bug
>Reporter: zourun
>Priority: Major
>
> when i create table and materialized view,then i drop it, if i drop it 
> frequently it will got error: "no response received from cassandra within 
> timeout period",for example:
>   
>  for i=0; i<100; i++ {            
>           create table;            
>           create materialized view;          
>           drop materialized view;          
>           drop table;
> }
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14880) drop table and materialized view frequently get error over time

2018-11-08 Thread zourun (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zourun updated CASSANDRA-14880:
---
Description: 
when i create table and materialized view,then i drop it, if i drop it 
frequently it will got error: "no response received from cassandra within 
timeout period",for example:

 
{quote} for i=0; i<100; i++ {            

          create table;            

          create materialized view;          

          drop materialized view;          

          drop table;

}
{quote}
 

  was:
when i create table and materialized view,then i drop it, if i drop it 
frequently it will got error: "no response received from cassandra within 
timeout period",for example:

 for i=0; i<100; i++ {      

     create table;      

     create materialized view;      

    drop materialized view;      

    drop table;

}

 

 


> drop table and materialized view frequently get error over time
> ---
>
> Key: CASSANDRA-14880
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14880
> Project: Cassandra
>  Issue Type: Bug
>Reporter: zourun
>Priority: Major
>
> when i create table and materialized view,then i drop it, if i drop it 
> frequently it will got error: "no response received from cassandra within 
> timeout period",for example:
>  
> {quote} for i=0; i<100; i++ {            
>           create table;            
>           create materialized view;          
>           drop materialized view;          
>           drop table;
> }
> {quote}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14880) drop table and materialized view frequently get error over time

2018-11-08 Thread zourun (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zourun updated CASSANDRA-14880:
---
Description: 
when i create table and materialized view,then i drop it, if i drop it 
frequently it will got error: "no response received from cassandra within 
timeout period",for example:

 for i=0; i<100; i++ {      

     create table;      

     create materialized view;      

    drop materialized view;      

    drop table;

}

 

 

  was:
when i create table and materialized view,then i drop it, if i drop it 
frequently it will got error: "no response received from cassandra within 
timeout period",for example:

 for i=0; i<100; i++{

      create table;

      create materialized view;

      sleep(1);

      drop materialized view;

      drop table;

}

 

 


> drop table and materialized view frequently get error over time
> ---
>
> Key: CASSANDRA-14880
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14880
> Project: Cassandra
>  Issue Type: Bug
>Reporter: zourun
>Priority: Major
>
> when i create table and materialized view,then i drop it, if i drop it 
> frequently it will got error: "no response received from cassandra within 
> timeout period",for example:
>  for i=0; i<100; i++ {      
>      create table;      
>      create materialized view;      
>     drop materialized view;      
>     drop table;
> }
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14880) drop table and materialized view frequently get error over time

2018-11-08 Thread zourun (JIRA)
zourun created CASSANDRA-14880:
--

 Summary: drop table and materialized view frequently get error 
over time
 Key: CASSANDRA-14880
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14880
 Project: Cassandra
  Issue Type: Bug
Reporter: zourun


when i create table and materialized view,then i drop it, if i drop it 
frequently it will got error: "no response received from cassandra within 
timeout period",for example:

 for i=0; i<100; i++{

      create table;

      create materialized view;

      sleep(1);

      drop materialized view;

      drop table;

}

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14875) Explicitly initialize StageManager early in startup

2018-11-08 Thread Jeff Jirsa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680569#comment-16680569
 ] 

Jeff Jirsa commented on CASSANDRA-14875:


Is this 4.0 only? 


> Explicitly initialize StageManager early in startup
> ---
>
> Key: CASSANDRA-14875
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14875
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Aleksandr Sorokoumov
>Assignee: Aleksandr Sorokoumov
>Priority: Minor
> Fix For: 4.x
>
>
> {{StageManager}} initializes itself through a static block and sets up every 
> {{Stage}}, including pre-starting their threads. This initizalization can 
> take a few hundred milliseconds. This timing impact is unpredictable and hard 
> to reason about; it looks like it usually gets hit when creating new 
> Keyspaces on start-up and announcing them through migrations.
> While these processes are resilient to these delays, it is dramatically 
> easier to reason about over time if this initialization happens explicitly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14877) StreamCoordinator "leaks" threads

2018-11-08 Thread Jeff Jirsa (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-14877:
---
Fix Version/s: (was: 3.11.4)
   (was: 3.0.18)
   (was: 2.2.14)
   (was: 2.1.21)
   (was: 4.0)
   4.x
   3.11.x
   3.0.x
   2.2.x
   2.1.x

> StreamCoordinator "leaks" threads
> -
>
> Key: CASSANDRA-14877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14877
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Massimiliano Tomassi
>Assignee: Massimiliano Tomassi
>Priority: Minor
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> Since Cassandra 2.1, streaming sessions are started by running a 
> StreamSessionConnector task for each session in a dedicated executor (a 
> static field of StreamCoordinator).
> That executor is initialized with 
> DebuggableThreadPoolExecutor.createWithFixedPoolSize, which means that once 
> created (up to the given limit of the number of logical cores), its threads 
> are kept alive for Integer.MAX_VALUE seconds.
> This practically means that once a node needs to establish streaming sessions 
> to n other nodes, it will create Math.min(n, numLogicalCores) 
> StreamConnectionEstablisher threads that will stay parked forever after 
> initializing (not completing) the session.
> It seems preferable to replace 
> DebuggableThreadPoolExecutor.createWithFixedPoolSize with 
> DebuggableThreadPoolExecutor.createWithMaximumPoolSize which allows providing 
> a saner keep-alive period (e.g. a minute).
> That's also what createWithFixedPoolSize's Javadoc recommends: If (most) 
> threads are expected to be idle most of the time, prefer createWithMaxSize() 
> instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14875) Explicitly initialize StageManager early in startup

2018-11-08 Thread Jeff Jirsa (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-14875:
---
Fix Version/s: 4.x

> Explicitly initialize StageManager early in startup
> ---
>
> Key: CASSANDRA-14875
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14875
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Aleksandr Sorokoumov
>Assignee: Aleksandr Sorokoumov
>Priority: Minor
> Fix For: 4.x
>
>
> {{StageManager}} initializes itself through a static block and sets up every 
> {{Stage}}, including pre-starting their threads. This initizalization can 
> take a few hundred milliseconds. This timing impact is unpredictable and hard 
> to reason about; it looks like it usually gets hit when creating new 
> Keyspaces on start-up and announcing them through migrations.
> While these processes are resilient to these delays, it is dramatically 
> easier to reason about over time if this initialization happens explicitly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14862) TestTopology.test_size_estimates_multidc fails on trunk

2018-11-08 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679212#comment-16679212
 ] 

Aleksey Yeschenko commented on CASSANDRA-14862:
---

Committed to trunk as 
[507e4a46a166cab5322a50fbe40c80cb0d16c290|https://github.com/apache/cassandra/commit/507e4a46a166cab5322a50fbe40c80cb0d16c290],
 thanks.

> TestTopology.test_size_estimates_multidc fails on trunk
> ---
>
> Key: CASSANDRA-14862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14862
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Minor
>  Labels: 4.0-QA
> Fix For: 4.0
>
>
> The sorting of natural replicas in 
> {{SimpleStrategy.calculateNaturalReplicas}} committed as part of 
> [e645b917|https://github.com/apache/cassandra/commit/e645b9172c5d50fc2af407de724e46121edfe109#diff-0e1563a70b49cd81e9e11b4ddad15cf2L68]
>  for CASSANDRA-14726 has broken the 
> {{TestTopology.test_size_estimates_multidc}} dtest ([example 
> run|https://circleci.com/gh/jolynch/cassandra/245#tests/containers/48]) as 
> the "primary" ranges have now changed. I'm actually surprised only a single 
> dtest fails as I believe we've broken multi-dc {{SimpleStrategy}} reasonably 
> badly.
> In particular the {{SimpleStrategy.calculateNaturalReplicas}} method cannot 
> sort the endpoints by datacenter first. It has to leave them in the order 
> that it found them else change which replicas are considered "primary" 
> replicas (which mostly impacts repair and size estimates and the such).
> I have written a regression unit test for the SimpleStrategy and am running 
> it through circleci now. Will post the patch shortly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14874) Read repair can race with truncations

2018-11-08 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679554#comment-16679554
 ] 

Sylvain Lebresne commented on CASSANDRA-14874:
--

I'll note that while I have barely spent any time thinking about it, I have a 
hunch that getting this perfectly right might not be easy since truncate is not 
timestamp based (and not coordinated with reads), but at least having 
coordinators check, before sending read-repairs, if they have seen a truncation 
since the beginning of the repair would be dead simple and improve this 
drastically.

> Read repair can race with truncations
> -
>
> Key: CASSANDRA-14874
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14874
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Sylvain Lebresne
>Priority: Minor
>
> While hint and commit log replay handle truncation alright, we don't have 
> anything to prevent a read/read-repair to race with {{TRUNCATE}}. In other 
> words, you can have a read reading some pre-truncation data, some truncation 
> running and removing that data, and then some read-repair mutation from that 
> previous read that resurrects some data that should have bene truncated.
> Probably not that common in practice, but can lead to seemingly random data 
> surviving truncate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade

2018-11-08 Thread Li (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679078#comment-16679078
 ] 

Li commented on CASSANDRA-14495:


What if the memory usage already hits over 95% and still keeps growing? No 
latency or throughput impact yet.

> Memory Leak /High Memory usage post 3.11.2 upgrade
> --
>
> Key: CASSANDRA-14495
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14495
> Project: Cassandra
>  Issue Type: Bug
>  Components: Metrics
>Reporter: Abdul Patel
>Priority: Major
> Attachments: cas_heap.txt
>
>
> Hi All,
>  
> I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from 
> 3.10 to 3.11.2 version.
> No issues reported apart from only nodetool info reporting 80% usage .
> I intially had 16GB memory on each node, later i bumped up to 20GB, and 
> rebooted all nodes.
> Waited for an week and now again i have seen memory usage more than 80% , 
> 16GB + .
> this means some memory leaks are happening over the time.
> Any one has faced such issue or do we have any workaround ? my 3.11.2 version 
>  upgrade rollout has been halted because of this bug.
> ===
> ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc
> Gossip active  : true
> Thrift active  : true
> Native Transport active: true
> Load   : 985.24 MiB
> Generation No  : 1526923117
> Uptime (seconds)   : 1097684
> Heap Memory (MB)   : 16875.64 / 20480.00
> Off Heap Memory (MB)   : 20.42
> Data Center    : DC7
> Rack   : rac1
> Exceptions : 0
> Key Cache  : entries 3569, size 421.44 KiB, capacity 100 MiB, 
> 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in 
> seconds
> Row Cache  : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 
> requests, NaN recent hit rate, 0 save period in seconds
> Counter Cache  : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 
> requests, NaN recent hit rate, 7200 save period in seconds
> Chunk Cache    : entries 2361, size 147.56 MiB, capacity 3.97 GiB, 
> 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds 
> miss latency
> Percent Repaired   : 99.88086234106282%
> Token  : (invoke with -T/--tokens to see all 256 tokens)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14876) Snapshot name merges with keyspace name shown by nodetool listsnapshots for snapshots with long names

2018-11-08 Thread Aleksandr Sorokoumov (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Sorokoumov updated CASSANDRA-14876:
-
Fix Version/s: 3.0.x
   Status: Patch Available  (was: Open)

Patch: [3.0 | https://github.com/Ge/cassandra/commits/14876-3.0]

> Snapshot name merges with keyspace name shown by nodetool listsnapshots for 
> snapshots with long names
> -
>
> Key: CASSANDRA-14876
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14876
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Aleksandr Sorokoumov
>Assignee: Aleksandr Sorokoumov
>Priority: Major
> Fix For: 3.0.x
>
>
> If snapshot name is long enough, it will merge  keyspace name and the command 
> output will be inconvenient to read for a {{nodetool}} user, e.g.
> {noformat}
> bin/nodetool listsnapshots
> Snapshot Details:
> Snapshot name   Keyspace nameColumn family name   
> True size  Size on disk
> 1541670390886   system_distributed   parent_repair_history
> 0 bytes13 bytes
> 1541670390886   system_distributed   repair_history   
> 0 bytes13 bytes
> 1541670390886   system_auth  roles
> 0 bytes4.98 KB
> 1541670390886   system_auth  role_members 
> 0 bytes13 bytes
> 1541670390886   system_auth  
> resource_role_permissons_index0 bytes13 bytes
> 1541670390886   system_auth  role_permissions 
> 0 bytes13 bytes
> 1541670390886   system_tracessessions 
> 0 bytes13 bytes
> 1541670390886   system_tracesevents   
> 0 bytes13 bytes
> 39_characters_long_name_2017-09-05-11-Usystem_distributed   
> parent_repair_history0 bytes13 bytes
> 39_characters_long_name_2017-09-05-11-Usystem_distributed   
> repair_history   0 bytes13 bytes
> 39_characters_long_name_2017-09-05-11-Usystem_auth  roles 
>0 bytes4.98 KB
> 39_characters_long_name_2017-09-05-11-Usystem_auth  
> role_members 0 bytes13 bytes
> 39_characters_long_name_2017-09-05-11-Usystem_auth  
> resource_role_permissons_index0 bytes13 bytes
> 39_characters_long_name_2017-09-05-11-Usystem_auth  
> role_permissions 0 bytes13 bytes
> 39_characters_long_name_2017-09-05-11-Usystem_tracessessions  
>0 bytes13 bytes
> 39_characters_long_name_2017-09-05-11-Usystem_tracesevents
>0 bytes13 bytes
> 41_characters_long_name_2017-09-05-11-UTCsystem_distributed   
> parent_repair_history0 bytes13 bytes
> 41_characters_long_name_2017-09-05-11-UTCsystem_distributed   
> repair_history   0 bytes13 bytes
> 41_characters_long_name_2017-09-05-11-UTCsystem_auth  roles   
>  0 bytes4.98 KB
> 41_characters_long_name_2017-09-05-11-UTCsystem_auth  
> role_members 0 bytes13 bytes
> 41_characters_long_name_2017-09-05-11-UTCsystem_auth  
> resource_role_permissons_index0 bytes13 bytes
> 41_characters_long_name_2017-09-05-11-UTCsystem_auth  
> role_permissions 0 bytes13 bytes
> 41_characters_long_name_2017-09-05-11-UTCsystem_traces
> sessions 0 bytes13 bytes
> 41_characters_long_name_2017-09-05-11-UTCsystem_tracesevents  
>  0 bytes13 bytes
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14875) Explicitly initialize StageManager early in startup

2018-11-08 Thread Aleksandr Sorokoumov (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Sorokoumov updated CASSANDRA-14875:
-
Status: Patch Available  (was: Open)

Patch: [4.0 | https://github.com/Ge/cassandra/tree/14875-4.0]

> Explicitly initialize StageManager early in startup
> ---
>
> Key: CASSANDRA-14875
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14875
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Aleksandr Sorokoumov
>Assignee: Aleksandr Sorokoumov
>Priority: Minor
>
> {{StageManager}} initializes itself through a static block and sets up every 
> {{Stage}}, including pre-starting their threads. This initizalization can 
> take a few hundred milliseconds. This timing impact is unpredictable and hard 
> to reason about; it looks like it usually gets hit when creating new 
> Keyspaces on start-up and announcing them through migrations.
> While these processes are resilient to these delays, it is dramatically 
> easier to reason about over time if this initialization happens explicitly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14874) Read repair can race with truncations

2018-11-08 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680455#comment-16680455
 ] 

Ariel Weisberg commented on CASSANDRA-14874:


bq. I'll still suggest that maybe the trivial solution I've hinted in my 
previous comment might be a good stopgap option to make things behave 
predictably most of the time.
+1

> Read repair can race with truncations
> -
>
> Key: CASSANDRA-14874
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14874
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Sylvain Lebresne
>Priority: Minor
>
> While hint and commit log replay handle truncation alright, we don't have 
> anything to prevent a read/read-repair to race with {{TRUNCATE}}. In other 
> words, you can have a read reading some pre-truncation data, some truncation 
> running and removing that data, and then some read-repair mutation from that 
> previous read that resurrects some data that should have bene truncated.
> Probably not that common in practice, but can lead to seemingly random data 
> surviving truncate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14874) Read repair can race with truncations

2018-11-08 Thread Sylvain Lebresne (JIRA)
Sylvain Lebresne created CASSANDRA-14874:


 Summary: Read repair can race with truncations
 Key: CASSANDRA-14874
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14874
 Project: Cassandra
  Issue Type: Bug
  Components: Local Write-Read Paths
Reporter: Sylvain Lebresne


While hint and commit log replay handle truncation alright, we don't have 
anything to prevent a read/read-repair to race with {{TRUNCATE}}. In other 
words, you can have a read reading some pre-truncation data, some truncation 
running and removing that data, and then some read-repair mutation from that 
previous read that resurrects some data that should have bene truncated.

Probably not that common in practice, but can lead to seemingly random data 
surviving truncate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14867) Histogram overflows potentially leading to writes failing

2018-11-08 Thread Alex Petrov (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-14867:

Description: 
I observed the following in cassandra logs on 1 host of a 6-node cluster:

{code}
ERROR [ScheduledTasks:1] 2018-11-01 17:26:41,277 CassandraDaemon.java:228 - 
Exception in thread Thread[ScheduledTasks:1,5,main]
java.lang.IllegalStateException: Unable to compute when histogram overflowed
 at 
org.apache.cassandra.metrics.DecayingEstimatedHistogramReservoir$EstimatedHistogramReservoirSnapshot.getMean(DecayingEstimatedHistogramReservoir.java:472)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
 at 
org.apache.cassandra.net.MessagingService.getDroppedMessagesLogs(MessagingService.java:1263)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
 at 
org.apache.cassandra.net.MessagingService.logDroppedMessages(MessagingService.java:1236)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
 at 
org.apache.cassandra.net.MessagingService.access$200(MessagingService.java:87) 
~[apache-cassandra-3.11.1.jar:3.11.1]
 at org.apache.cassandra.net.MessagingService$4.run(MessagingService.java:507) 
~[apache-cassandra-3.11.1.jar:3.11.1]
 at 
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_172]
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[na:1.8.0_172]
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [na:1.8.0_172]
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [na:1.8.0_172]
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_172]
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_172]
 at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
 [apache-cassandra-3.11.1.jar:3.11.1]
 at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_172]
{code}

At the same time, this node was failing all writes issued to it. Restarting 
cassandra on the node brought the cluster into a good state and we stopped 
seeing the histogram overflow errors.

Has this issue been observed before? Could the histogram overflows cause writes 
to fail?

  was:
I observed the following in cassandra logs on 1 host of a 6-node cluster:

ERROR [ScheduledTasks:1] 2018-11-01 17:26:41,277 CassandraDaemon.java:228 - 
Exception in thread Thread[ScheduledTasks:1,5,main]
java.lang.IllegalStateException: Unable to compute when histogram overflowed
 at 
org.apache.cassandra.metrics.DecayingEstimatedHistogramReservoir$EstimatedHistogramReservoirSnapshot.getMean(DecayingEstimatedHistogramReservoir.java:472)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
 at 
org.apache.cassandra.net.MessagingService.getDroppedMessagesLogs(MessagingService.java:1263)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
 at 
org.apache.cassandra.net.MessagingService.logDroppedMessages(MessagingService.java:1236)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
 at 
org.apache.cassandra.net.MessagingService.access$200(MessagingService.java:87) 
~[apache-cassandra-3.11.1.jar:3.11.1]
 at org.apache.cassandra.net.MessagingService$4.run(MessagingService.java:507) 
~[apache-cassandra-3.11.1.jar:3.11.1]
 at 
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_172]
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[na:1.8.0_172]
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [na:1.8.0_172]
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [na:1.8.0_172]
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_172]
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_172]
 at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
 [apache-cassandra-3.11.1.jar:3.11.1]
 at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_172]

 

At the same time, this node was failing all writes issued to it. Restarting 
cassandra on the node brought the cluster into a good state and we stopped 
seeing the histogram overflow errors.

Has this issue been observed before? Could the histogram overflows cause writes 
to fail?


> Histogram overflows potentially leading to writes failing
> 

[jira] [Resolved] (CASSANDRA-14868) 安装出错

2018-11-08 Thread Yuki Morishita (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita resolved CASSANDRA-14868.

Resolution: Invalid

The issue tracker here is for developing Apache Cassandra.

You should report your issue to 
[https://github.com/Symantec/ambari-cassandra-service] (guessing from the stack 
trace).

> 安装出错
> 
>
> Key: CASSANDRA-14868
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14868
> Project: Cassandra
>  Issue Type: Bug
> Environment: HDP 2.5 
>Reporter: znn
>Priority: Major
>
> Traceback (most recent call last):
>   File 
> "/var/lib/ambari-agent/cache/stacks/HDP/2.5/services/CASSANDRA/package/scripts/cassandra_master.py",
>  line 60, in 
> Cassandra_Master().execute()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
>  line 280, in execute
> method(env)
>   File 
> "/var/lib/ambari-agent/cache/stacks/HDP/2.5/services/CASSANDRA/package/scripts/cassandra_master.py",
>  line 27, in install
> import params
>   File 
> "/var/lib/ambari-agent/cache/stacks/HDP/2.5/services/CASSANDRA/package/scripts/params.py",
>  line 16, in 
> from resource_management.libraries.functions.version import  
> format_hdp_stack_version, compare_versions
> ImportError: cannot import name format_hdp_stack_version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14877) StreamCoordinator "leaks" threads

2018-11-08 Thread Aleksandr Sorokoumov (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Sorokoumov reassigned CASSANDRA-14877:


Assignee: Massimiliano Tomassi

> StreamCoordinator "leaks" threads
> -
>
> Key: CASSANDRA-14877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14877
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Massimiliano Tomassi
>Assignee: Massimiliano Tomassi
>Priority: Minor
> Fix For: 2.1.21, 2.2.14, 3.0.18, 3.11.4, 4.0
>
>
> Since Cassandra 2.1, streaming sessions are started by running a 
> StreamSessionConnector task for each session in a dedicated executor (a 
> static field of StreamCoordinator).
> That executor is initialized with 
> DebuggableThreadPoolExecutor.createWithFixedPoolSize, which means that once 
> created (up to the given limit of the number of logical cores), its threads 
> are kept alive for Integer.MAX_VALUE seconds.
> This practically means that once a node needs to establish streaming sessions 
> to n other nodes, it will create Math.min(n, numLogicalCores) 
> StreamConnectionEstablisher threads that will stay parked forever after 
> initializing (not completing) the session.
> It seems preferable to replace 
> DebuggableThreadPoolExecutor.createWithFixedPoolSize with 
> DebuggableThreadPoolExecutor.createWithMaximumPoolSize which allows providing 
> a saner keep-alive period (e.g. a minute).
> That's also what createWithFixedPoolSize's Javadoc recommends: If (most) 
> threads are expected to be idle most of the time, prefer createWithMaxSize() 
> instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14875) Explicitly initialize StageManager early in startup

2018-11-08 Thread Aleksandr Sorokoumov (JIRA)
Aleksandr Sorokoumov created CASSANDRA-14875:


 Summary: Explicitly initialize StageManager early in startup
 Key: CASSANDRA-14875
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14875
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksandr Sorokoumov
Assignee: Aleksandr Sorokoumov


{{StageManager}} initializes itself through a static block and sets up every 
{{Stage}}, including pre-starting their threads. This initizalization can take 
a few hundred milliseconds. This timing impact is unpredictable and hard to 
reason about; it looks like it usually gets hit when creating new Keyspaces on 
start-up and announcing them through migrations.

While these processes are resilient to these delays, it is dramatically easier 
to reason about over time if this initialization happens explicitly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14862) TestTopology.test_size_estimates_multidc fails on trunk

2018-11-08 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679212#comment-16679212
 ] 

Aleksey Yeschenko edited comment on CASSANDRA-14862 at 11/8/18 3:05 AM:


Committed to trunk as 
[2adfa92044381aa9093104f3a105f3dbd7dda94c|https://github.com/apache/cassandra/commit/2adfa92044381aa9093104f3a105f3dbd7dda94c],
 thanks.


was (Author: iamaleksey):
Committed to trunk as 
[507e4a46a166cab5322a50fbe40c80cb0d16c290|https://github.com/apache/cassandra/commit/507e4a46a166cab5322a50fbe40c80cb0d16c290],
 thanks.

> TestTopology.test_size_estimates_multidc fails on trunk
> ---
>
> Key: CASSANDRA-14862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14862
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Minor
>  Labels: 4.0-QA
> Fix For: 4.0
>
>
> The sorting of natural replicas in 
> {{SimpleStrategy.calculateNaturalReplicas}} committed as part of 
> [e645b917|https://github.com/apache/cassandra/commit/e645b9172c5d50fc2af407de724e46121edfe109#diff-0e1563a70b49cd81e9e11b4ddad15cf2L68]
>  for CASSANDRA-14726 has broken the 
> {{TestTopology.test_size_estimates_multidc}} dtest ([example 
> run|https://circleci.com/gh/jolynch/cassandra/245#tests/containers/48]) as 
> the "primary" ranges have now changed. I'm actually surprised only a single 
> dtest fails as I believe we've broken multi-dc {{SimpleStrategy}} reasonably 
> badly.
> In particular the {{SimpleStrategy.calculateNaturalReplicas}} method cannot 
> sort the endpoints by datacenter first. It has to leave them in the order 
> that it found them else change which replicas are considered "primary" 
> replicas (which mostly impacts repair and size estimates and the such).
> I have written a regression unit test for the SimpleStrategy and am running 
> it through circleci now. Will post the patch shortly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14870) The order of application of nodetool garbagecollect is broken

2018-11-08 Thread Branimir Lambov (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679435#comment-16679435
 ] 

Branimir Lambov commented on CASSANDRA-14870:
-

testall and dtests are clean on DataStax CI servers.

> The order of application of nodetool garbagecollect is broken
> -
>
> Key: CASSANDRA-14870
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14870
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>Priority: Major
>
> {{nodetool garbagecollect}} was intended to work from oldest sstable to 
> newest, so that the collection in newer tables can purge tombstones over data 
> that has been deleted.
> However, {{SSTableReader.maxTimestampComparator}} currently sorts in the 
> opposite order (the order changed in CASSANDRA-13776 and then back in 
> CASSANDRA-14010), which makes the garbage collection unable to purge any 
> tombstones.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14876) Snapshot name merges with keyspace name shown by nodetool listsnapshots for snapshots with long names

2018-11-08 Thread Aleksandr Sorokoumov (JIRA)
Aleksandr Sorokoumov created CASSANDRA-14876:


 Summary: Snapshot name merges with keyspace name shown by nodetool 
listsnapshots for snapshots with long names
 Key: CASSANDRA-14876
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14876
 Project: Cassandra
  Issue Type: Bug
Reporter: Aleksandr Sorokoumov
Assignee: Aleksandr Sorokoumov


If snapshot name is long enough, it will merge  keyspace name and the command 
output will be inconvenient to read for a {{nodetool}} user, e.g.

{noformat}
bin/nodetool listsnapshots
Snapshot Details:
Snapshot name   Keyspace nameColumn family name   
True size  Size on disk
1541670390886   system_distributed   parent_repair_history0 
bytes13 bytes
1541670390886   system_distributed   repair_history   0 
bytes13 bytes
1541670390886   system_auth  roles0 
bytes4.98 KB
1541670390886   system_auth  role_members 0 
bytes13 bytes
1541670390886   system_auth  
resource_role_permissons_index0 bytes13 bytes
1541670390886   system_auth  role_permissions 0 
bytes13 bytes
1541670390886   system_tracessessions 0 
bytes13 bytes
1541670390886   system_tracesevents   0 
bytes13 bytes
39_characters_long_name_2017-09-05-11-Usystem_distributed   
parent_repair_history0 bytes13 bytes
39_characters_long_name_2017-09-05-11-Usystem_distributed   
repair_history   0 bytes13 bytes
39_characters_long_name_2017-09-05-11-Usystem_auth  roles   
 0 bytes4.98 KB
39_characters_long_name_2017-09-05-11-Usystem_auth  
role_members 0 bytes13 bytes
39_characters_long_name_2017-09-05-11-Usystem_auth  
resource_role_permissons_index0 bytes13 bytes
39_characters_long_name_2017-09-05-11-Usystem_auth  
role_permissions 0 bytes13 bytes
39_characters_long_name_2017-09-05-11-Usystem_tracessessions
 0 bytes13 bytes
39_characters_long_name_2017-09-05-11-Usystem_tracesevents  
 0 bytes13 bytes
41_characters_long_name_2017-09-05-11-UTCsystem_distributed   
parent_repair_history0 bytes13 bytes
41_characters_long_name_2017-09-05-11-UTCsystem_distributed   
repair_history   0 bytes13 bytes
41_characters_long_name_2017-09-05-11-UTCsystem_auth  roles 
   0 bytes4.98 KB
41_characters_long_name_2017-09-05-11-UTCsystem_auth  
role_members 0 bytes13 bytes
41_characters_long_name_2017-09-05-11-UTCsystem_auth  
resource_role_permissons_index0 bytes13 bytes
41_characters_long_name_2017-09-05-11-UTCsystem_auth  
role_permissions 0 bytes13 bytes
41_characters_long_name_2017-09-05-11-UTCsystem_tracessessions  
   0 bytes13 bytes
41_characters_long_name_2017-09-05-11-UTCsystem_tracesevents
   0 bytes13 bytes
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14878) Race condition when setting bootstrap flags

2018-11-08 Thread Sergio Bossa (JIRA)
Sergio Bossa created CASSANDRA-14878:


 Summary: Race condition when setting bootstrap flags
 Key: CASSANDRA-14878
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14878
 Project: Cassandra
  Issue Type: Bug
Reporter: Sergio Bossa
Assignee: Sergio Bossa
 Fix For: 3.0.x, 3.11.x, 4.x


{{StorageService#bootstrap()}} is supposed to wait for bootstrap to finish, but 
Guava calls the future listeners 
[after|https://github.com/google/guava/blob/ec2dedebfa359991cbcc8750dc62003be63ec6d3/guava/src/com/google/common/util/concurrent/AbstractFuture.java#L890]
 unparking its waiters, which causes a race on when the {{bootstrapFinished()}} 
will be executed, making it non-deterministic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14879) Log DDL statements on coordinator

2018-11-08 Thread Sylvain Lebresne (JIRA)
Sylvain Lebresne created CASSANDRA-14879:


 Summary: Log DDL statements on coordinator
 Key: CASSANDRA-14879
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14879
 Project: Cassandra
  Issue Type: Improvement
  Components: CQL
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne


People sometimes run into issues with schema, and that is often because they do 
concurrent schema changes, which are just not supported and we should fix that 
someday, but in the meantime, it's not always easy to even check if you may 
indeed have had concurrent schema changes.

A very trivial way to make that easier would be to simply log DDL statements on 
the coordinator before they are executed. This is likely useful info for 
operators in the first place, and would allow in most case to track if 
concurrent schema was the likely cause of a particular issue seen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14874) Read repair can race with truncations

2018-11-08 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680360#comment-16680360
 ] 

Sylvain Lebresne commented on CASSANDRA-14874:
--

bq. Seems like there is a case for an über tombstone with no partition or 
clustering.

Well, that's kind of what I'd qualify as having truncate being timestamp based, 
and that's a much bigger change to Truncate that just this problem. It does 
likely make this problem much easier to solve, or at least easier to solve the 
right way, but it still qualify as a rewrite of truncate, and one that would 
likely not fully preserve existing behavior (typically, you wouldn't be able to 
add data with pre-truncate timstamp after a truncate (unless we do something 
rather funky), while this is possible today; and to be extra clear, I am not 
arguing the current behavior is better, I'm just pointing out that this 
wouldn't be a fully backward compatible change, and that's to be taken into 
account)

In any case, I'm not opposed at all to consider that option in principle, 
because I do think a timestamp-based truncate is likely an overall better fit 
to C* and we should probably have done first to start with, but as this is 
probably a bit longer term (and probably deserve its own ticket), I'll still 
suggest that maybe the trivial solution I've hinted in my previous comment 
might be a good stopgap option to make things behave predictably most of the 
time.

> Read repair can race with truncations
> -
>
> Key: CASSANDRA-14874
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14874
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Sylvain Lebresne
>Priority: Minor
>
> While hint and commit log replay handle truncation alright, we don't have 
> anything to prevent a read/read-repair to race with {{TRUNCATE}}. In other 
> words, you can have a read reading some pre-truncation data, some truncation 
> running and removing that data, and then some read-repair mutation from that 
> previous read that resurrects some data that should have bene truncated.
> Probably not that common in practice, but can lead to seemingly random data 
> surviving truncate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14554) LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded context

2018-11-08 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679214#comment-16679214
 ] 

Stefania commented on CASSANDRA-14554:
--

You're welcome [~benedict] !

bq.  I wonder if you had considered (and potentially discarded) what might be a 
slightly simpler approach of allocating a separate LifecycleTransaction for 
each operation, and atomically transferring their contents as they "complete" 
to the shared LivecycleTransaction?

No I hadn't considered it. It sounds elegant in principle but in order to 
atomically transfer child transactions to their parent, we'd have to add some 
complexity to transactions that I'm not sure we need. Obviously, the state of 
the parent transaction could change at any time (due to an abort), including 
whilst a child transaction is trying to transfer its state. So this would 
require some form of synchronization or CAS. The same is true for two child 
transactions transferring their state simultaneously. The state on disk should 
be fine as long as child transactions are never committed but only transferred. 
Child transaction should be allowed to abort independently though. So different 
rules for child and parent transactions would apply. 

I'm not sure we need this additional complexity because the txn state only 
changes rarely. {{LifecycleTransaction}} exposes a large API, but many methods 
are probably only used during compaction. Extracting a more comprehensive 
interface that can be implemented with a synchronized wrapper may be an easier 
approach.

I submitted a safe patch that fixes a known problem with streaming and that is 
safe for branches that will not undergo a major release testing cycle. 
Unfortunately, I do not have the time to work on a more comprehensive solution, 
at least not right now. I could however review whichever approach we choose.

> LifecycleTransaction encounters ConcurrentModificationException when used in 
> multi-threaded context
> ---
>
> Key: CASSANDRA-14554
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14554
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Major
>
> When LifecycleTransaction is used in a multi-threaded context, we encounter 
> this exception -
> {quote}java.util.ConcurrentModificationException: null
>  at 
> java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719)
>  at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742)
>  at java.lang.Iterable.forEach(Iterable.java:74)
>  at 
> org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78)
>  at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320)
>  at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285)
>  at 
> org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136)
>  at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529)
> {quote}
> During streaming we create a reference to a {{LifeCycleTransaction}} and 
> share it between threads -
> [https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156]
> This is used in a multi-threaded context inside {{CassandraIncomingFile}} 
> which is an {{IncomingStreamMessage}}. This is being deserialized in parallel.
> {{LifecycleTransaction}} is not meant to be used in a multi-threaded context 
> and this leads to streaming failures due to object sharing. On trunk, this 
> object is shared across all threads that transfer sstables in parallel for 
> the given {{TableId}} in a {{StreamSession}}. There are two options to solve 
> this - make {{LifecycleTransaction}} and the associated objects thread safe, 
> scope the transaction to a single {{CassandraIncomingFile}}. The consequences 
> of the latter option is that if we experience streaming failure we may have 
> redundant SSTables on disk. This is ok as compaction should clean this up. A 
> third option is we synchronize access in the streaming infrastructure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14862) TestTopology.test_size_estimates_multidc fails on trunk

2018-11-08 Thread Aleksey Yeschenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-14862:
--
Resolution: Fixed
Status: Resolved  (was: Ready to Commit)

> TestTopology.test_size_estimates_multidc fails on trunk
> ---
>
> Key: CASSANDRA-14862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14862
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Minor
>  Labels: 4.0-QA
> Fix For: 4.0
>
>
> The sorting of natural replicas in 
> {{SimpleStrategy.calculateNaturalReplicas}} committed as part of 
> [e645b917|https://github.com/apache/cassandra/commit/e645b9172c5d50fc2af407de724e46121edfe109#diff-0e1563a70b49cd81e9e11b4ddad15cf2L68]
>  for CASSANDRA-14726 has broken the 
> {{TestTopology.test_size_estimates_multidc}} dtest ([example 
> run|https://circleci.com/gh/jolynch/cassandra/245#tests/containers/48]) as 
> the "primary" ranges have now changed. I'm actually surprised only a single 
> dtest fails as I believe we've broken multi-dc {{SimpleStrategy}} reasonably 
> badly.
> In particular the {{SimpleStrategy.calculateNaturalReplicas}} method cannot 
> sort the endpoints by datacenter first. It has to leave them in the order 
> that it found them else change which replicas are considered "primary" 
> replicas (which mostly impacts repair and size estimates and the such).
> I have written a regression unit test for the SimpleStrategy and am running 
> it through circleci now. Will post the patch shortly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14876) Snapshot name merges with keyspace name shown by nodetool listsnapshots for snapshots with long names

2018-11-08 Thread Aleksandr Sorokoumov (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679835#comment-16679835
 ] 

Aleksandr Sorokoumov edited comment on CASSANDRA-14876 at 11/8/18 2:34 PM:
---

Patch: [3.0|https://github.com/Ge/cassandra/tree/14876-3.0]




was (Author: ge):
Patch: [3.0 | https://github.com/Ge/cassandra/commits/14876-3.0]

> Snapshot name merges with keyspace name shown by nodetool listsnapshots for 
> snapshots with long names
> -
>
> Key: CASSANDRA-14876
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14876
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Aleksandr Sorokoumov
>Assignee: Aleksandr Sorokoumov
>Priority: Major
> Fix For: 3.0.x
>
>
> If snapshot name is long enough, it will merge  keyspace name and the command 
> output will be inconvenient to read for a {{nodetool}} user, e.g.
> {noformat}
> bin/nodetool listsnapshots
> Snapshot Details:
> Snapshot name   Keyspace nameColumn family name   
> True size  Size on disk
> 1541670390886   system_distributed   parent_repair_history
> 0 bytes13 bytes
> 1541670390886   system_distributed   repair_history   
> 0 bytes13 bytes
> 1541670390886   system_auth  roles
> 0 bytes4.98 KB
> 1541670390886   system_auth  role_members 
> 0 bytes13 bytes
> 1541670390886   system_auth  
> resource_role_permissons_index0 bytes13 bytes
> 1541670390886   system_auth  role_permissions 
> 0 bytes13 bytes
> 1541670390886   system_tracessessions 
> 0 bytes13 bytes
> 1541670390886   system_tracesevents   
> 0 bytes13 bytes
> 39_characters_long_name_2017-09-05-11-Usystem_distributed   
> parent_repair_history0 bytes13 bytes
> 39_characters_long_name_2017-09-05-11-Usystem_distributed   
> repair_history   0 bytes13 bytes
> 39_characters_long_name_2017-09-05-11-Usystem_auth  roles 
>0 bytes4.98 KB
> 39_characters_long_name_2017-09-05-11-Usystem_auth  
> role_members 0 bytes13 bytes
> 39_characters_long_name_2017-09-05-11-Usystem_auth  
> resource_role_permissons_index0 bytes13 bytes
> 39_characters_long_name_2017-09-05-11-Usystem_auth  
> role_permissions 0 bytes13 bytes
> 39_characters_long_name_2017-09-05-11-Usystem_tracessessions  
>0 bytes13 bytes
> 39_characters_long_name_2017-09-05-11-Usystem_tracesevents
>0 bytes13 bytes
> 41_characters_long_name_2017-09-05-11-UTCsystem_distributed   
> parent_repair_history0 bytes13 bytes
> 41_characters_long_name_2017-09-05-11-UTCsystem_distributed   
> repair_history   0 bytes13 bytes
> 41_characters_long_name_2017-09-05-11-UTCsystem_auth  roles   
>  0 bytes4.98 KB
> 41_characters_long_name_2017-09-05-11-UTCsystem_auth  
> role_members 0 bytes13 bytes
> 41_characters_long_name_2017-09-05-11-UTCsystem_auth  
> resource_role_permissons_index0 bytes13 bytes
> 41_characters_long_name_2017-09-05-11-UTCsystem_auth  
> role_permissions 0 bytes13 bytes
> 41_characters_long_name_2017-09-05-11-UTCsystem_traces
> sessions 0 bytes13 bytes
> 41_characters_long_name_2017-09-05-11-UTCsystem_tracesevents  
>  0 bytes13 bytes
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14554) LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded context

2018-11-08 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680015#comment-16680015
 ] 

Benedict commented on CASSANDRA-14554:
--

I was actually thinking of something very simple.  Child transactions would not 
have any direct relationship to parents, there would just be a method to 
transfer their contents, and this method would be synchronised.  The other 
methods on a {{LifecycleTransaction}} could simply be marked synchronised as 
well.  I don't think there would be any major problem with this?  It's not a 
high-traffic object, so the cost would be low even without extracting a 
synchronised interface, particularly as this object requires regular fsyncs.

I completely understand that you may be too busy to try this alternative 
approach.  I think it would be _preferable_ for somebody to have a brief try at 
the alternative, just to see if we can isolate the complexity, but if we find 
we don't have time I think your patch looks good too (modulo a proper review).  

Perhaps we should wait and see how things pan out with finding time for review, 
as I know [~djoshi3] had been planning to take a crack at this too.

> LifecycleTransaction encounters ConcurrentModificationException when used in 
> multi-threaded context
> ---
>
> Key: CASSANDRA-14554
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14554
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Major
>
> When LifecycleTransaction is used in a multi-threaded context, we encounter 
> this exception -
> {quote}java.util.ConcurrentModificationException: null
>  at 
> java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719)
>  at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742)
>  at java.lang.Iterable.forEach(Iterable.java:74)
>  at 
> org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78)
>  at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320)
>  at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285)
>  at 
> org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136)
>  at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529)
> {quote}
> During streaming we create a reference to a {{LifeCycleTransaction}} and 
> share it between threads -
> [https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156]
> This is used in a multi-threaded context inside {{CassandraIncomingFile}} 
> which is an {{IncomingStreamMessage}}. This is being deserialized in parallel.
> {{LifecycleTransaction}} is not meant to be used in a multi-threaded context 
> and this leads to streaming failures due to object sharing. On trunk, this 
> object is shared across all threads that transfer sstables in parallel for 
> the given {{TableId}} in a {{StreamSession}}. There are two options to solve 
> this - make {{LifecycleTransaction}} and the associated objects thread safe, 
> scope the transaction to a single {{CassandraIncomingFile}}. The consequences 
> of the latter option is that if we experience streaming failure we may have 
> redundant SSTables on disk. This is ok as compaction should clean this up. A 
> third option is we synchronize access in the streaming infrastructure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14877) StreamCoordinator "leaks" threads

2018-11-08 Thread Massimiliano Tomassi (JIRA)
Massimiliano Tomassi created CASSANDRA-14877:


 Summary: StreamCoordinator "leaks" threads
 Key: CASSANDRA-14877
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14877
 Project: Cassandra
  Issue Type: Bug
  Components: Streaming and Messaging
Reporter: Massimiliano Tomassi
 Fix For: 2.1.21, 2.2.14, 3.0.18, 3.11.4, 4.0


Since Cassandra 2.1, streaming sessions are started by running a 
StreamSessionConnector task for each session in a dedicated executor (a static 
field of StreamCoordinator).
That executor is initialized with 
DebuggableThreadPoolExecutor.createWithFixedPoolSize, which means that once 
created (up to the given limit of the number of logical cores), its threads are 
kept alive for Integer.MAX_VALUE seconds.

This practically means that once a node needs to establish streaming sessions 
to n other nodes, it will create Math.min(n, numLogicalCores) 
StreamConnectionEstablisher threads that will stay parked forever after 
initializing (not completing) the session.

It seems preferable to replace 
DebuggableThreadPoolExecutor.createWithFixedPoolSize with 
DebuggableThreadPoolExecutor.createWithMaximumPoolSize which allows providing a 
saner keep-alive period (e.g. a minute).

That's also what createWithFixedPoolSize's Javadoc recommends: If (most) 
threads are expected to be idle most of the time, prefer createWithMaxSize() 
instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14874) Read repair can race with truncations

2018-11-08 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680066#comment-16680066
 ] 

Ariel Weisberg commented on CASSANDRA-14874:


I was just talking about something similar with [~bdeggleston] last night. 
Seems like there is a case for an über tombstone with no partition or 
clustering. Truncate should also require CL responses from all ranges on the 
ring. This tombstone could shadow any data from before the truncation. Removing 
all the pre-truncate data could then be a background deletion process.

> Read repair can race with truncations
> -
>
> Key: CASSANDRA-14874
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14874
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Sylvain Lebresne
>Priority: Minor
>
> While hint and commit log replay handle truncation alright, we don't have 
> anything to prevent a read/read-repair to race with {{TRUNCATE}}. In other 
> words, you can have a read reading some pre-truncation data, some truncation 
> running and removing that data, and then some read-repair mutation from that 
> previous read that resurrects some data that should have bene truncated.
> Probably not that common in practice, but can lead to seemingly random data 
> surviving truncate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org