[jira] [Created] (CASSANDRA-14882) Exception in Cassandra during Compaction
Alex C Punnen created CASSANDRA-14882: - Summary: Exception in Cassandra during Compaction Key: CASSANDRA-14882 URL: https://issues.apache.org/jira/browse/CASSANDRA-14882 Project: Cassandra Issue Type: Bug Environment: Cassandra single node 3.11.2 running on Linux (CentOS I think) Reporter: Alex C Punnen Exception coming in Debug logs every minute. The other information is that the particular table entries were deleted (not sure if by program error or human error) in a testbed. --- DEBUG [CompactionExecutor:750] 2018-10-26 21:59:00,512 CompactionTask.java:155 - Compacting (583f8a00-d96a-11e8-9c62-3944322fd453) [/var/lib/cassandra/data/xx/xxt_uuid-35106240c6d511e897e559c5a71f3c3c/mc-4-big-Data.db:level=0, ] ERROR [CompactionExecutor:750] 2018-10-26 21:59:00,514 CassandraDaemon.java:228 - Exception in thread Thread[CompactionExecutor:750,1,main] java.lang.NullPointerException: null at org.apache.cassandra.config.CFMetaData.enforceStrictLiveness(CFMetaData.java:1266) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.compaction.CompactionIterator$Purger.(CompactionIterator.java:280) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.compaction.CompactionIterator$Purger.(CompactionIterator.java:264) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.compaction.CompactionIterator.(CompactionIterator.java:108) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.compaction.CompactionIterator.(CompactionIterator.java:80) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:177) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:85) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:275) ~[apache-cassandra-3.11.2.jar:3.11.2] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_161] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_161] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_161] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_161] at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.2.jar:3.11.2] at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_161] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14842) SSL connection problems when upgrading to 4.0 when upgrading from 3.0.x
[ https://issues.apache.org/jira/browse/CASSANDRA-14842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommy Stendahl reassigned CASSANDRA-14842: -- Assignee: Tommy Stendahl > SSL connection problems when upgrading to 4.0 when upgrading from 3.0.x > --- > > Key: CASSANDRA-14842 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14842 > Project: Cassandra > Issue Type: Bug >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl >Priority: Major > > While testing to upgrade from 3.0.15 to 4.0 the old nodes fails to connect to > the 4.0 node, I get this exception on the 4.0 node: > > {noformat} > 2018-10-22T11:57:44.366+0200 ERROR [MessagingService-NettyInbound-Thread-3-8] > InboundHandshakeHandler.java:300 Failed to properly handshake with peer > /10.216.193.246:58296. Closing the channel. > io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: > SSLv2Hello is disabled > at > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) > at > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:808) > at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:417) > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:317) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > Caused by: javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled > at sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:637) > at sun.security.ssl.InputRecord.read(InputRecord.java:527) > at sun.security.ssl.EngineInputRecord.read(EngineInputRecord.java:382) > at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:962) > at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907) > at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781) > at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) > at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:294) > at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1275) > at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1177) > at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1221) > at > io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) > at > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) > ... 14 common frames omitted{noformat} > In the server encryption options on the 4.0 node I have both "enabled and > "enable_legacy_ssl_storage_port" set to true so it should accept incoming > connections on the "ssl_storage_port". > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14881) Inconsistent behaviour of counter update during cassandra upgrade 2.1.16 -> 3.11.2
[ https://issues.apache.org/jira/browse/CASSANDRA-14881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laxmikant Upadhyay updated CASSANDRA-14881: --- Description: I have 3 node (A,B,C) 2.1.16 cassandra cluster which i am upgrading to cassandra 3.11.2. My current cluster status is node a has been upgrade to 3.11.2, B is down, and C is on cassandra 2.1.16 when i run counter update using cqlsh it is behaving strange inconsistent way , sometimes the update actually applied and sometimes it does not apply at all. See the below example of cqlsh logged in to node A: ===Incorrect update : update not applied {code:java} user@cqlsh:ks> select * from "counterTable"; key | column1 | value --++--- key1 | count | 0 user@cqlsh:ks> UPDATE "counterTable" SET value = value + 100 WHERE key = 'key1' AND column1 = 'count'; user@cqlsh:ks> select * from "counterTable"; key | column1 | value --++--- key1 | count | 0{code} ===Correct update : update applied successfully {code:java} user@cqlsh> USE ks ; user@cqlsh:ks> select * from "counterTable"; key | column1 | value --++--- key1 | count | -100 user@cqlsh:ks> UPDATE "counterTable" SET value = value + 100 WHERE key = 'key1' AND column1 = 'count'; user@cqlsh:ks> select * from "counterTable"; key | column1 | value --++--- key1 | count | 0{code} I have attached the output with trace enabled (with actual timestamp) for both correct and incorrect counter update . What is the reason of this weird behavior ? Below is my table details: {code:java} CREATE TABLE ks."counterTable" ( key text, column1 text, value counter, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY (column1 ASC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE';{code} *Note :* When all nodes are successfully upgraded OR when 2 nodes gets upgraded to 3.11.2 and 3rd non-upgraded node is down then we don't see such issue . couner update works as expected was: I have 3 node (A,B,C) 2.1.16 cassandra cluster which i am upgrading to cassandra 3.11.2. My current cluster status is node a has been upgrade to 3.11.2, B is down, and C is on cassandra 2.1.16 when i run counter update using cqlsh it is behaving strange inconsistent way , sometimes the update actually applied and sometimes it does not apply at all. See the below example of cqlsh logged in to node A: ===Incorrect update : update not applied {code:java} user@cqlsh:ks> select * from "counterTable"; key | column1 | value --++--- key1 | count | 0 user@cqlsh:ks> UPDATE "counterTable" SET value = value + 100 WHERE key = 'key1' AND column1 = 'count'; user@cqlsh:ks> select * from "counterTable"; key | column1 | value --++--- key1 | count | 0{code} ===Correct update : update applied successfully {code:java} user@cqlsh> USE ks ; user@cqlsh:ks> select * from "counterTable"; key | column1 | value --++--- key1 | count | -100 user@cqlsh:ks> UPDATE "counterTable" SET value = value + 100 WHERE key = 'key1' AND column1 = 'count'; user@cqlsh:ks> select * from "counterTable"; key | column1 | value --++--- key1 | count | 0{code} I have attached the output with trace enabled (with actual timestamp) for both correct and incorrect counter update . What is the reason of this weird behavior ? *Note :* When all nodes are successfully upgraded OR when 2 nodes gets upgraded to 3.11.2 and 3rd non-upgraded node is down then we don't see such issue . couner update works as expected > Inconsistent behaviour of counter update during cassandra upgrade 2.1.16 -> > 3.11.2 > -- > > Key: CASSANDRA-14881 > URL:
[jira] [Created] (CASSANDRA-14881) Inconsistent behaviour of counter update during cassandra upgrade 2.1.16 -> 3.11.2
Laxmikant Upadhyay created CASSANDRA-14881: -- Summary: Inconsistent behaviour of counter update during cassandra upgrade 2.1.16 -> 3.11.2 Key: CASSANDRA-14881 URL: https://issues.apache.org/jira/browse/CASSANDRA-14881 Project: Cassandra Issue Type: Bug Components: Core Environment: {code:java} // code placeholder {code} Reporter: Laxmikant Upadhyay Attachments: CorrectCounterUpdateTrace.txt, IncorrectCounterUpdateTrace.txt I have 3 node (A,B,C) 2.1.16 cassandra cluster which i am upgrading to cassandra 3.11.2. My current cluster status is node a has been upgrade to 3.11.2, B is down, and C is on cassandra 2.1.16 when i run counter update using cqlsh it is behaving strange inconsistent way , sometimes the update actually applied and sometimes it does not apply at all. See the below example of cqlsh logged in to node A: ===Incorrect update : update not applied {code:java} user@cqlsh:ks> select * from "counterTable"; key | column1 | value --++--- key1 | count | 0 user@cqlsh:ks> UPDATE "counterTable" SET value = value + 100 WHERE key = 'key1' AND column1 = 'count'; user@cqlsh:ks> select * from "counterTable"; key | column1 | value --++--- key1 | count | 0{code} ===Correct update : update applied successfully {code:java} user@cqlsh> USE ks ; user@cqlsh:ks> select * from "counterTable"; key | column1 | value --++--- key1 | count | -100 user@cqlsh:ks> UPDATE "counterTable" SET value = value + 100 WHERE key = 'key1' AND column1 = 'count'; user@cqlsh:ks> select * from "counterTable"; key | column1 | value --++--- key1 | count | 0{code} I have attached the output with trace enabled (with actual timestamp) for both correct and incorrect counter update . What is the reason of this weird behavior ? *Note :* When all nodes are successfully upgraded OR when 2 nodes gets upgraded to 3.11.2 and 3rd non-upgraded node is down then we don't see such issue . couner update works as expected -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14554) LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded context
[ https://issues.apache.org/jira/browse/CASSANDRA-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680787#comment-16680787 ] Stefania commented on CASSANDRA-14554: -- bq. The other methods on a LifecycleTransaction could simply be marked synchronised as well. If we are synchronizing the LifecycleTransaction methods anyway, I'm not sure I understand why we need child transactions. Even in 4.0, where Netty threads call {{trackNew}}, I don't think we're adding sstables so frequently to introduce contention on a shared, synchronized txn. Considering {{trackNew}} performs a file sync as you correctly reminded me, surely this blocks Netty threads more than a synchronized {{trackNew}}. Maybe if many sstables are created concurrently during streaming, child transactions would make sense. I'm still not totally sure. It's fine with me if we prefer to try a different alternative, the patch is available at any time. This code is not changing much so there is little risk of the patch getting stale. For info, internally [~snazy] already reviewed the patch. > LifecycleTransaction encounters ConcurrentModificationException when used in > multi-threaded context > --- > > Key: CASSANDRA-14554 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14554 > Project: Cassandra > Issue Type: Bug >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > > When LifecycleTransaction is used in a multi-threaded context, we encounter > this exception - > {quote}java.util.ConcurrentModificationException: null > at > java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) > at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) > at java.lang.Iterable.forEach(Iterable.java:74) > at > org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78) > at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320) > at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285) > at > org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136) > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529) > {quote} > During streaming we create a reference to a {{LifeCycleTransaction}} and > share it between threads - > [https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156] > This is used in a multi-threaded context inside {{CassandraIncomingFile}} > which is an {{IncomingStreamMessage}}. This is being deserialized in parallel. > {{LifecycleTransaction}} is not meant to be used in a multi-threaded context > and this leads to streaming failures due to object sharing. On trunk, this > object is shared across all threads that transfer sstables in parallel for > the given {{TableId}} in a {{StreamSession}}. There are two options to solve > this - make {{LifecycleTransaction}} and the associated objects thread safe, > scope the transaction to a single {{CassandraIncomingFile}}. The consequences > of the latter option is that if we experience streaming failure we may have > redundant SSTables on disk. This is ok as compaction should clean this up. A > third option is we synchronize access in the streaming infrastructure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14880) drop table and materialized view frequently get error over time
[ https://issues.apache.org/jira/browse/CASSANDRA-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zourun updated CASSANDRA-14880: --- Description: when i create table and materialized view,then i drop it, if i drop it frequently it will got error: "no response received from cassandra within timeout period",for example: for i=0; i<100; i++ { create table; create materialized view; drop materialized view; drop table; how can i solve it ? was: when i create table and materialized view,then i drop it, if i drop it frequently it will got error: "no response received from cassandra within timeout period",for example: {{ for i=0; i<100; i++ { create table; create materialized view; drop materialized view; drop table; }}} how can i solve it ? > drop table and materialized view frequently get error over time > --- > > Key: CASSANDRA-14880 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14880 > Project: Cassandra > Issue Type: Bug >Reporter: zourun >Priority: Major > > when i create table and materialized view,then i drop it, if i drop it > frequently it will got error: "no response received from cassandra within > timeout period",for example: > for i=0; i<100; i++ { > create table; > create materialized view; > drop materialized view; > drop table; > how can i solve it ? > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14880) drop table and materialized view frequently get error over time
[ https://issues.apache.org/jira/browse/CASSANDRA-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zourun updated CASSANDRA-14880: --- Description: when i create table and materialized view,then i drop it, if i drop it frequently it will got error: "no response received from cassandra within timeout period",for example: {{ for i=0; i<100; i++ { create table; create materialized view; drop materialized view; drop table; }}} how can i solve it ? was: when i create table and materialized view,then i drop it, if i drop it frequently it will got error: "no response received from cassandra within timeout period",for example: for i=0; i<100; i++ { create table; create materialized view; drop materialized view; drop table; } > drop table and materialized view frequently get error over time > --- > > Key: CASSANDRA-14880 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14880 > Project: Cassandra > Issue Type: Bug >Reporter: zourun >Priority: Major > > when i create table and materialized view,then i drop it, if i drop it > frequently it will got error: "no response received from cassandra within > timeout period",for example: > {{ for i=0; i<100; i++ { > create table; > create materialized view; > drop materialized view; > drop table; > }}} > how can i solve it ? > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14880) drop table and materialized view frequently get error over time
[ https://issues.apache.org/jira/browse/CASSANDRA-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zourun updated CASSANDRA-14880: --- Description: when i create table and materialized view,then i drop it, if i drop it frequently it will got error: "no response received from cassandra within timeout period",for example: for i=0; i<100; i++ { create table; create materialized view; drop materialized view; drop table; } was: when i create table and materialized view,then i drop it, if i drop it frequently it will got error: "no response received from cassandra within timeout period",for example: {quote} for i=0; i<100; i++ { create table; create materialized view; drop materialized view; drop table; } {quote} > drop table and materialized view frequently get error over time > --- > > Key: CASSANDRA-14880 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14880 > Project: Cassandra > Issue Type: Bug >Reporter: zourun >Priority: Major > > when i create table and materialized view,then i drop it, if i drop it > frequently it will got error: "no response received from cassandra within > timeout period",for example: > > for i=0; i<100; i++ { > create table; > create materialized view; > drop materialized view; > drop table; > } > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14880) drop table and materialized view frequently get error over time
[ https://issues.apache.org/jira/browse/CASSANDRA-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zourun updated CASSANDRA-14880: --- Description: when i create table and materialized view,then i drop it, if i drop it frequently it will got error: "no response received from cassandra within timeout period",for example: {quote} for i=0; i<100; i++ { create table; create materialized view; drop materialized view; drop table; } {quote} was: when i create table and materialized view,then i drop it, if i drop it frequently it will got error: "no response received from cassandra within timeout period",for example: for i=0; i<100; i++ { create table; create materialized view; drop materialized view; drop table; } > drop table and materialized view frequently get error over time > --- > > Key: CASSANDRA-14880 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14880 > Project: Cassandra > Issue Type: Bug >Reporter: zourun >Priority: Major > > when i create table and materialized view,then i drop it, if i drop it > frequently it will got error: "no response received from cassandra within > timeout period",for example: > > {quote} for i=0; i<100; i++ { > create table; > create materialized view; > drop materialized view; > drop table; > } > {quote} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14880) drop table and materialized view frequently get error over time
[ https://issues.apache.org/jira/browse/CASSANDRA-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zourun updated CASSANDRA-14880: --- Description: when i create table and materialized view,then i drop it, if i drop it frequently it will got error: "no response received from cassandra within timeout period",for example: for i=0; i<100; i++ { create table; create materialized view; drop materialized view; drop table; } was: when i create table and materialized view,then i drop it, if i drop it frequently it will got error: "no response received from cassandra within timeout period",for example: for i=0; i<100; i++{ create table; create materialized view; sleep(1); drop materialized view; drop table; } > drop table and materialized view frequently get error over time > --- > > Key: CASSANDRA-14880 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14880 > Project: Cassandra > Issue Type: Bug >Reporter: zourun >Priority: Major > > when i create table and materialized view,then i drop it, if i drop it > frequently it will got error: "no response received from cassandra within > timeout period",for example: > for i=0; i<100; i++ { > create table; > create materialized view; > drop materialized view; > drop table; > } > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14880) drop table and materialized view frequently get error over time
zourun created CASSANDRA-14880: -- Summary: drop table and materialized view frequently get error over time Key: CASSANDRA-14880 URL: https://issues.apache.org/jira/browse/CASSANDRA-14880 Project: Cassandra Issue Type: Bug Reporter: zourun when i create table and materialized view,then i drop it, if i drop it frequently it will got error: "no response received from cassandra within timeout period",for example: for i=0; i<100; i++{ create table; create materialized view; sleep(1); drop materialized view; drop table; } -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14875) Explicitly initialize StageManager early in startup
[ https://issues.apache.org/jira/browse/CASSANDRA-14875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680569#comment-16680569 ] Jeff Jirsa commented on CASSANDRA-14875: Is this 4.0 only? > Explicitly initialize StageManager early in startup > --- > > Key: CASSANDRA-14875 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14875 > Project: Cassandra > Issue Type: Improvement >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Minor > Fix For: 4.x > > > {{StageManager}} initializes itself through a static block and sets up every > {{Stage}}, including pre-starting their threads. This initizalization can > take a few hundred milliseconds. This timing impact is unpredictable and hard > to reason about; it looks like it usually gets hit when creating new > Keyspaces on start-up and announcing them through migrations. > While these processes are resilient to these delays, it is dramatically > easier to reason about over time if this initialization happens explicitly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14877) StreamCoordinator "leaks" threads
[ https://issues.apache.org/jira/browse/CASSANDRA-14877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-14877: --- Fix Version/s: (was: 3.11.4) (was: 3.0.18) (was: 2.2.14) (was: 2.1.21) (was: 4.0) 4.x 3.11.x 3.0.x 2.2.x 2.1.x > StreamCoordinator "leaks" threads > - > > Key: CASSANDRA-14877 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14877 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Massimiliano Tomassi >Assignee: Massimiliano Tomassi >Priority: Minor > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.11.x, 4.x > > > Since Cassandra 2.1, streaming sessions are started by running a > StreamSessionConnector task for each session in a dedicated executor (a > static field of StreamCoordinator). > That executor is initialized with > DebuggableThreadPoolExecutor.createWithFixedPoolSize, which means that once > created (up to the given limit of the number of logical cores), its threads > are kept alive for Integer.MAX_VALUE seconds. > This practically means that once a node needs to establish streaming sessions > to n other nodes, it will create Math.min(n, numLogicalCores) > StreamConnectionEstablisher threads that will stay parked forever after > initializing (not completing) the session. > It seems preferable to replace > DebuggableThreadPoolExecutor.createWithFixedPoolSize with > DebuggableThreadPoolExecutor.createWithMaximumPoolSize which allows providing > a saner keep-alive period (e.g. a minute). > That's also what createWithFixedPoolSize's Javadoc recommends: If (most) > threads are expected to be idle most of the time, prefer createWithMaxSize() > instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14875) Explicitly initialize StageManager early in startup
[ https://issues.apache.org/jira/browse/CASSANDRA-14875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-14875: --- Fix Version/s: 4.x > Explicitly initialize StageManager early in startup > --- > > Key: CASSANDRA-14875 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14875 > Project: Cassandra > Issue Type: Improvement >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Minor > Fix For: 4.x > > > {{StageManager}} initializes itself through a static block and sets up every > {{Stage}}, including pre-starting their threads. This initizalization can > take a few hundred milliseconds. This timing impact is unpredictable and hard > to reason about; it looks like it usually gets hit when creating new > Keyspaces on start-up and announcing them through migrations. > While these processes are resilient to these delays, it is dramatically > easier to reason about over time if this initialization happens explicitly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14862) TestTopology.test_size_estimates_multidc fails on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-14862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679212#comment-16679212 ] Aleksey Yeschenko commented on CASSANDRA-14862: --- Committed to trunk as [507e4a46a166cab5322a50fbe40c80cb0d16c290|https://github.com/apache/cassandra/commit/507e4a46a166cab5322a50fbe40c80cb0d16c290], thanks. > TestTopology.test_size_estimates_multidc fails on trunk > --- > > Key: CASSANDRA-14862 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14862 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Minor > Labels: 4.0-QA > Fix For: 4.0 > > > The sorting of natural replicas in > {{SimpleStrategy.calculateNaturalReplicas}} committed as part of > [e645b917|https://github.com/apache/cassandra/commit/e645b9172c5d50fc2af407de724e46121edfe109#diff-0e1563a70b49cd81e9e11b4ddad15cf2L68] > for CASSANDRA-14726 has broken the > {{TestTopology.test_size_estimates_multidc}} dtest ([example > run|https://circleci.com/gh/jolynch/cassandra/245#tests/containers/48]) as > the "primary" ranges have now changed. I'm actually surprised only a single > dtest fails as I believe we've broken multi-dc {{SimpleStrategy}} reasonably > badly. > In particular the {{SimpleStrategy.calculateNaturalReplicas}} method cannot > sort the endpoints by datacenter first. It has to leave them in the order > that it found them else change which replicas are considered "primary" > replicas (which mostly impacts repair and size estimates and the such). > I have written a regression unit test for the SimpleStrategy and am running > it through circleci now. Will post the patch shortly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14874) Read repair can race with truncations
[ https://issues.apache.org/jira/browse/CASSANDRA-14874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679554#comment-16679554 ] Sylvain Lebresne commented on CASSANDRA-14874: -- I'll note that while I have barely spent any time thinking about it, I have a hunch that getting this perfectly right might not be easy since truncate is not timestamp based (and not coordinated with reads), but at least having coordinators check, before sending read-repairs, if they have seen a truncation since the beginning of the repair would be dead simple and improve this drastically. > Read repair can race with truncations > - > > Key: CASSANDRA-14874 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14874 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Sylvain Lebresne >Priority: Minor > > While hint and commit log replay handle truncation alright, we don't have > anything to prevent a read/read-repair to race with {{TRUNCATE}}. In other > words, you can have a read reading some pre-truncation data, some truncation > running and removing that data, and then some read-repair mutation from that > previous read that resurrects some data that should have bene truncated. > Probably not that common in practice, but can lead to seemingly random data > surviving truncate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679078#comment-16679078 ] Li commented on CASSANDRA-14495: What if the memory usage already hits over 95% and still keeps growing? No latency or throughput impact yet. > Memory Leak /High Memory usage post 3.11.2 upgrade > -- > > Key: CASSANDRA-14495 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14495 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: Abdul Patel >Priority: Major > Attachments: cas_heap.txt > > > Hi All, > > I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from > 3.10 to 3.11.2 version. > No issues reported apart from only nodetool info reporting 80% usage . > I intially had 16GB memory on each node, later i bumped up to 20GB, and > rebooted all nodes. > Waited for an week and now again i have seen memory usage more than 80% , > 16GB + . > this means some memory leaks are happening over the time. > Any one has faced such issue or do we have any workaround ? my 3.11.2 version > upgrade rollout has been halted because of this bug. > === > ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc > Gossip active : true > Thrift active : true > Native Transport active: true > Load : 985.24 MiB > Generation No : 1526923117 > Uptime (seconds) : 1097684 > Heap Memory (MB) : 16875.64 / 20480.00 > Off Heap Memory (MB) : 20.42 > Data Center : DC7 > Rack : rac1 > Exceptions : 0 > Key Cache : entries 3569, size 421.44 KiB, capacity 100 MiB, > 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in > seconds > Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 > requests, NaN recent hit rate, 0 save period in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache : entries 2361, size 147.56 MiB, capacity 3.97 GiB, > 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds > miss latency > Percent Repaired : 99.88086234106282% > Token : (invoke with -T/--tokens to see all 256 tokens) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14876) Snapshot name merges with keyspace name shown by nodetool listsnapshots for snapshots with long names
[ https://issues.apache.org/jira/browse/CASSANDRA-14876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-14876: - Fix Version/s: 3.0.x Status: Patch Available (was: Open) Patch: [3.0 | https://github.com/Ge/cassandra/commits/14876-3.0] > Snapshot name merges with keyspace name shown by nodetool listsnapshots for > snapshots with long names > - > > Key: CASSANDRA-14876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14876 > Project: Cassandra > Issue Type: Bug >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Major > Fix For: 3.0.x > > > If snapshot name is long enough, it will merge keyspace name and the command > output will be inconvenient to read for a {{nodetool}} user, e.g. > {noformat} > bin/nodetool listsnapshots > Snapshot Details: > Snapshot name Keyspace nameColumn family name > True size Size on disk > 1541670390886 system_distributed parent_repair_history > 0 bytes13 bytes > 1541670390886 system_distributed repair_history > 0 bytes13 bytes > 1541670390886 system_auth roles > 0 bytes4.98 KB > 1541670390886 system_auth role_members > 0 bytes13 bytes > 1541670390886 system_auth > resource_role_permissons_index0 bytes13 bytes > 1541670390886 system_auth role_permissions > 0 bytes13 bytes > 1541670390886 system_tracessessions > 0 bytes13 bytes > 1541670390886 system_tracesevents > 0 bytes13 bytes > 39_characters_long_name_2017-09-05-11-Usystem_distributed > parent_repair_history0 bytes13 bytes > 39_characters_long_name_2017-09-05-11-Usystem_distributed > repair_history 0 bytes13 bytes > 39_characters_long_name_2017-09-05-11-Usystem_auth roles >0 bytes4.98 KB > 39_characters_long_name_2017-09-05-11-Usystem_auth > role_members 0 bytes13 bytes > 39_characters_long_name_2017-09-05-11-Usystem_auth > resource_role_permissons_index0 bytes13 bytes > 39_characters_long_name_2017-09-05-11-Usystem_auth > role_permissions 0 bytes13 bytes > 39_characters_long_name_2017-09-05-11-Usystem_tracessessions >0 bytes13 bytes > 39_characters_long_name_2017-09-05-11-Usystem_tracesevents >0 bytes13 bytes > 41_characters_long_name_2017-09-05-11-UTCsystem_distributed > parent_repair_history0 bytes13 bytes > 41_characters_long_name_2017-09-05-11-UTCsystem_distributed > repair_history 0 bytes13 bytes > 41_characters_long_name_2017-09-05-11-UTCsystem_auth roles > 0 bytes4.98 KB > 41_characters_long_name_2017-09-05-11-UTCsystem_auth > role_members 0 bytes13 bytes > 41_characters_long_name_2017-09-05-11-UTCsystem_auth > resource_role_permissons_index0 bytes13 bytes > 41_characters_long_name_2017-09-05-11-UTCsystem_auth > role_permissions 0 bytes13 bytes > 41_characters_long_name_2017-09-05-11-UTCsystem_traces > sessions 0 bytes13 bytes > 41_characters_long_name_2017-09-05-11-UTCsystem_tracesevents > 0 bytes13 bytes > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14875) Explicitly initialize StageManager early in startup
[ https://issues.apache.org/jira/browse/CASSANDRA-14875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov updated CASSANDRA-14875: - Status: Patch Available (was: Open) Patch: [4.0 | https://github.com/Ge/cassandra/tree/14875-4.0] > Explicitly initialize StageManager early in startup > --- > > Key: CASSANDRA-14875 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14875 > Project: Cassandra > Issue Type: Improvement >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Minor > > {{StageManager}} initializes itself through a static block and sets up every > {{Stage}}, including pre-starting their threads. This initizalization can > take a few hundred milliseconds. This timing impact is unpredictable and hard > to reason about; it looks like it usually gets hit when creating new > Keyspaces on start-up and announcing them through migrations. > While these processes are resilient to these delays, it is dramatically > easier to reason about over time if this initialization happens explicitly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14874) Read repair can race with truncations
[ https://issues.apache.org/jira/browse/CASSANDRA-14874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680455#comment-16680455 ] Ariel Weisberg commented on CASSANDRA-14874: bq. I'll still suggest that maybe the trivial solution I've hinted in my previous comment might be a good stopgap option to make things behave predictably most of the time. +1 > Read repair can race with truncations > - > > Key: CASSANDRA-14874 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14874 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Sylvain Lebresne >Priority: Minor > > While hint and commit log replay handle truncation alright, we don't have > anything to prevent a read/read-repair to race with {{TRUNCATE}}. In other > words, you can have a read reading some pre-truncation data, some truncation > running and removing that data, and then some read-repair mutation from that > previous read that resurrects some data that should have bene truncated. > Probably not that common in practice, but can lead to seemingly random data > surviving truncate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14874) Read repair can race with truncations
Sylvain Lebresne created CASSANDRA-14874: Summary: Read repair can race with truncations Key: CASSANDRA-14874 URL: https://issues.apache.org/jira/browse/CASSANDRA-14874 Project: Cassandra Issue Type: Bug Components: Local Write-Read Paths Reporter: Sylvain Lebresne While hint and commit log replay handle truncation alright, we don't have anything to prevent a read/read-repair to race with {{TRUNCATE}}. In other words, you can have a read reading some pre-truncation data, some truncation running and removing that data, and then some read-repair mutation from that previous read that resurrects some data that should have bene truncated. Probably not that common in practice, but can lead to seemingly random data surviving truncate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14867) Histogram overflows potentially leading to writes failing
[ https://issues.apache.org/jira/browse/CASSANDRA-14867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-14867: Description: I observed the following in cassandra logs on 1 host of a 6-node cluster: {code} ERROR [ScheduledTasks:1] 2018-11-01 17:26:41,277 CassandraDaemon.java:228 - Exception in thread Thread[ScheduledTasks:1,5,main] java.lang.IllegalStateException: Unable to compute when histogram overflowed at org.apache.cassandra.metrics.DecayingEstimatedHistogramReservoir$EstimatedHistogramReservoirSnapshot.getMean(DecayingEstimatedHistogramReservoir.java:472) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.net.MessagingService.getDroppedMessagesLogs(MessagingService.java:1263) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.net.MessagingService.logDroppedMessages(MessagingService.java:1236) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.net.MessagingService.access$200(MessagingService.java:87) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.net.MessagingService$4.run(MessagingService.java:507) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) ~[apache-cassandra-3.11.1.jar:3.11.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_172] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_172] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_172] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_172] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_172] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_172] at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.1.jar:3.11.1] at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_172] {code} At the same time, this node was failing all writes issued to it. Restarting cassandra on the node brought the cluster into a good state and we stopped seeing the histogram overflow errors. Has this issue been observed before? Could the histogram overflows cause writes to fail? was: I observed the following in cassandra logs on 1 host of a 6-node cluster: ERROR [ScheduledTasks:1] 2018-11-01 17:26:41,277 CassandraDaemon.java:228 - Exception in thread Thread[ScheduledTasks:1,5,main] java.lang.IllegalStateException: Unable to compute when histogram overflowed at org.apache.cassandra.metrics.DecayingEstimatedHistogramReservoir$EstimatedHistogramReservoirSnapshot.getMean(DecayingEstimatedHistogramReservoir.java:472) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.net.MessagingService.getDroppedMessagesLogs(MessagingService.java:1263) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.net.MessagingService.logDroppedMessages(MessagingService.java:1236) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.net.MessagingService.access$200(MessagingService.java:87) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.net.MessagingService$4.run(MessagingService.java:507) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) ~[apache-cassandra-3.11.1.jar:3.11.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_172] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_172] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_172] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_172] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_172] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_172] at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.1.jar:3.11.1] at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_172] At the same time, this node was failing all writes issued to it. Restarting cassandra on the node brought the cluster into a good state and we stopped seeing the histogram overflow errors. Has this issue been observed before? Could the histogram overflows cause writes to fail? > Histogram overflows potentially leading to writes failing >
[jira] [Resolved] (CASSANDRA-14868) 安装出错
[ https://issues.apache.org/jira/browse/CASSANDRA-14868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuki Morishita resolved CASSANDRA-14868. Resolution: Invalid The issue tracker here is for developing Apache Cassandra. You should report your issue to [https://github.com/Symantec/ambari-cassandra-service] (guessing from the stack trace). > 安装出错 > > > Key: CASSANDRA-14868 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14868 > Project: Cassandra > Issue Type: Bug > Environment: HDP 2.5 >Reporter: znn >Priority: Major > > Traceback (most recent call last): > File > "/var/lib/ambari-agent/cache/stacks/HDP/2.5/services/CASSANDRA/package/scripts/cassandra_master.py", > line 60, in > Cassandra_Master().execute() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 280, in execute > method(env) > File > "/var/lib/ambari-agent/cache/stacks/HDP/2.5/services/CASSANDRA/package/scripts/cassandra_master.py", > line 27, in install > import params > File > "/var/lib/ambari-agent/cache/stacks/HDP/2.5/services/CASSANDRA/package/scripts/params.py", > line 16, in > from resource_management.libraries.functions.version import > format_hdp_stack_version, compare_versions > ImportError: cannot import name format_hdp_stack_version -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14877) StreamCoordinator "leaks" threads
[ https://issues.apache.org/jira/browse/CASSANDRA-14877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Sorokoumov reassigned CASSANDRA-14877: Assignee: Massimiliano Tomassi > StreamCoordinator "leaks" threads > - > > Key: CASSANDRA-14877 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14877 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Massimiliano Tomassi >Assignee: Massimiliano Tomassi >Priority: Minor > Fix For: 2.1.21, 2.2.14, 3.0.18, 3.11.4, 4.0 > > > Since Cassandra 2.1, streaming sessions are started by running a > StreamSessionConnector task for each session in a dedicated executor (a > static field of StreamCoordinator). > That executor is initialized with > DebuggableThreadPoolExecutor.createWithFixedPoolSize, which means that once > created (up to the given limit of the number of logical cores), its threads > are kept alive for Integer.MAX_VALUE seconds. > This practically means that once a node needs to establish streaming sessions > to n other nodes, it will create Math.min(n, numLogicalCores) > StreamConnectionEstablisher threads that will stay parked forever after > initializing (not completing) the session. > It seems preferable to replace > DebuggableThreadPoolExecutor.createWithFixedPoolSize with > DebuggableThreadPoolExecutor.createWithMaximumPoolSize which allows providing > a saner keep-alive period (e.g. a minute). > That's also what createWithFixedPoolSize's Javadoc recommends: If (most) > threads are expected to be idle most of the time, prefer createWithMaxSize() > instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14875) Explicitly initialize StageManager early in startup
Aleksandr Sorokoumov created CASSANDRA-14875: Summary: Explicitly initialize StageManager early in startup Key: CASSANDRA-14875 URL: https://issues.apache.org/jira/browse/CASSANDRA-14875 Project: Cassandra Issue Type: Improvement Reporter: Aleksandr Sorokoumov Assignee: Aleksandr Sorokoumov {{StageManager}} initializes itself through a static block and sets up every {{Stage}}, including pre-starting their threads. This initizalization can take a few hundred milliseconds. This timing impact is unpredictable and hard to reason about; it looks like it usually gets hit when creating new Keyspaces on start-up and announcing them through migrations. While these processes are resilient to these delays, it is dramatically easier to reason about over time if this initialization happens explicitly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14862) TestTopology.test_size_estimates_multidc fails on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-14862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679212#comment-16679212 ] Aleksey Yeschenko edited comment on CASSANDRA-14862 at 11/8/18 3:05 AM: Committed to trunk as [2adfa92044381aa9093104f3a105f3dbd7dda94c|https://github.com/apache/cassandra/commit/2adfa92044381aa9093104f3a105f3dbd7dda94c], thanks. was (Author: iamaleksey): Committed to trunk as [507e4a46a166cab5322a50fbe40c80cb0d16c290|https://github.com/apache/cassandra/commit/507e4a46a166cab5322a50fbe40c80cb0d16c290], thanks. > TestTopology.test_size_estimates_multidc fails on trunk > --- > > Key: CASSANDRA-14862 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14862 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Minor > Labels: 4.0-QA > Fix For: 4.0 > > > The sorting of natural replicas in > {{SimpleStrategy.calculateNaturalReplicas}} committed as part of > [e645b917|https://github.com/apache/cassandra/commit/e645b9172c5d50fc2af407de724e46121edfe109#diff-0e1563a70b49cd81e9e11b4ddad15cf2L68] > for CASSANDRA-14726 has broken the > {{TestTopology.test_size_estimates_multidc}} dtest ([example > run|https://circleci.com/gh/jolynch/cassandra/245#tests/containers/48]) as > the "primary" ranges have now changed. I'm actually surprised only a single > dtest fails as I believe we've broken multi-dc {{SimpleStrategy}} reasonably > badly. > In particular the {{SimpleStrategy.calculateNaturalReplicas}} method cannot > sort the endpoints by datacenter first. It has to leave them in the order > that it found them else change which replicas are considered "primary" > replicas (which mostly impacts repair and size estimates and the such). > I have written a regression unit test for the SimpleStrategy and am running > it through circleci now. Will post the patch shortly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14870) The order of application of nodetool garbagecollect is broken
[ https://issues.apache.org/jira/browse/CASSANDRA-14870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679435#comment-16679435 ] Branimir Lambov commented on CASSANDRA-14870: - testall and dtests are clean on DataStax CI servers. > The order of application of nodetool garbagecollect is broken > - > > Key: CASSANDRA-14870 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14870 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Branimir Lambov >Assignee: Branimir Lambov >Priority: Major > > {{nodetool garbagecollect}} was intended to work from oldest sstable to > newest, so that the collection in newer tables can purge tombstones over data > that has been deleted. > However, {{SSTableReader.maxTimestampComparator}} currently sorts in the > opposite order (the order changed in CASSANDRA-13776 and then back in > CASSANDRA-14010), which makes the garbage collection unable to purge any > tombstones. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14876) Snapshot name merges with keyspace name shown by nodetool listsnapshots for snapshots with long names
Aleksandr Sorokoumov created CASSANDRA-14876: Summary: Snapshot name merges with keyspace name shown by nodetool listsnapshots for snapshots with long names Key: CASSANDRA-14876 URL: https://issues.apache.org/jira/browse/CASSANDRA-14876 Project: Cassandra Issue Type: Bug Reporter: Aleksandr Sorokoumov Assignee: Aleksandr Sorokoumov If snapshot name is long enough, it will merge keyspace name and the command output will be inconvenient to read for a {{nodetool}} user, e.g. {noformat} bin/nodetool listsnapshots Snapshot Details: Snapshot name Keyspace nameColumn family name True size Size on disk 1541670390886 system_distributed parent_repair_history0 bytes13 bytes 1541670390886 system_distributed repair_history 0 bytes13 bytes 1541670390886 system_auth roles0 bytes4.98 KB 1541670390886 system_auth role_members 0 bytes13 bytes 1541670390886 system_auth resource_role_permissons_index0 bytes13 bytes 1541670390886 system_auth role_permissions 0 bytes13 bytes 1541670390886 system_tracessessions 0 bytes13 bytes 1541670390886 system_tracesevents 0 bytes13 bytes 39_characters_long_name_2017-09-05-11-Usystem_distributed parent_repair_history0 bytes13 bytes 39_characters_long_name_2017-09-05-11-Usystem_distributed repair_history 0 bytes13 bytes 39_characters_long_name_2017-09-05-11-Usystem_auth roles 0 bytes4.98 KB 39_characters_long_name_2017-09-05-11-Usystem_auth role_members 0 bytes13 bytes 39_characters_long_name_2017-09-05-11-Usystem_auth resource_role_permissons_index0 bytes13 bytes 39_characters_long_name_2017-09-05-11-Usystem_auth role_permissions 0 bytes13 bytes 39_characters_long_name_2017-09-05-11-Usystem_tracessessions 0 bytes13 bytes 39_characters_long_name_2017-09-05-11-Usystem_tracesevents 0 bytes13 bytes 41_characters_long_name_2017-09-05-11-UTCsystem_distributed parent_repair_history0 bytes13 bytes 41_characters_long_name_2017-09-05-11-UTCsystem_distributed repair_history 0 bytes13 bytes 41_characters_long_name_2017-09-05-11-UTCsystem_auth roles 0 bytes4.98 KB 41_characters_long_name_2017-09-05-11-UTCsystem_auth role_members 0 bytes13 bytes 41_characters_long_name_2017-09-05-11-UTCsystem_auth resource_role_permissons_index0 bytes13 bytes 41_characters_long_name_2017-09-05-11-UTCsystem_auth role_permissions 0 bytes13 bytes 41_characters_long_name_2017-09-05-11-UTCsystem_tracessessions 0 bytes13 bytes 41_characters_long_name_2017-09-05-11-UTCsystem_tracesevents 0 bytes13 bytes {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14878) Race condition when setting bootstrap flags
Sergio Bossa created CASSANDRA-14878: Summary: Race condition when setting bootstrap flags Key: CASSANDRA-14878 URL: https://issues.apache.org/jira/browse/CASSANDRA-14878 Project: Cassandra Issue Type: Bug Reporter: Sergio Bossa Assignee: Sergio Bossa Fix For: 3.0.x, 3.11.x, 4.x {{StorageService#bootstrap()}} is supposed to wait for bootstrap to finish, but Guava calls the future listeners [after|https://github.com/google/guava/blob/ec2dedebfa359991cbcc8750dc62003be63ec6d3/guava/src/com/google/common/util/concurrent/AbstractFuture.java#L890] unparking its waiters, which causes a race on when the {{bootstrapFinished()}} will be executed, making it non-deterministic. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14879) Log DDL statements on coordinator
Sylvain Lebresne created CASSANDRA-14879: Summary: Log DDL statements on coordinator Key: CASSANDRA-14879 URL: https://issues.apache.org/jira/browse/CASSANDRA-14879 Project: Cassandra Issue Type: Improvement Components: CQL Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne People sometimes run into issues with schema, and that is often because they do concurrent schema changes, which are just not supported and we should fix that someday, but in the meantime, it's not always easy to even check if you may indeed have had concurrent schema changes. A very trivial way to make that easier would be to simply log DDL statements on the coordinator before they are executed. This is likely useful info for operators in the first place, and would allow in most case to track if concurrent schema was the likely cause of a particular issue seen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14874) Read repair can race with truncations
[ https://issues.apache.org/jira/browse/CASSANDRA-14874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680360#comment-16680360 ] Sylvain Lebresne commented on CASSANDRA-14874: -- bq. Seems like there is a case for an über tombstone with no partition or clustering. Well, that's kind of what I'd qualify as having truncate being timestamp based, and that's a much bigger change to Truncate that just this problem. It does likely make this problem much easier to solve, or at least easier to solve the right way, but it still qualify as a rewrite of truncate, and one that would likely not fully preserve existing behavior (typically, you wouldn't be able to add data with pre-truncate timstamp after a truncate (unless we do something rather funky), while this is possible today; and to be extra clear, I am not arguing the current behavior is better, I'm just pointing out that this wouldn't be a fully backward compatible change, and that's to be taken into account) In any case, I'm not opposed at all to consider that option in principle, because I do think a timestamp-based truncate is likely an overall better fit to C* and we should probably have done first to start with, but as this is probably a bit longer term (and probably deserve its own ticket), I'll still suggest that maybe the trivial solution I've hinted in my previous comment might be a good stopgap option to make things behave predictably most of the time. > Read repair can race with truncations > - > > Key: CASSANDRA-14874 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14874 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Sylvain Lebresne >Priority: Minor > > While hint and commit log replay handle truncation alright, we don't have > anything to prevent a read/read-repair to race with {{TRUNCATE}}. In other > words, you can have a read reading some pre-truncation data, some truncation > running and removing that data, and then some read-repair mutation from that > previous read that resurrects some data that should have bene truncated. > Probably not that common in practice, but can lead to seemingly random data > surviving truncate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14554) LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded context
[ https://issues.apache.org/jira/browse/CASSANDRA-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679214#comment-16679214 ] Stefania commented on CASSANDRA-14554: -- You're welcome [~benedict] ! bq. I wonder if you had considered (and potentially discarded) what might be a slightly simpler approach of allocating a separate LifecycleTransaction for each operation, and atomically transferring their contents as they "complete" to the shared LivecycleTransaction? No I hadn't considered it. It sounds elegant in principle but in order to atomically transfer child transactions to their parent, we'd have to add some complexity to transactions that I'm not sure we need. Obviously, the state of the parent transaction could change at any time (due to an abort), including whilst a child transaction is trying to transfer its state. So this would require some form of synchronization or CAS. The same is true for two child transactions transferring their state simultaneously. The state on disk should be fine as long as child transactions are never committed but only transferred. Child transaction should be allowed to abort independently though. So different rules for child and parent transactions would apply. I'm not sure we need this additional complexity because the txn state only changes rarely. {{LifecycleTransaction}} exposes a large API, but many methods are probably only used during compaction. Extracting a more comprehensive interface that can be implemented with a synchronized wrapper may be an easier approach. I submitted a safe patch that fixes a known problem with streaming and that is safe for branches that will not undergo a major release testing cycle. Unfortunately, I do not have the time to work on a more comprehensive solution, at least not right now. I could however review whichever approach we choose. > LifecycleTransaction encounters ConcurrentModificationException when used in > multi-threaded context > --- > > Key: CASSANDRA-14554 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14554 > Project: Cassandra > Issue Type: Bug >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > > When LifecycleTransaction is used in a multi-threaded context, we encounter > this exception - > {quote}java.util.ConcurrentModificationException: null > at > java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) > at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) > at java.lang.Iterable.forEach(Iterable.java:74) > at > org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78) > at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320) > at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285) > at > org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136) > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529) > {quote} > During streaming we create a reference to a {{LifeCycleTransaction}} and > share it between threads - > [https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156] > This is used in a multi-threaded context inside {{CassandraIncomingFile}} > which is an {{IncomingStreamMessage}}. This is being deserialized in parallel. > {{LifecycleTransaction}} is not meant to be used in a multi-threaded context > and this leads to streaming failures due to object sharing. On trunk, this > object is shared across all threads that transfer sstables in parallel for > the given {{TableId}} in a {{StreamSession}}. There are two options to solve > this - make {{LifecycleTransaction}} and the associated objects thread safe, > scope the transaction to a single {{CassandraIncomingFile}}. The consequences > of the latter option is that if we experience streaming failure we may have > redundant SSTables on disk. This is ok as compaction should clean this up. A > third option is we synchronize access in the streaming infrastructure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14862) TestTopology.test_size_estimates_multidc fails on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-14862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-14862: -- Resolution: Fixed Status: Resolved (was: Ready to Commit) > TestTopology.test_size_estimates_multidc fails on trunk > --- > > Key: CASSANDRA-14862 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14862 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Minor > Labels: 4.0-QA > Fix For: 4.0 > > > The sorting of natural replicas in > {{SimpleStrategy.calculateNaturalReplicas}} committed as part of > [e645b917|https://github.com/apache/cassandra/commit/e645b9172c5d50fc2af407de724e46121edfe109#diff-0e1563a70b49cd81e9e11b4ddad15cf2L68] > for CASSANDRA-14726 has broken the > {{TestTopology.test_size_estimates_multidc}} dtest ([example > run|https://circleci.com/gh/jolynch/cassandra/245#tests/containers/48]) as > the "primary" ranges have now changed. I'm actually surprised only a single > dtest fails as I believe we've broken multi-dc {{SimpleStrategy}} reasonably > badly. > In particular the {{SimpleStrategy.calculateNaturalReplicas}} method cannot > sort the endpoints by datacenter first. It has to leave them in the order > that it found them else change which replicas are considered "primary" > replicas (which mostly impacts repair and size estimates and the such). > I have written a regression unit test for the SimpleStrategy and am running > it through circleci now. Will post the patch shortly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14876) Snapshot name merges with keyspace name shown by nodetool listsnapshots for snapshots with long names
[ https://issues.apache.org/jira/browse/CASSANDRA-14876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679835#comment-16679835 ] Aleksandr Sorokoumov edited comment on CASSANDRA-14876 at 11/8/18 2:34 PM: --- Patch: [3.0|https://github.com/Ge/cassandra/tree/14876-3.0] was (Author: ge): Patch: [3.0 | https://github.com/Ge/cassandra/commits/14876-3.0] > Snapshot name merges with keyspace name shown by nodetool listsnapshots for > snapshots with long names > - > > Key: CASSANDRA-14876 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14876 > Project: Cassandra > Issue Type: Bug >Reporter: Aleksandr Sorokoumov >Assignee: Aleksandr Sorokoumov >Priority: Major > Fix For: 3.0.x > > > If snapshot name is long enough, it will merge keyspace name and the command > output will be inconvenient to read for a {{nodetool}} user, e.g. > {noformat} > bin/nodetool listsnapshots > Snapshot Details: > Snapshot name Keyspace nameColumn family name > True size Size on disk > 1541670390886 system_distributed parent_repair_history > 0 bytes13 bytes > 1541670390886 system_distributed repair_history > 0 bytes13 bytes > 1541670390886 system_auth roles > 0 bytes4.98 KB > 1541670390886 system_auth role_members > 0 bytes13 bytes > 1541670390886 system_auth > resource_role_permissons_index0 bytes13 bytes > 1541670390886 system_auth role_permissions > 0 bytes13 bytes > 1541670390886 system_tracessessions > 0 bytes13 bytes > 1541670390886 system_tracesevents > 0 bytes13 bytes > 39_characters_long_name_2017-09-05-11-Usystem_distributed > parent_repair_history0 bytes13 bytes > 39_characters_long_name_2017-09-05-11-Usystem_distributed > repair_history 0 bytes13 bytes > 39_characters_long_name_2017-09-05-11-Usystem_auth roles >0 bytes4.98 KB > 39_characters_long_name_2017-09-05-11-Usystem_auth > role_members 0 bytes13 bytes > 39_characters_long_name_2017-09-05-11-Usystem_auth > resource_role_permissons_index0 bytes13 bytes > 39_characters_long_name_2017-09-05-11-Usystem_auth > role_permissions 0 bytes13 bytes > 39_characters_long_name_2017-09-05-11-Usystem_tracessessions >0 bytes13 bytes > 39_characters_long_name_2017-09-05-11-Usystem_tracesevents >0 bytes13 bytes > 41_characters_long_name_2017-09-05-11-UTCsystem_distributed > parent_repair_history0 bytes13 bytes > 41_characters_long_name_2017-09-05-11-UTCsystem_distributed > repair_history 0 bytes13 bytes > 41_characters_long_name_2017-09-05-11-UTCsystem_auth roles > 0 bytes4.98 KB > 41_characters_long_name_2017-09-05-11-UTCsystem_auth > role_members 0 bytes13 bytes > 41_characters_long_name_2017-09-05-11-UTCsystem_auth > resource_role_permissons_index0 bytes13 bytes > 41_characters_long_name_2017-09-05-11-UTCsystem_auth > role_permissions 0 bytes13 bytes > 41_characters_long_name_2017-09-05-11-UTCsystem_traces > sessions 0 bytes13 bytes > 41_characters_long_name_2017-09-05-11-UTCsystem_tracesevents > 0 bytes13 bytes > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14554) LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded context
[ https://issues.apache.org/jira/browse/CASSANDRA-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680015#comment-16680015 ] Benedict commented on CASSANDRA-14554: -- I was actually thinking of something very simple. Child transactions would not have any direct relationship to parents, there would just be a method to transfer their contents, and this method would be synchronised. The other methods on a {{LifecycleTransaction}} could simply be marked synchronised as well. I don't think there would be any major problem with this? It's not a high-traffic object, so the cost would be low even without extracting a synchronised interface, particularly as this object requires regular fsyncs. I completely understand that you may be too busy to try this alternative approach. I think it would be _preferable_ for somebody to have a brief try at the alternative, just to see if we can isolate the complexity, but if we find we don't have time I think your patch looks good too (modulo a proper review). Perhaps we should wait and see how things pan out with finding time for review, as I know [~djoshi3] had been planning to take a crack at this too. > LifecycleTransaction encounters ConcurrentModificationException when used in > multi-threaded context > --- > > Key: CASSANDRA-14554 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14554 > Project: Cassandra > Issue Type: Bug >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > > When LifecycleTransaction is used in a multi-threaded context, we encounter > this exception - > {quote}java.util.ConcurrentModificationException: null > at > java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) > at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) > at java.lang.Iterable.forEach(Iterable.java:74) > at > org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78) > at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320) > at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285) > at > org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136) > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529) > {quote} > During streaming we create a reference to a {{LifeCycleTransaction}} and > share it between threads - > [https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156] > This is used in a multi-threaded context inside {{CassandraIncomingFile}} > which is an {{IncomingStreamMessage}}. This is being deserialized in parallel. > {{LifecycleTransaction}} is not meant to be used in a multi-threaded context > and this leads to streaming failures due to object sharing. On trunk, this > object is shared across all threads that transfer sstables in parallel for > the given {{TableId}} in a {{StreamSession}}. There are two options to solve > this - make {{LifecycleTransaction}} and the associated objects thread safe, > scope the transaction to a single {{CassandraIncomingFile}}. The consequences > of the latter option is that if we experience streaming failure we may have > redundant SSTables on disk. This is ok as compaction should clean this up. A > third option is we synchronize access in the streaming infrastructure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14877) StreamCoordinator "leaks" threads
Massimiliano Tomassi created CASSANDRA-14877: Summary: StreamCoordinator "leaks" threads Key: CASSANDRA-14877 URL: https://issues.apache.org/jira/browse/CASSANDRA-14877 Project: Cassandra Issue Type: Bug Components: Streaming and Messaging Reporter: Massimiliano Tomassi Fix For: 2.1.21, 2.2.14, 3.0.18, 3.11.4, 4.0 Since Cassandra 2.1, streaming sessions are started by running a StreamSessionConnector task for each session in a dedicated executor (a static field of StreamCoordinator). That executor is initialized with DebuggableThreadPoolExecutor.createWithFixedPoolSize, which means that once created (up to the given limit of the number of logical cores), its threads are kept alive for Integer.MAX_VALUE seconds. This practically means that once a node needs to establish streaming sessions to n other nodes, it will create Math.min(n, numLogicalCores) StreamConnectionEstablisher threads that will stay parked forever after initializing (not completing) the session. It seems preferable to replace DebuggableThreadPoolExecutor.createWithFixedPoolSize with DebuggableThreadPoolExecutor.createWithMaximumPoolSize which allows providing a saner keep-alive period (e.g. a minute). That's also what createWithFixedPoolSize's Javadoc recommends: If (most) threads are expected to be idle most of the time, prefer createWithMaxSize() instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14874) Read repair can race with truncations
[ https://issues.apache.org/jira/browse/CASSANDRA-14874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680066#comment-16680066 ] Ariel Weisberg commented on CASSANDRA-14874: I was just talking about something similar with [~bdeggleston] last night. Seems like there is a case for an über tombstone with no partition or clustering. Truncate should also require CL responses from all ranges on the ring. This tombstone could shadow any data from before the truncation. Removing all the pre-truncate data could then be a background deletion process. > Read repair can race with truncations > - > > Key: CASSANDRA-14874 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14874 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Sylvain Lebresne >Priority: Minor > > While hint and commit log replay handle truncation alright, we don't have > anything to prevent a read/read-repair to race with {{TRUNCATE}}. In other > words, you can have a read reading some pre-truncation data, some truncation > running and removing that data, and then some read-repair mutation from that > previous read that resurrects some data that should have bene truncated. > Probably not that common in practice, but can lead to seemingly random data > surviving truncate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org