[jira] [Commented] (CASSANDRA-10735) Support netty openssl (netty-tcnative) for client encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516682#comment-16516682 ] Dinesh Joshi commented on CASSANDRA-10735: -- `user@ ML` is users mailing list. See: http://cassandra.apache.org/community/ > Support netty openssl (netty-tcnative) for client encryption > > > Key: CASSANDRA-10735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10735 > Project: Cassandra > Issue Type: Improvement >Reporter: Andy Tolbert >Assignee: Jason Brown >Priority: Major > Fix For: 4.0 > > Attachments: netty-ssl-trunk.tgz, nettyssl-bench.tgz, > nettysslbench.png, nettysslbench_small.png, sslbench12-03.png > > > The java-driver recently added support for using netty openssl via > [netty-tcnative|http://netty.io/wiki/forked-tomcat-native.html] in > [JAVA-841|https://datastax-oss.atlassian.net/browse/JAVA-841], this shows a > very measured improvement (numbers incoming on that ticket). It seems > likely that this can offer improvement if implemented C* side as well. > Since netty-tcnative has platform specific requirements, this should not be > made the default, but rather be an option that one can use. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state
[ https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516669#comment-16516669 ] Kurt Greaves commented on CASSANDRA-14525: -- Thanks [~chovatia.jayd...@gmail.com], I agree. No fault on your part, more a problem with the consistent lack of reviewers we have who can prioritise review work. Just unfortunate that it's wasted more time than necessary for everyone. I think [~VincentWhite] would appreciate the acknowledgement (especially after such a long time) so FCFS makes sense to me, but there's no use doing the work twice, just take into account the two patches slight discrepancies when reviewing I guess. > streaming failure during bootstrap makes new node into inconsistent state > - > > Key: CASSANDRA-14525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14525 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Fix For: 4.0, 2.2.x, 3.0.x > > > If bootstrap fails for newly joining node (most common reason is due to > streaming failure) then Cassandra state remains in {{joining}} state which is > fine but Cassandra also enables Native transport which makes overall state > inconsistent. This further creates NullPointer exception if auth is enabled > on the new node, please find reproducible steps here: > For example if bootstrap fails due to streaming errors like > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) > [apache-cassandra-3.0.16.jar:3.0.16] > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121] > {quote} > then variable [StorageService.java::dataAvailable > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892]
[jira] [Updated] (CASSANDRA-14515) Short read protection in presence of almost-purgeable range tombstones may cause permanent data loss
[ https://issues.apache.org/jira/browse/CASSANDRA-14515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mck updated CASSANDRA-14515: Priority: Blocker (was: Major) > Short read protection in presence of almost-purgeable range tombstones may > cause permanent data loss > > > Key: CASSANDRA-14515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14515 > Project: Cassandra > Issue Type: Bug >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko >Priority: Blocker > Fix For: 3.0.x, 3.11.x, 4.0.x > > > Because read responses don't necessarily close their open RT bounds, it's > possible to lose data during short read protection, if a closing bound is > compacted away between two adjacent reads from a node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10735) Support netty openssl (netty-tcnative) for client encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516631#comment-16516631 ] jahar commented on CASSANDRA-10735: --- Thanks Jason for your response. Can you please elaborate what is this _*user@ML?*_ > Support netty openssl (netty-tcnative) for client encryption > > > Key: CASSANDRA-10735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10735 > Project: Cassandra > Issue Type: Improvement >Reporter: Andy Tolbert >Assignee: Jason Brown >Priority: Major > Fix For: 4.0 > > Attachments: netty-ssl-trunk.tgz, nettyssl-bench.tgz, > nettysslbench.png, nettysslbench_small.png, sslbench12-03.png > > > The java-driver recently added support for using netty openssl via > [netty-tcnative|http://netty.io/wiki/forked-tomcat-native.html] in > [JAVA-841|https://datastax-oss.atlassian.net/browse/JAVA-841], this shows a > very measured improvement (numbers incoming on that ticket). It seems > likely that this can offer improvement if implemented C* side as well. > Since netty-tcnative has platform specific requirements, this should not be > made the default, but rather be an option that one can use. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14444) Got NPE when querying Cassandra 3.11.2
[ https://issues.apache.org/jira/browse/CASSANDRA-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mck updated CASSANDRA-1: Reproduced In: 3.11.2 Since Version: 3.11.2 > Got NPE when querying Cassandra 3.11.2 > -- > > Key: CASSANDRA-1 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: Ubuntu 14.04, JDK 1.8.0_171. > Cassandra 3.11.2 >Reporter: Xiaodong Xie >Priority: Blocker > > We just upgraded our Cassandra cluster from 2.2.6 to 3.11.2 > After upgrading, we immediately got exceptions in Cassandra like this one: > > {code} > ERROR [Native-Transport-Requests-1] 2018-05-11 17:10:21,994 > QueryMessage.java:129 - Unexpected error during query > java.lang.NullPointerException: null > at > org.apache.cassandra.dht.RandomPartitioner.getToken(RandomPartitioner.java:248) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.dht.RandomPartitioner.decorateKey(RandomPartitioner.java:92) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at org.apache.cassandra.config.CFMetaData.decorateKey(CFMetaData.java:666) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.service.pager.PartitionRangeQueryPager.(PartitionRangeQueryPager.java:44) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.db.PartitionRangeReadCommand.getPager(PartitionRangeReadCommand.java:268) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.cql3.statements.SelectStatement.getPager(SelectStatement.java:475) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:288) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:118) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:224) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:255) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:240) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:517) > [apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:410) > [apache-cassandra-3.11.2.jar:3.11.2] > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > [netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357) > [netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35) > [netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:348) > [netty-all-4.0.44.Final.jar:4.0.44.Final] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_171] > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) > [apache-cassandra-3.11.2.jar:3.11.2] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) > [apache-cassandra-3.11.2.jar:3.11.2] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171] > {code} > > The table schema is like: > {code} > CREATE TABLE example.example_table ( > id bigint, > hash text, > json text, > PRIMARY KEY (id, hash) > ) WITH COMPACT STORAGE > {code} > > The query is something like: > {code} > "select * from example.example_table;" // (We do know this is bad practise, > and we are trying to fix that right now) > {code} > with fetch-size as 200, using DataStax Java driver. > This table contains about 20k rows. > > Actually, the fix is quite simple, > > {code} > --- a/src/java/org/apache/cassandra/service/pager/PagingState.java > +++ b/src/java/org/apache/cassandra/service/pager/PagingState.java > @@ -46,7 +46,7 @@ public class PagingState > public PagingState(ByteBuffer partitionKey, RowMark rowMark, int remaining, > int remainingInPartition) > { > - this.partitionKey = partitionKey; > + this.partitionKey = partitionKey == null ? ByteBufferUtil.EMPTY_BYTE_BUFFER > : partitionKey; > this.rowMark = rowMark; > this.remaining = remaining; > this.remainingInPartition = remainingInPartition; > {code} > > "partitionKey == null ? ByteBufferUtil.EMPTY_BYTE_BUFFER : partit
[jira] [Updated] (CASSANDRA-14529) nodetool import row cache invalidation races with adding sstables to tracker
[ https://issues.apache.org/jira/browse/CASSANDRA-14529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14529: Status: Patch Available (was: Open) Made the cache invalidation run after the files are added to the tracker. This is similar to [streaming|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/streaming/CassandraStreamReceiver.java#L207-L210]. There is still a race condition but the worst case is only invalidation of a cached copy of the newly added data. Branch: [https://github.com/jrwest/cassandra/commits/14529-trunk] Tests: [https://circleci.com/gh/jrwest/cassandra/tree/14529-trunk] > nodetool import row cache invalidation races with adding sstables to tracker > > > Key: CASSANDRA-14529 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14529 > Project: Cassandra > Issue Type: Bug >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > CASSANDRA-6719 introduced {{nodetool import}} with row cache invalidation, > which [occurs before adding new sstables to the > tracker|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SSTableImporter.java#L137-L178]. > Stale reads will result after a read is interleaved with the read row's > invalidation and adding the containing file to the tracker. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14529) nodetool import row cache invalidation races with adding sstables to tracker
Jordan West created CASSANDRA-14529: --- Summary: nodetool import row cache invalidation races with adding sstables to tracker Key: CASSANDRA-14529 URL: https://issues.apache.org/jira/browse/CASSANDRA-14529 Project: Cassandra Issue Type: Bug Reporter: Jordan West Assignee: Jordan West CASSANDRA-6719 introduced {{nodetool import}} with row cache invalidation, which [occurs before adding new sstables to the tracker|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SSTableImporter.java#L137-L178]. Stale reads will result after a read is interleaved with the read row's invalidation and adding the containing file to the tracker. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14480) Digest mismatch requires all replicas to be responsive
[ https://issues.apache.org/jira/browse/CASSANDRA-14480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516182#comment-16516182 ] Christian Spriegel commented on CASSANDRA-14480: The fact that it will be 4.0 only, is indeed a hard pill to swallow. :( > Digest mismatch requires all replicas to be responsive > -- > > Key: CASSANDRA-14480 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14480 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Christian Spriegel >Priority: Major > Attachments: Reader.java, Writer.java, schema_14480.cql > > > I ran across a scenario where a digest mismatch causes a read-repair that > requires all up nodes to be able to respond. If one of these nodes is not > responding, then the read-repair is being reported to the client as > ReadTimeoutException. > > My expection would be that a CL=QUORUM will always succeed as long as 2 nodes > are responding. But unfortunetaly the third node being "up" in the ring, but > not being able to respond does lead to a RTE. > > > I came up with a scenario that reproduces the issue: > # set up a 3 node cluster using ccm > # increase the phi_convict_threshold to 16, so that nodes are permanently > reported as up > # create attached schema > # run attached reader&writer (which only connects to node1&2). This should > already produce digest mismatches > # do a "ccm node3 pause" > # The reader will report a read-timeout with consistency QUORUM (2 responses > were required but only 1 replica responded). Within the > DigestMismatchException catch-block it can be seen that the repairHandler is > waiting for 3 responses, even though the exception says that 2 responses are > required. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-14480) Digest mismatch requires all replicas to be responsive
[ https://issues.apache.org/jira/browse/CASSANDRA-14480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Spriegel resolved CASSANDRA-14480. Resolution: Duplicate > Digest mismatch requires all replicas to be responsive > -- > > Key: CASSANDRA-14480 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14480 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Christian Spriegel >Priority: Major > Attachments: Reader.java, Writer.java, schema_14480.cql > > > I ran across a scenario where a digest mismatch causes a read-repair that > requires all up nodes to be able to respond. If one of these nodes is not > responding, then the read-repair is being reported to the client as > ReadTimeoutException. > > My expection would be that a CL=QUORUM will always succeed as long as 2 nodes > are responding. But unfortunetaly the third node being "up" in the ring, but > not being able to respond does lead to a RTE. > > > I came up with a scenario that reproduces the issue: > # set up a 3 node cluster using ccm > # increase the phi_convict_threshold to 16, so that nodes are permanently > reported as up > # create attached schema > # run attached reader&writer (which only connects to node1&2). This should > already produce digest mismatches > # do a "ccm node3 pause" > # The reader will report a read-timeout with consistency QUORUM (2 responses > were required but only 1 replica responded). Within the > DigestMismatchException catch-block it can be seen that the repairHandler is > waiting for 3 responses, even though the exception says that 2 responses are > required. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14480) Digest mismatch requires all replicas to be responsive
[ https://issues.apache.org/jira/browse/CASSANDRA-14480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516134#comment-16516134 ] Jeff Jirsa commented on CASSANDRA-14480: If it's a dupe (and it looks like it may be), then you have good news and bad news. The good news is that 10726 is patch-available. The bad news is it's a major refactor that won't land until 4.0 If you're satisfied it's a dupe, please feel free to relate+close it. > Digest mismatch requires all replicas to be responsive > -- > > Key: CASSANDRA-14480 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14480 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Christian Spriegel >Priority: Major > Attachments: Reader.java, Writer.java, schema_14480.cql > > > I ran across a scenario where a digest mismatch causes a read-repair that > requires all up nodes to be able to respond. If one of these nodes is not > responding, then the read-repair is being reported to the client as > ReadTimeoutException. > > My expection would be that a CL=QUORUM will always succeed as long as 2 nodes > are responding. But unfortunetaly the third node being "up" in the ring, but > not being able to respond does lead to a RTE. > > > I came up with a scenario that reproduces the issue: > # set up a 3 node cluster using ccm > # increase the phi_convict_threshold to 16, so that nodes are permanently > reported as up > # create attached schema > # run attached reader&writer (which only connects to node1&2). This should > already produce digest mismatches > # do a "ccm node3 pause" > # The reader will report a read-timeout with consistency QUORUM (2 responses > were required but only 1 replica responded). Within the > DigestMismatchException catch-block it can be seen that the repairHandler is > waiting for 3 responses, even though the exception says that 2 responses are > required. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14480) Digest mismatch requires all replicas to be responsive
[ https://issues.apache.org/jira/browse/CASSANDRA-14480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516124#comment-16516124 ] Christian Spriegel commented on CASSANDRA-14480: [~jjirsa]: It sounds like my ticket is a duplicate. > Digest mismatch requires all replicas to be responsive > -- > > Key: CASSANDRA-14480 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14480 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Christian Spriegel >Priority: Major > Attachments: Reader.java, Writer.java, schema_14480.cql > > > I ran across a scenario where a digest mismatch causes a read-repair that > requires all up nodes to be able to respond. If one of these nodes is not > responding, then the read-repair is being reported to the client as > ReadTimeoutException. > > My expection would be that a CL=QUORUM will always succeed as long as 2 nodes > are responding. But unfortunetaly the third node being "up" in the ring, but > not being able to respond does lead to a RTE. > > > I came up with a scenario that reproduces the issue: > # set up a 3 node cluster using ccm > # increase the phi_convict_threshold to 16, so that nodes are permanently > reported as up > # create attached schema > # run attached reader&writer (which only connects to node1&2). This should > already produce digest mismatches > # do a "ccm node3 pause" > # The reader will report a read-timeout with consistency QUORUM (2 responses > were required but only 1 replica responded). Within the > DigestMismatchException catch-block it can be seen that the repairHandler is > waiting for 3 responses, even though the exception says that 2 responses are > required. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14480) Digest mismatch requires all replicas to be responsive
[ https://issues.apache.org/jira/browse/CASSANDRA-14480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516015#comment-16516015 ] Jeff Jirsa commented on CASSANDRA-14480: Is this different from CASSANDRA-10726 ? > Digest mismatch requires all replicas to be responsive > -- > > Key: CASSANDRA-14480 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14480 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Christian Spriegel >Priority: Major > Attachments: Reader.java, Writer.java, schema_14480.cql > > > I ran across a scenario where a digest mismatch causes a read-repair that > requires all up nodes to be able to respond. If one of these nodes is not > responding, then the read-repair is being reported to the client as > ReadTimeoutException. > > My expection would be that a CL=QUORUM will always succeed as long as 2 nodes > are responding. But unfortunetaly the third node being "up" in the ring, but > not being able to respond does lead to a RTE. > > > I came up with a scenario that reproduces the issue: > # set up a 3 node cluster using ccm > # increase the phi_convict_threshold to 16, so that nodes are permanently > reported as up > # create attached schema > # run attached reader&writer (which only connects to node1&2). This should > already produce digest mismatches > # do a "ccm node3 pause" > # The reader will report a read-timeout with consistency QUORUM (2 responses > were required but only 1 replica responded). Within the > DigestMismatchException catch-block it can be seen that the repairHandler is > waiting for 3 responses, even though the exception says that 2 responses are > required. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14504) fqltool should open chronicle queue read only and a GC bug
[ https://issues.apache.org/jira/browse/CASSANDRA-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-14504: --- Resolution: Fixed Status: Resolved (was: Ready to Commit) Commited as [c6570fac180b6f816efb47cbd9b7fe30c771835d|https://github.com/apache/cassandra/commit/c6570fac180b6f816efb47cbd9b7fe30c771835d]. Thanks. > fqltool should open chronicle queue read only and a GC bug > -- > > Key: CASSANDRA-14504 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14504 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Major > Fix For: 4.0 > > > There are two issues with fqltool. > The first is that it doesn't open the chronicle queue read only so it won't > work if it doesn't have write permissions and it's not clear if it's safe to > open the queue to write if the server is also still appending. > The next issue is that NativeBytesStore.toTemporaryDirectByteBuffer() returns > a ByteBuffer that doesn't strongly reference the memory it refers to > resulting it in sometimes being reclaimed and containing the wrong data when > we go to read from it. At least that is the theory. Simple solution is to use > toByteArray() and that seems to make it work consistently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: fqltool should open chronicle queue read only and a GC bug
Repository: cassandra Updated Branches: refs/heads/trunk 717c10837 -> c6570fac1 fqltool should open chronicle queue read only and a GC bug Patch by Ariel Weisberg; Reviewed by Sam Tunnicliffe for CASSANDRA-14504 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c6570fac Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c6570fac Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c6570fac Branch: refs/heads/trunk Commit: c6570fac180b6f816efb47cbd9b7fe30c771835d Parents: 717c108 Author: Ariel Weisberg Authored: Thu Jun 7 14:22:43 2018 -0400 Committer: Ariel Weisberg Committed: Mon Jun 18 12:35:06 2018 -0400 -- src/java/org/apache/cassandra/tools/fqltool/Dump.java | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/c6570fac/src/java/org/apache/cassandra/tools/fqltool/Dump.java -- diff --git a/src/java/org/apache/cassandra/tools/fqltool/Dump.java b/src/java/org/apache/cassandra/tools/fqltool/Dump.java index 6a748bc..52eadb5 100644 --- a/src/java/org/apache/cassandra/tools/fqltool/Dump.java +++ b/src/java/org/apache/cassandra/tools/fqltool/Dump.java @@ -81,7 +81,7 @@ public class Dump implements Runnable { int protocolVersion = wireIn.read("protocol-version").int32(); sb.append("Protocol version: ").append(protocolVersion).append(System.lineSeparator()); -QueryOptions options = QueryOptions.codec.decode(Unpooled.wrappedBuffer(wireIn.read("query-options").bytesStore().toTemporaryDirectByteBuffer()), ProtocolVersion.decode(protocolVersion)); +QueryOptions options = QueryOptions.codec.decode(Unpooled.wrappedBuffer(wireIn.read("query-options").bytes()), ProtocolVersion.decode(protocolVersion)); sb.append("Query time: ").append(wireIn.read("query-time").int64()).append(System.lineSeparator()); if (type.equals("single")) @@ -126,7 +126,7 @@ public class Dump implements Runnable //Backoff strategy for spinning on the queue, not aggressive at all as this doesn't need to be low latency Pauser pauser = Pauser.millis(100); -List queues = arguments.stream().distinct().map(path -> ChronicleQueueBuilder.single(new File(path)).rollCycle(RollCycles.valueOf(rollCycle)).build()).collect(Collectors.toList()); +List queues = arguments.stream().distinct().map(path -> ChronicleQueueBuilder.single(new File(path)).readOnly(true).rollCycle(RollCycles.valueOf(rollCycle)).build()).collect(Collectors.toList()); List tailers = queues.stream().map(ChronicleQueue::createTailer).collect(Collectors.toList()); boolean hadWork = true; while (hadWork) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14527) Real time Bad query logging framework
[ https://issues.apache.org/jira/browse/CASSANDRA-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516003#comment-16516003 ] Jaydeepkumar Chovatia commented on CASSANDRA-14527: --- {quote}I like the idea of making operational issues more visible to the user. But the intention to create a common framework for logging certain messages, doesn't sound very convincing to me. At such point, I'm always asking myself, what kind of problem we precisely want to solve. What's the value in adding abstraction layers around producing log messages for very specific, predefined conditions? {quote} Thanks for the review [~spo...@gmail.com]. Here are some of the reason behind this intention: 1. As described in the doc, CASSANDRA-12403 [large partition warning |https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/format/big/BigTableWriter.java#L208] [tombstone warning |https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ReadCommand.java#L490] are already having some sort of this logic to inform user about some uncommon behavior but they are very specific for certain type of problems and with many limitations like changing thresholds require restart, user cannot consume in different ways, etc. Also adding yet another type of problem will require duplication of work. 2. Improve C* operational aspect 3. Having a common way of detecting/reporting often encourages people to add more anti-patterns and reduces duplicate code {quote} I'd also recommend to first take any such ideas to the dev mailing list before spending time implementing such changes. The Google docs hosted proposal gives a nice overview on the ideas behind this and would have been a good way to start a discussion and get some early feedback. {quote} I agree, will send this to dev mailing list. > Real time Bad query logging framework > - > > Key: CASSANDRA-14527 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14527 > Project: Cassandra > Issue Type: New Feature > Components: Observability >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Fix For: 4.x > > > If Cassandra is not used in right way then it can create adverse effect to > the application. There are lots of bad queries when using Cassandra, but > major problem is end user don’t know where and what exactly is the problem. > Most of the times end user is ready to take actions on bad queries provided > Cassandra gives all detailed which could have potential impact on cluster > performance. > There has been already lots of work done as part of CASSANDRA-12403, proposal > as part of this JIRA is let’s have some common way of detecting and logging > different problems in Cassandra cluster which could have potential impact on > Cassandra cluster performance. > Please visit this document which has details like what is currently > available, motivation behind developing this common framework, architecture, > samples, etc. > [https://docs.google.com/document/d/1D0HNjC3a7gnuKnR_iDXLI5mvn1zQxtV7tloMaLYIENE/edit?usp=sharing] > Here is the patch with this feature: > ||trunk|| > |[!https://circleci.com/gh/jaydeepkumar1984/cassandra/tree/bqr.svg?style=svg! > |https://circleci.com/gh/jaydeepkumar1984/cassandra/82]| > |[patch > |https://github.com/apache/cassandra/compare/trunk...jaydeepkumar1984:bqr]| > Please review this doc and the patch, and provide your opinion and feedback > about this effort. > Thank you! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14504) fqltool should open chronicle queue read only and a GC bug
[ https://issues.apache.org/jira/browse/CASSANDRA-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515955#comment-16515955 ] Sam Tunnicliffe commented on CASSANDRA-14504: - LGTM (minus the circle yaml change ofc) > fqltool should open chronicle queue read only and a GC bug > -- > > Key: CASSANDRA-14504 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14504 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Major > Fix For: 4.0 > > > There are two issues with fqltool. > The first is that it doesn't open the chronicle queue read only so it won't > work if it doesn't have write permissions and it's not clear if it's safe to > open the queue to write if the server is also still appending. > The next issue is that NativeBytesStore.toTemporaryDirectByteBuffer() returns > a ByteBuffer that doesn't strongly reference the memory it refers to > resulting it in sometimes being reclaimed and containing the wrong data when > we go to read from it. At least that is the theory. Simple solution is to use > toByteArray() and that seems to make it work consistently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14504) fqltool should open chronicle queue read only and a GC bug
[ https://issues.apache.org/jira/browse/CASSANDRA-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-14504: Status: Ready to Commit (was: Patch Available) > fqltool should open chronicle queue read only and a GC bug > -- > > Key: CASSANDRA-14504 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14504 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Major > Fix For: 4.0 > > > There are two issues with fqltool. > The first is that it doesn't open the chronicle queue read only so it won't > work if it doesn't have write permissions and it's not clear if it's safe to > open the queue to write if the server is also still appending. > The next issue is that NativeBytesStore.toTemporaryDirectByteBuffer() returns > a ByteBuffer that doesn't strongly reference the memory it refers to > resulting it in sometimes being reclaimed and containing the wrong data when > we go to read from it. At least that is the theory. Simple solution is to use > toByteArray() and that seems to make it work consistently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state
[ https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515904#comment-16515904 ] Jaydeepkumar Chovatia commented on CASSANDRA-14525: --- I see [~KurtG], sorry I missed it. In my opinion this is a bug and needs to be fixed (CASSANDRA-14063 should be considered as per FCFS priority). We also need to fix dtest along with this so CASSANDRA-14526 also needs to be landed. > streaming failure during bootstrap makes new node into inconsistent state > - > > Key: CASSANDRA-14525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14525 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Fix For: 4.0, 2.2.x, 3.0.x > > > If bootstrap fails for newly joining node (most common reason is due to > streaming failure) then Cassandra state remains in {{joining}} state which is > fine but Cassandra also enables Native transport which makes overall state > inconsistent. This further creates NullPointer exception if auth is enabled > on the new node, please find reproducible steps here: > For example if bootstrap fails due to streaming errors like > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) > [apache-cassandra-3.0.16.jar:3.0.16] > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121] > {quote} > then variable [StorageService.java::dataAvailable > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892] > will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not > call [StorageService.java::finishJoiningRing > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.j
[jira] [Commented] (CASSANDRA-14480) Digest mismatch requires all replicas to be responsive
[ https://issues.apache.org/jira/browse/CASSANDRA-14480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515889#comment-16515889 ] Christian Spriegel commented on CASSANDRA-14480: I just saw this happening in a production system: {noformat} Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ALL (8 responses were required but only 7 replica responded){noformat} Our queries use LOCAL_QUORUM, but we have RTEs happening due to read-repair. read_repair_chance = 0.1 is set, so its going cross DC :( > Digest mismatch requires all replicas to be responsive > -- > > Key: CASSANDRA-14480 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14480 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Christian Spriegel >Priority: Major > Attachments: Reader.java, Writer.java, schema_14480.cql > > > I ran across a scenario where a digest mismatch causes a read-repair that > requires all up nodes to be able to respond. If one of these nodes is not > responding, then the read-repair is being reported to the client as > ReadTimeoutException. > > My expection would be that a CL=QUORUM will always succeed as long as 2 nodes > are responding. But unfortunetaly the third node being "up" in the ring, but > not being able to respond does lead to a RTE. > > > I came up with a scenario that reproduces the issue: > # set up a 3 node cluster using ccm > # increase the phi_convict_threshold to 16, so that nodes are permanently > reported as up > # create attached schema > # run attached reader&writer (which only connects to node1&2). This should > already produce digest mismatches > # do a "ccm node3 pause" > # The reader will report a read-timeout with consistency QUORUM (2 responses > were required but only 1 replica responded). Within the > DigestMismatchException catch-block it can be seen that the repairHandler is > waiting for 3 responses, even though the exception says that 2 responses are > required. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14470) Repair validation failed/unable to create merkle tree
[ https://issues.apache.org/jira/browse/CASSANDRA-14470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515810#comment-16515810 ] Harry Hough commented on CASSANDRA-14470: - Sounds good, worse case I'll open a new one after 3.11.3 is released and I have tested it. Thank you for your help. > Repair validation failed/unable to create merkle tree > - > > Key: CASSANDRA-14470 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14470 > Project: Cassandra > Issue Type: Bug >Reporter: Harry Hough >Priority: Major > > I had trouble repairing with a full repair across all nodes and keyspaces so > I swapped to doing table by table. This table will not repair even after > scrub/restart of all nodes. I am using command: > {code:java} > nodetool repair -full -seq keyspace table > {code} > {code:java} > [2018-05-25 19:26:36,525] Repair session 0198ee50-6050-11e8-a3b7-9d0793eab507 > for range [(165598500763544933,166800441975877433], > (-5455068259072262254,-5445777107512274819], > (-4614366950466274594,-4609359222424798148], > (3417371506258365094,3421921915575816226], > (5221788898381458942,5222846663270250559], > (3421921915575816226,3429175540277204991], > (3276484330153091115,3282213186258578546], > (-3306169730424140596,-3303439264231406101], > (5228704360821395206,5242415853745535023], > (5808045095951939338,5808562658315740708], > (-3303439264231406101,-3302592736123212969]] finished (progress: 1%) > [2018-05-25 19:27:23,848] Repair session 0180f980-6050-11e8-a3b7-9d0793eab507 > for range [(-8495158945319933291,-8482949618583319581], > (1803296697741516342,1805330812863783941], > (8633191319643427141,8637771071728131257], > (2214097236323810344,2218253238829661319], > (8637771071728131257,8639627594735133685], > (2195525904029414718,2214097236323810344], > (-8500127431270773970,-8495158945319933291], > (7151693083782264341,7152162989417914407], > (-8482949618583319581,-8481973749935314249]] finished (progress: 1%) > [2018-05-25 19:30:32,590] Repair session 01ac9d62-6050-11e8-a3b7-9d0793eab507 > for range [(7887346492105510731,7893062759268864220], > (-153277717939330979,-151986584968539220], > (-6351665356961460262,-6336288442758847669], > (7881942012672602731,7887346492105510731], > (-5884528383037906783,-5878097817437987368], > (6054625594262089428,6060773114960761336], > (-6354401100436622515,-6351665356961460262], > (3358411934943460772,336336663817876], > (6255644242745576360,6278718135193665575], > (-6321106762570843270,-6316788220143151823], > (1754319239259058661,1759314644652031521], > (7893062759268864220,7894890594190784729], > (-8012293411840276426,-8011781808288431224]] failed with error [repair > #01ac9d62-6050-11e8-a3b7-9d0793eab507 on keyspace/table, > [(7887346492105510731,7893062759268864220], > (-153277717939330979,-151986584968539220], > (-6351665356961460262,-6336288442758847669], > (7881942012672602731,7887346492105510731], > (-5884528383037906783,-5878097817437987368], > (6054625594262089428,6060773114960761336], > (-6354401100436622515,-6351665356961460262], > (3358411934943460772,336336663817876], > (6255644242745576360,6278718135193665575], > (-6321106762570843270,-6316788220143151823], > (1754319239259058661,1759314644652031521], > (7893062759268864220,7894890594190784729], > (-8012293411840276426,-8011781808288431224]]] Validation failed in > /192.168.8.64 (progress: 1%) > [2018-05-25 19:30:38,744] Repair session 01ab16c1-6050-11e8-a3b7-9d0793eab507 > for range [(4474598255414218354,4477186372547790770], > (-8368931070988054567,-8367389908801757978], > (4445104759712094068,4445123832517144036], > (6749641233379918040,6749879473217708908], > (717627050679001698,729408043324000761], > (8984622403893999385,8990662643404904110], > (4457612694557846994,4474598255414218354], > (5589049422573545528,5593079877787783784], > (3609693317839644945,3613727999875360405], > (8499016262183246473,8504603366117127178], > (-5421277973540712245,-5417725796037372830], > (5586405751301680690,5589049422573545528], > (-2611069890590917549,-2603911539353128123], > (2424772330724108233,2427564448454334730], > (3172651438220766183,3175226710613527829], > (4445123832517144036,4457612694557846994], > (-6827531712183440570,-6800863837312326365], > (5593079877787783784,5596020904874304252], > (716705770783505310,717627050679001698], > (115377252345874298,119626359210683992], > (239394377432130766,240250561347730054]] failed with error [repair > #01ab16c1-6050-11e8-a3b7-9d0793eab507 on keyspace/table, > [(4474598255414218354,4477186372547790770], > (-8368931070988054567,-8367389908801757978], > (4445104759712094068,4445123832517144036], > (6749641233379918040,6749879473217708908], > (717627050679001698,7294080433
[jira] [Commented] (CASSANDRA-14527) Real time Bad query logging framework
[ https://issues.apache.org/jira/browse/CASSANDRA-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515781#comment-16515781 ] Stefan Podkowinski commented on CASSANDRA-14527: I like the idea of making operational issues more visible to the user. But the intention to create a common framework for logging certain messages, doesn't sound very convincing to me. At such point, I'm always asking myself, what kind of problem we precisely want to solve. What's the value in adding abstraction layers around producing log messages for very specific, predefined conditions? I'd also recommend to first take any such ideas to the dev mailing list before spending time implementing such changes. The Google docs hosted proposal gives a nice overview on the ideas behind this and would have been a good way to start a discussion and get some early feedback. > Real time Bad query logging framework > - > > Key: CASSANDRA-14527 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14527 > Project: Cassandra > Issue Type: New Feature > Components: Observability >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Fix For: 4.x > > > If Cassandra is not used in right way then it can create adverse effect to > the application. There are lots of bad queries when using Cassandra, but > major problem is end user don’t know where and what exactly is the problem. > Most of the times end user is ready to take actions on bad queries provided > Cassandra gives all detailed which could have potential impact on cluster > performance. > There has been already lots of work done as part of CASSANDRA-12403, proposal > as part of this JIRA is let’s have some common way of detecting and logging > different problems in Cassandra cluster which could have potential impact on > Cassandra cluster performance. > Please visit this document which has details like what is currently > available, motivation behind developing this common framework, architecture, > samples, etc. > [https://docs.google.com/document/d/1D0HNjC3a7gnuKnR_iDXLI5mvn1zQxtV7tloMaLYIENE/edit?usp=sharing] > Here is the patch with this feature: > ||trunk|| > |[!https://circleci.com/gh/jaydeepkumar1984/cassandra/tree/bqr.svg?style=svg! > |https://circleci.com/gh/jaydeepkumar1984/cassandra/82]| > |[patch > |https://github.com/apache/cassandra/compare/trunk...jaydeepkumar1984:bqr]| > Please review this doc and the patch, and provide your opinion and feedback > about this effort. > Thank you! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14527) Real time Bad query logging framework
[ https://issues.apache.org/jira/browse/CASSANDRA-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Podkowinski updated CASSANDRA-14527: --- Fix Version/s: (was: 4.0) 4.x > Real time Bad query logging framework > - > > Key: CASSANDRA-14527 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14527 > Project: Cassandra > Issue Type: New Feature > Components: Observability >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Fix For: 4.x > > > If Cassandra is not used in right way then it can create adverse effect to > the application. There are lots of bad queries when using Cassandra, but > major problem is end user don’t know where and what exactly is the problem. > Most of the times end user is ready to take actions on bad queries provided > Cassandra gives all detailed which could have potential impact on cluster > performance. > There has been already lots of work done as part of CASSANDRA-12403, proposal > as part of this JIRA is let’s have some common way of detecting and logging > different problems in Cassandra cluster which could have potential impact on > Cassandra cluster performance. > Please visit this document which has details like what is currently > available, motivation behind developing this common framework, architecture, > samples, etc. > [https://docs.google.com/document/d/1D0HNjC3a7gnuKnR_iDXLI5mvn1zQxtV7tloMaLYIENE/edit?usp=sharing] > Here is the patch with this feature: > ||trunk|| > |[!https://circleci.com/gh/jaydeepkumar1984/cassandra/tree/bqr.svg?style=svg! > |https://circleci.com/gh/jaydeepkumar1984/cassandra/82]| > |[patch > |https://github.com/apache/cassandra/compare/trunk...jaydeepkumar1984:bqr]| > Please review this doc and the patch, and provide your opinion and feedback > about this effort. > Thank you! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14471) Manage audit whitelists with CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-14471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515729#comment-16515729 ] Per Otterström commented on CASSANDRA-14471: Thanks for your feedback. bq. Trying to treat audit logging in a similar way to the roles/permissions seems wrong to me. The 2 have completely different logic and should be clearly separated at the code level. To me there is a clear connection since Auditing (or Accounting) is often related to Authentication and Authorization - ([AAA|https://en.wikipedia.org/wiki/AAA_(computer_security)]). However, to make sure it is working as a stand-alone feature for other use cases we could persist things in a separate keyspace, and keep it separate at code level. bq. The MUTE/UNMUTE syntax is in my opinion confusing. I'm not really happy with {{MUTE/UNMUTE}} either, but it's what I've come up with so far. I'm open to better suggestions. bq. It is also limited because it only allow you to do a white list approach. Actually, that was intentional. IMO it is hard to overlook the result when setting up filters using the current mix of includes/excludes on keyspaces/categories/users. Having everything included by default, and then whitelist selected parts seem much more straight forward. Also this fit the primary purpose of audits well. IMO everything should be accounted for, unless explicitly excluded. bq. The approach that you would like to use for persisting the whitelists is risky. If one of your node cannot get the data from the configuration table it might be unable to start (this problem is an existing one with the current auth framework). There is no reason to prevent it from starting up just because data is not accessible at startup. At worst the filter component would not be able to load whitelists from the database which would lead to unwanted audit records. From a AAA perspective this is a reasonable fallback solution. Considering the caching-solution we in place for the auth framework, is this a problem for users in production? bq. I would rather prefer a syntax similar to what Transact-SQL is doing. Interesting reference! So from what I can understand the administrator can do {{CREATE SERVER AUDIT}} which roughly corresponds to the settings we have for {{logger}} and {{audit_logs_dir}} in the yaml file and then {{CREATE DATABASE AUDIT SPECIFICATION}} which corresponds to the filtering part that we're addressing in this ticket. Here is a simple but expressive example I've found (slightly modified): {{CREATE DATABASE AUDIT SPECIFICATION Audit_Pay_Tables}} {{FOR SERVER AUDIT Payrole_Security_Audit}} {{ ADD (SELECT , INSERT ON HumanResources.EmployeePayHistory BY dbo )}} {{ ,ADD (SELECT , INSERT ON HumanResources.EmployeeSalary BY dbo )}} {{WITH (STATE = ON) ;}} Basically, this syntax make it possible to group conditions together under a label and associate that with a "backend". One thing I don't like with this approach is the fact that everything is exempt from audit logging until the administrator explicitly creates and audit specification for the operation/resource/role combination. > Manage audit whitelists with CQL > > > Key: CASSANDRA-14471 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14471 > Project: Cassandra > Issue Type: Improvement >Reporter: Per Otterström >Priority: Major > Labels: audit, security > Fix For: 4.0 > > > Since CASSANDRA-12151 is merged we have support for audit logs in Cassandra. > With this ticket I want to explore the idea of managing audit whitelists > using CQL. > I can think of a few different benefits compared to current yaml-based > whitelist/blacklist approach. > * Nodes would always be aligned - no risk that node configuraiton go out of > sync as tables are added and whitelists updated. > * Easier to manage whitelists in large clusters - change in one place and > apply cluster wide. > * Changes to the whitelists would be in the audit log itself. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14423) SSTables stop being compacted
[ https://issues.apache.org/jira/browse/CASSANDRA-14423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515644#comment-16515644 ] Stefan Podkowinski commented on CASSANDRA-14423: I'd like to add for interested readers that full repairs on subranges (e.g. using reaper) will not be affected by this issue. In this case, "Not a global repair, will not do anticompaction" will occur in your logs. > SSTables stop being compacted > - > > Key: CASSANDRA-14423 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14423 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Kurt Greaves >Assignee: Kurt Greaves >Priority: Major > Fix For: 2.2.13, 3.0.17, 3.11.3 > > > So seeing a problem in 3.11.0 where SSTables are being lost from the view and > not being included in compactions/as candidates for compaction. It seems to > get progressively worse until there's only 1-2 SSTables in the view which > happen to be the most recent SSTables and thus compactions completely stop > for that table. > The SSTables seem to still be included in reads, just not compactions. > The issue can be fixed by restarting C*, as it will reload all SSTables into > the view, but this is only a temporary fix. User defined/major compactions > still work - not clear if they include the result back in the view but is not > a good work around. > This also results in a discrepancy between SSTable count and SSTables in > levels for any table using LCS. > {code:java} > Keyspace : xxx > Read Count: 57761088 > Read Latency: 0.10527088681224288 ms. > Write Count: 2513164 > Write Latency: 0.018211106398149903 ms. > Pending Flushes: 0 > Table: xxx > SSTable count: 10 > SSTables in each level: [2, 0, 0, 0, 0, 0, 0, 0, 0] > Space used (live): 894498746 > Space used (total): 894498746 > Space used by snapshots (total): 0 > Off heap memory used (total): 11576197 > SSTable Compression Ratio: 0.6956629530569777 > Number of keys (estimate): 3562207 > Memtable cell count: 0 > Memtable data size: 0 > Memtable off heap memory used: 0 > Memtable switch count: 87 > Local read count: 57761088 > Local read latency: 0.108 ms > Local write count: 2513164 > Local write latency: NaN ms > Pending flushes: 0 > Percent repaired: 86.33 > Bloom filter false positives: 43 > Bloom filter false ratio: 0.0 > Bloom filter space used: 8046104 > Bloom filter off heap memory used: 8046024 > Index summary off heap memory used: 3449005 > Compression metadata off heap memory used: 81168 > Compacted partition minimum bytes: 104 > Compacted partition maximum bytes: 5722 > Compacted partition mean bytes: 175 > Average live cells per slice (last five minutes): 1.0 > Maximum live cells per slice (last five minutes): 1 > Average tombstones per slice (last five minutes): 1.0 > Maximum tombstones per slice (last five minutes): 1 > Dropped Mutations: 0 > {code} > Also for STCS we've confirmed that SSTable count will be different to the > number of SSTables reported in the Compaction Bucket's. In the below example > there's only 3 SSTables in a single bucket - no more are listed for this > table. Compaction thresholds haven't been modified for this table and it's a > very basic KV schema. > {code:java} > Keyspace : yyy > Read Count: 30485 > Read Latency: 0.06708991307200263 ms. > Write Count: 57044 > Write Latency: 0.02204061776873992 ms. > Pending Flushes: 0 > Table: yyy > SSTable count: 19 > Space used (live): 18195482 > Space used (total): 18195482 > Space used by snapshots (total): 0 > Off heap memory used (total): 747376 > SSTable Compression Ratio: 0.7607394576769735 > Number of keys (estimate): 116074 > Memtable cell count: 0 > Memtable data size: 0 > Memtable off heap memory used: 0 > Memtable switch count: 39 > Local read count: 30485 > Local read latency: NaN ms > Local write count: 57044 > Local write latency: NaN ms > Pending flushes: 0 > Percent repaired: 79.76 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used: 690912 > Bloom filter off heap memory used: 690760 > Index summary off heap memory used: 54736 > Compression metadata off heap memory used: 1880 > Compacted partition minimum bytes: 73 > Compacted partition maximum bytes: 124 > Compacted partition mean bytes: 96 > Average live cells per slice (last five minutes): NaN > Maximum live cells per slice (last five minutes): 0 > Average tombstones per slice (last five minutes): NaN > Maximum tombstones per slice (last five minutes): 0 > Dropped Mut
[jira] [Commented] (CASSANDRA-14423) SSTables stop being compacted
[ https://issues.apache.org/jira/browse/CASSANDRA-14423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515628#comment-16515628 ] Stefan Podkowinski commented on CASSANDRA-14423: Can we move the status check into {{performAnticompaction}} by adding already repaired sstables to {{nonAnticompacting}}? I think filtering there would be more coherent, given we also create a corresponding log message and use the same code path for canceling/releasing such sstables. We also keep updating repairedAt this way, in case of fully contained sstables (and the triggered event notification related to that). > SSTables stop being compacted > - > > Key: CASSANDRA-14423 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14423 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Kurt Greaves >Assignee: Kurt Greaves >Priority: Major > Fix For: 2.2.13, 3.0.17, 3.11.3 > > > So seeing a problem in 3.11.0 where SSTables are being lost from the view and > not being included in compactions/as candidates for compaction. It seems to > get progressively worse until there's only 1-2 SSTables in the view which > happen to be the most recent SSTables and thus compactions completely stop > for that table. > The SSTables seem to still be included in reads, just not compactions. > The issue can be fixed by restarting C*, as it will reload all SSTables into > the view, but this is only a temporary fix. User defined/major compactions > still work - not clear if they include the result back in the view but is not > a good work around. > This also results in a discrepancy between SSTable count and SSTables in > levels for any table using LCS. > {code:java} > Keyspace : xxx > Read Count: 57761088 > Read Latency: 0.10527088681224288 ms. > Write Count: 2513164 > Write Latency: 0.018211106398149903 ms. > Pending Flushes: 0 > Table: xxx > SSTable count: 10 > SSTables in each level: [2, 0, 0, 0, 0, 0, 0, 0, 0] > Space used (live): 894498746 > Space used (total): 894498746 > Space used by snapshots (total): 0 > Off heap memory used (total): 11576197 > SSTable Compression Ratio: 0.6956629530569777 > Number of keys (estimate): 3562207 > Memtable cell count: 0 > Memtable data size: 0 > Memtable off heap memory used: 0 > Memtable switch count: 87 > Local read count: 57761088 > Local read latency: 0.108 ms > Local write count: 2513164 > Local write latency: NaN ms > Pending flushes: 0 > Percent repaired: 86.33 > Bloom filter false positives: 43 > Bloom filter false ratio: 0.0 > Bloom filter space used: 8046104 > Bloom filter off heap memory used: 8046024 > Index summary off heap memory used: 3449005 > Compression metadata off heap memory used: 81168 > Compacted partition minimum bytes: 104 > Compacted partition maximum bytes: 5722 > Compacted partition mean bytes: 175 > Average live cells per slice (last five minutes): 1.0 > Maximum live cells per slice (last five minutes): 1 > Average tombstones per slice (last five minutes): 1.0 > Maximum tombstones per slice (last five minutes): 1 > Dropped Mutations: 0 > {code} > Also for STCS we've confirmed that SSTable count will be different to the > number of SSTables reported in the Compaction Bucket's. In the below example > there's only 3 SSTables in a single bucket - no more are listed for this > table. Compaction thresholds haven't been modified for this table and it's a > very basic KV schema. > {code:java} > Keyspace : yyy > Read Count: 30485 > Read Latency: 0.06708991307200263 ms. > Write Count: 57044 > Write Latency: 0.02204061776873992 ms. > Pending Flushes: 0 > Table: yyy > SSTable count: 19 > Space used (live): 18195482 > Space used (total): 18195482 > Space used by snapshots (total): 0 > Off heap memory used (total): 747376 > SSTable Compression Ratio: 0.7607394576769735 > Number of keys (estimate): 116074 > Memtable cell count: 0 > Memtable data size: 0 > Memtable off heap memory used: 0 > Memtable switch count: 39 > Local read count: 30485 > Local read latency: NaN ms > Local write count: 57044 > Local write latency: NaN ms > Pending flushes: 0 > Percent repaired: 79.76 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used: 690912 > Bloom filter off heap memory used: 690760 > Index summary off heap memory used: 54736 > Compression metadata off heap memory used: 1880 > Compacted partition minimum bytes: 73 > Compacted partition maximum bytes: 124 > Compacted partition mean bytes: 96 > Average live cells per slice (last five minutes): NaN >
[jira] [Comment Edited] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state
[ https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515592#comment-16515592 ] Kurt Greaves edited comment on CASSANDRA-14525 at 6/18/18 11:15 AM: We've already had a ticket (and a _very_ similar patch) for this since November last year... CASSANDRA-14063 was (Author: kurtg): We've already had a ticket (and a _very_ similar patch( for this since November last year... CASSANDRA-14063 > streaming failure during bootstrap makes new node into inconsistent state > - > > Key: CASSANDRA-14525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14525 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Fix For: 4.0, 2.2.x, 3.0.x > > > If bootstrap fails for newly joining node (most common reason is due to > streaming failure) then Cassandra state remains in {{joining}} state which is > fine but Cassandra also enables Native transport which makes overall state > inconsistent. This further creates NullPointer exception if auth is enabled > on the new node, please find reproducible steps here: > For example if bootstrap fails due to streaming errors like > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) > [apache-cassandra-3.0.16.jar:3.0.16] > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121] > {quote} > then variable [StorageService.java::dataAvailable > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892] > will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not > call [StorageService.java::finishJoiningRing > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/o
[jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state
[ https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515592#comment-16515592 ] Kurt Greaves commented on CASSANDRA-14525: -- We've already had a ticket (and a _very_ similar patch( for this since November last year... CASSANDRA-14063 > streaming failure during bootstrap makes new node into inconsistent state > - > > Key: CASSANDRA-14525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14525 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Fix For: 4.0, 2.2.x, 3.0.x > > > If bootstrap fails for newly joining node (most common reason is due to > streaming failure) then Cassandra state remains in {{joining}} state which is > fine but Cassandra also enables Native transport which makes overall state > inconsistent. This further creates NullPointer exception if auth is enabled > on the new node, please find reproducible steps here: > For example if bootstrap fails due to streaming errors like > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) > [apache-cassandra-3.0.16.jar:3.0.16] > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121] > {quote} > then variable [StorageService.java::dataAvailable > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892] > will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not > call [StorageService.java::finishJoiningRing > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933] > and as a result > [StorageService.java::doAuthSetup|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/
[jira] [Commented] (CASSANDRA-10735) Support netty openssl (netty-tcnative) for client encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515586#comment-16515586 ] Jason Brown commented on CASSANDRA-10735: - [~jahar.tyagi] I think you are having a client-side problem, and not on the server. This ticket describes functionality going into the server-side database for 4.0. You should probably contact the user@ ML for help. > Support netty openssl (netty-tcnative) for client encryption > > > Key: CASSANDRA-10735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10735 > Project: Cassandra > Issue Type: Improvement >Reporter: Andy Tolbert >Assignee: Jason Brown >Priority: Major > Fix For: 4.0 > > Attachments: netty-ssl-trunk.tgz, nettyssl-bench.tgz, > nettysslbench.png, nettysslbench_small.png, sslbench12-03.png > > > The java-driver recently added support for using netty openssl via > [netty-tcnative|http://netty.io/wiki/forked-tomcat-native.html] in > [JAVA-841|https://datastax-oss.atlassian.net/browse/JAVA-841], this shows a > very measured improvement (numbers incoming on that ticket). It seems > likely that this can offer improvement if implemented C* side as well. > Since netty-tcnative has platform specific requirements, this should not be > made the default, but rather be an option that one can use. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14356) LWTs keep failing in trunk after immutable refactor
[ https://issues.apache.org/jira/browse/CASSANDRA-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515556#comment-16515556 ] mck commented on CASSANDRA-14356: - Committed as 717c108374 > LWTs keep failing in trunk after immutable refactor > --- > > Key: CASSANDRA-14356 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14356 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: OpenJDK Runtime Environment (build 1.8.0_161-b14), > Cassandra 4.0 commit c22ee2bd451d030e99cfb65be839bbc735a5352f (29.3.2018 > 14:01) >Reporter: Michael Burman >Assignee: Michael Burman >Priority: Major > Labels: LWT > Fix For: 4.0 > > Attachments: CASSANDRA-14356.diff > > > In the PaxosState, the original assert check is in the form of: > assert promised.update.metadata() == accepted.update.metadata() && > accepted.update.metadata() == mostRecentCommit.update.metadata(); > However, after the change to make TableMetadata immutable this no longer > works as these instances are not necessarily the same (or never). This causes > the LWTs to fail although they're still correctly targetting the same table. > From IRC: > It's a bug alright. Though really, the assertion should be on the > metadata ids, cause TableMetadata#equals does more than what we want. > That is, replacing by .equals() is not ok. That would reject throw > on any change to a table metadata, while the spirit of the assumption was to > sanity check both update were on the same table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: Fix assertions in PaxosState and PrepareResponse after TableMetadata was made immutable
Repository: cassandra Updated Branches: refs/heads/trunk 255242237 -> 717c10837 Fix assertions in PaxosState and PrepareResponse after TableMetadata was made immutable Patch by Michael Burman; reviewed by Mick Semb Wever for CASSANDRA-14356 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/717c1083 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/717c1083 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/717c1083 Branch: refs/heads/trunk Commit: 717c108374a56897d10fcad41fe82b43e2192648 Parents: 2552422 Author: Mick Semb Wever Authored: Sun Jun 17 14:29:00 2018 +1000 Committer: Mick Semb Wever Committed: Mon Jun 18 20:03:27 2018 +1000 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/service/paxos/PaxosState.java | 2 +- src/java/org/apache/cassandra/service/paxos/PrepareResponse.java | 2 +- 3 files changed, 3 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/717c1083/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 4ea32c9..fd236a2 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0 + * Fix assertions in LWTs after TableMetadata was made immutable (CASSANDRA-14356) * Abort compactions quicker (CASSANDRA-14397) * Support light-weight transactions in cassandra-stress (CASSANDRA-13529) * Make AsyncOneResponse use the correct timeout (CASSANDRA-14509) http://git-wip-us.apache.org/repos/asf/cassandra/blob/717c1083/src/java/org/apache/cassandra/service/paxos/PaxosState.java -- diff --git a/src/java/org/apache/cassandra/service/paxos/PaxosState.java b/src/java/org/apache/cassandra/service/paxos/PaxosState.java index 7d59374..6e02435 100644 --- a/src/java/org/apache/cassandra/service/paxos/PaxosState.java +++ b/src/java/org/apache/cassandra/service/paxos/PaxosState.java @@ -46,7 +46,7 @@ public class PaxosState public PaxosState(Commit promised, Commit accepted, Commit mostRecentCommit) { assert promised.update.partitionKey().equals(accepted.update.partitionKey()) && accepted.update.partitionKey().equals(mostRecentCommit.update.partitionKey()); -assert promised.update.metadata() == accepted.update.metadata() && accepted.update.metadata() == mostRecentCommit.update.metadata(); +assert promised.update.metadata().id.equals(accepted.update.metadata().id) && accepted.update.metadata().id.equals(mostRecentCommit.update.metadata().id); this.promised = promised; this.accepted = accepted; http://git-wip-us.apache.org/repos/asf/cassandra/blob/717c1083/src/java/org/apache/cassandra/service/paxos/PrepareResponse.java -- diff --git a/src/java/org/apache/cassandra/service/paxos/PrepareResponse.java b/src/java/org/apache/cassandra/service/paxos/PrepareResponse.java index 2110dd7..4c7becc 100644 --- a/src/java/org/apache/cassandra/service/paxos/PrepareResponse.java +++ b/src/java/org/apache/cassandra/service/paxos/PrepareResponse.java @@ -45,7 +45,7 @@ public class PrepareResponse public PrepareResponse(boolean promised, Commit inProgressCommit, Commit mostRecentCommit) { assert inProgressCommit.update.partitionKey().equals(mostRecentCommit.update.partitionKey()); -assert inProgressCommit.update.metadata() == mostRecentCommit.update.metadata(); +assert inProgressCommit.update.metadata().id.equals(mostRecentCommit.update.metadata().id); this.promised = promised; this.mostRecentCommit = mostRecentCommit; - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14356) LWTs keep failing in trunk after immutable refactor
[ https://issues.apache.org/jira/browse/CASSANDRA-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mck updated CASSANDRA-14356: Resolution: Fixed Status: Resolved (was: Testing) > LWTs keep failing in trunk after immutable refactor > --- > > Key: CASSANDRA-14356 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14356 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: OpenJDK Runtime Environment (build 1.8.0_161-b14), > Cassandra 4.0 commit c22ee2bd451d030e99cfb65be839bbc735a5352f (29.3.2018 > 14:01) >Reporter: Michael Burman >Assignee: Michael Burman >Priority: Major > Labels: LWT > Fix For: 4.0 > > Attachments: CASSANDRA-14356.diff > > > In the PaxosState, the original assert check is in the form of: > assert promised.update.metadata() == accepted.update.metadata() && > accepted.update.metadata() == mostRecentCommit.update.metadata(); > However, after the change to make TableMetadata immutable this no longer > works as these instances are not necessarily the same (or never). This causes > the LWTs to fail although they're still correctly targetting the same table. > From IRC: > It's a bug alright. Though really, the assertion should be on the > metadata ids, cause TableMetadata#equals does more than what we want. > That is, replacing by .equals() is not ok. That would reject throw > on any change to a table metadata, while the spirit of the assumption was to > sanity check both update were on the same table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14356) LWTs keep failing in trunk after immutable refactor
[ https://issues.apache.org/jira/browse/CASSANDRA-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515536#comment-16515536 ] Aleksey Yeschenko commented on CASSANDRA-14356: --- [~michaelsembwever] Change LGTM, ship it (: > LWTs keep failing in trunk after immutable refactor > --- > > Key: CASSANDRA-14356 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14356 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: OpenJDK Runtime Environment (build 1.8.0_161-b14), > Cassandra 4.0 commit c22ee2bd451d030e99cfb65be839bbc735a5352f (29.3.2018 > 14:01) >Reporter: Michael Burman >Assignee: Michael Burman >Priority: Major > Labels: LWT > Fix For: 4.0 > > Attachments: CASSANDRA-14356.diff > > > In the PaxosState, the original assert check is in the form of: > assert promised.update.metadata() == accepted.update.metadata() && > accepted.update.metadata() == mostRecentCommit.update.metadata(); > However, after the change to make TableMetadata immutable this no longer > works as these instances are not necessarily the same (or never). This causes > the LWTs to fail although they're still correctly targetting the same table. > From IRC: > It's a bug alright. Though really, the assertion should be on the > metadata ids, cause TableMetadata#equals does more than what we want. > That is, replacing by .equals() is not ok. That would reject throw > on any change to a table metadata, while the spirit of the assumption was to > sanity check both update were on the same table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14423) SSTables stop being compacted
[ https://issues.apache.org/jira/browse/CASSANDRA-14423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Podkowinski updated CASSANDRA-14423: --- Reproduced In: 3.11.2, 3.11.0 (was: 3.11.0, 3.11.2) Reviewer: Stefan Podkowinski > SSTables stop being compacted > - > > Key: CASSANDRA-14423 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14423 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Kurt Greaves >Assignee: Kurt Greaves >Priority: Major > Fix For: 2.2.13, 3.0.17, 3.11.3 > > > So seeing a problem in 3.11.0 where SSTables are being lost from the view and > not being included in compactions/as candidates for compaction. It seems to > get progressively worse until there's only 1-2 SSTables in the view which > happen to be the most recent SSTables and thus compactions completely stop > for that table. > The SSTables seem to still be included in reads, just not compactions. > The issue can be fixed by restarting C*, as it will reload all SSTables into > the view, but this is only a temporary fix. User defined/major compactions > still work - not clear if they include the result back in the view but is not > a good work around. > This also results in a discrepancy between SSTable count and SSTables in > levels for any table using LCS. > {code:java} > Keyspace : xxx > Read Count: 57761088 > Read Latency: 0.10527088681224288 ms. > Write Count: 2513164 > Write Latency: 0.018211106398149903 ms. > Pending Flushes: 0 > Table: xxx > SSTable count: 10 > SSTables in each level: [2, 0, 0, 0, 0, 0, 0, 0, 0] > Space used (live): 894498746 > Space used (total): 894498746 > Space used by snapshots (total): 0 > Off heap memory used (total): 11576197 > SSTable Compression Ratio: 0.6956629530569777 > Number of keys (estimate): 3562207 > Memtable cell count: 0 > Memtable data size: 0 > Memtable off heap memory used: 0 > Memtable switch count: 87 > Local read count: 57761088 > Local read latency: 0.108 ms > Local write count: 2513164 > Local write latency: NaN ms > Pending flushes: 0 > Percent repaired: 86.33 > Bloom filter false positives: 43 > Bloom filter false ratio: 0.0 > Bloom filter space used: 8046104 > Bloom filter off heap memory used: 8046024 > Index summary off heap memory used: 3449005 > Compression metadata off heap memory used: 81168 > Compacted partition minimum bytes: 104 > Compacted partition maximum bytes: 5722 > Compacted partition mean bytes: 175 > Average live cells per slice (last five minutes): 1.0 > Maximum live cells per slice (last five minutes): 1 > Average tombstones per slice (last five minutes): 1.0 > Maximum tombstones per slice (last five minutes): 1 > Dropped Mutations: 0 > {code} > Also for STCS we've confirmed that SSTable count will be different to the > number of SSTables reported in the Compaction Bucket's. In the below example > there's only 3 SSTables in a single bucket - no more are listed for this > table. Compaction thresholds haven't been modified for this table and it's a > very basic KV schema. > {code:java} > Keyspace : yyy > Read Count: 30485 > Read Latency: 0.06708991307200263 ms. > Write Count: 57044 > Write Latency: 0.02204061776873992 ms. > Pending Flushes: 0 > Table: yyy > SSTable count: 19 > Space used (live): 18195482 > Space used (total): 18195482 > Space used by snapshots (total): 0 > Off heap memory used (total): 747376 > SSTable Compression Ratio: 0.7607394576769735 > Number of keys (estimate): 116074 > Memtable cell count: 0 > Memtable data size: 0 > Memtable off heap memory used: 0 > Memtable switch count: 39 > Local read count: 30485 > Local read latency: NaN ms > Local write count: 57044 > Local write latency: NaN ms > Pending flushes: 0 > Percent repaired: 79.76 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used: 690912 > Bloom filter off heap memory used: 690760 > Index summary off heap memory used: 54736 > Compression metadata off heap memory used: 1880 > Compacted partition minimum bytes: 73 > Compacted partition maximum bytes: 124 > Compacted partition mean bytes: 96 > Average live cells per slice (last five minutes): NaN > Maximum live cells per slice (last five minutes): 0 > Average tombstones per slice (last five minutes): NaN > Maximum tombstones per slice (last five minutes): 0 > Dropped Mutations: 0 > {code} > {code:java} > Apr 27 03:10:39 cassandra[9263]: TRACE o.a.c.d.c.SizeTieredCompactionStrategy > Compaction buckets are > [[BigTableReader(path='/var/lib/cassa
[jira] [Created] (CASSANDRA-14528) Provide stacktraces for various error logs
Stefan Podkowinski created CASSANDRA-14528: -- Summary: Provide stacktraces for various error logs Key: CASSANDRA-14528 URL: https://issues.apache.org/jira/browse/CASSANDRA-14528 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Stefan Podkowinski Assignee: Stefan Podkowinski Fix For: 4.x We should reintroduce some stack traces that have gone missing since CASSANDRA-13723 (ba87ab4e954ad2). The cleanest way would probably to use {{String.format}} for any custom messages, e.g. {{logger.error(String.format("Error using param {}", param), e)}}, so we make this more implicit and robust for coming api changes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10735) Support netty openssl (netty-tcnative) for client encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515426#comment-16515426 ] jahar commented on CASSANDRA-10735: --- Hi, I just followed the instructions given on [https://docs.datastax.com/en/developer/java-driver/3.0/manual/ssl/] to use NettySSLOptions, but getting _com.datastax.driver.core.exceptions.NoHostAvailableException._ My .crt and private key and certificates are ok as I have verified them using OpenSSL. Tried a lot but not able to find the root cause. JdkSSLOptions is working fine but when I use the SSLOptions it fails. This is what I am using in code: _KeyStore ks = KeyStore.getInstance("JKS");_ _trustStore = new FileInputStream(theTrustStorePath);_ _ks.load(trustStore, theTrustStorePassword.toCharArray());_ _TrustManagerFactory tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());_ _tmf.init(ks);_ _SslContextBuilder builder =_ _SslContextBuilder.forClient()_ _.sslProvider(SslProvider.OPENSSL)_ _.trustManager(tmf)_ _.ciphers(theCipherSuites)//_ _.keyManager(new File("mycert.pem"),_ _new File("mykey.pem"));_ _SSLOptions sslOptions = new NettySSLOptions(builder.build());_ _return sslOptions;_ This throws exception _mySession = myCluster.connect();_ Any idea or suggestions please. > Support netty openssl (netty-tcnative) for client encryption > > > Key: CASSANDRA-10735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10735 > Project: Cassandra > Issue Type: Improvement >Reporter: Andy Tolbert >Assignee: Jason Brown >Priority: Major > Fix For: 4.0 > > Attachments: netty-ssl-trunk.tgz, nettyssl-bench.tgz, > nettysslbench.png, nettysslbench_small.png, sslbench12-03.png > > > The java-driver recently added support for using netty openssl via > [netty-tcnative|http://netty.io/wiki/forked-tomcat-native.html] in > [JAVA-841|https://datastax-oss.atlassian.net/browse/JAVA-841], this shows a > very measured improvement (numbers incoming on that ticket). It seems > likely that this can offer improvement if implemented C* side as well. > Since netty-tcnative has platform specific requirements, this should not be > made the default, but rather be an option that one can use. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org