[jira] [Updated] (CASSANDRA-14827) Blog Post: "Finding Bugs in Cassandra's Internals with Property-based Testing"
[ https://issues.apache.org/jira/browse/CASSANDRA-14827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nate McCall updated CASSANDRA-14827: Resolution: Fixed Status: Resolved (was: Patch Available) Published and verified display and links all render on the site. Thanks [~jrwest]! > Blog Post: "Finding Bugs in Cassandra's Internals with Property-based Testing" > -- > > Key: CASSANDRA-14827 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14827 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Jordan West >Assignee: Jordan West >Priority: Minor > Labels: blog > Attachments: quicktheories-blogpost-v2.patch, > quicktheories-blogpost.patch, rendered.png > > > This blog post introduces > [QuickTheories|https://github.com/ncredinburgh/QuickTheories] and describes > usage of it when testing -CASSANDRA-13304.- > > SVN patch contained the post and a rendered screenshot are attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
svn commit: r1844194 - in /cassandra/site: publish/ publish/blog/ publish/blog/2018/10/ publish/blog/2018/10/17/ src/_posts/
Author: zznate Date: Thu Oct 18 01:21:41 2018 New Revision: 1844194 URL: http://svn.apache.org/viewvc?rev=1844194&view=rev Log: CASSANDRA-14827 - Add content and supporting directory structure for blog post Added: cassandra/site/publish/blog/2018/10/ cassandra/site/publish/blog/2018/10/17/ cassandra/site/publish/blog/2018/10/17/finding_bugs_with_property_based_testing.html cassandra/site/src/_posts/2018-10-17-finding_bugs_with_property_based_testing.markdown Modified: cassandra/site/publish/blog/index.html cassandra/site/publish/feed.xml Added: cassandra/site/publish/blog/2018/10/17/finding_bugs_with_property_based_testing.html URL: http://svn.apache.org/viewvc/cassandra/site/publish/blog/2018/10/17/finding_bugs_with_property_based_testing.html?rev=1844194&view=auto == --- cassandra/site/publish/blog/2018/10/17/finding_bugs_with_property_based_testing.html (added) +++ cassandra/site/publish/blog/2018/10/17/finding_bugs_with_property_based_testing.html Thu Oct 18 01:21:41 2018 @@ -0,0 +1,260 @@ + + + + + + + + + + + + + + + + Finding Bugs in Cassandra's Internals with Property-based Testing + + http://cassandra.apache.org/blog/2018/10/17/finding_bugs_with_property_based_testing.html";> + + https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"; integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7" crossorigin="anonymous"> + + + + + https://use.fontawesome.com/releases/v5.2.0/css/all.css"; integrity="sha384-hWVjflwFxL6sNzntih27bfxkr27PmbbK/iSvJ+a4+0owXq79v+lsFkW54bOGbiDQ" crossorigin="anonymous"> + + http://cassandra.apache.org/feed.xml"; title="Apache Cassandra Website" /> + + + + + + + + + + + Apache Software Foundation + +http://www.apache.org";>Apache Homepage +http://www.apache.org/licenses/";>License +http://www.apache.org/foundation/sponsorship.html";>Sponsorship +http://www.apache.org/foundation/thanks.html";>Thanks +http://www.apache.org/security/";>Security + + + + + + Apache Cassandra + + + + +Finding Bugs in Cassandra's Internals with Property-based Testing + + + + + + + + + + + + + + + Toggle navigation + + + + + + + + + + Home + Download + Documentation + Community + +Blog + + + + + + + + + + + + Finding Bugs in Cassandra's Internals with Property-based Testing +Posted on October 17, 2018 by the Apache Cassandra Community +« Back to the Apache Cassandra Blog + + As of September 1st, the Apache Cassandra community has shifted the focus of Cassandra 4.0 development from new feature work to testing, validation, and hardening, with the goal of releasing a stable 4.0 that every Cassandra user, from small deployments to large corporations, can deploy with confidence. There are several projects and methodologies that the community is undertaking to this end. One of these is the adoption of property-based testing, which was http://cassandra.apache.org/blog/2018/08/21/testing_apache_cassandra.html";>previously introduced here. This post will take a look at a specific use of this approach and how it found a bug in a new feature meant to ensure data integrity between the client and Cassandra. + +Detecting Corruption is a Property + +In this post, we demonstrate property-based testing in Cassandra through the integration of the https://github.com/ncredinburgh/QuickTheories";>QuickTheories library introduced as part of the work done for https://issues.apache.org/jira/browse/CASSANDRA-13304";>CASSANDRA-13304. + +This ticket modifies the framing of Cassandraâs native client protocol to include checksums in addition to the existing, optional compression. Clients can opt-in to this new feature to retain data integrity across the many hops between themselves and Cassandra. This is meant to address cases where hardware and protocol level checksums fail (due to underlying hardware issues) â a case that has been seen in production. A description of the protocol changes can be found in the ticket but for the purposes of this discussion the salient part is that two checksums are added: one that covers the length(s) of the data (if compressed there are two lengths), and one for the data itself. Before merging this feature, property-based testing using QuickTheories was used to uncover a bug in the calculation of the checksum over the lengths. This bug could have led to silent corruption at worst or unexpected errors during deserialization at best. + +The test used to fi
[jira] [Updated] (CASSANDRA-14831) Nodetool repair hangs with java.net.SocketException: End-of-stream reached
[ https://issues.apache.org/jira/browse/CASSANDRA-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tania S Engel updated CASSANDRA-14831: -- Attachment: (was: image-2018-10-17-13-30-42-590.png) > Nodetool repair hangs with java.net.SocketException: End-of-stream reached > -- > > Key: CASSANDRA-14831 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14831 > Project: Cassandra > Issue Type: Bug > Components: Repair >Reporter: Tania S Engel >Priority: Major > Fix For: 3.11.1 > > Attachments: Cassandra - 14831 Logs.mht > > > Using Cassandra 3.11.1. > Ran >nodetool repair on a small 3 node cluster from node > 3eef. Node 9160 and 3f5e experienced a stream failure. > *NODE 9160:* > ERROR [STREAM-IN-/fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e:7000] 2018-10-16 > 01:45:00,400 StreamSession.java:593 - [Stream > #103fe070-d0e5-11e8-a993-5929a1c131b4] Streaming error occurred on session > with peer fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e > *java.net.SocketException: End-of-stream reached* > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:71) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:311) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152] > > *NODE 3f5e:* > ERROR [STREAM-IN-/fd70:616e:6761:6561:ec4:7aff:fece:9160:59676] 2018-10-16 > 01:45:09,474 StreamSession.java:593 - [Stream > #103ef610-d0e5-11e8-a993-5929a1c131b4] Streaming error occurred on session > with peer fd70:616e:6761:6561:ec4:7aff:fece:9160 > java.io.IOException: An existing connection was forcibly closed by the remote > host > at sun.nio.ch.SocketDispatcher.read0(Native Method) ~[na:1.8.0_152] > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43) ~[na:1.8.0_152] > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_152] > at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[na:1.8.0_152] > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) > ~[na:1.8.0_152] > at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:206) > ~[na:1.8.0_152] > at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) > ~[na:1.8.0_152] > at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) > ~[na:1.8.0_152] > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:56) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:311) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152] > > *NODE 3EEF:* > ERROR [RepairJobTask:14] 2018-10-16 01:45:00,457 RepairSession.java:281 - > [repair #f2ab3eb0-d0e4-11e8-9926-bf64f35712c1] Session completed with the > following error > org.apache.cassandra.exceptions.RepairException: [repair > #f2ab3eb0-d0e4-11e8-9926-bf64f35712c1 on logs/{color:#33}XX{color}, > [(-8271925838625565988,-8266397600493941101], > (2290821710735817606,2299380749828706426] > …(-8701313305140908434,-8686533141993948378]]] Sync failed between > /fd70:616e:6761:6561:ec4:7aff:fece:9160 and > /fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e > at > org.apache.cassandra.repair.RemoteSyncTask.syncComplete(RemoteSyncTask.java:67) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at > org.apache.cassandra.repair.RepairSession.syncComplete(RepairSession.java:202) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:495) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:162) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_152] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_152] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_152] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_152] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) > [apache-cassandra-3.11.1.jar:3.11.1] > at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_152] > > ERROR [RepairJobTask:14] 2018-10-16 01:45:00,459 RepairRunnable.java:276 - > Repair session f2ab3eb0-d0e4-11e8-9926-bf64f35712c1 for range > [(-827192583862
[jira] [Updated] (CASSANDRA-14831) Nodetool repair hangs with java.net.SocketException: End-of-stream reached
[ https://issues.apache.org/jira/browse/CASSANDRA-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tania S Engel updated CASSANDRA-14831: -- Attachment: Cassandra - 14831 Logs.mht > Nodetool repair hangs with java.net.SocketException: End-of-stream reached > -- > > Key: CASSANDRA-14831 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14831 > Project: Cassandra > Issue Type: Bug > Components: Repair >Reporter: Tania S Engel >Priority: Major > Fix For: 3.11.1 > > Attachments: Cassandra - 14831 Logs.mht, > image-2018-10-17-13-30-42-590.png > > > Using Cassandra 3.11.1. > Ran >nodetool repair on a small 3 node cluster from node > 3eef. Node 9160 and 3f5e experienced a stream failure. > *NODE 9160:* > ERROR [STREAM-IN-/fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e:7000] 2018-10-16 > 01:45:00,400 StreamSession.java:593 - [Stream > #103fe070-d0e5-11e8-a993-5929a1c131b4] Streaming error occurred on session > with peer fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e > *java.net.SocketException: End-of-stream reached* > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:71) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:311) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152] > > *NODE 3f5e:* > ERROR [STREAM-IN-/fd70:616e:6761:6561:ec4:7aff:fece:9160:59676] 2018-10-16 > 01:45:09,474 StreamSession.java:593 - [Stream > #103ef610-d0e5-11e8-a993-5929a1c131b4] Streaming error occurred on session > with peer fd70:616e:6761:6561:ec4:7aff:fece:9160 > java.io.IOException: An existing connection was forcibly closed by the remote > host > at sun.nio.ch.SocketDispatcher.read0(Native Method) ~[na:1.8.0_152] > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43) ~[na:1.8.0_152] > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_152] > at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[na:1.8.0_152] > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) > ~[na:1.8.0_152] > at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:206) > ~[na:1.8.0_152] > at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) > ~[na:1.8.0_152] > at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) > ~[na:1.8.0_152] > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:56) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:311) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152] > > *NODE 3EEF:* > ERROR [RepairJobTask:14] 2018-10-16 01:45:00,457 RepairSession.java:281 - > [repair #f2ab3eb0-d0e4-11e8-9926-bf64f35712c1] Session completed with the > following error > org.apache.cassandra.exceptions.RepairException: [repair > #f2ab3eb0-d0e4-11e8-9926-bf64f35712c1 on logs/{color:#33}XX{color}, > [(-8271925838625565988,-8266397600493941101], > (2290821710735817606,2299380749828706426] > …(-8701313305140908434,-8686533141993948378]]] Sync failed between > /fd70:616e:6761:6561:ec4:7aff:fece:9160 and > /fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e > at > org.apache.cassandra.repair.RemoteSyncTask.syncComplete(RemoteSyncTask.java:67) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at > org.apache.cassandra.repair.RepairSession.syncComplete(RepairSession.java:202) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:495) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:162) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > ~[apache-cassandra-3.11.1.jar:3.11.1] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_152] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_152] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_152] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_152] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) > [apache-cassandra-3.11.1.jar:3.11.1] > at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_152] > > ERROR [RepairJobTask:14] 2018-10-16 01:45:00,459 RepairRunnable.java:276 - > Repair session f2ab3eb0-d0e4-11e8-9926-bf64f35712c1 for rang
[jira] [Created] (CASSANDRA-14831) Nodetool repair hangs with java.net.SocketException: End-of-stream reached
Tania S Engel created CASSANDRA-14831: - Summary: Nodetool repair hangs with java.net.SocketException: End-of-stream reached Key: CASSANDRA-14831 URL: https://issues.apache.org/jira/browse/CASSANDRA-14831 Project: Cassandra Issue Type: Bug Components: Repair Reporter: Tania S Engel Fix For: 3.11.1 Attachments: image-2018-10-17-13-30-42-590.png Using Cassandra 3.11.1. Ran >nodetool repair on a small 3 node cluster from node 3eef. Node 9160 and 3f5e experienced a stream failure. *NODE 9160:* ERROR [STREAM-IN-/fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e:7000] 2018-10-16 01:45:00,400 StreamSession.java:593 - [Stream #103fe070-d0e5-11e8-a993-5929a1c131b4] Streaming error occurred on session with peer fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e *java.net.SocketException: End-of-stream reached* at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:71) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:311) ~[apache-cassandra-3.11.1.jar:3.11.1] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152] *NODE 3f5e:* ERROR [STREAM-IN-/fd70:616e:6761:6561:ec4:7aff:fece:9160:59676] 2018-10-16 01:45:09,474 StreamSession.java:593 - [Stream #103ef610-d0e5-11e8-a993-5929a1c131b4] Streaming error occurred on session with peer fd70:616e:6761:6561:ec4:7aff:fece:9160 java.io.IOException: An existing connection was forcibly closed by the remote host at sun.nio.ch.SocketDispatcher.read0(Native Method) ~[na:1.8.0_152] at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43) ~[na:1.8.0_152] at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_152] at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[na:1.8.0_152] at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[na:1.8.0_152] at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:206) ~[na:1.8.0_152] at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) ~[na:1.8.0_152] at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) ~[na:1.8.0_152] at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:56) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:311) ~[apache-cassandra-3.11.1.jar:3.11.1] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152] *NODE 3EEF:* ERROR [RepairJobTask:14] 2018-10-16 01:45:00,457 RepairSession.java:281 - [repair #f2ab3eb0-d0e4-11e8-9926-bf64f35712c1] Session completed with the following error org.apache.cassandra.exceptions.RepairException: [repair #f2ab3eb0-d0e4-11e8-9926-bf64f35712c1 on logs/{color:#33}XX{color}, [(-8271925838625565988,-8266397600493941101], (2290821710735817606,2299380749828706426] …(-8701313305140908434,-8686533141993948378]]] Sync failed between /fd70:616e:6761:6561:ec4:7aff:fece:9160 and /fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e at org.apache.cassandra.repair.RemoteSyncTask.syncComplete(RemoteSyncTask.java:67) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.repair.RepairSession.syncComplete(RepairSession.java:202) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:495) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:162) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) ~[apache-cassandra-3.11.1.jar:3.11.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_152] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_152] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_152] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_152] at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.1.jar:3.11.1] at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_152] ERROR [RepairJobTask:14] 2018-10-16 01:45:00,459 RepairRunnable.java:276 - Repair session f2ab3eb0-d0e4-11e8-9926-bf64f35712c1 for range [(-8271925838625565988,-8266397600493941101],…(-6146831664074703724,-6117107236121156255], (4842256698807887573,4848113042863615717], (-8701313305140908434,-8686533141993948378]] failed with error [repair #f2ab3eb0-d0e4-11e8-9926-bf64f35712c1 on logs/auditsearchlog,…(-8701313305140908434,-8686533141993948378]]] Sync failed between /fd70:616e:6761:6561:ec4:7aff:fece:9160 and /fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e org.apache.
[jira] [Commented] (CASSANDRA-14827) Blog Post: "Finding Bugs in Cassandra's Internals with Property-based Testing"
[ https://issues.apache.org/jira/browse/CASSANDRA-14827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654179#comment-16654179 ] Jordan West commented on CASSANDRA-14827: - Looks good to me. Thanks! > Blog Post: "Finding Bugs in Cassandra's Internals with Property-based Testing" > -- > > Key: CASSANDRA-14827 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14827 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Jordan West >Assignee: Jordan West >Priority: Minor > Labels: blog > Attachments: quicktheories-blogpost-v2.patch, > quicktheories-blogpost.patch, rendered.png > > > This blog post introduces > [QuickTheories|https://github.com/ncredinburgh/QuickTheories] and describes > usage of it when testing -CASSANDRA-13304.- > > SVN patch contained the post and a rendered screenshot are attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10302) Track repair state for more reliable repair
[ https://issues.apache.org/jira/browse/CASSANDRA-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654152#comment-16654152 ] Tania S Engel commented on CASSANDRA-10302: --- Is there any hope that there will be better repair tracking in 4.0. It really would be wonderful to have a nodetool command to see any active running repairs (by the ActiveRepairService?). In our small test cluster of 3, nodetool repair typically takes 2 minutes, but with a stream failure it can hang, and 13 hours later you are left wondering. > Track repair state for more reliable repair > --- > > Key: CASSANDRA-10302 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10302 > Project: Cassandra > Issue Type: Improvement >Reporter: Yuki Morishita >Assignee: Yuki Morishita >Priority: Major > > During repair, coordinator and replica exchange various messages. I've seen > cases that those messages sometimes get lost. > We've made repair message to be more durable (CASSANDRA-5393, etc) but still > messages seem to be lost and hang repair till messaging timeout reaches. > We can prevent this by tracking repair status on repair participants, and > periodically check state after certain period of times to make sure > everything is working fine. > We alse can add command / JMX API to query repair state. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653988#comment-16653988 ] Ariel Weisberg edited comment on CASSANDRA-13241 at 10/17/18 8:12 PM: -- Performance comparison of summing vs the larger compact sequence vs just fetching a long from memory. These numbers are low enough that I don't think it matters which we pick. For every lookup we do here we are going to do several microseconds of decompression and that is going to get much faster by virtue of decompressing less data. Decompression may also get faster due to being a better fit for cache. {noformat} [java] Benchmark Mode CntScore Error Units [java] CompactIntegerSequenceBench.benchCompactIntegerSequence sample2 80.500 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.00 sample78.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.50 sample80.500 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.90 sample83.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.95 sample83.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.99 sample83.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.999 sample83.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0. sample83.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p1.00 sample83.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence sample2 165.500 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.00 sample 164.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.50 sample 165.500 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.90 sample 167.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.95 sample 167.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.99 sample 167.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.999 sample 167.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0. sample 167.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p1.00 sample 167.000 ns/op [java] CompactIntegerSequenceBench.benchMemory sample2 56.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.00 sample51.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.50 sample56.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.90 sample61.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.95 sample61.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.99 sample61.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.999 sample61.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0. sample61.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p1.00
[jira] [Updated] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-13241: --- Attachment: CompactSummingIntegerSequence.java CompactIntegerSequence.java > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth >Assignee: Ariel Weisberg >Priority: Major > Attachments: CompactIntegerSequence.java, > CompactIntegerSequenceBench.java, CompactSummingIntegerSequence.java > > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14827) Blog Post: "Finding Bugs in Cassandra's Internals with Property-based Testing"
[ https://issues.apache.org/jira/browse/CASSANDRA-14827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nate McCall updated CASSANDRA-14827: Attachment: quicktheories-blogpost-v2.patch Status: Patch Available (was: Open) Some edits, small formatting tweaks and moving the date up to 10/18. [~jrwest] take a look and if you are cool with it, i'll post asap. > Blog Post: "Finding Bugs in Cassandra's Internals with Property-based Testing" > -- > > Key: CASSANDRA-14827 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14827 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Jordan West >Assignee: Jordan West >Priority: Minor > Labels: blog > Attachments: quicktheories-blogpost-v2.patch, > quicktheories-blogpost.patch, rendered.png > > > This blog post introduces > [QuickTheories|https://github.com/ncredinburgh/QuickTheories] and describes > usage of it when testing -CASSANDRA-13304.- > > SVN patch contained the post and a rendered screenshot are attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14827) Blog Post: "Finding Bugs in Cassandra's Internals with Property-based Testing"
[ https://issues.apache.org/jira/browse/CASSANDRA-14827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nate McCall updated CASSANDRA-14827: Status: Open (was: Patch Available) > Blog Post: "Finding Bugs in Cassandra's Internals with Property-based Testing" > -- > > Key: CASSANDRA-14827 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14827 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Jordan West >Assignee: Jordan West >Priority: Minor > Labels: blog > Attachments: quicktheories-blogpost.patch, rendered.png > > > This blog post introduces > [QuickTheories|https://github.com/ncredinburgh/QuickTheories] and describes > usage of it when testing -CASSANDRA-13304.- > > SVN patch contained the post and a rendered screenshot are attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-13241: --- Attachment: (was: CompactSummingIntegerSequence.java) > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth >Assignee: Ariel Weisberg >Priority: Major > Attachments: CompactIntegerSequenceBench.java > > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-13241: --- Attachment: (was: CompactIntegerSequence.java) > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth >Assignee: Ariel Weisberg >Priority: Major > Attachments: CompactIntegerSequenceBench.java > > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14827) Blog Post: "Finding Bugs in Cassandra's Internals with Property-based Testing"
[ https://issues.apache.org/jira/browse/CASSANDRA-14827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nate McCall updated CASSANDRA-14827: Reviewer: Nate McCall Status: Patch Available (was: Open) > Blog Post: "Finding Bugs in Cassandra's Internals with Property-based Testing" > -- > > Key: CASSANDRA-14827 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14827 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Jordan West >Assignee: Jordan West >Priority: Minor > Labels: blog > Attachments: quicktheories-blogpost.patch, rendered.png > > > This blog post introduces > [QuickTheories|https://github.com/ncredinburgh/QuickTheories] and describes > usage of it when testing -CASSANDRA-13304.- > > SVN patch contained the post and a rendered screenshot are attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14821) Make it possible to run multi-node coordinator/replica tests in a single JVM
[ https://issues.apache.org/jira/browse/CASSANDRA-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-14821: Status: Patch Available (was: Open) > Make it possible to run multi-node coordinator/replica tests in a single JVM > > > Key: CASSANDRA-14821 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14821 > Project: Cassandra > Issue Type: Test >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Major > > This patch proposes an in-JVM Distributed Tester that can help to write > distributed tests in a single JVM and be able to control node behaviour in a > fine-grained way and set up nodes exactly how one needs it: configuration > settings, parameters, which are also controllable in runtime on a per node > basis, so each node can have its own unique state. > It fires up multiple Cassandra Instances in a single JVM. It is done through > having distinct class loaders in order to work around the singleton problem > in Cassandra. In order to be able to pass some information between the nodes, > a common class loader is used that loads up java standard library and several > helper classes. Tests look a lot like CQLTester tests would usually look like. > Each Cassandra Instance, with its distinct class loader is using > serialisation and class loading mechanisms in order to run instance-local > queries and execute node state manipulation code, hooks, callbacks etc. > First version mocks out Messaging Service and simplifies schema management by > simply running schema change commands on each of the instances separately. > Internode communication is mocked by passing ByteBuffers through shared class > loader. > |[patch|https://github.com/ifesdjeen/cassandra/tree/in-jvm-distributed-tests-2]|[tests|https://circleci.com/workflow-run/d88a1278-596c-4af1-9a03-998e9f6c78d3]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14821) Make it possible to run multi-node coordinator/replica tests in a single JVM
[ https://issues.apache.org/jira/browse/CASSANDRA-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-14821: Description: This patch proposes an in-JVM Distributed Tester that can help to write distributed tests in a single JVM and be able to control node behaviour in a fine-grained way and set up nodes exactly how one needs it: configuration settings, parameters, which are also controllable in runtime on a per node basis, so each node can have its own unique state. It fires up multiple Cassandra Instances in a single JVM. It is done through having distinct class loaders in order to work around the singleton problem in Cassandra. In order to be able to pass some information between the nodes, a common class loader is used that loads up java standard library and several helper classes. Tests look a lot like CQLTester tests would usually look like. Each Cassandra Instance, with its distinct class loader is using serialisation and class loading mechanisms in order to run instance-local queries and execute node state manipulation code, hooks, callbacks etc. First version mocks out Messaging Service and simplifies schema management by simply running schema change commands on each of the instances separately. Internode communication is mocked by passing ByteBuffers through shared class loader. |[patch|https://github.com/ifesdjeen/cassandra/tree/in-jvm-distributed-tests-2]|[tests|https://circleci.com/workflow-run/d88a1278-596c-4af1-9a03-998e9f6c78d3]| was: This patch proposes an in-JVM Distributed Tester that can help to write distributed tests in a single JVM and be able to control node behaviour in a fine-grained way and set up nodes exactly how one needs it: configuration settings, parameters, which are also controllable in runtime on a per node basis, so each node can have its own unique state. It fires up multiple Cassandra Instances in a single JVM. It is done through having distinct class loaders in order to work around the singleton problem in Cassandra. In order to be able to pass some information between the nodes, a common class loader is used that loads up java standard library and several helper classes. Tests look a lot like CQLTester tests would usually look like. Each Cassandra Instance, with its distinct class loader is using serialisation and class loading mechanisms in order to run instance-local queries and execute node state manipulation code, hooks, callbacks etc. First version mocks out Messaging Service and simplifies schema management by simply running schema change commands on each of the instances separately. Internode communication is mocked by passing ByteBuffers through shared class loader. Patch: |[patch|https://github.com/ifesdjeen/cassandra/tree/in-jvm-distributed-tests-2]|[tests|https://circleci.com/workflow-run/d88a1278-596c-4af1-9a03-998e9f6c78d3]| > Make it possible to run multi-node coordinator/replica tests in a single JVM > > > Key: CASSANDRA-14821 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14821 > Project: Cassandra > Issue Type: Test >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Major > > This patch proposes an in-JVM Distributed Tester that can help to write > distributed tests in a single JVM and be able to control node behaviour in a > fine-grained way and set up nodes exactly how one needs it: configuration > settings, parameters, which are also controllable in runtime on a per node > basis, so each node can have its own unique state. > It fires up multiple Cassandra Instances in a single JVM. It is done through > having distinct class loaders in order to work around the singleton problem > in Cassandra. In order to be able to pass some information between the nodes, > a common class loader is used that loads up java standard library and several > helper classes. Tests look a lot like CQLTester tests would usually look like. > Each Cassandra Instance, with its distinct class loader is using > serialisation and class loading mechanisms in order to run instance-local > queries and execute node state manipulation code, hooks, callbacks etc. > First version mocks out Messaging Service and simplifies schema management by > simply running schema change commands on each of the instances separately. > Internode communication is mocked by passing ByteBuffers through shared class > loader. > |[patch|https://github.com/ifesdjeen/cassandra/tree/in-jvm-distributed-tests-2]|[tests|https://circleci.com/workflow-run/d88a1278-596c-4af1-9a03-998e9f6c78d3]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.
[jira] [Updated] (CASSANDRA-14821) Make it possible to run multi-node coordinator/replica tests in a single JVM
[ https://issues.apache.org/jira/browse/CASSANDRA-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-14821: Description: This patch proposes an in-JVM Distributed Tester that can help to write distributed tests in a single JVM and be able to control node behaviour in a fine-grained way and set up nodes exactly how one needs it: configuration settings, parameters, which are also controllable in runtime on a per node basis, so each node can have its own unique state. It fires up multiple Cassandra Instances in a single JVM. It is done through having distinct class loaders in order to work around the singleton problem in Cassandra. In order to be able to pass some information between the nodes, a common class loader is used that loads up java standard library and several helper classes. Tests look a lot like CQLTester tests would usually look like. Each Cassandra Instance, with its distinct class loader is using serialisation and class loading mechanisms in order to run instance-local queries and execute node state manipulation code, hooks, callbacks etc. First version mocks out Messaging Service and simplifies schema management by simply running schema change commands on each of the instances separately. Internode communication is mocked by passing ByteBuffers through shared class loader. Patch: |[patch|https://github.com/ifesdjeen/cassandra/tree/in-jvm-distributed-tests-2]|[tests|https://circleci.com/workflow-run/d88a1278-596c-4af1-9a03-998e9f6c78d3]| was: This patch proposes an in-JVM Distributed Tester that can help to write distributed tests in a single JVM and be able to control node behaviour in a fine-grained way and set up nodes exactly how one needs it: configuration settings, parameters, which are also controllable in runtime on a per node basis, so each node can have its own unique state. It fires up multiple Cassandra Instances in a single JVM. It is done through having distinct class loaders in order to work around the singleton problem in Cassandra. In order to be able to pass some information between the nodes, a common class loader is used that loads up java standard library and several helper classes. Tests look a lot like CQLTester tests would usually look like. Each Cassandra Instance, with its distinct class loader is using serialisation and class loading mechanisms in order to run instance-local queries and execute node state manipulation code, hooks, callbacks etc. First version mocks out Messaging Service and simplifies schema management by simply running schema change commands on each of the instances separately. Internode communication is mocked by passing ByteBuffers through shared class loader. > Make it possible to run multi-node coordinator/replica tests in a single JVM > > > Key: CASSANDRA-14821 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14821 > Project: Cassandra > Issue Type: Test >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Major > > This patch proposes an in-JVM Distributed Tester that can help to write > distributed tests in a single JVM and be able to control node behaviour in a > fine-grained way and set up nodes exactly how one needs it: configuration > settings, parameters, which are also controllable in runtime on a per node > basis, so each node can have its own unique state. > It fires up multiple Cassandra Instances in a single JVM. It is done through > having distinct class loaders in order to work around the singleton problem > in Cassandra. In order to be able to pass some information between the nodes, > a common class loader is used that loads up java standard library and several > helper classes. Tests look a lot like CQLTester tests would usually look like. > Each Cassandra Instance, with its distinct class loader is using > serialisation and class loading mechanisms in order to run instance-local > queries and execute node state manipulation code, hooks, callbacks etc. > First version mocks out Messaging Service and simplifies schema management by > simply running schema change commands on each of the instances separately. > Internode communication is mocked by passing ByteBuffers through shared class > loader. > Patch: > |[patch|https://github.com/ifesdjeen/cassandra/tree/in-jvm-distributed-tests-2]|[tests|https://circleci.com/workflow-run/d88a1278-596c-4af1-9a03-998e9f6c78d3]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14821) Make it possible to run multi-node coordinator/replica tests in a single JVM
[ https://issues.apache.org/jira/browse/CASSANDRA-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-14821: Description: This patch proposes an in-JVM Distributed Tester that can help to write distributed tests in a single JVM and be able to control node behaviour in a fine-grained way and set up nodes exactly how one needs it: configuration settings, parameters, which are also controllable in runtime on a per node basis, so each node can have its own unique state. It fires up multiple Cassandra Instances in a single JVM. It is done through having distinct class loaders in order to work around the singleton problem in Cassandra. In order to be able to pass some information between the nodes, a common class loader is used that loads up java standard library and several helper classes. Tests look a lot like CQLTester tests would usually look like. Each Cassandra Instance, with its distinct class loader is using serialisation and class loading mechanisms in order to run instance-local queries and execute node state manipulation code, hooks, callbacks etc. First version mocks out Messaging Service and simplifies schema management by simply running schema change commands on each of the instances separately. Internode communication is mocked by passing ByteBuffers through shared class loader. was: Currently, dtests are complex to write, hard to modify and slow to run. The only option to manipulate a cluster state is either to shut down nodes or run unreliable Byteman queries. In order to improve the situation, a new Distributed Tester is proposed. It fires up multiple Cassandra Instances in a single JVM. It is done through having distinct class loaders in order to work around the singleton problem in Cassandra. In order to be able to pass some information between the nodes, a common class loader is used that loads up java standard library and several helper classes. Tests look a lot like CQLTester tests would usually look like. Each Cassandra Instance, with its distinct class loader is using serialisation and class loading mechanisms in order to run instance-local queries and execute node state manipulation code, hooks, callbacks etc. First version mocks out Messaging Service and simplifies schema management by simply running schema change commands on each of the instances separately. Internode communication is mocked by passing ByteBuffers through shared class loader. > Make it possible to run multi-node coordinator/replica tests in a single JVM > > > Key: CASSANDRA-14821 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14821 > Project: Cassandra > Issue Type: Test >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Major > > This patch proposes an in-JVM Distributed Tester that can help to write > distributed tests in a single JVM and be able to control node behaviour in a > fine-grained way and set up nodes exactly how one needs it: configuration > settings, parameters, which are also controllable in runtime on a per node > basis, so each node can have its own unique state. > It fires up multiple Cassandra Instances in a single JVM. It is done through > having distinct class loaders in order to work around the singleton problem > in Cassandra. In order to be able to pass some information between the nodes, > a common class loader is used that loads up java standard library and several > helper classes. Tests look a lot like CQLTester tests would usually look like. > Each Cassandra Instance, with its distinct class loader is using > serialisation and class loading mechanisms in order to run instance-local > queries and execute node state manipulation code, hooks, callbacks etc. > First version mocks out Messaging Service and simplifies schema management by > simply running schema change commands on each of the instances separately. > Internode communication is mocked by passing ByteBuffers through shared class > loader. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653988#comment-16653988 ] Ariel Weisberg commented on CASSANDRA-13241: Performance comparison of summing vs the larger compact sequence vs just fetching a long from memory. These numbers are low enough that I don't think it matters which we pick. For every lookup we do here we are going to do several microseconds of decompression and that is going to get much faster by virtue of decompressing less data. Decompression may also get faster due to being a better fit for cache. {noformat} [java] Benchmark Mode CntScore Error Units [java] CompactIntegerSequenceBench.benchCompactIntegerSequence sample2 59.500 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.00 sample57.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.50 sample59.500 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.90 sample62.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.95 sample62.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.99 sample62.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0.999 sample62.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p0. sample62.000 ns/op [java] CompactIntegerSequenceBench.benchCompactIntegerSequence:benchCompactIntegerSequence·p1.00 sample62.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence sample2 147.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.00 sample 146.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.50 sample 147.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.90 sample 148.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.95 sample 148.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.99 sample 148.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0.999 sample 148.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p0. sample 148.000 ns/op [java] CompactIntegerSequenceBench.benchCompactSummingntegerSequence:benchCompactSummingntegerSequence·p1.00 sample 148.000 ns/op [java] CompactIntegerSequenceBench.benchMemory sample2 49.500 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.00 sample44.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.50 sample49.500 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.90 sample55.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.95 sample55.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.99 sample55.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0.999 sample55.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p0. sample55.000 ns/op [java] CompactIntegerSequenceBench.benchMemory:benchMemory·p1.00 sample
[jira] [Updated] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb
[ https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-13241: --- Attachment: CompactIntegerSequenceBench.java > Lower default chunk_length_in_kb from 64kb to 4kb > - > > Key: CASSANDRA-13241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13241 > Project: Cassandra > Issue Type: Wish > Components: Core >Reporter: Benjamin Roth >Assignee: Ariel Weisberg >Priority: Major > Attachments: CompactIntegerSequence.java, > CompactIntegerSequenceBench.java, CompactSummingIntegerSequence.java > > > Having a too low chunk size may result in some wasted disk space. A too high > chunk size may lead to massive overreads and may have a critical impact on > overall system performance. > In my case, the default chunk size lead to peak read IOs of up to 1GB/s and > avg reads of 200MB/s. After lowering chunksize (of course aligned with read > ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s. > The risk of (physical) overreads is increasing with lower (page cache size) / > (total data size) ratio. > High chunk sizes are mostly appropriate for bigger payloads pre request but > if the model consists rather of small rows or small resultsets, the read > overhead with 64kb chunk size is insanely high. This applies for example for > (small) skinny rows. > Please also see here: > https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY > To give you some insights what a difference it can make (460GB data, 128GB > RAM): > - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L > - Disk throughput: https://cl.ly/2a0Z250S1M3c > - This shows, that the request distribution remained the same, so no "dynamic > snitch magic": https://cl.ly/3E0t1T1z2c0J -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14654) Reduce heap pressure during compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653954#comment-16653954 ] Chris Lohfink commented on CASSANDRA-14654: --- |[branch|https://github.com/clohfink/cassandra/tree/compaction_allocs]|[unit|https://circleci.com/gh/clohfink/cassandra/428]|[dtest|https://circleci.com/gh/clohfink/cassandra/426]| > Reduce heap pressure during compactions > --- > > Key: CASSANDRA-14654 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14654 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Major > Labels: Performance, pull-request-available > Fix For: 4.x > > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 40m > Remaining Estimate: 0h > > Small partition compactions are painfully slow with a lot of overhead per > partition. There also tends to be an excess of objects created (ie > 200-700mb/s) per compaction thread. > The EncoderStats walks through all the partitions and with mergeWith it will > create a new one per partition as it walks the potentially millions of > partitions. In a test scenario of about 600byte partitions and a couple 100mb > of data this consumed ~16% of the heap pressure. Changing this to instead > mutably track the min values and create one in a EncodingStats.Collector > brought this down considerably (but not 100% since the > UnfilteredRowIterator.stats() still creates 1 per partition). > The KeyCacheKey makes a full copy of the underlying byte array in > ByteBufferUtil.getArray in its constructor. This is the dominating heap > pressure as there are more sstables. By changing this to just keeping the > original it completely eliminates the current dominator of the compactions > and also improves read performance. > Minor tweak included for this as well for operators when compactions are > behind on low read clusters is to make the preemptive opening setting a > hotprop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14790) LongBufferPoolTest burn test fails assertion
[ https://issues.apache.org/jira/browse/CASSANDRA-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653787#comment-16653787 ] Jon Meredith commented on CASSANDRA-14790: -- You're right, I was reading it wrong and I agree it's not a bug - thanks for looking at it. I've already got a patch for the change, should it go on this ticket or open a new one? > LongBufferPoolTest burn test fails assertion > > > Key: CASSANDRA-14790 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14790 > Project: Cassandra > Issue Type: Test > Components: Testing > Environment: Run under macOS 10.13.6, with patch (attached, but also > https://github.com/jonmeredith/cassandra/tree/failing-burn-test) >Reporter: Jon Meredith >Assignee: Jon Meredith >Priority: Major > Labels: pull-request-available > Attachments: 0001-Add-burn-testsome-target-to-build.xml.patch, > 0002-Initialize-before-running-LongBufferPoolTest.patch > > Time Spent: 2h > Remaining Estimate: 0h > > The LongBufferPoolTest from the burn tests fails with an assertion error. I > added a build target to run individual burn tests, and \{jasobrown} gave a > fix for the uninitialized test setup (attached), however the test now fails > on an assertion about recycling buffers. > To reproduce (with patch applied) > {{ant burn-testsome > -Dtest.name=org.apache.cassandra.utils.memory.LongBufferPoolTest > -Dtest.methods=testAllocate}} > Output > {{ [junit] Testcase: > testAllocate(org.apache.cassandra.utils.memory.LongBufferPoolTest): FAILED}} > {{ [junit] null}} > {{ [junit] junit.framework.AssertionFailedError}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$Debug.check(BufferPool.java:204)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool.assertAllRecycled(BufferPool.java:181)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:350)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:54)}} > All major branches from 3.0 and later have issues, however the trunk branch > also warns about references not being released before the reference is > garbage collected. > {{[junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:224 - > LEAK DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a) to @623704362 was > not released before the reference was garbage collected}} > {{ [junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:255 - > Allocate trace org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a:}} > {{ [junit] Thread[pool-2-thread-24,5,main]}} > {{ [junit] at java.lang.Thread.getStackTrace(Thread.java:1559)}} > {{ [junit] at > org.apache.cassandra.utils.concurrent.Ref$Debug.(Ref.java:245)}} > {{ [junit] at > org.apache.cassandra.utils.concurrent.Ref$State.(Ref.java:175)}} > {{ [junit] at org.apache.cassandra.utils.concurrent.Ref.(Ref.java:97)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$Chunk.setAttachment(BufferPool.java:663)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$Chunk.get(BufferPool.java:803)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$Chunk.get(BufferPool.java:793)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$LocalPool.get(BufferPool.java:388)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool.maybeTakeFromPool(BufferPool.java:143)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool.takeFromPool(BufferPool.java:115)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool.get(BufferPool.java:85)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest$3.allocate(LongBufferPoolTest.java:296)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest$3.testOne(LongBufferPoolTest.java:246)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:399)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:379)}} > {{ [junit] at java.util.concurrent.FutureTask.run(FutureTask.java:266)}} > {{ [junit] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}} > {{ [junit] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}} > {{ [junit] at java.lang.Thread.run(Thread.java:748)}} > > Perhaps the environment is not being set up correctly for the tests. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr.
[jira] [Commented] (CASSANDRA-14790) LongBufferPoolTest burn test fails assertion
[ https://issues.apache.org/jira/browse/CASSANDRA-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653759#comment-16653759 ] Benedict commented on CASSANDRA-14790: -- I still think you're reading it wrong. Perhaps we should hop on a call to discuss it? FWIW, outside of this being a bug or not, your proposed change is absolutely fine - and an improvement over the current code. So we could simply make the change, since it is trivial. But I don't believe this is a _bug_ (which, if the behaviour were as you describe, it would certainly be in my book, even one of low impact). So, to your example. T1 has _successfully_ called {{allocateMoreChunks}}, right? So at step 3, it goes to the beginning of the loop to poll. {{null}} is returned, _but_ it immediately goes to invoke {{allocateMoreChunks}} again. This time it either succeeds or doesn't. There is by definition no 'one last attempt' performed by any thread until it has _itself_ witnessed {{MEMORY_USAGE_THRESHOLD}} being exceeded. > LongBufferPoolTest burn test fails assertion > > > Key: CASSANDRA-14790 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14790 > Project: Cassandra > Issue Type: Test > Components: Testing > Environment: Run under macOS 10.13.6, with patch (attached, but also > https://github.com/jonmeredith/cassandra/tree/failing-burn-test) >Reporter: Jon Meredith >Assignee: Jon Meredith >Priority: Major > Labels: pull-request-available > Attachments: 0001-Add-burn-testsome-target-to-build.xml.patch, > 0002-Initialize-before-running-LongBufferPoolTest.patch > > Time Spent: 2h > Remaining Estimate: 0h > > The LongBufferPoolTest from the burn tests fails with an assertion error. I > added a build target to run individual burn tests, and \{jasobrown} gave a > fix for the uninitialized test setup (attached), however the test now fails > on an assertion about recycling buffers. > To reproduce (with patch applied) > {{ant burn-testsome > -Dtest.name=org.apache.cassandra.utils.memory.LongBufferPoolTest > -Dtest.methods=testAllocate}} > Output > {{ [junit] Testcase: > testAllocate(org.apache.cassandra.utils.memory.LongBufferPoolTest): FAILED}} > {{ [junit] null}} > {{ [junit] junit.framework.AssertionFailedError}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$Debug.check(BufferPool.java:204)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool.assertAllRecycled(BufferPool.java:181)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:350)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:54)}} > All major branches from 3.0 and later have issues, however the trunk branch > also warns about references not being released before the reference is > garbage collected. > {{[junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:224 - > LEAK DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a) to @623704362 was > not released before the reference was garbage collected}} > {{ [junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:255 - > Allocate trace org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a:}} > {{ [junit] Thread[pool-2-thread-24,5,main]}} > {{ [junit] at java.lang.Thread.getStackTrace(Thread.java:1559)}} > {{ [junit] at > org.apache.cassandra.utils.concurrent.Ref$Debug.(Ref.java:245)}} > {{ [junit] at > org.apache.cassandra.utils.concurrent.Ref$State.(Ref.java:175)}} > {{ [junit] at org.apache.cassandra.utils.concurrent.Ref.(Ref.java:97)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$Chunk.setAttachment(BufferPool.java:663)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$Chunk.get(BufferPool.java:803)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$Chunk.get(BufferPool.java:793)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$LocalPool.get(BufferPool.java:388)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool.maybeTakeFromPool(BufferPool.java:143)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool.takeFromPool(BufferPool.java:115)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool.get(BufferPool.java:85)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest$3.allocate(LongBufferPoolTest.java:296)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest$3.testOne(LongBufferPoolTest.java:246)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:399)}} > {{ [junit] at > org.apache.cassandra.utils.mem
[jira] [Comment Edited] (CASSANDRA-14790) LongBufferPoolTest burn test fails assertion
[ https://issues.apache.org/jira/browse/CASSANDRA-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653759#comment-16653759 ] Benedict edited comment on CASSANDRA-14790 at 10/17/18 3:51 PM: I still think you're reading it wrong. Perhaps we should hop on a call to discuss it? FWIW, outside of this being a bug or not, your proposed change is absolutely fine - and an improvement over the current code. So we could simply make the change, since it is trivial. But I don't believe this is a _bug_ (which, if the behaviour were as you describe, it would certainly be in my book, even one of low impact). So, to your example. T1 has _successfully_ called {{allocateMoreChunks}}, right? So at step 3, it goes to the beginning of the loop to poll. {{null}} is returned, _but_ it immediately goes to invoke {{allocateMoreChunks}} again (and again, etc, if necessary) There is by definition no 'one last attempt' performed by any thread until it has _itself_ witnessed {{MEMORY_USAGE_THRESHOLD}} being exceeded. was (Author: benedict): I still think you're reading it wrong. Perhaps we should hop on a call to discuss it? FWIW, outside of this being a bug or not, your proposed change is absolutely fine - and an improvement over the current code. So we could simply make the change, since it is trivial. But I don't believe this is a _bug_ (which, if the behaviour were as you describe, it would certainly be in my book, even one of low impact). So, to your example. T1 has _successfully_ called {{allocateMoreChunks}}, right? So at step 3, it goes to the beginning of the loop to poll. {{null}} is returned, _but_ it immediately goes to invoke {{allocateMoreChunks}} again. This time it either succeeds or doesn't. There is by definition no 'one last attempt' performed by any thread until it has _itself_ witnessed {{MEMORY_USAGE_THRESHOLD}} being exceeded. > LongBufferPoolTest burn test fails assertion > > > Key: CASSANDRA-14790 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14790 > Project: Cassandra > Issue Type: Test > Components: Testing > Environment: Run under macOS 10.13.6, with patch (attached, but also > https://github.com/jonmeredith/cassandra/tree/failing-burn-test) >Reporter: Jon Meredith >Assignee: Jon Meredith >Priority: Major > Labels: pull-request-available > Attachments: 0001-Add-burn-testsome-target-to-build.xml.patch, > 0002-Initialize-before-running-LongBufferPoolTest.patch > > Time Spent: 2h > Remaining Estimate: 0h > > The LongBufferPoolTest from the burn tests fails with an assertion error. I > added a build target to run individual burn tests, and \{jasobrown} gave a > fix for the uninitialized test setup (attached), however the test now fails > on an assertion about recycling buffers. > To reproduce (with patch applied) > {{ant burn-testsome > -Dtest.name=org.apache.cassandra.utils.memory.LongBufferPoolTest > -Dtest.methods=testAllocate}} > Output > {{ [junit] Testcase: > testAllocate(org.apache.cassandra.utils.memory.LongBufferPoolTest): FAILED}} > {{ [junit] null}} > {{ [junit] junit.framework.AssertionFailedError}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$Debug.check(BufferPool.java:204)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool.assertAllRecycled(BufferPool.java:181)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:350)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:54)}} > All major branches from 3.0 and later have issues, however the trunk branch > also warns about references not being released before the reference is > garbage collected. > {{[junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:224 - > LEAK DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a) to @623704362 was > not released before the reference was garbage collected}} > {{ [junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:255 - > Allocate trace org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a:}} > {{ [junit] Thread[pool-2-thread-24,5,main]}} > {{ [junit] at java.lang.Thread.getStackTrace(Thread.java:1559)}} > {{ [junit] at > org.apache.cassandra.utils.concurrent.Ref$Debug.(Ref.java:245)}} > {{ [junit] at > org.apache.cassandra.utils.concurrent.Ref$State.(Ref.java:175)}} > {{ [junit] at org.apache.cassandra.utils.concurrent.Ref.(Ref.java:97)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$Chunk.setAttachment(BufferPool.java:663)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$Chunk.get
[jira] [Commented] (CASSANDRA-14790) LongBufferPoolTest burn test fails assertion
[ https://issues.apache.org/jira/browse/CASSANDRA-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653734#comment-16653734 ] Jon Meredith commented on CASSANDRA-14790: -- [~benedict] I think we're reading it the same way - my argument was that it can cause buffers to be allocated from the heap when MEMORY_USAGE_THRESHOLD has not been exceeded yet. I'd describe it as a benign race rather than beneficial. The calling thread has to pay the price of allocating chunks that other threads stole and then an extra allocation which could possibly result in a blocking system call to get more memory. Instead allocateMoreChunks could return one of the chunks to it's caller and add one less chunks to the queue. I'm not even sure it's worth changing anything, but [~djoshi3] wanted to see what you thought about it. --8<-- Here's the example I wrote up before I read your comment more carefully. Start with no allocations from any of the thread local or buffer pools yet. CHUNK_SIZE=64 KiB MACRO_CHUNK_SIZE = 1024 KiB MEMORY_USAGE_THRESHOLD = 16384 KiB (for the unit test) 1) T1 calls BufferPool.get(1) and ends up in GlobalPool:get. chunks.poll returns null so it calls allocateMoreChunks which allocates a macro chunk, divides it up into 16 (1024KiB / 64KiB) Chunks that are added to BufferPool.GlobalPool.chunks. 2) Between the adding the last chunk and the 'one last attempt' to pull it in Chunk.get, 16 other calls to GlobalPool::get take place on other threads, emptying GlobalPool.chunks 3) T1 returns from allocateMoreChunks, back in Chunk::get chunks.poll() returns null and which gets passed up the call chain with the null causing a call to BufferPool.allocate which allocates memory outside of the pool, despite the current pool memory usage being at ~1MiB, which is less than the usage threshold and should have been satisfied by the pool. As I said, I don't think it's really a big deal as memory allocated outside the pool should be freed/garbage collected just fine and the buffer pool is just an optimization. It's also possible for T1, T2 to both arrive in allocateMoreBuffers with BufferPool.GlobalPool.chunk empty and cause harmless allocation of extra buffers, but it looks like it uses atomics to make sure the MEMORY_USAGE_THRESHOLD invariant isn't exceeded. > LongBufferPoolTest burn test fails assertion > > > Key: CASSANDRA-14790 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14790 > Project: Cassandra > Issue Type: Test > Components: Testing > Environment: Run under macOS 10.13.6, with patch (attached, but also > https://github.com/jonmeredith/cassandra/tree/failing-burn-test) >Reporter: Jon Meredith >Assignee: Jon Meredith >Priority: Major > Labels: pull-request-available > Attachments: 0001-Add-burn-testsome-target-to-build.xml.patch, > 0002-Initialize-before-running-LongBufferPoolTest.patch > > Time Spent: 2h > Remaining Estimate: 0h > > The LongBufferPoolTest from the burn tests fails with an assertion error. I > added a build target to run individual burn tests, and \{jasobrown} gave a > fix for the uninitialized test setup (attached), however the test now fails > on an assertion about recycling buffers. > To reproduce (with patch applied) > {{ant burn-testsome > -Dtest.name=org.apache.cassandra.utils.memory.LongBufferPoolTest > -Dtest.methods=testAllocate}} > Output > {{ [junit] Testcase: > testAllocate(org.apache.cassandra.utils.memory.LongBufferPoolTest): FAILED}} > {{ [junit] null}} > {{ [junit] junit.framework.AssertionFailedError}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$Debug.check(BufferPool.java:204)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool.assertAllRecycled(BufferPool.java:181)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:350)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:54)}} > All major branches from 3.0 and later have issues, however the trunk branch > also warns about references not being released before the reference is > garbage collected. > {{[junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:224 - > LEAK DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a) to @623704362 was > not released before the reference was garbage collected}} > {{ [junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:255 - > Allocate trace org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a:}} > {{ [junit] Thread[pool-2-thread-24,5,main]}} > {{ [junit] at java.lang.Thread.getStackTrace(Thread.java:1559)}} > {{ [junit] at > org.apache.c
[jira] [Commented] (CASSANDRA-14789) Configuring nodetool from a file
[ https://issues.apache.org/jira/browse/CASSANDRA-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653632#comment-16653632 ] Jan Karlsson commented on CASSANDRA-14789: -- {{I had a look at how we can do this, but was not impressed with the options we have. Airline does not seem to jive(no pun intended) well with going into the code to fetch defaults from a file. Overriding the different parameters in the abstract class does not seem to be too smooth. Doing it in the script calling Nodetool might be a little more clean. Sourcing in a file could allow to manipulate the ARGS variable by adding lines likes this:}} {{JMX_PORT=7199}} {{ARGS="$ARGS -h 127.0.0.2"}} {{There are a few concerns I have with this approach. Firstly, It might some security risks associated, but file permissions can help with that. Secondly, we will be practically requiring the user to provide lines of bash script. I would like to avoid that, but I am not sure how to do that without having a map of all the available options and grepping the ARGS parameter with each options.}} {{All in all, it might not be as bad, considering that this is an optional feature for more advanced use cases.}} {{This solution is definitely quite non intrusive, but it does mean that parameters could be placed twice into the command run. It is a little iffy but it should still work.}} > Configuring nodetool from a file > > > Key: CASSANDRA-14789 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14789 > Project: Cassandra > Issue Type: Improvement > Components: Tools >Reporter: Jan Karlsson >Assignee: Jan Karlsson >Priority: Minor > Fix For: 4.x > > > Nodetool has a lot of options that can be set. SSL can be configured through > a file[1], but most other parameters must be provided when running the > command. It would be helpful to be able to configure its parameters through a > file much like how cqlsh can be configured[2]. > > [1] https://issues.apache.org/jira/browse/CASSANDRA-9090 > [2] > [https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshUsingCqlshrc.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14798) Improve wording around partitioner selection
[ https://issues.apache.org/jira/browse/CASSANDRA-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653543#comment-16653543 ] Aaron Ploetz commented on CASSANDRA-14798: -- [~KurtG] Thanks for the tip! I applied your verbiage change, and updated the patch file. > Improve wording around partitioner selection > > > Key: CASSANDRA-14798 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14798 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Aaron Ploetz >Assignee: Aaron Ploetz >Priority: Trivial > Fix For: 4.0 > > Attachments: 14798-trunk.patch > > > Given some recent community interactions on Stack Overflow, Nate McCall asked > me provide some stronger wording on partitioner selection. Specifically, in > further discouraging people from using the other partitioners (namely, the > ByteOrderedPartitioner). > Right now, this is the language that I'm leaning toward: > {{# The partitioner is responsible for distributing groups of rows (by}} > {{# partition key) across nodes in the cluster. The partitioner can NOT be}} > {{# changed without reloading all data. If you are upgrading, you should set > this}} > {{# to the same partitioner that you are currently using.}} > {{#}} > {{# The default partitioner is the Murmur3Partitioner. Older partitioners}} > {{# such as the RandomPartitioner, ByteOrderedPartitioner, and}} > {{# OrderPreservingPartitioner have been included for backward compatibility > only.}} > {{# For new clusters, you should NOT change this value.}} > {{#}} > {{partitioner: org.apache.cassandra.dht.Murmur3Partitioner }} > I'm open to suggested improvements. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14798) Improve wording around partitioner selection
[ https://issues.apache.org/jira/browse/CASSANDRA-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Ploetz updated CASSANDRA-14798: - Attachment: (was: 14798-trunk.txt) > Improve wording around partitioner selection > > > Key: CASSANDRA-14798 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14798 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Aaron Ploetz >Assignee: Aaron Ploetz >Priority: Trivial > Fix For: 4.0 > > Attachments: 14798-trunk.patch > > > Given some recent community interactions on Stack Overflow, Nate McCall asked > me provide some stronger wording on partitioner selection. Specifically, in > further discouraging people from using the other partitioners (namely, the > ByteOrderedPartitioner). > Right now, this is the language that I'm leaning toward: > {{# The partitioner is responsible for distributing groups of rows (by}} > {{# partition key) across nodes in the cluster. The partitioner can NOT be}} > {{# changed without reloading all data. If you are upgrading, you should set > this}} > {{# to the same partitioner that you are currently using.}} > {{#}} > {{# The default partitioner is the Murmur3Partitioner. Older partitioners}} > {{# such as the RandomPartitioner, ByteOrderedPartitioner, and}} > {{# OrderPreservingPartitioner have been included for backward compatibility > only.}} > {{# For new clusters, you should NOT change this value.}} > {{#}} > {{partitioner: org.apache.cassandra.dht.Murmur3Partitioner }} > I'm open to suggested improvements. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14798) Improve wording around partitioner selection
[ https://issues.apache.org/jira/browse/CASSANDRA-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Ploetz updated CASSANDRA-14798: - Attachment: 14798-trunk.patch > Improve wording around partitioner selection > > > Key: CASSANDRA-14798 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14798 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Aaron Ploetz >Assignee: Aaron Ploetz >Priority: Trivial > Fix For: 4.0 > > Attachments: 14798-trunk.patch > > > Given some recent community interactions on Stack Overflow, Nate McCall asked > me provide some stronger wording on partitioner selection. Specifically, in > further discouraging people from using the other partitioners (namely, the > ByteOrderedPartitioner). > Right now, this is the language that I'm leaning toward: > {{# The partitioner is responsible for distributing groups of rows (by}} > {{# partition key) across nodes in the cluster. The partitioner can NOT be}} > {{# changed without reloading all data. If you are upgrading, you should set > this}} > {{# to the same partitioner that you are currently using.}} > {{#}} > {{# The default partitioner is the Murmur3Partitioner. Older partitioners}} > {{# such as the RandomPartitioner, ByteOrderedPartitioner, and}} > {{# OrderPreservingPartitioner have been included for backward compatibility > only.}} > {{# For new clusters, you should NOT change this value.}} > {{#}} > {{partitioner: org.apache.cassandra.dht.Murmur3Partitioner }} > I'm open to suggested improvements. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14826) cassandra spinning forever on 1 thread while initializing keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-14826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653500#comment-16653500 ] ingard mevåg commented on CASSANDRA-14826: -- [~KurtG] yeah sure. In this case we won't add any more nodes, and it works ok for the use case, but what should be improved is the single threaded whatever-is-going-on that takes over 1 hour on startup :) > cassandra spinning forever on 1 thread while initializing keyspace > -- > > Key: CASSANDRA-14826 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14826 > Project: Cassandra > Issue Type: Bug >Reporter: ingard mevåg >Assignee: Marcus Eriksson >Priority: Major > Attachments: Screen Shot 2018-10-16 at 14.51.27.png > > > When starting cassandra 3.0.17 it takes a long time to initialize a keyspace. > top shows 1 thread spinning at 100% cpu. Thread dump shows: > {code:java} > "main" - Thread t@1 > java.lang.Thread.State: RUNNABLE > at java.util.TimSort.mergeHi(TimSort.java:850) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeCollapse(TimSort.java:441) > at java.util.TimSort.sort(TimSort.java:245) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.cassandra.db.compaction.LeveledManifest.canAddSSTable(LeveledManifest.java:243) > at > org.apache.cassandra.db.compaction.LeveledManifest.add(LeveledManifest.java:146) > - locked (a > org.apache.cassandra.db.compaction.LeveledManifest) > at > org.apache.cassandra.db.compaction.LeveledCompactionStrategy.addSSTable(LeveledCompactionStrategy.java:298) > at > org.apache.cassandra.db.compaction.CompactionStrategyManager.startup(CompactionStrategyManager.java:135) > at > org.apache.cassandra.db.compaction.CompactionStrategyManager.reload(CompactionStrategyManager.java:187) > - locked <742bfce7> (a > org.apache.cassandra.db.compaction.CompactionStrategyManager) > at > org.apache.cassandra.db.compaction.CompactionStrategyManager.(CompactionStrategyManager.java:75) > at > org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:408) > at > org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:363) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:579) > - locked <4e4f20d2> (a java.lang.Class) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:556) > at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.(Keyspace.java:305) > at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129) > - locked <5318346c> (a java.lang.Class) > at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:262) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:697) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14821) Make it possible to run multi-node coordinator/replica tests in a single JVM
[ https://issues.apache.org/jira/browse/CASSANDRA-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653450#comment-16653450 ] Stefan Podkowinski commented on CASSANDRA-14821: Great, let's wrap up testing approaches then in CASSANDRA-14830 and see if we can create a guideline to help people to understand which approach to use for which kind of test case. We should have a set of different solutions by now that allows writing tests at different abstraction levels, which is good IMO. > Make it possible to run multi-node coordinator/replica tests in a single JVM > > > Key: CASSANDRA-14821 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14821 > Project: Cassandra > Issue Type: Test >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Major > > Currently, dtests are complex to write, hard to modify and slow to run. The > only option to manipulate a cluster state is either to shut down nodes or run > unreliable Byteman queries. > In order to improve the situation, a new Distributed Tester is proposed. It > fires up multiple Cassandra Instances in a single JVM. It is done through > having distinct class loaders in order to work around the singleton problem > in Cassandra. In order to be able to pass some information between the nodes, > a common class loader is used that loads up java standard library and several > helper classes. Tests look a lot like CQLTester tests would usually look like. > Each Cassandra Instance, with its distinct class loader is using > serialisation and class loading mechanisms in order to run instance-local > queries and execute node state manipulation code, hooks, callbacks etc. > First version mocks out Messaging Service and simplifies schema management by > simply running schema change commands on each of the instances separately. > Internode communication is mocked by passing ByteBuffers through shared class > loader. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14830) Update testing docs and provide high-level guide
Stefan Podkowinski created CASSANDRA-14830: -- Summary: Update testing docs and provide high-level guide Key: CASSANDRA-14830 URL: https://issues.apache.org/jira/browse/CASSANDRA-14830 Project: Cassandra Issue Type: New Feature Components: Documentation and Website Reporter: Stefan Podkowinski Assignee: Stefan Podkowinski We already have a some details on [testing|https://cassandra.apache.org/doc/latest/development/testing.html] as part of our contribution pages. Lets update it with more recently added testing frameworks and also add a general overview on the various approaches and which one works best for the kind of tests someone would want to write. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14829) Make stop-server.bat wait for Cassandra to terminate
Georg Dietrich created CASSANDRA-14829: -- Summary: Make stop-server.bat wait for Cassandra to terminate Key: CASSANDRA-14829 URL: https://issues.apache.org/jira/browse/CASSANDRA-14829 Project: Cassandra Issue Type: Improvement Components: Packaging Environment: Windows 10 Reporter: Georg Dietrich Fix For: 3.11.x, 4.x, 4.0.x While administering a single node Cassandra on Windows, I noticed that the stop-server.bat script returns before the cassandra process has actually terminated. For use cases like creating a script "shut down & create backup of data directory without having to worry about open files, then restart", it would be good to make stop-server.bat wait for Cassandra to terminate. All that is needed for that is to change in apache-cassandra-3.11.3\bin\stop-server.bat "start /B powershell /file ..." to "start /WAIT /B powershell /file ..." (additional /WAIT parameter). Does this sound reasonable? If yes, I'll create a pull request. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14821) Make it possible to run multi-node coordinator/replica tests in a single JVM
[ https://issues.apache.org/jira/browse/CASSANDRA-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653424#comment-16653424 ] Alex Petrov commented on CASSANDRA-14821: - [~spo...@gmail.com] MessagingService mocks are great for the purposes you've listed in the original ticket (asserting that the outgoing messages are correct and a node proceeds correctly on response). In my view, intention here is rather orthogonal to MS mocks, since we're trying to capture end-to-end behaviour and interaction of multiple nodes (potentially with in different states / having different data). We can start off a wider discussion separately on how we could use both approaches together or which tests we could write that would use MS mocks, as both approaches might be useful for verifying different behaviours. > Make it possible to run multi-node coordinator/replica tests in a single JVM > > > Key: CASSANDRA-14821 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14821 > Project: Cassandra > Issue Type: Test >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Major > > Currently, dtests are complex to write, hard to modify and slow to run. The > only option to manipulate a cluster state is either to shut down nodes or run > unreliable Byteman queries. > In order to improve the situation, a new Distributed Tester is proposed. It > fires up multiple Cassandra Instances in a single JVM. It is done through > having distinct class loaders in order to work around the singleton problem > in Cassandra. In order to be able to pass some information between the nodes, > a common class loader is used that loads up java standard library and several > helper classes. Tests look a lot like CQLTester tests would usually look like. > Each Cassandra Instance, with its distinct class loader is using > serialisation and class loading mechanisms in order to run instance-local > queries and execute node state manipulation code, hooks, callbacks etc. > First version mocks out Messaging Service and simplifies schema management by > simply running schema change commands on each of the instances separately. > Internode communication is mocked by passing ByteBuffers through shared class > loader. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14821) Make it possible to run multi-node coordinator/replica tests in a single JVM
[ https://issues.apache.org/jira/browse/CASSANDRA-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653361#comment-16653361 ] Stefan Podkowinski commented on CASSANDRA-14821: As already mentioned, I really understand the motivation behind this and the intention to replace dtests with a JVM based solution in the long run. My approach to that was implementing CASSANDRA-12016 a little more than two years ago. The main assumption behind it was that tests would be less complex and could be run much more efficiently, by always focusing on a single node and mocking it's interactions with a simulated cluster, instead of actually running separate Cassandra instances in parallel and manipulating instances to exchange messages in a certain way for a particular test case. Some tests implemented based on that approach turned out to be promising. But there are certainly also limitations and I hope that looking at the solution provided as part of this ticket, would help me to learn more about use cases when tests could not be fully simulated (for functional or practical reasons) and really need to be run as multi instance integration tests. But that's just my personal curiosity. What we should do nonetheless, is to eventually update the documentation and describe which testing approach would fit best for which use case (dtest py/jvm, unit, message mocking, property based, ..). I'd be happy to do that, once this ticket is done. > Make it possible to run multi-node coordinator/replica tests in a single JVM > > > Key: CASSANDRA-14821 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14821 > Project: Cassandra > Issue Type: Test >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Major > > Currently, dtests are complex to write, hard to modify and slow to run. The > only option to manipulate a cluster state is either to shut down nodes or run > unreliable Byteman queries. > In order to improve the situation, a new Distributed Tester is proposed. It > fires up multiple Cassandra Instances in a single JVM. It is done through > having distinct class loaders in order to work around the singleton problem > in Cassandra. In order to be able to pass some information between the nodes, > a common class loader is used that loads up java standard library and several > helper classes. Tests look a lot like CQLTester tests would usually look like. > Each Cassandra Instance, with its distinct class loader is using > serialisation and class loading mechanisms in order to run instance-local > queries and execute node state manipulation code, hooks, callbacks etc. > First version mocks out Messaging Service and simplifies schema management by > simply running schema change commands on each of the instances separately. > Internode communication is mocked by passing ByteBuffers through shared class > loader. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14821) Make it possible to run multi-node coordinator/replica tests in a single JVM
[ https://issues.apache.org/jira/browse/CASSANDRA-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653275#comment-16653275 ] Alex Petrov commented on CASSANDRA-14821: - Thank you [~spo...@gmail.com] for your feedback. I've added some more complex tests to make it more illustrative: one is illustrative for [CASSANDRA-13004] and one for failing read/repair (when node drops read-repair message). I wouldn't say that this is work in progress even by the time of first submission, but I agree that some additional description to get people more excited about it was due. I hoped it was covered in the initial issue description, but I'm also happy to elaborate. This testing framework will help us to introduce things that are otherwise difficult to reproduce. For example [CASSANDRA-13304] was quite hard to catch and only thanks to folks who have submitted a go program that would help to reproduce it we were able to catch it. Creating a dtest test for it would've been rather difficult. Similarly, read/repair dropping messages would've required to introduce Byteman scripts, which are much less reliable than direct code execution. Hope the latest version and this comment will be more helpful in terms of our motivation. > Make it possible to run multi-node coordinator/replica tests in a single JVM > > > Key: CASSANDRA-14821 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14821 > Project: Cassandra > Issue Type: Test >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Major > > Currently, dtests are complex to write, hard to modify and slow to run. The > only option to manipulate a cluster state is either to shut down nodes or run > unreliable Byteman queries. > In order to improve the situation, a new Distributed Tester is proposed. It > fires up multiple Cassandra Instances in a single JVM. It is done through > having distinct class loaders in order to work around the singleton problem > in Cassandra. In order to be able to pass some information between the nodes, > a common class loader is used that loads up java standard library and several > helper classes. Tests look a lot like CQLTester tests would usually look like. > Each Cassandra Instance, with its distinct class loader is using > serialisation and class loading mechanisms in order to run instance-local > queries and execute node state manipulation code, hooks, callbacks etc. > First version mocks out Messaging Service and simplifies schema management by > simply running schema change commands on each of the instances separately. > Internode communication is mocked by passing ByteBuffers through shared class > loader. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra-builds git commit: Use new docker image for dtests
Repository: cassandra-builds Updated Branches: refs/heads/master 82dbe9d4f -> d0f61be1c Use new docker image for dtests Project: http://git-wip-us.apache.org/repos/asf/cassandra-builds/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra-builds/commit/d0f61be1 Tree: http://git-wip-us.apache.org/repos/asf/cassandra-builds/tree/d0f61be1 Diff: http://git-wip-us.apache.org/repos/asf/cassandra-builds/diff/d0f61be1 Branch: refs/heads/master Commit: d0f61be1c978b4342849e97bf4e06d8086cd9c27 Parents: 82dbe9d Author: Stefan Podkowinski Authored: Wed Oct 17 10:24:55 2018 +0200 Committer: Stefan Podkowinski Committed: Wed Oct 17 10:24:55 2018 +0200 -- jenkins-dsl/cassandra_job_dsl_seed.groovy | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra-builds/blob/d0f61be1/jenkins-dsl/cassandra_job_dsl_seed.groovy -- diff --git a/jenkins-dsl/cassandra_job_dsl_seed.groovy b/jenkins-dsl/cassandra_job_dsl_seed.groovy index 71a0d9d..79fe708 100644 --- a/jenkins-dsl/cassandra_job_dsl_seed.groovy +++ b/jenkins-dsl/cassandra_job_dsl_seed.groovy @@ -50,7 +50,7 @@ def dtestTargets = ['dtest', 'dtest-novnode', 'dtest-offheap', 'dtest-large'] if(binding.hasVariable("CASSANDRA_DTEST_TEST_TARGETS")) { dtestTargets = "${CASSANDRA_DTEST_TEST_TARGETS}".split(",") } -def dtestDockerImage = 'kjellman/cassandra-test:0.4.4' +def dtestDockerImage = 'spod/cassandra-testing-ubuntu18-java11' if(binding.hasVariable("CASSANDRA_DOCKER_IMAGE")) { dtestDockerImage = "${CASSANDRA_DOCKER_IMAGE}" } - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14790) LongBufferPoolTest burn test fails assertion
[ https://issues.apache.org/jira/browse/CASSANDRA-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653174#comment-16653174 ] Benedict commented on CASSANDRA-14790: -- [~jmeredithco] I haven't looked too closely at this, but I _think_ you may be reading this backwards. If {{allocateMoreChunks}} is successful, the loop continues and a 'safe' attempt to read the result is made - if a race occurs here, {{allocateMoreChunks}} will be invoked again. On the branch you pointed to, it would be acceptable to simply {{return null}}, immediately requiring the caller to allocate on heap, but as the comment suggests, we are polling the queue of chunks just in case we have had a _beneficial_ race wherein somebody has deposited a recycled chunk during our invocation of {{allocateMoreChunks}} > LongBufferPoolTest burn test fails assertion > > > Key: CASSANDRA-14790 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14790 > Project: Cassandra > Issue Type: Test > Components: Testing > Environment: Run under macOS 10.13.6, with patch (attached, but also > https://github.com/jonmeredith/cassandra/tree/failing-burn-test) >Reporter: Jon Meredith >Assignee: Jon Meredith >Priority: Major > Labels: pull-request-available > Attachments: 0001-Add-burn-testsome-target-to-build.xml.patch, > 0002-Initialize-before-running-LongBufferPoolTest.patch > > Time Spent: 2h > Remaining Estimate: 0h > > The LongBufferPoolTest from the burn tests fails with an assertion error. I > added a build target to run individual burn tests, and \{jasobrown} gave a > fix for the uninitialized test setup (attached), however the test now fails > on an assertion about recycling buffers. > To reproduce (with patch applied) > {{ant burn-testsome > -Dtest.name=org.apache.cassandra.utils.memory.LongBufferPoolTest > -Dtest.methods=testAllocate}} > Output > {{ [junit] Testcase: > testAllocate(org.apache.cassandra.utils.memory.LongBufferPoolTest): FAILED}} > {{ [junit] null}} > {{ [junit] junit.framework.AssertionFailedError}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$Debug.check(BufferPool.java:204)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool.assertAllRecycled(BufferPool.java:181)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:350)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:54)}} > All major branches from 3.0 and later have issues, however the trunk branch > also warns about references not being released before the reference is > garbage collected. > {{[junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:224 - > LEAK DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a) to @623704362 was > not released before the reference was garbage collected}} > {{ [junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:255 - > Allocate trace org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a:}} > {{ [junit] Thread[pool-2-thread-24,5,main]}} > {{ [junit] at java.lang.Thread.getStackTrace(Thread.java:1559)}} > {{ [junit] at > org.apache.cassandra.utils.concurrent.Ref$Debug.(Ref.java:245)}} > {{ [junit] at > org.apache.cassandra.utils.concurrent.Ref$State.(Ref.java:175)}} > {{ [junit] at org.apache.cassandra.utils.concurrent.Ref.(Ref.java:97)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$Chunk.setAttachment(BufferPool.java:663)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$Chunk.get(BufferPool.java:803)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$Chunk.get(BufferPool.java:793)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool$LocalPool.get(BufferPool.java:388)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool.maybeTakeFromPool(BufferPool.java:143)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool.takeFromPool(BufferPool.java:115)}} > {{ [junit] at > org.apache.cassandra.utils.memory.BufferPool.get(BufferPool.java:85)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest$3.allocate(LongBufferPoolTest.java:296)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest$3.testOne(LongBufferPoolTest.java:246)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:399)}} > {{ [junit] at > org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:379)}} > {{ [junit] at java.util.concurrent.FutureTask.run(FutureTask.java:266)}} > {{ [junit] at > java.util.concurrent.ThreadPoolExecutor.runWorker(Thre
[jira] [Commented] (CASSANDRA-14631) Add RSS support for Cassandra blog
[ https://issues.apache.org/jira/browse/CASSANDRA-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653166#comment-16653166 ] Jacques-Henri Berthemet commented on CASSANDRA-14631: - I confirm it's working, thank you! > Add RSS support for Cassandra blog > -- > > Key: CASSANDRA-14631 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14631 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Jacques-Henri Berthemet >Assignee: Jeff Beck >Priority: Major > Labels: blog > Attachments: 14631-site.txt, Screen Shot 2018-08-17 at 5.32.08 > PM.png, Screen Shot 2018-08-17 at 5.32.25 PM.png, feed404.png > > > It would be convenient to add RSS support to Cassandra blog: > [http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html] > And maybe also for other resources like new versions, but this ticket is > about blog. > > {quote}From: Scott Andreas > Sent: Wednesday, August 08, 2018 6:53 PM > To: [d...@cassandra.apache.org|mailto:d...@cassandra.apache.org] > Subject: Re: Apache Cassandra Blog is now live > > Please feel free to file a ticket (label: Documentation and Website). > > It looks like Jekyll, the static site generator used to build the website, > has a plugin that generates Atom feeds if someone would like to work on > adding one: [https://github.com/jekyll/jekyll-feed] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14142) logs directory for gc.log doesn't exist on first start
[ https://issues.apache.org/jira/browse/CASSANDRA-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653135#comment-16653135 ] Alex Ott commented on CASSANDRA-14142: -- [~kirktrue] - I can rebase my patch to the trunk. Under the proper patch you mean to generate a diff & attach it? Should I also add the entry to CHANGES.txt? Anything else? > logs directory for gc.log doesn't exist on first start > -- > > Key: CASSANDRA-14142 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14142 > Project: Cassandra > Issue Type: Bug > Components: Configuration > Environment: Unix & Windows environments, when starting freshly > downloaded tarball >Reporter: Alex Ott >Priority: Trivial > Labels: lhf > > This was originally reported at > https://stackoverflow.com/questions/47976248/gc-log-file-error-when-running-cassandra. > This is very minor problem related to timing of 'logs' directory creation - > when Cassandra starts first time, this directory doesn't exist, and created > when Cassandra starts to write system.log & debug.log files. But this > directory is referenced in the -Xloggc command line parameter of JVM, causing > following warning: > {{Java HotSpot(TM) 64-Bit Server VM warning: Cannot open file > bin/../logs/gc.log due to No such file or directory}} > The fix is to check existence of this directory in the cassandra-env, and > create it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org