[jira] [Comment Edited] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542961#comment-16542961 ] Dimitar Dimitrov edited comment on CASSANDRA-13938 at 9/11/18 5:52 AM: --- {quote}The problem is that when {{CompressedInputStream#position()}} is called, the new position might be in the middle of a buffer. We need to remember that offset, and subtract that value when updating {{current}} in {{#reBuffer(boolean)}}. The resaon why is that those offset bytes get double counted on the first call to {{#reBuffer()}} after {{#position()}} as we add the {{buffer.position()}} to {{current}}. {{current}} already accounts for those offset bytes when {{#position()}} was called. {quote} [~jasobrown], isn't that equivalent (although a bit more complex) to just setting {{current}} to the last reached/read position in the stream when rebuffering? (i.e. {{current = streamOffset + buffer.position()}}). I might be missing something, but the role of {{currentBufferOffset}} seems to be solely to "align" {{current}} and {{streamOffset}} the first time after a new section is started. Then {{current += buffer.position() - currentBufferOffset}} expands to {{current = -current- + buffer.position() + streamOffset - -current- }} which is the same as {{current = streamOffset + buffer.position()}}. After that first time, {{current}} naturally follows {{streamOffset}} without the need of any adjustment, but it seems more natural to express this as {{streamOffset + buffer.position()}} instead of the new expression or the old {{current + buffer.position()}}. To me, it's also a bit more intuitive and easier to understand (hopefully it's also right in addition to intuitive :)). The equivalence above would hold true if {{current}} and {{streamOffset}} don't change their value in the meantime, but I think this is ensured by the well-ordered sequential fashion in which the decompressing and the offset bookkeeping functionality of {{CompressedInputStream}} happen in the thread running the corresponding {{StreamDeserializingTask}}. * The aforementioned well-ordered sequential fashion seems to be POSITION followed by 0-N times REBUFFER + DECOMPRESS, where the first REBUFFER might not update {{current}} with the above calculation in case {{current}} is already too far ahead (i.e. the new section is not starting within the current buffer). was (Author: dimitarndimitrov): {quote}The problem is that when {{CompressedInputStream#position()}} is called, the new position might be in the middle of a buffer. We need to remember that offset, and subtract that value when updating {{current}} in {{#reBuffer(boolean)}}. The resaon why is that those offset bytes get double counted on the first call to {{#reBuffer()}} after {{#position()}} as we add the {{buffer.position()}} to {{current}}. {{current}} already accounts for those offset bytes when {{#position()}} was called. {quote} [~jasobrown], isn't that equivalent (although a bit more complex) to just setting {{current}} to the last reached/read position in the stream when rebuffering? (i.e. {{current = streamOffset + buffer.position()}}). I might be missing something, but the role of {{currentBufferOffset}} seems to be solely to "align" {{current}} and {{streamOffset}} the first time after a new section is started. Then {{current += buffer.position() - currentBufferOffse expands to }}{{current = -current- + buffer.position() + streamOffset - -current- }}which is the same as {{current = streamOffset + buffer.position()}}. After that first time, {{current}} naturally follows {{streamOffset}} without the need of any adjustment, but it seems more natural to express this as {{streamOffset + buffer.position()}} instead of the new expression or the old {{current + buffer.position()}}. To me, it's also a bit more intuitive and easier to understand (hopefully it's also right in addition to intuitive :)). The equivalence above would hold true if {{current}} and {{streamOffset}} don't change their value in the meantime, but I think this is ensured by the well-ordered sequential fashion in which the decompressing and the offset bookkeeping functionality of {{CompressedInputStream}} happen in the thread running the corresponding {{StreamDeserializingTask}}. * The aforementioned well-ordered sequential fashion seems to be POSITION followed by 0-N times REBUFFER + DECOMPRESS, where the first REBUFFER might not update {{current}} with the above calculation in case {{current}} is already too far ahead (i.e. the new section is not starting within the current buffer). > Default repair is broken, crashes other nodes participating in repair (in > trunk) > > > Key: CASSANDRA-13938 > URL:
[jira] [Commented] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client
[ https://issues.apache.org/jira/browse/CASSANDRA-14715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610127#comment-16610127 ] Cameron Zemek commented on CASSANDRA-14715: --- I should also point out this means that the timeouts don't get captured in the read timeout metric either due to the timeout occuring on the close for the PartitionIterator returned by StorageProxy:read where the timeouts are caught (see readRegular) > Read repairs can result in bogus timeout errors to the client > - > > Key: CASSANDRA-14715 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Cameron Zemek >Priority: Minor > > In RepairMergeListener:close() it does the following: > > {code:java} > try > { > FBUtilities.waitOnFutures(repairResults, > DatabaseDescriptor.getWriteRpcTimeout()); > } > catch (TimeoutException ex) > { > // We got all responses, but timed out while repairing > int blockFor = consistency.blockFor(keyspace); > if (Tracing.isTracing()) > Tracing.trace("Timed out while read-repairing after receiving all {} > data and digest responses", blockFor); > else > logger.debug("Timeout while read-repairing after receiving all {} > data and digest responses", blockFor); > throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); > } > {code} > This propagates up and gets sent to the client and we have customers get > confused cause they see timeouts for CL ALL requiring ALL replicas even > though they have read_repair_chance = 0 and using a LOCAL_* CL. > At minimum I suggest instead of using the consistency level of DataResolver > (which is always ALL with read repairs) for the timeout it instead use > repairResults.size(). That is blockFor = repairResults.size() . But saying it > received _blockFor - 1_ is bogus still. Fixing that would require more > changes. I was thinking maybe like so: > > {code:java} > public static void waitOnFutures(List results, long ms, > MutableInt counter) throws TimeoutException > { > for (AsyncOneResponse result : results) > { > result.get(ms, TimeUnit.MILLISECONDS); > counter.increment(); > } > } > {code} > > > > Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says > _blockFor - 1_ for how many were received, which is also bogus. > > Steps used to reproduce was modify RepairMergeListener:close() to always > throw timeout exception. With schema: > {noformat} > CREATE KEYSPACE weather WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; > CREATE TABLE weather.city ( > cityid int PRIMARY KEY, > name text > ) WITH bloom_filter_fp_chance = 0.01 > AND dclocal_read_repair_chance = 0.0 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {noformat} > Then using the following steps: > # ccm node1 cqlsh > # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); > # exit; > # ccm node1 flush > # ccm node1 stop > # rm -rf > ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* > # remove the sstable with the insert > # ccm node1 start > # ccm node1 cqlsh > # CONSISTENCY LOCAL_QUORUM; > # select * from weather.city where cityid = 1; > You get result of: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 5 > responses." info={'received_responses': 5, 'required_responses': 6, > 'consistency': 'ALL'}{noformat} > But was expecting: > {noformat} > ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting > for replica nodes' responses] message="Operation timed out - received only 1 > responses." info={'received_responses': 1, 'required_responses': 2, > 'consistency': 'LOCAL_QUORUM'}{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14702) Cassandra Write failed even when the required nodes to Ack(consistency) are up.
[ https://issues.apache.org/jira/browse/CASSANDRA-14702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610103#comment-16610103 ] Rohit Singh commented on CASSANDRA-14702: - Any update? > Cassandra Write failed even when the required nodes to Ack(consistency) are > up. > --- > > Key: CASSANDRA-14702 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14702 > Project: Cassandra > Issue Type: Bug >Reporter: Rohit Singh >Priority: Major > > Hi, > We have following configuration in our project for cassandra. > Total nodes in Cluster-5 > Replication Factor- 3 > Consistency- LOCAL_QUORUM > We get the writetimeout exception from cassandra even when 2 nodes are up and > why does stack trace says that 3 replica were required when consistency is 2? > Below is the exception we got:- > com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout > during write query at consistency LOCAL_QUORUM (3 replica were required but > only 2 acknowledged the write) > at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:59) > at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37) > at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:289) > at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:269) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610105#comment-16610105 ] Jason Brown commented on CASSANDRA-13938: - [~dimitarndimitrov], Thanks for your comments, and apologies for the late response. While your proposed simplification indeed clarifies the logic, unfortunately it doesn't resolve the bug (my dtest still fails - this is due to the need to reset a 'some' value, like the currentBufferOffset, after rebufferring). However, your observation about simplifying this patch (in particular eliminate {{currentBufferOffset}} made me reconsider the needs of this class. Basically, we just need to correctly track the streamOffset for the current buffer, and that's all. When I ported this clas from 3.11, I over-complicated the offsets and counters into the first version of this class (committed with CASSANDRA-12229), and then confused it again (while resolving the error) with the first patch. In short: as long as I correctly calculate streamOffset, that should satisfy the needs for the class. Thus, I eliminated both {{current}} and {{currentBufferOffset}}, and the result is clearer and correct. I've pushed a cleaned up branch (which has been rebased to trunk). Please note that, as with the first patch, the majority of this patch is refactoring to clean up the class in general. I've also updated my dtest patch as my version required a stress profile (based on [~zznate]'s original) to be committed, as well. (Note: my dtest branch also includes [~pauloricardomg]'s patch, but, as before, I'm unable to get that to fail on trunk.) > Default repair is broken, crashes other nodes participating in repair (in > trunk) > > > Key: CASSANDRA-13938 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13938 > Project: Cassandra > Issue Type: Bug > Components: Repair >Reporter: Nate McCall >Assignee: Jason Brown >Priority: Critical > Fix For: 4.x > > Attachments: 13938.yaml, test.sh > > > Running through a simple scenario to test some of the new repair features, I > was not able to make a repair command work. Further, the exception seemed to > trigger a nasty failure state that basically shuts down the netty connections > for messaging *and* CQL on the nodes transferring back data to the node being > repaired. The following steps reproduce this issue consistently. > Cassandra stress profile (probably not necessary, but this one provides a > really simple schema and consistent data shape): > {noformat} > keyspace: standard_long > keyspace_definition: | > CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', > 'replication_factor':3}; > table: test_data > table_definition: | > CREATE TABLE test_data ( > key text, > ts bigint, > val text, > PRIMARY KEY (key, ts) > ) WITH COMPACT STORAGE AND > CLUSTERING ORDER BY (ts DESC) AND > bloom_filter_fp_chance=0.01 AND > caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND > comment='' AND > dclocal_read_repair_chance=0.00 AND > gc_grace_seconds=864000 AND > read_repair_chance=0.00 AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND > compression={'sstable_compression': 'LZ4Compressor'}; > columnspec: > - name: key > population: uniform(1..5000) # 50 million records available > - name: ts > cluster: gaussian(1..50) # Up to 50 inserts per record > - name: val > population: gaussian(128..1024) # varrying size of value data > insert: > partitions: fixed(1) # only one insert per batch for individual partitions > select: fixed(1)/1 # each insert comes in one at a time > batchtype: UNLOGGED > queries: > single: > cql: select * from test_data where key = ? and ts = ? limit 1; > series: > cql: select key,ts,val from test_data where key = ? limit 10; > {noformat} > The commands to build and run: > {noformat} > ccm create 4_0_test -v git:trunk -n 3 -s > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4 > # flush the memtable just to get everything on disk > ccm node1 nodetool flush > ccm node2 nodetool flush > ccm node3 nodetool flush > # disable hints for nodes 2 and 3 > ccm node2 nodetool disablehandoff > ccm node3 nodetool disablehandoff > # stop node1 > ccm node1 stop > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4 > # wait 10 seconds > ccm node1 start > # Note that we are local to ccm's nodetool install 'cause repair preview is > not reported yet > node1/bin/nodetool repair --preview > node1/bin/nodetool repair standard_long test_data > {noformat} > The error outputs from the last
[jira] [Comment Edited] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610105#comment-16610105 ] Jason Brown edited comment on CASSANDRA-13938 at 9/11/18 5:01 AM: -- [~dimitarndimitrov], Thanks for your comments, and apologies for the late response. While your proposed simplification indeed clarifies the logic, unfortunately it doesn't resolve the bug (my dtest still fails - this is due to the need to reset a 'some' value, like the currentBufferOffset, after rebufferring). However, your observation about simplifying this patch (in particular eliminate {{currentBufferOffset}} made me reconsider the needs of this class. Basically, we just need to correctly track the streamOffset for the current buffer, and that's all. When I ported this clas from 3.11, I over-complicated the offsets and counters into the first version of this class (committed with CASSANDRA-12229), and then confused it again (while resolving the error) with the first patch. In short: as long as I correctly calculate streamOffset, that should satisfy the needs for the class. Thus, I eliminated both {{current}} and {{currentBufferOffset}}, and the result is clearer and correct. I've pushed a cleaned up branch (which has been rebased to trunk). Please note that, as with the first patch, the majority of this patch is refactoring to clean up the class in general. I've also updated my dtest patch as my version required a stress profile (based on [~zznate]'s original) to be committed, as well. (Note: my dtest branch also includes [~pauloricardomg]'s patch, but, as before, I'm unable to get that to fail on trunk.) was (Author: jasobrown): [~dimitarndimitrov], Thanks for your comments, and apologies for the late response. While your proposed simplification indeed clarifies the logic, unfortunately it doesn't resolve the bug (my dtest still fails - this is due to the need to reset a 'some' value, like the currentBufferOffset, after rebufferring). However, your observation about simplifying this patch (in particular eliminate {{currentBufferOffset}} made me reconsider the needs of this class. Basically, we just need to correctly track the streamOffset for the current buffer, and that's all. When I ported this clas from 3.11, I over-complicated the offsets and counters into the first version of this class (committed with CASSANDRA-12229), and then confused it again (while resolving the error) with the first patch. In short: as long as I correctly calculate streamOffset, that should satisfy the needs for the class. Thus, I eliminated both {{current}} and {{currentBufferOffset}}, and the result is clearer and correct. I've pushed a cleaned up branch (which has been rebased to trunk). Please note that, as with the first patch, the majority of this patch is refactoring to clean up the class in general. I've also updated my dtest patch as my version required a stress profile (based on [~zznate]'s original) to be committed, as well. (Note: my dtest branch also includes [~pauloricardomg]'s patch, but, as before, I'm unable to get that to fail on trunk.) > Default repair is broken, crashes other nodes participating in repair (in > trunk) > > > Key: CASSANDRA-13938 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13938 > Project: Cassandra > Issue Type: Bug > Components: Repair >Reporter: Nate McCall >Assignee: Jason Brown >Priority: Critical > Fix For: 4.x > > Attachments: 13938.yaml, test.sh > > > Running through a simple scenario to test some of the new repair features, I > was not able to make a repair command work. Further, the exception seemed to > trigger a nasty failure state that basically shuts down the netty connections > for messaging *and* CQL on the nodes transferring back data to the node being > repaired. The following steps reproduce this issue consistently. > Cassandra stress profile (probably not necessary, but this one provides a > really simple schema and consistent data shape): > {noformat} > keyspace: standard_long > keyspace_definition: | > CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', > 'replication_factor':3}; > table: test_data > table_definition: | > CREATE TABLE test_data ( > key text, > ts bigint, > val text, > PRIMARY KEY (key, ts) > ) WITH COMPACT STORAGE AND > CLUSTERING ORDER BY (ts DESC) AND > bloom_filter_fp_chance=0.01 AND > caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND > comment='' AND > dclocal_read_repair_chance=0.00 AND > gc_grace_seconds=864000 AND > read_repair_chance=0.00 AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND >
[jira] [Commented] (CASSANDRA-14714) `ant artifacts` broken on trunk (4.0); creates no tar artifacts
[ https://issues.apache.org/jira/browse/CASSANDRA-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610049#comment-16610049 ] Jordan West commented on CASSANDRA-14714: - It would be nice to have a workaround for this that doesn’t involve needing Java 11 installed on the machine. Is that being tracked as part of CASSANDRA-14712? I came across this while trying to run {{mvn-install}}. Fwiw, at least for {{mvn-install}}, removing this line fixes it: [https://github.com/apache/cassandra/blob/trunk/build.xml#L1069.] > `ant artifacts` broken on trunk (4.0); creates no tar artifacts > --- > > Key: CASSANDRA-14714 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14714 > Project: Cassandra > Issue Type: Bug >Reporter: Michael Shuler >Priority: Blocker > Labels: Java11 > Fix For: 4.0 > > > `ant artifacts` on the trunk (4.0) branch currently creates no tar artifacts. > Additionally, the target does not exit non-zero, so the result is: > {noformat} > <...> > artifacts: > BUILD SUCCESSFUL > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14715) Read repairs can result in bogus timeout errors to the client
Cameron Zemek created CASSANDRA-14715: - Summary: Read repairs can result in bogus timeout errors to the client Key: CASSANDRA-14715 URL: https://issues.apache.org/jira/browse/CASSANDRA-14715 Project: Cassandra Issue Type: Bug Components: Local Write-Read Paths Reporter: Cameron Zemek In RepairMergeListener:close() it does the following: {code:java} try { FBUtilities.waitOnFutures(repairResults, DatabaseDescriptor.getWriteRpcTimeout()); } catch (TimeoutException ex) { // We got all responses, but timed out while repairing int blockFor = consistency.blockFor(keyspace); if (Tracing.isTracing()) Tracing.trace("Timed out while read-repairing after receiving all {} data and digest responses", blockFor); else logger.debug("Timeout while read-repairing after receiving all {} data and digest responses", blockFor); throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); } {code} This propagates up and gets sent to the client and we have customers get confused cause they see timeouts for CL ALL requiring ALL replicas even though they have read_repair_chance = 0 and using a LOCAL_* CL. At minimum I suggest instead of using the consistency level of DataResolver (which is always ALL with read repairs) for the timeout it instead use repairResults.size(). That is blockFor = repairResults.size() . But saying it received _blockFor - 1_ is bogus still. Fixing that would require more changes. I was thinking maybe like so: {code:java} public static void waitOnFutures(List results, long ms, MutableInt counter) throws TimeoutException { for (AsyncOneResponse result : results) { result.get(ms, TimeUnit.MILLISECONDS); counter.increment(); } } {code} Likewise in SinglePartitionReadLifecycle:maybeAwaitFullDataRead() it says _blockFor - 1_ for how many were received, which is also bogus. Steps used to reproduce was modify RepairMergeListener:close() to always throw timeout exception. With schema: {noformat} CREATE KEYSPACE weather WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = true; CREATE TABLE weather.city ( cityid int PRIMARY KEY, name text ) WITH bloom_filter_fp_chance = 0.01 AND dclocal_read_repair_chance = 0.0 AND read_repair_chance = 0.0 AND speculative_retry = 'NONE'; {noformat} Then using the following steps: # ccm node1 cqlsh # INSERT INTO weather.city(cityid, name) VALUES (1, 'Canberra'); # exit; # ccm node1 flush # ccm node1 stop # rm -rf ~/.ccm/test_repair/node1/data0/weather/city-ff2fade0b18d11e8b1cd097acbab1e3d/mc-1-big-* # remove the sstable with the insert # ccm node1 start # ccm node1 cqlsh # CONSISTENCY LOCAL_QUORUM; # select * from weather.city where cityid = 1; You get result of: {noformat} ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 5 responses." info={'received_responses': 5, 'required_responses': 6, 'consistency': 'ALL'}{noformat} But was expecting: {noformat} ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 1 responses." info={'received_responses': 1, 'required_responses': 2, 'consistency': 'LOCAL_QUORUM'}{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609973#comment-16609973 ] Jason Brown commented on CASSANDRA-14346: - Somehow this got marked as Ready to Commit; switched back to Patch Available. > Scheduled Repair in Cassandra > - > > Key: CASSANDRA-14346 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14346 > Project: Cassandra > Issue Type: Improvement > Components: Repair >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Labels: 4.0-feature-freeze-review-requested, > CommunityFeedbackRequested > Fix For: 4.x > > Attachments: ScheduledRepairV1_20180327.pdf > > > There have been many attempts to automate repair in Cassandra, which makes > sense given that it is necessary to give our users eventual consistency. Most > recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked > for ways to solve this problem. > At Netflix we've built a scheduled repair service within Priam (our sidecar), > which we spoke about last year at NGCC. Given the positive feedback at NGCC > we focussed on getting it production ready and have now been using it in > production to repair hundreds of clusters, tens of thousands of nodes, and > petabytes of data for the past six months. Also based on feedback at NGCC we > have invested effort in figuring out how to integrate this natively into > Cassandra rather than open sourcing it as an external service (e.g. in Priam). > As such, [~vinaykumarcse] and I would like to re-work and merge our > implementation into Cassandra, and have created a [design > document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing] > showing how we plan to make it happen, including the the user interface. > As we work on the code migration from Priam to Cassandra, any feedback would > be greatly appreciated about the interface or v1 implementation features. I > have tried to call out in the document features which we explicitly consider > future work (as well as a path forward to implement them in the future) > because I would very much like to get this done before the 4.0 merge window > closes, and to do that I think aggressively pruning scope is going to be a > necessity. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14346) Scheduled Repair in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14346: Status: Patch Available (was: Awaiting Feedback) > Scheduled Repair in Cassandra > - > > Key: CASSANDRA-14346 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14346 > Project: Cassandra > Issue Type: Improvement > Components: Repair >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Labels: 4.0-feature-freeze-review-requested, > CommunityFeedbackRequested > Fix For: 4.x > > Attachments: ScheduledRepairV1_20180327.pdf > > > There have been many attempts to automate repair in Cassandra, which makes > sense given that it is necessary to give our users eventual consistency. Most > recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked > for ways to solve this problem. > At Netflix we've built a scheduled repair service within Priam (our sidecar), > which we spoke about last year at NGCC. Given the positive feedback at NGCC > we focussed on getting it production ready and have now been using it in > production to repair hundreds of clusters, tens of thousands of nodes, and > petabytes of data for the past six months. Also based on feedback at NGCC we > have invested effort in figuring out how to integrate this natively into > Cassandra rather than open sourcing it as an external service (e.g. in Priam). > As such, [~vinaykumarcse] and I would like to re-work and merge our > implementation into Cassandra, and have created a [design > document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing] > showing how we plan to make it happen, including the the user interface. > As we work on the code migration from Priam to Cassandra, any feedback would > be greatly appreciated about the interface or v1 implementation features. I > have tried to call out in the document features which we explicitly consider > future work (as well as a path forward to implement them in the future) > because I would very much like to get this done before the 4.0 merge window > closes, and to do that I think aggressively pruning scope is going to be a > necessity. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14346) Scheduled Repair in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14346: Status: Awaiting Feedback (was: In Progress) > Scheduled Repair in Cassandra > - > > Key: CASSANDRA-14346 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14346 > Project: Cassandra > Issue Type: Improvement > Components: Repair >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Labels: 4.0-feature-freeze-review-requested, > CommunityFeedbackRequested > Fix For: 4.x > > Attachments: ScheduledRepairV1_20180327.pdf > > > There have been many attempts to automate repair in Cassandra, which makes > sense given that it is necessary to give our users eventual consistency. Most > recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked > for ways to solve this problem. > At Netflix we've built a scheduled repair service within Priam (our sidecar), > which we spoke about last year at NGCC. Given the positive feedback at NGCC > we focussed on getting it production ready and have now been using it in > production to repair hundreds of clusters, tens of thousands of nodes, and > petabytes of data for the past six months. Also based on feedback at NGCC we > have invested effort in figuring out how to integrate this natively into > Cassandra rather than open sourcing it as an external service (e.g. in Priam). > As such, [~vinaykumarcse] and I would like to re-work and merge our > implementation into Cassandra, and have created a [design > document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing] > showing how we plan to make it happen, including the the user interface. > As we work on the code migration from Priam to Cassandra, any feedback would > be greatly appreciated about the interface or v1 implementation features. I > have tried to call out in the document features which we explicitly consider > future work (as well as a path forward to implement them in the future) > because I would very much like to get this done before the 4.0 merge window > closes, and to do that I think aggressively pruning scope is going to be a > necessity. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14346) Scheduled Repair in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14346: Status: In Progress (was: Ready to Commit) > Scheduled Repair in Cassandra > - > > Key: CASSANDRA-14346 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14346 > Project: Cassandra > Issue Type: Improvement > Components: Repair >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Labels: 4.0-feature-freeze-review-requested, > CommunityFeedbackRequested > Fix For: 4.x > > Attachments: ScheduledRepairV1_20180327.pdf > > > There have been many attempts to automate repair in Cassandra, which makes > sense given that it is necessary to give our users eventual consistency. Most > recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked > for ways to solve this problem. > At Netflix we've built a scheduled repair service within Priam (our sidecar), > which we spoke about last year at NGCC. Given the positive feedback at NGCC > we focussed on getting it production ready and have now been using it in > production to repair hundreds of clusters, tens of thousands of nodes, and > petabytes of data for the past six months. Also based on feedback at NGCC we > have invested effort in figuring out how to integrate this natively into > Cassandra rather than open sourcing it as an external service (e.g. in Priam). > As such, [~vinaykumarcse] and I would like to re-work and merge our > implementation into Cassandra, and have created a [design > document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing] > showing how we plan to make it happen, including the the user interface. > As we work on the code migration from Priam to Cassandra, any feedback would > be greatly appreciated about the interface or v1 implementation features. I > have tried to call out in the document features which we explicitly consider > future work (as well as a path forward to implement them in the future) > because I would very much like to get this done before the 4.0 merge window > closes, and to do that I think aggressively pruning scope is going to be a > necessity. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14346) Scheduled Repair in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous updated CASSANDRA-14346: -- Status: Ready to Commit (was: Patch Available) > Scheduled Repair in Cassandra > - > > Key: CASSANDRA-14346 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14346 > Project: Cassandra > Issue Type: Improvement > Components: Repair >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Labels: 4.0-feature-freeze-review-requested, > CommunityFeedbackRequested > Fix For: 4.x > > Attachments: ScheduledRepairV1_20180327.pdf > > > There have been many attempts to automate repair in Cassandra, which makes > sense given that it is necessary to give our users eventual consistency. Most > recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked > for ways to solve this problem. > At Netflix we've built a scheduled repair service within Priam (our sidecar), > which we spoke about last year at NGCC. Given the positive feedback at NGCC > we focussed on getting it production ready and have now been using it in > production to repair hundreds of clusters, tens of thousands of nodes, and > petabytes of data for the past six months. Also based on feedback at NGCC we > have invested effort in figuring out how to integrate this natively into > Cassandra rather than open sourcing it as an external service (e.g. in Priam). > As such, [~vinaykumarcse] and I would like to re-work and merge our > implementation into Cassandra, and have created a [design > document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing] > showing how we plan to make it happen, including the the user interface. > As we work on the code migration from Priam to Cassandra, any feedback would > be greatly appreciated about the interface or v1 implementation features. I > have tried to call out in the document features which we explicitly consider > future work (as well as a path forward to implement them in the future) > because I would very much like to get this done before the 4.0 merge window > closes, and to do that I think aggressively pruning scope is going to be a > necessity. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi updated CASSANDRA-14503: - Reviewers: Dinesh Joshi > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Major > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might end up changing > the connection state concurrently with other methods (i.e. by unexpectedly > closing the channel or clearing the backlog). > Overall, the thread safety of {{OutboundMessagingConnection}} is very > difficult to assess given the current implementation: I would suggest to > refactor it into a single-thread model, where all connection state changing > actions are enqueued on a single threaded scheduler, so that state > transitions can be clearly defined and checked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14714) `ant artifacts` broken on trunk (4.0); creates no tar artifacts
[ https://issues.apache.org/jira/browse/CASSANDRA-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14714: Labels: Java11 (was: ) > `ant artifacts` broken on trunk (4.0); creates no tar artifacts > --- > > Key: CASSANDRA-14714 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14714 > Project: Cassandra > Issue Type: Bug >Reporter: Michael Shuler >Priority: Blocker > Labels: Java11 > Fix For: 4.0 > > > `ant artifacts` on the trunk (4.0) branch currently creates no tar artifacts. > Additionally, the target does not exit non-zero, so the result is: > {noformat} > <...> > artifacts: > BUILD SUCCESSFUL > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14712) Cassandra 4.0 packaging support
[ https://issues.apache.org/jira/browse/CASSANDRA-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14712: Labels: Java11 (was: ) > Cassandra 4.0 packaging support > --- > > Key: CASSANDRA-14712 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14712 > Project: Cassandra > Issue Type: Bug > Components: Packaging >Reporter: Stefan Podkowinski >Priority: Major > Labels: Java11 > Fix For: 4.x > > > Currently it's not possible to build any native packages (.deb/.rpm) for > trunk. > cassandra-builds - docker/*-image.docker > * Add Java11 to debian+centos build image > * (packaged ant scripts won't work with Java 11 on centos, so we may have to > install ant from tarballs) > cassandra-builds - docker/build-*.sh > * set JAVA8_HOME to Java8 > * set JAVA_HOME to Java11 (4.0) or Java8 (<4.0) > cassandra - redhat/cassandra.spec > * Check if patches still apply after CASSANDRA-14707 > * Add fqltool as %files > We may also have to change the version handling in build.xml or build-*.sh, > depending how we plan to release packages during beta, or if we plan to do so > at all before GA. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-14714) `ant artifacts` broken on trunk (4.0); creates no tar artifacts
[ https://issues.apache.org/jira/browse/CASSANDRA-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Shuler resolved CASSANDRA-14714. Resolution: Not A Problem Thanks for the Jira pointer. Local fix and I can build tar.gz artifacts successfully: {noformat} export JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64 export JAVA8_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64{noformat} > `ant artifacts` broken on trunk (4.0); creates no tar artifacts > --- > > Key: CASSANDRA-14714 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14714 > Project: Cassandra > Issue Type: Bug >Reporter: Michael Shuler >Priority: Blocker > Fix For: 4.0 > > > `ant artifacts` on the trunk (4.0) branch currently creates no tar artifacts. > Additionally, the target does not exit non-zero, so the result is: > {noformat} > <...> > artifacts: > BUILD SUCCESSFUL > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14714) `ant artifacts` broken on trunk (4.0); creates no tar artifacts
[ https://issues.apache.org/jira/browse/CASSANDRA-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609740#comment-16609740 ] Michael Shuler commented on CASSANDRA-14714: {noformat} - *Experimental* support for Java 11 has been added. JVM options that differ between or are specific for Java 8 and 11 have been moved from jvm.options into jvm8.options and jvm11.options. IMPORTANT: Running C* on Java 11 is *experimental* and do it at your own risk. Compilation recommendations: configure Java 11 SDK via JAVA_HOME and Java 8 SDK via JAVA8_HOME. Release builds require Java 11 + Java 8. Development builds can use Java 8 without 11. {noformat} We'll see what I can work out here locally with some env vars. I found this issue when checking on linking to artifacts builds in Jenkins. Basic Jenkins slave usage means only one JDK version available.. > `ant artifacts` broken on trunk (4.0); creates no tar artifacts > --- > > Key: CASSANDRA-14714 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14714 > Project: Cassandra > Issue Type: Bug >Reporter: Michael Shuler >Priority: Blocker > Fix For: 4.0 > > > `ant artifacts` on the trunk (4.0) branch currently creates no tar artifacts. > Additionally, the target does not exit non-zero, so the result is: > {noformat} > <...> > artifacts: > BUILD SUCCESSFUL > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14714) `ant artifacts` broken on trunk (4.0); creates no tar artifacts
[ https://issues.apache.org/jira/browse/CASSANDRA-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609722#comment-16609722 ] Stefan Podkowinski commented on CASSANDRA-14714: I've tried to wrap up some of the 4.0 related build/packaging issues in CASSANDRA-14712 > `ant artifacts` broken on trunk (4.0); creates no tar artifacts > --- > > Key: CASSANDRA-14714 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14714 > Project: Cassandra > Issue Type: Bug >Reporter: Michael Shuler >Priority: Blocker > Fix For: 4.0 > > > `ant artifacts` on the trunk (4.0) branch currently creates no tar artifacts. > Additionally, the target does not exit non-zero, so the result is: > {noformat} > <...> > artifacts: > BUILD SUCCESSFUL > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14714) `ant artifacts` broken on trunk (4.0); creates no tar artifacts
[ https://issues.apache.org/jira/browse/CASSANDRA-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609717#comment-16609717 ] Michael Shuler commented on CASSANDRA-14714: {noformat} ((6ba2fb9395...)|BISECTING)mshuler@hana:~/git/cassandra$ git bisect bad 6ba2fb9395226491872b41312d978a169f36fcdb is the first bad commit commit 6ba2fb9395226491872b41312d978a169f36fcdb Author: Robert Stupp Date: Tue Sep 12 20:04:30 2017 +0200 Make C* compile and run on Java 11 and Java 8 patch by Robert Stupp; reviewed by Jason Brown for CASSANDRA-9608 {noformat} > `ant artifacts` broken on trunk (4.0); creates no tar artifacts > --- > > Key: CASSANDRA-14714 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14714 > Project: Cassandra > Issue Type: Bug >Reporter: Michael Shuler >Priority: Blocker > Fix For: 4.0 > > > `ant artifacts` on the trunk (4.0) branch currently creates no tar artifacts. > Additionally, the target does not exit non-zero, so the result is: > {noformat} > <...> > artifacts: > BUILD SUCCESSFUL > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14714) `ant artifacts` broken on trunk (4.0); creates no tar artifacts
Michael Shuler created CASSANDRA-14714: -- Summary: `ant artifacts` broken on trunk (4.0); creates no tar artifacts Key: CASSANDRA-14714 URL: https://issues.apache.org/jira/browse/CASSANDRA-14714 Project: Cassandra Issue Type: Bug Reporter: Michael Shuler Fix For: 4.0 `ant artifacts` on the trunk (4.0) branch currently creates no tar artifacts. Additionally, the target does not exit non-zero, so the result is: {noformat} <...> artifacts: BUILD SUCCESSFUL {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14289) Document sstable tools
[ https://issues.apache.org/jira/browse/CASSANDRA-14289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609668#comment-16609668 ] Valerie Parham-Thompson commented on CASSANDRA-14289: - I've completed these documents, and am getting peer review. > Document sstable tools > -- > > Key: CASSANDRA-14289 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14289 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Hannu Kröger >Priority: Major > Attachments: gen-sstable-docs.py, sstabledocs.tar.gz > > > Following tools are missing in the documentation of cassandra tools on the > documentation site (http://cassandra.apache.org/doc/latest/tools/index.html): > * sstabledump > * sstableexpiredblockers > * sstablelevelreset > * sstableloader > * sstablemetadata > * sstableofflinerelevel > * sstablerepairedset > * sstablescrub > * sstablesplit > * sstableupgrade > * sstableutil > * sstableverify -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
[ https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609614#comment-16609614 ] Marcus Eriksson commented on CASSANDRA-3200: While reviewing CASSANDRA-14693 I realised that the dtests for this were never committed, could you have a quick look [~bdeggleston]? https://github.com/krummas/cassandra-dtest/commits/marcuse/3200 and circle run: https://circleci.com/gh/krummas/cassandra/tree/marcuse%2Ffor_3200_dtests > Repair: compare all trees together (for a given range/cf) instead of by pair > in isolation > - > > Key: CASSANDRA-3200 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3200 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Marcus Eriksson >Priority: Minor > Labels: repair > Fix For: 4.0 > > > Currently, repair compare merkle trees by pair, in isolation of any other > tree. What that means concretely is that if I have three node A, B and C > (RF=3) with A and B in sync, but C having some range r inconsitent with both > A and B (since those are consistent), we will do the following transfer of r: > A -> C, C -> A, B -> C, C -> B. > The fact that we do both A -> C and C -> A is fine, because we cannot know > which one is more to date from A or C. However, the transfer B -> C is > useless provided we do A -> C if A and B are in sync. Not doing that transfer > will be a 25% improvement in that case. With RF=5 and only one node > inconsistent with all the others, that almost a 40% improvement, etc... > Given that this situation of one node not in sync while the others are is > probably fairly common (one node died so it is behind), this could be a fair > improvement over what is transferred. In the case where we use repair to > rebuild completely a node, this will be a dramatic improvement, because it > will avoid the rebuilded node to get RF times the data it should get. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14705) ReplicaLayout follow-up
[ https://issues.apache.org/jira/browse/CASSANDRA-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609615#comment-16609615 ] Ariel Weisberg commented on CASSANDRA-14705: [~ifesdjeen] that branch you linked to in your PR is the wrong one, it's 14705 > ReplicaLayout follow-up > --- > > Key: CASSANDRA-14705 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14705 > Project: Cassandra > Issue Type: Improvement >Reporter: Benedict >Assignee: Benedict >Priority: Major > > Clarify the new {{ReplicaLayout}} code, separating it into ReplicaPlan (for > what we want to do) and {{ReplicaLayout}} (for what we know about the > cluster), with well defined semantics (and comments in the rare cases those > semantics are weird) > Found and fixed some bugs: > - {{commitPaxos}} was using only live nodes, when needed to include down > - We were not writing to pending transient replicas > - On write, we were not hinting to full nodes with transient > replication enabled (since we filtered to {{liveOnly}}, in order to include > our transient replicas above {{blockFor}}) > - If we speculated, in {{maybeSendAdditionalReads}} (in read repair) > we would only consult the same node we had speculated too. This also applied > to {{maybeSendAdditionalWrites}} - and this issue was also true pre-TR. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14693) Follow-up: allow transient node to serve as repair coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609471#comment-16609471 ] Marcus Eriksson commented on CASSANDRA-14693: - the new class hierarchy looks great, just a minor comment that we could remove the parameter to {{startSync}} and instead make {{private final List> rangesToSync;}} protected and use that, makes it a bit clearer since we never call {{startSync}} with anything else > Follow-up: allow transient node to serve as repair coordinator > -- > > Key: CASSANDRA-14693 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14693 > Project: Cassandra > Issue Type: Task >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Minor > > Allow transient node to serve as a coordinator. > |[trunk|https://github.com/apache/cassandra/pull/257]|[utest|https://circleci.com/gh/ifesdjeen/cassandra/329]|[dtest|https://circleci.com/gh/ifesdjeen/cassandra/330]|[dtest-novnode|https://circleci.com/gh/ifesdjeen/cassandra/328]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14549) Transient Replication: support logged batches
[ https://issues.apache.org/jira/browse/CASSANDRA-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CASSANDRA-14549: --- Labels: pull-request-available (was: ) > Transient Replication: support logged batches > - > > Key: CASSANDRA-14549 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14549 > Project: Cassandra > Issue Type: Sub-task >Reporter: Blake Eggleston >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14711) Apache Cassandra 3.2 crashing with exception org.apache.cassandra.db.marshal.TimestampType.compareCustom
[ https://issues.apache.org/jira/browse/CASSANDRA-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609190#comment-16609190 ] Saurabh commented on CASSANDRA-14711: - [~jasobrown] - Thanks for your response. We are in a process of planning the upgrade but as it is a Prod it will take time. We have started seeing this issue just a few days back and trying to fix it. There were no changes from Application code/DB changes. As per the hr_err log file (attached), I can see a lot of threads in Blocked status and also 100% used HEAP regions. I have tried increasing the -Xms - 4G -> 8G -> 16G -Xmx - 4G -> 8G -> 16G but this didnot thelp much but just delayed the crash. Something is pinning up in the memory but the cassandra logs does not show any OOM errors too. > Apache Cassandra 3.2 crashing with exception > org.apache.cassandra.db.marshal.TimestampType.compareCustom > > > Key: CASSANDRA-14711 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14711 > Project: Cassandra > Issue Type: Bug >Reporter: Saurabh >Priority: Major > Attachments: hs_err_pid32069.log > > > Hi Team, > I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12.. > Issue: > Cassandra is continuously crashing with generating an HEAP dump log. There > are no errors reported in system.log OR Debug.log. > Exception in hs_err_PID.log: > # Problematic frame: > # J 8283 C2 > org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I > (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334] > Java Threads: ( => current thread ) > 0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon > [_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)] > 0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon > [_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)] > 0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon > [_thread_blocked, id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)] > 0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon > [_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)] > 0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon > : > : > lot of threads in BLOCKED status > Other Threads: > 0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] > [id=32098] > 0x2b7d38fa9de0 WatcherThread [stack: > 0x2b7d88ee9000,0x2b7d88fe9000] [id=32108] > VM state:not at safepoint (normal execution) > VM Mutex/Monitor currently owned by a thread: None > Heap: > garbage-first heap total 8388608K, used 6791168K [0x0003c000, > 0x0003c0404000, 0x0007c000) > region size 4096K, 785 young (3215360K), 55 survivors (225280K) > Metaspace used 40915K, capacity 42044K, committed 42368K, reserved 1087488K > class space used 4429K, capacity 4646K, committed 4736K, reserved 1048576K > Heap Regions: (Y=young(eden), SU=young(survivor), HS=humongous(starts), > HC=humongous(continues), CS=collection set, F=free, TS=gc time stamp, > PTAMS=previous top-at-mark-start, NTAMS=next top-at-mark-start) > AC 0 O TS 0 PTAMS 0x0003c040 NTAMS 0x0003c040 space 4096K, > 100% used [0x0003c000, 0x0003c040) > AC 0 O TS 0 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, > 100% used [0x0003c040, 0x0003c080) > AC 0 O TS 9 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, > 100% used [0x0003c080, 0x0003c0c0) > AC 0 O TS 11 PTAMS 0x0003c0c0 NTAMS 0x0003c0c0 space 4096K, > 100% used [0x0003c0c0, 0x0003c100) > AC 0 O TS 11 PTAMS 0x0003c100 NTAMS 0x0003c100 space 4096K, > 100% used [0x0003c100, 0x0003c140) > AC 0 O TS 11 PTAMS 0x0003c140 NTAMS 0x0003c140 space 4096K, > 100% used [0x0003c140, 0x0003c180) > : > : > lot of such messages -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-13348) Duplicate tokens after bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-13348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Podkowinski resolved CASSANDRA-13348. Resolution: Cannot Reproduce > Duplicate tokens after bootstrap > > > Key: CASSANDRA-13348 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13348 > Project: Cassandra > Issue Type: Bug >Reporter: Tom van der Woerdt >Assignee: Dikang Gu >Priority: Blocker > Fix For: 3.0.x > > > This one is a bit scary, and probably results in data loss. After a bootstrap > of a few new nodes into an existing cluster, two new nodes have chosen some > overlapping tokens. > In fact, of the 256 tokens chosen, 51 tokens were already in use on the other > node. > Node 1 log : > {noformat} > INFO [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,461 > StorageService.java:1160 - JOINING: waiting for ring information > INFO [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,461 > StorageService.java:1160 - JOINING: waiting for schema information to complete > INFO [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,461 > StorageService.java:1160 - JOINING: schema complete, ready to bootstrap > INFO [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,462 > StorageService.java:1160 - JOINING: waiting for pending range calculation > INFO [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,462 > StorageService.java:1160 - JOINING: calculation complete, ready to bootstrap > INFO [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,462 > StorageService.java:1160 - JOINING: getting bootstrap token > WARN [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,564 > TokenAllocation.java:61 - Selected tokens [, 2959334889475814712, > 3727103702384420083, 7183119311535804926, 6013900799616279548, > -1222135324851761575, 1645259890258332163, -1213352346686661387, > 7604192574911909354] > WARN [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,729 > TokenAllocation.java:65 - Replicated node load in datacentre before > allocation max 1.00 min 1.00 stddev 0. > WARN [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,729 > TokenAllocation.java:66 - Replicated node load in datacentre after allocation > max 1.00 min 1.00 stddev 0. > WARN [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,729 > TokenAllocation.java:70 - Unexpected growth in standard deviation after > allocation. > INFO [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:44,150 > StorageService.java:1160 - JOINING: sleeping 3 ms for pending range setup > INFO [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:43:14,151 > StorageService.java:1160 - JOINING: Starting to bootstrap... > {noformat} > Node 2 log: > {noformat} > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:51,937 > StorageService.java:971 - Joining ring by operator request > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,513 > StorageService.java:1160 - JOINING: waiting for ring information > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,513 > StorageService.java:1160 - JOINING: waiting for schema information to complete > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,513 > StorageService.java:1160 - JOINING: schema complete, ready to bootstrap > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,513 > StorageService.java:1160 - JOINING: waiting for pending range calculation > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,514 > StorageService.java:1160 - JOINING: calculation complete, ready to bootstrap > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,514 > StorageService.java:1160 - JOINING: getting bootstrap token > WARN [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,630 > TokenAllocation.java:61 - Selected tokens [.., 2890709530010722764, > -2416006722819773829, -5820248611267569511, -5990139574852472056, > 1645259890258332163, 9135021011763659240, -5451286144622276797, > 7604192574911909354] > WARN [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,794 > TokenAllocation.java:65 - Replicated node load in datacentre before > allocation max 1.02 min 0.98 stddev 0. > WARN [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,795 > TokenAllocation.java:66 - Replicated node load in datacentre after allocation > max 1.00 min 1.00 stddev 0. > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:53,149 > StorageService.java:1160 - JOINING: sleeping 3 ms for pending range setup > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:56:23,149 > StorageService.java:1160 - JOINING: Starting to bootstrap... > {noformat} > eg. 7604192574911909354 has been chosen by both. > The joins were eight days apart, so I don't
[jira] [Updated] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14503: Fix Version/s: 4.0 Status: Patch Available (was: Open) > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Major > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might end up changing > the connection state concurrently with other methods (i.e. by unexpectedly > closing the channel or clearing the backlog). > Overall, the thread safety of {{OutboundMessagingConnection}} is very > difficult to assess given the current implementation: I would suggest to > refactor it into a single-thread model, where all connection state changing > actions are enqueued on a single threaded scheduler, so that state > transitions can be clearly defined and checked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609099#comment-16609099 ] Jason Brown commented on CASSANDRA-14503: - Patch available here: ||14503|| |[branch|https://github.com/jasobrown/cassandra/tree/14503]| |[utests dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/14503]| || Additionally, I've [created a Pull Request|https://github.com/apache/cassandra/pull/264] for review, as well. Note: this patch will need to be rebased when CASSANDRA-13630 is committed, and incorprate the changes ChannelWriter for large messages, but that should not affect this patch much (I've been keeping that in mind as I worked on this) - OutboundMessagingConnection changes -- All producer threads queue messages into the backlog, and messages are only consumed by a task on a fixed thread (the event loop). Producers will contend to schedule the consumer, but have no further involvement in sending a message (unlike the current implementation). -- All netty-related activity (setting up a remote connection, connection-related callbacks and time outs, consuming form the backlog and writing to the channel and associated callbacks) are all handled on the event loop. OutboundMessagingConnection gets a reference to a event loop in it's constructor, and uses that for the duration of it's lifetime. -- Finally forward-ported the queue bounding functionality of CASSANDRA-13265. In short, we want to limit the size of queued messages in order to not OOM. Thus, we schedule a task for the consumer thread that examines the queue looking for elements to prune. Further, I've added a naive upper bound to the queue so that producers drop messages before enqueuing if the backlog is in a *really* bad state. @djoshi3 has recomended bounding by message size rather than by message count, which I agree with, but propose saving that for a followup ticket. -- Cleaner, more documented, and better tested State machine to manage state transitions for the class. - ChannelWriter and MessageOutHandler became much simpler as we can control the flush behaviors from the OMC (instead of the previous complicated CW/MOH dance) because we're already on the event loop when consuming from the backlog and writing to the channel. - I was able to clean up/remove a bunch of extra code due to this simplification, as well (ExpiredException, OutboundMessagingParameters, MessageResult) - Updated all the javadoc documentation for these changes (mostly OMC and ChannelWriter) > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Major > Labels: pull-request-available > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might end up changing > the connection state concurrently with other methods (i.e. by unexpectedly > closing the channel or clearing the backlog). > Overall, the thread safety of {{OutboundMessagingConnection}} is very > difficult to assess given the current implementation: I would suggest to > refactor it into a single-thread model, where all connection state changing > actions are enqueued on a single threaded scheduler, so that state > transitions can be clearly defined and checked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14503) Internode connection management is race-prone
[ https://issues.apache.org/jira/browse/CASSANDRA-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CASSANDRA-14503: --- Labels: pull-request-available (was: ) > Internode connection management is race-prone > - > > Key: CASSANDRA-14503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14503 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Sergio Bossa >Assignee: Jason Brown >Priority: Major > Labels: pull-request-available > > Following CASSANDRA-8457, internode connection management has been rewritten > to rely on Netty, but the new implementation in > {{OutboundMessagingConnection}} seems quite race prone to me, in particular > on those two cases: > * {{#finishHandshake()}} racing with {{#close()}}: i.e. in such case the > former could run into an NPE if the latter nulls the {{channelWriter}} (but > this is just an example, other conflicts might happen). > * Connection timeout and retry racing with state changing methods: > {{connectionRetryFuture}} and {{connectionTimeoutFuture}} are cancelled when > handshaking or closing, but there's no guarantee those will be actually > cancelled (as they might be already running), so they might end up changing > the connection state concurrently with other methods (i.e. by unexpectedly > closing the channel or clearing the backlog). > Overall, the thread safety of {{OutboundMessagingConnection}} is very > difficult to assess given the current implementation: I would suggest to > refactor it into a single-thread model, where all connection state changing > actions are enqueued on a single threaded scheduler, so that state > transitions can be clearly defined and checked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14708) protocol v5 duration wire format is overly complex and awkward to implement for clients
[ https://issues.apache.org/jira/browse/CASSANDRA-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609073#comment-16609073 ] Benedict commented on CASSANDRA-14708: -- Thanks. It looks like we've at least introduced a bug wrt adding hours and seconds to a date/timestamp across leap second boundaries (and if we introduce TZ support, across DST boundaries), but that's an issue for another ticket. You brought up the issue of leap seconds in that discussion, I can see, so it's a shame this wasn't accounted for in the eventual solution. On the topic of this ticket, I agree that making the type accept nanos exclusively is not the solution; that is a different type of duration. It might have been nice to use the JDK or Joda time nomenclature for some consistency, and call it a period (and reserve duration for those operating exclusively on nanos/millis, much as in Go), but c'est la vie. > protocol v5 duration wire format is overly complex and awkward to implement > for clients > --- > > Key: CASSANDRA-14708 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14708 > Project: Cassandra > Issue Type: Bug >Reporter: Chris Bannister >Priority: Major > > Protocol V5 defines the duration type to be on the wire as months, days and > nanoseconds. Days and months require a timezone to make sense of the duration > and varies depending on from which they are applied for. > > Go defines a [duration|https://golang.org/pkg/time/#Duration] type as > nanoseconds in int64 which can represent ~290 years. Java > [duration|https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html] > does not have way to handle months. > > I suggest that before 4.0 is release the duration format is converted to just > be represented as nanoseconds. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14711) Apache Cassandra 3.2 crashing with exception org.apache.cassandra.db.marshal.TimestampType.compareCustom
[ https://issues.apache.org/jira/browse/CASSANDRA-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609050#comment-16609050 ] Jason Brown commented on CASSANDRA-14711: - So, the first thing to know is that 3.2 is an, old unsupported release. 3.11.3 is the currently supported 3.X release. > Apache Cassandra 3.2 crashing with exception > org.apache.cassandra.db.marshal.TimestampType.compareCustom > > > Key: CASSANDRA-14711 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14711 > Project: Cassandra > Issue Type: Bug >Reporter: Saurabh >Priority: Major > Attachments: hs_err_pid32069.log > > > Hi Team, > I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12.. > Issue: > Cassandra is continuously crashing with generating an HEAP dump log. There > are no errors reported in system.log OR Debug.log. > Exception in hs_err_PID.log: > # Problematic frame: > # J 8283 C2 > org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I > (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334] > Java Threads: ( => current thread ) > 0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon > [_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)] > 0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon > [_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)] > 0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon > [_thread_blocked, id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)] > 0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon > [_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)] > 0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon > : > : > lot of threads in BLOCKED status > Other Threads: > 0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] > [id=32098] > 0x2b7d38fa9de0 WatcherThread [stack: > 0x2b7d88ee9000,0x2b7d88fe9000] [id=32108] > VM state:not at safepoint (normal execution) > VM Mutex/Monitor currently owned by a thread: None > Heap: > garbage-first heap total 8388608K, used 6791168K [0x0003c000, > 0x0003c0404000, 0x0007c000) > region size 4096K, 785 young (3215360K), 55 survivors (225280K) > Metaspace used 40915K, capacity 42044K, committed 42368K, reserved 1087488K > class space used 4429K, capacity 4646K, committed 4736K, reserved 1048576K > Heap Regions: (Y=young(eden), SU=young(survivor), HS=humongous(starts), > HC=humongous(continues), CS=collection set, F=free, TS=gc time stamp, > PTAMS=previous top-at-mark-start, NTAMS=next top-at-mark-start) > AC 0 O TS 0 PTAMS 0x0003c040 NTAMS 0x0003c040 space 4096K, > 100% used [0x0003c000, 0x0003c040) > AC 0 O TS 0 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, > 100% used [0x0003c040, 0x0003c080) > AC 0 O TS 9 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, > 100% used [0x0003c080, 0x0003c0c0) > AC 0 O TS 11 PTAMS 0x0003c0c0 NTAMS 0x0003c0c0 space 4096K, > 100% used [0x0003c0c0, 0x0003c100) > AC 0 O TS 11 PTAMS 0x0003c100 NTAMS 0x0003c100 space 4096K, > 100% used [0x0003c100, 0x0003c140) > AC 0 O TS 11 PTAMS 0x0003c140 NTAMS 0x0003c140 space 4096K, > 100% used [0x0003c140, 0x0003c180) > : > : > lot of such messages -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14708) protocol v5 duration wire format is overly complex and awkward to implement for clients
[ https://issues.apache.org/jira/browse/CASSANDRA-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609045#comment-16609045 ] Sylvain Lebresne commented on CASSANDRA-14708: -- bq. Do you have a link to the original discussions around its inclusion Well, it's not exactly hard to find: CASSANDRA-11873. > protocol v5 duration wire format is overly complex and awkward to implement > for clients > --- > > Key: CASSANDRA-14708 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14708 > Project: Cassandra > Issue Type: Bug >Reporter: Chris Bannister >Priority: Major > > Protocol V5 defines the duration type to be on the wire as months, days and > nanoseconds. Days and months require a timezone to make sense of the duration > and varies depending on from which they are applied for. > > Go defines a [duration|https://golang.org/pkg/time/#Duration] type as > nanoseconds in int64 which can represent ~290 years. Java > [duration|https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html] > does not have way to handle months. > > I suggest that before 4.0 is release the duration format is converted to just > be represented as nanoseconds. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14708) protocol v5 duration wire format is overly complex and awkward to implement for clients
[ https://issues.apache.org/jira/browse/CASSANDRA-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609039#comment-16609039 ] Benedict commented on CASSANDRA-14708: -- {quote}I'll also note this all went in C* 3.10, so it's not like we can really change the goals of the duration type {quote} We can at least revisit if it turns out to not make enough sense, and I'm not sure that it does. Do you have a link to the original discussions around its inclusion, because it seems to treat the concept of durations in a confusing manner. At the very least, if it's accepting {{months}} and {{days}} as parameters, it should be accepting {{hours}}, and {{seconds}} because these are not occupy a consistent number of nanos across all points in time. Typically, a time library will offer facilities to work exclusively in millis/nanos, or in all date components, not mix the two half-heartedly. This has me generally worried about how we handle time in Cassandra. > protocol v5 duration wire format is overly complex and awkward to implement > for clients > --- > > Key: CASSANDRA-14708 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14708 > Project: Cassandra > Issue Type: Bug >Reporter: Chris Bannister >Priority: Major > > Protocol V5 defines the duration type to be on the wire as months, days and > nanoseconds. Days and months require a timezone to make sense of the duration > and varies depending on from which they are applied for. > > Go defines a [duration|https://golang.org/pkg/time/#Duration] type as > nanoseconds in int64 which can represent ~290 years. Java > [duration|https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html] > does not have way to handle months. > > I suggest that before 4.0 is release the duration format is converted to just > be represented as nanoseconds. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14298) cqlshlib tests broken on b.a.o
[ https://issues.apache.org/jira/browse/CASSANDRA-14298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608965#comment-16608965 ] Stefan Podkowinski edited comment on CASSANDRA-14298 at 9/10/18 10:15 AM: -- Thanks [~mkjellman]! Can you please attach your Dockerfile to CASSANDRA-14713? I'll then try to get rid of the ADDed resource dependencies. was (Author: spo...@gmail.com): Thanks [~mkjellman]! Can you please attach your Dockerimage file to CASSANDRA-14713? I'll then try to get rid of the ADDed resource dependencies. > cqlshlib tests broken on b.a.o > -- > > Key: CASSANDRA-14298 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14298 > Project: Cassandra > Issue Type: Bug > Components: Build, Testing >Reporter: Stefan Podkowinski >Assignee: Patrick Bannister >Priority: Major > Labels: cqlsh, dtest > Attachments: CASSANDRA-14298-old.txt, CASSANDRA-14298.txt, > cqlsh_tests_notes.md > > > It appears that cqlsh-tests on builds.apache.org on all branches stopped > working since we removed nosetests from the system environment. See e.g. > [here|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-cqlsh-tests/458/cython=no,jdk=JDK%201.8%20(latest),label=cassandra/console]. > Looks like we either have to make nosetests available again or migrate to > pytest as we did with dtests. Giving pytest a quick try resulted in many > errors locally, but I haven't inspected them in detail yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14298) cqlshlib tests broken on b.a.o
[ https://issues.apache.org/jira/browse/CASSANDRA-14298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608965#comment-16608965 ] Stefan Podkowinski commented on CASSANDRA-14298: Thanks [~mkjellman]! Can you please attach your Dockerimage file to CASSANDRA-14713? I'll then try to get rid of the ADDed resource dependencies. > cqlshlib tests broken on b.a.o > -- > > Key: CASSANDRA-14298 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14298 > Project: Cassandra > Issue Type: Bug > Components: Build, Testing >Reporter: Stefan Podkowinski >Assignee: Patrick Bannister >Priority: Major > Labels: cqlsh, dtest > Attachments: CASSANDRA-14298-old.txt, CASSANDRA-14298.txt, > cqlsh_tests_notes.md > > > It appears that cqlsh-tests on builds.apache.org on all branches stopped > working since we removed nosetests from the system environment. See e.g. > [here|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-cqlsh-tests/458/cython=no,jdk=JDK%201.8%20(latest),label=cassandra/console]. > Looks like we either have to make nosetests available again or migrate to > pytest as we did with dtests. Giving pytest a quick try resulted in many > errors locally, but I haven't inspected them in detail yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14713) Add docker testing image to cassandra-builds
Stefan Podkowinski created CASSANDRA-14713: -- Summary: Add docker testing image to cassandra-builds Key: CASSANDRA-14713 URL: https://issues.apache.org/jira/browse/CASSANDRA-14713 Project: Cassandra Issue Type: New Feature Components: Testing Reporter: Stefan Podkowinski Tests executed on builds.apache.org ({{docker/jenkins/jenkinscommand.sh}}) and circleCI ({{.circleci/config.yml}}) will currently use the same [cassandra-test|https://hub.docker.com/r/kjellman/cassandra-test/] docker image ([github|https://github.com/mkjellman/cassandra-test-docker]) by [~mkjellman]. We should manage this image on our own as part of cassandra-builds, to keep it updated. There's also a [Apache user|https://hub.docker.com/u/apache/?page=1] on docker hub for publishing images. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14708) protocol v5 duration wire format is overly complex and awkward to implement for clients
[ https://issues.apache.org/jira/browse/CASSANDRA-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608961#comment-16608961 ] Sylvain Lebresne commented on CASSANDRA-14708: -- The duration time has been added primarily for doing aggregations over time, and if you want to aggregate things by months, you don't want that to be all messed up because you have to provide a time in nanoseconds which gives you no way to get proper month boundaries. Overall, we cannot use nanoseconds for duration in the way duration are currently implemented and used (including user visible duration values like {{3m2d5s}}). I just don't think our duration type and the similarly-named Golang one have the same purpose. It might be a shame they have the same name, but well... I'll also note this all went in C* 3.10, so it's not like we can really change the goals of the duration type now even if we agreed this was a good idea (I don't). > protocol v5 duration wire format is overly complex and awkward to implement > for clients > --- > > Key: CASSANDRA-14708 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14708 > Project: Cassandra > Issue Type: Bug >Reporter: Chris Bannister >Priority: Major > > Protocol V5 defines the duration type to be on the wire as months, days and > nanoseconds. Days and months require a timezone to make sense of the duration > and varies depending on from which they are applied for. > > Go defines a [duration|https://golang.org/pkg/time/#Duration] type as > nanoseconds in int64 which can represent ~290 years. Java > [duration|https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html] > does not have way to handle months. > > I suggest that before 4.0 is release the duration format is converted to just > be represented as nanoseconds. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14712) Cassandra 4.0 packaging support
Stefan Podkowinski created CASSANDRA-14712: -- Summary: Cassandra 4.0 packaging support Key: CASSANDRA-14712 URL: https://issues.apache.org/jira/browse/CASSANDRA-14712 Project: Cassandra Issue Type: Bug Components: Packaging Reporter: Stefan Podkowinski Fix For: 4.x Currently it's not possible to build any native packages (.deb/.rpm) for trunk. cassandra-builds - docker/*-image.docker * Add Java11 to debian+centos build image * (packaged ant scripts won't work with Java 11 on centos, so we may have to install ant from tarballs) cassandra-builds - docker/build-*.sh * set JAVA8_HOME to Java8 * set JAVA_HOME to Java11 (4.0) or Java8 (<4.0) cassandra - redhat/cassandra.spec * Check if patches still apply after CASSANDRA-14707 * Add fqltool as %files We may also have to change the version handling in build.xml or build-*.sh, depending how we plan to release packages during beta, or if we plan to do so at all before GA. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14704) Validate transient status on query
[ https://issues.apache.org/jira/browse/CASSANDRA-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-14704: Status: Patch Available (was: Open) > Validate transient status on query > --- > > Key: CASSANDRA-14704 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14704 > Project: Cassandra > Issue Type: Bug >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Major > > Validate transient status on query: > |[patch|https://github.com/apache/cassandra/pull/261]|[utest|https://circleci.com/gh/ifesdjeen/cassandra/393]|[dtest-novnode|https://circleci.com/gh/ifesdjeen/cassandra/394]|[dtest-vnode|https://circleci.com/gh/ifesdjeen/cassandra/392]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14711) Apache Cassandra 3.2 crashing with exception org.apache.cassandra.db.marshal.TimestampType.compareCustom
[ https://issues.apache.org/jira/browse/CASSANDRA-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608828#comment-16608828 ] Saurabh commented on CASSANDRA-14711: - Cassandra config: java -ea -XX:+UseThreadPriorities -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB -XX:+PerfDisableSharedMem -Djava.net.preferIPv4Stack=true -Xms8G -Xmx16G -XX:+CMSClassUnloadingEnabled -XX:+UseG1GC -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=500 -XX:InitiatingHeapOccupancyPercent=70 -XX:ParallelGCThreads=10 -XX:ConcGCThreads=3 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:PrintFLSStatistics=1 -Xloggc:/data/cassandra/log/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -XX:CompileCommandFile=/data/tools/repository/apache-cassandra-3.2/conf/hotspot_compiler -javaagent:/data/tools/repository/apache-cassandra-3.2/lib/jamm-0.3.0.jar -Djava.net.preferIPv4Stack=true -Dcassandra.jmx.local.port=7199 -XX:+DisableExplicitGC -Djava.library.path=/data/tools/repository/apache-cassandra-3.2/lib/sigar-bin -Dcassandra.max_queued_native_transport_requests=4096 -Dlogback.configurationFile=logback.xml -Dcas andra.logdir=/data/tools/repository/apache-cassandra-3.2/logs -Dcassandra.storagedir=/data/tools/repository/apache-cassandra-3.2/data -cp /data/tools/repository/apache-cassandra-3.2/conf > Apache Cassandra 3.2 crashing with exception > org.apache.cassandra.db.marshal.TimestampType.compareCustom > > > Key: CASSANDRA-14711 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14711 > Project: Cassandra > Issue Type: Bug >Reporter: Saurabh >Priority: Minor > Attachments: hs_err_pid32069.log > > > Hi Team, > I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12.. > Issue: > Cassandra is continuously crashing with generating an HEAP dump log. There > are no errors reported in system.log OR Debug.log. > Exception in hs_err_PID.log: > # Problematic frame: > # J 8283 C2 > org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I > (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334] > Java Threads: ( => current thread ) > 0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon > [_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)] > 0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon > [_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)] > 0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon > [_thread_blocked, id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)] > 0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon > [_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)] > 0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon > : > : > lot of threads in BLOCKED status > Other Threads: > 0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] > [id=32098] > 0x2b7d38fa9de0 WatcherThread [stack: > 0x2b7d88ee9000,0x2b7d88fe9000] [id=32108] > VM state:not at safepoint (normal execution) > VM Mutex/Monitor currently owned by a thread: None > Heap: > garbage-first heap total 8388608K, used 6791168K [0x0003c000, > 0x0003c0404000, 0x0007c000) > region size 4096K, 785 young (3215360K), 55 survivors (225280K) > Metaspace used 40915K, capacity 42044K, committed 42368K, reserved 1087488K > class space used 4429K, capacity 4646K, committed 4736K, reserved 1048576K > Heap Regions: (Y=young(eden), SU=young(survivor), HS=humongous(starts), > HC=humongous(continues), CS=collection set, F=free, TS=gc time stamp, > PTAMS=previous top-at-mark-start, NTAMS=next top-at-mark-start) > AC 0 O TS 0 PTAMS 0x0003c040 NTAMS 0x0003c040 space 4096K, > 100% used [0x0003c000, 0x0003c040) > AC 0 O TS 0 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, > 100% used [0x0003c040, 0x0003c080) > AC 0 O TS 9 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, > 100% used [0x0003c080, 0x0003c0c0) > AC 0 O TS 11 PTAMS 0x0003c0c0 NTAMS 0x0003c0c0 space 4096K, > 100% used [0x0003c0c0, 0x0003c100) > AC 0 O TS 11 PTAMS 0x0003c100 NTAMS 0x0003c100 space 4096K, > 100% used [0x0003c100, 0x0003c140) > AC 0 O TS 11 PTAMS 0x0003c140 NTAMS 0x0003c140 space 4096K, > 100% used [0x0003c140, 0x0003c180) > : > : > lot
[jira] [Updated] (CASSANDRA-14711) Apache Cassandra 3.2 crashing with exception org.apache.cassandra.db.marshal.TimestampType.compareCustom
[ https://issues.apache.org/jira/browse/CASSANDRA-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saurabh updated CASSANDRA-14711: Priority: Major (was: Minor) > Apache Cassandra 3.2 crashing with exception > org.apache.cassandra.db.marshal.TimestampType.compareCustom > > > Key: CASSANDRA-14711 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14711 > Project: Cassandra > Issue Type: Bug >Reporter: Saurabh >Priority: Major > Attachments: hs_err_pid32069.log > > > Hi Team, > I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12.. > Issue: > Cassandra is continuously crashing with generating an HEAP dump log. There > are no errors reported in system.log OR Debug.log. > Exception in hs_err_PID.log: > # Problematic frame: > # J 8283 C2 > org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I > (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334] > Java Threads: ( => current thread ) > 0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon > [_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)] > 0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon > [_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)] > 0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon > [_thread_blocked, id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)] > 0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon > [_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)] > 0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon > : > : > lot of threads in BLOCKED status > Other Threads: > 0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] > [id=32098] > 0x2b7d38fa9de0 WatcherThread [stack: > 0x2b7d88ee9000,0x2b7d88fe9000] [id=32108] > VM state:not at safepoint (normal execution) > VM Mutex/Monitor currently owned by a thread: None > Heap: > garbage-first heap total 8388608K, used 6791168K [0x0003c000, > 0x0003c0404000, 0x0007c000) > region size 4096K, 785 young (3215360K), 55 survivors (225280K) > Metaspace used 40915K, capacity 42044K, committed 42368K, reserved 1087488K > class space used 4429K, capacity 4646K, committed 4736K, reserved 1048576K > Heap Regions: (Y=young(eden), SU=young(survivor), HS=humongous(starts), > HC=humongous(continues), CS=collection set, F=free, TS=gc time stamp, > PTAMS=previous top-at-mark-start, NTAMS=next top-at-mark-start) > AC 0 O TS 0 PTAMS 0x0003c040 NTAMS 0x0003c040 space 4096K, > 100% used [0x0003c000, 0x0003c040) > AC 0 O TS 0 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, > 100% used [0x0003c040, 0x0003c080) > AC 0 O TS 9 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, > 100% used [0x0003c080, 0x0003c0c0) > AC 0 O TS 11 PTAMS 0x0003c0c0 NTAMS 0x0003c0c0 space 4096K, > 100% used [0x0003c0c0, 0x0003c100) > AC 0 O TS 11 PTAMS 0x0003c100 NTAMS 0x0003c100 space 4096K, > 100% used [0x0003c100, 0x0003c140) > AC 0 O TS 11 PTAMS 0x0003c140 NTAMS 0x0003c140 space 4096K, > 100% used [0x0003c140, 0x0003c180) > : > : > lot of such messages -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14711) Apache Cassandra 3.2 crashing with exception org.apache.cassandra.db.marshal.TimestampType.compareCustom
[ https://issues.apache.org/jira/browse/CASSANDRA-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saurabh updated CASSANDRA-14711: Description: Hi Team, I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12.. Issue: Cassandra is continuously crashing with generating an HEAP dump log. There are no errors reported in system.log OR Debug.log. Exception in hs_err_PID.log: # Problematic frame: # J 8283 C2 org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334] #--- P R O C E S S --- Java Threads: ( => current thread ) 0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon [_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)] 0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon [_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)] 0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon [_thread_blocked, id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)] 0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon [_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)] 0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon : : lot of threads in BLOCKED status Other Threads: 0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] [id=32098] 0x2b7d38fa9de0 WatcherThread [stack: 0x2b7d88ee9000,0x2b7d88fe9000] [id=32108] VM state:not at safepoint (normal execution) VM Mutex/Monitor currently owned by a thread: None Heap: garbage-first heap total 8388608K, used 6791168K [0x0003c000, 0x0003c0404000, 0x0007c000) region size 4096K, 785 young (3215360K), 55 survivors (225280K) Metaspace used 40915K, capacity 42044K, committed 42368K, reserved 1087488K class space used 4429K, capacity 4646K, committed 4736K, reserved 1048576K Heap Regions: (Y=young(eden), SU=young(survivor), HS=humongous(starts), HC=humongous(continues), CS=collection set, F=free, TS=gc time stamp, PTAMS=previous top-at-mark-start, NTAMS=next top-at-mark-start) AC 0 O TS 0 PTAMS 0x0003c040 NTAMS 0x0003c040 space 4096K, 100% used [0x0003c000, 0x0003c040) AC 0 O TS 0 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 100% used [0x0003c040, 0x0003c080) AC 0 O TS 9 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 100% used [0x0003c080, 0x0003c0c0) AC 0 O TS 11 PTAMS 0x0003c0c0 NTAMS 0x0003c0c0 space 4096K, 100% used [0x0003c0c0, 0x0003c100) AC 0 O TS 11 PTAMS 0x0003c100 NTAMS 0x0003c100 space 4096K, 100% used [0x0003c100, 0x0003c140) AC 0 O TS 11 PTAMS 0x0003c140 NTAMS 0x0003c140 space 4096K, 100% used [0x0003c140, 0x0003c180) : : lot of such messages was: Hi Team, I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12.. Issue: Cassandra is continuously crashing with generating an HEAP dump log. There are no errors reported in system.log OR Debug.log. Exception in hs_err_PID.log: # Problematic frame: # J 8283 C2 org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334] # --- T H R E A D --- Current thread (0x2b7d3a1033e0): JavaThread "SharedPool-Worker-1" daemon [_thread_in_Java, id=32216, stack(0x2b7e4085f000,0x2b7e408a)] siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x14914c69 Registers: RAX=0x0001, RBX=0x, RCX=0x9f1fbef0, RDX=0x0004f8fdf798 RSP=0x2b7e4089e4b0, RBP=0x0001, RSI=0x14907800, RDI=0x R8 =0xd469, R9 =0x, R10=0x0004a41764c8, R11=0x R12=0x, R13=0x, R14=0xd469, R15=0x2b7d3a1033e0 RIP=0x2b7d3d417fb4, EFLAGS=0x00010283, CSGSFS=0x0033, ERR=0x0004 TRAPNO=0x000e [error occurred during error reporting (printing register info), id 0xb] Stack: [0x2b7e4085f000,0x2b7e408a], sp=0x2b7e4089e4b0, free space=253k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) J 8283 C2 org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334] J 12970 C2 org.apache.cassandra.db.Slice$Bound.compareTo(Lorg/apache/cassandra/db/ClusteringComparator;Ljava/util/List;)I (119 bytes) @ 0x2b7d3e0291c0 [0x2b7d3e028900+0x8c0] J 16245 C2
[jira] [Updated] (CASSANDRA-14711) Apache Cassandra 3.2 crashing with exception org.apache.cassandra.db.marshal.TimestampType.compareCustom
[ https://issues.apache.org/jira/browse/CASSANDRA-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saurabh updated CASSANDRA-14711: Description: Hi Team, I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12.. Issue: Cassandra is continuously crashing with generating an HEAP dump log. There are no errors reported in system.log OR Debug.log. Exception in hs_err_PID.log: # Problematic frame: # J 8283 C2 org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334] Java Threads: ( => current thread ) 0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon [_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)] 0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon [_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)] 0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon [_thread_blocked, id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)] 0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon [_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)] 0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon : : lot of threads in BLOCKED status Other Threads: 0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] [id=32098] 0x2b7d38fa9de0 WatcherThread [stack: 0x2b7d88ee9000,0x2b7d88fe9000] [id=32108] VM state:not at safepoint (normal execution) VM Mutex/Monitor currently owned by a thread: None Heap: garbage-first heap total 8388608K, used 6791168K [0x0003c000, 0x0003c0404000, 0x0007c000) region size 4096K, 785 young (3215360K), 55 survivors (225280K) Metaspace used 40915K, capacity 42044K, committed 42368K, reserved 1087488K class space used 4429K, capacity 4646K, committed 4736K, reserved 1048576K Heap Regions: (Y=young(eden), SU=young(survivor), HS=humongous(starts), HC=humongous(continues), CS=collection set, F=free, TS=gc time stamp, PTAMS=previous top-at-mark-start, NTAMS=next top-at-mark-start) AC 0 O TS 0 PTAMS 0x0003c040 NTAMS 0x0003c040 space 4096K, 100% used [0x0003c000, 0x0003c040) AC 0 O TS 0 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 100% used [0x0003c040, 0x0003c080) AC 0 O TS 9 PTAMS 0x0003c080 NTAMS 0x0003c080 space 4096K, 100% used [0x0003c080, 0x0003c0c0) AC 0 O TS 11 PTAMS 0x0003c0c0 NTAMS 0x0003c0c0 space 4096K, 100% used [0x0003c0c0, 0x0003c100) AC 0 O TS 11 PTAMS 0x0003c100 NTAMS 0x0003c100 space 4096K, 100% used [0x0003c100, 0x0003c140) AC 0 O TS 11 PTAMS 0x0003c140 NTAMS 0x0003c140 space 4096K, 100% used [0x0003c140, 0x0003c180) : : lot of such messages was: Hi Team, I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12.. Issue: Cassandra is continuously crashing with generating an HEAP dump log. There are no errors reported in system.log OR Debug.log. Exception in hs_err_PID.log: # Problematic frame: # J 8283 C2 org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334] #--- P R O C E S S --- Java Threads: ( => current thread ) 0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon [_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)] 0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon [_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)] 0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon [_thread_blocked, id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)] 0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon [_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)] 0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon : : lot of threads in BLOCKED status Other Threads: 0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] [id=32098] 0x2b7d38fa9de0 WatcherThread [stack: 0x2b7d88ee9000,0x2b7d88fe9000] [id=32108] VM state:not at safepoint (normal execution) VM Mutex/Monitor currently owned by a thread: None Heap: garbage-first heap total 8388608K, used 6791168K [0x0003c000, 0x0003c0404000, 0x0007c000) region size 4096K, 785 young (3215360K), 55 survivors (225280K) Metaspace used 40915K, capacity 42044K, committed 42368K, reserved 1087488K class space used 4429K, capacity 4646K, committed 4736K, reserved 1048576K Heap Regions: (Y=young(eden), SU=young(survivor), HS=humongous(starts), HC=humongous(continues), CS=collection set, F=free, TS=gc time stamp,
[jira] [Created] (CASSANDRA-14711) Apache Cassandra 3.2 crashing with exception org.apache.cassandra.db.marshal.TimestampType.compareCustom
Saurabh created CASSANDRA-14711: --- Summary: Apache Cassandra 3.2 crashing with exception org.apache.cassandra.db.marshal.TimestampType.compareCustom Key: CASSANDRA-14711 URL: https://issues.apache.org/jira/browse/CASSANDRA-14711 Project: Cassandra Issue Type: Bug Reporter: Saurabh Attachments: hs_err_pid32069.log Hi Team, I am using Apache Cassandra 3.2 with Java 1.8.0_161-b12.. Issue: Cassandra is continuously crashing with generating an HEAP dump log. There are no errors reported in system.log OR Debug.log. Exception in hs_err_PID.log: # Problematic frame: # J 8283 C2 org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334] # --- T H R E A D --- Current thread (0x2b7d3a1033e0): JavaThread "SharedPool-Worker-1" daemon [_thread_in_Java, id=32216, stack(0x2b7e4085f000,0x2b7e408a)] siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x14914c69 Registers: RAX=0x0001, RBX=0x, RCX=0x9f1fbef0, RDX=0x0004f8fdf798 RSP=0x2b7e4089e4b0, RBP=0x0001, RSI=0x14907800, RDI=0x R8 =0xd469, R9 =0x, R10=0x0004a41764c8, R11=0x R12=0x, R13=0x, R14=0xd469, R15=0x2b7d3a1033e0 RIP=0x2b7d3d417fb4, EFLAGS=0x00010283, CSGSFS=0x0033, ERR=0x0004 TRAPNO=0x000e [error occurred during error reporting (printing register info), id 0xb] Stack: [0x2b7e4085f000,0x2b7e408a], sp=0x2b7e4089e4b0, free space=253k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) J 8283 C2 org.apache.cassandra.db.marshal.TimestampType.compareCustom(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I (6 bytes) @ 0x2b7d3d417fb4 [0x2b7d3d417c80+0x334] J 12970 C2 org.apache.cassandra.db.Slice$Bound.compareTo(Lorg/apache/cassandra/db/ClusteringComparator;Ljava/util/List;)I (119 bytes) @ 0x2b7d3e0291c0 [0x2b7d3e028900+0x8c0] J 16245 C2 org.apache.cassandra.db.Slices$ArrayBackedSlices.intersects(Ljava/util/List;Ljava/util/List;)Z (46 bytes) @ 0x2b7d3e619cfc [0x2b7d3e619b20+0x1dc] J 18878 C2 org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(Lorg/apache/cassandra/db/ColumnFamilyStore;Z)Lorg/apache/cassandra/db/rows/UnfilteredRowIterator; (822 bytes) @ 0x2b7d3ebcabf4 [0x2b7d3ebc7be0+0x3014] J 9377 C2 org.apache.cassandra.db.ReadCommand.executeLocally(Lorg/apache/cassandra/db/ReadExecutionController;)Lorg/apache/cassandra/db/partitions/UnfilteredPartitionIterator; (219 bytes) @ 0x2b7d3d80cde8 [0x2b7d3d80c0a0+0xd48] J 14198 C2 org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(Lorg/apache/cassandra/net/MessageIn;I)V (328 bytes) @ 0x2b7d3c8bcbd0 [0x2b7d3c8bca20+0x1b0] J 9731 C2 org.apache.cassandra.net.MessageDeliveryTask.run()V (187 bytes) @ 0x2b7d3d158d60 [0x2b7d3d158bc0+0x1a0] J 18999% C2 org.apache.cassandra.concurrent.SEPWorker.run()V (253 bytes) @ 0x2b7d3eaa10ec [0x2b7d3eaa0960+0x78c] j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub V [libjvm.so+0x695ae6] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x1056 V [libjvm.so+0x695ff1] JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x321 V [libjvm.so+0x696497] JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)+0x47 V [libjvm.so+0x731cb0] thread_entry(JavaThread*, Thread*)+0xa0 V [libjvm.so+0xa7eaa3] JavaThread::thread_main_inner()+0x103 V [libjvm.so+0xa7ebec] JavaThread::run()+0x11c V [libjvm.so+0x92da28] java_start(Thread*)+0x108 C [libpthread.so.0+0x7e25] start_thread+0xc5 --- P R O C E S S --- Java Threads: ( => current thread ) 0x2b7da57924a0 JavaThread "MemtableReclaimMemory:52" daemon [_thread_blocked, id=117880, stack(0x2b7d917ff000,0x2b7d9184)] 0x2b7d39f6a9e0 JavaThread "PerDiskMemtableFlushWriter_0:52" daemon [_thread_blocked, id=117879, stack(0x2b7e4ea94000,0x2b7e4ead5000)] 0x2b7d39d0f520 JavaThread "MemtablePostFlush:53" daemon [_thread_blocked, id=117878, stack(0x2b7e407dd000,0x2b7e4081e000)] 0x2b7df31a9150 JavaThread "MemtableFlushWriter:52" daemon [_thread_blocked, id=117877, stack(0x2b7e406d9000,0x2b7e4071a000)] 0x2b7e53e60110 JavaThread "RMI TCP Connection(1795)-127.0.0.1" daemon : : lot of threads in BLOCKED status Other Threads: 0x2b7d38de5ea0 VMThread [stack: 0x2b7d8208d000,0x2b7d8218d000] [id=32098] 0x2b7d38fa9de0 WatcherThread [stack: 0x2b7d88ee9000,0x2b7d88fe9000] [id=32108] VM