[jira] [Comment Edited] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499721#comment-16499721 ] Lerh Chuan Low edited comment on CASSANDRA-13938 at 6/4/18 4:08 AM: Here's another stacktrace that may help - I've also been getting these while testing trunk in EC2. The steps I use are the same: - Disable hintedhandoff - Take out 1 node - Run stress for 10 mins, then run repair It will error out and the nodes also end up in a bizarre situation with gossip that I will have to stop the entire cluster and then start them up one at a time (in a rolling restart they still won't be able to sort themselves out). Sometimes it errors with {{stream can only read forward}} (as above and in the JIRA), but here's another stacktrace that has also showed up several times in some of the failed nodes: {code:java} May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: ERROR [Stream-Deserializer-35.155.140.194:39371-28daf76d] 2018-05-31 02:07:24,445 StreamingInboundHandler.java:210 - [Stream channel: 28daf76d] stream operation from 35.155.140.194:39371 failed May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 1711542017 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at net.jpountz.util.ByteBufferUtils.checkRange(ByteBufferUtils.java:20) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at net.jpountz.util.ByteBufferUtils.checkRange(ByteBufferUtils.java:14) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:48) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.compress.LZ4Compressor.uncompress(LZ4Compressor.java:162) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.db.streaming.CompressedInputStream.decompress(CompressedInputStream.java:163) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.db.streaming.CompressedInputStream.reBuffer(CompressedInputStream.java:144) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.db.streaming.CompressedInputStream.reBuffer(CompressedInputStream.java:119) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:144) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.util.RebufferingInputStream.readPrimitiveSlowly(RebufferingInputStream.java:108) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.util.RebufferingInputStream.readShort(RebufferingInputStream.java:164) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedShort(RebufferingInputStream.java:170) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.util.TrackedDataInputPlus.readUnsignedShort(TrackedDataInputPlus.java:139) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:367) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:377) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.db.streaming.CassandraStreamReader$StreamDeserializer.newPartition(CassandraStreamReader.java:199) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.db.streaming.CassandraStreamReader.writePartition(CassandraStreamReader.java:172){code} I get the feeling they may be related but I'm not sure...I can open a different Jira for this if you like, but otherwise hope it may point out more clues as to what is going on :/ was (Author: lerh low): Here's another stacktrace that may help - I've also been getting these while testing trunk in EC2. The steps I use are the same: - Disable hintedhandoff - Take out 1 node - Run stress for 10 mins, then run repair It will error out and the nodes also end up in a bizarre situation with gossip that I will have to stop the entire cluster and then start them up one at a time (in a rolling restart they still won't be able to sort themselves out). Sometimes it errors with {{stream can only read forward}} (as above and in the JIRA), but here's another stacktrace that has also showed up several times in some of the failed nodes: {code:java} May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: WARN [Stream-Deserializer-35.155.140.194:39371-28daf76d] 2018-05-31 02:07:24,440 CompressedCassandraStreamReader.java:110 - [Stream e12b9b10-6476-11e8-936f-35a28469245e] Error while reading partition null from stream on May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: ERROR [Stream-Deserializer-35.155.140.194:39371-28daf76d] 2018-05-31 02:07:24,445 StreamingInboundHandler.java:210 - [Stream channel: 28daf76d] stream operation from
[jira] [Comment Edited] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499721#comment-16499721 ] Lerh Chuan Low edited comment on CASSANDRA-13938 at 6/4/18 4:07 AM: Here's another stacktrace that may help - I've also been getting these while testing trunk in EC2. The steps I use are the same: - Disable hintedhandoff - Take out 1 node - Run stress for 10 mins, then run repair It will error out and the nodes also end up in a bizarre situation with gossip that I will have to stop the entire cluster and then start them up one at a time (in a rolling restart they still won't be able to sort themselves out). Sometimes it errors with {{stream can only read forward}} (as above and in the JIRA), but here's another stacktrace that has also showed up several times in some of the failed nodes: {code:java} May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: WARN [Stream-Deserializer-35.155.140.194:39371-28daf76d] 2018-05-31 02:07:24,440 CompressedCassandraStreamReader.java:110 - [Stream e12b9b10-6476-11e8-936f-35a28469245e] Error while reading partition null from stream on May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: ERROR [Stream-Deserializer-35.155.140.194:39371-28daf76d] 2018-05-31 02:07:24,445 StreamingInboundHandler.java:210 - [Stream channel: 28daf76d] stream operation from 35.155.140.194:39371 failed May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 1711542017 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at net.jpountz.util.ByteBufferUtils.checkRange(ByteBufferUtils.java:20) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at net.jpountz.util.ByteBufferUtils.checkRange(ByteBufferUtils.java:14) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:48) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.compress.LZ4Compressor.uncompress(LZ4Compressor.java:162) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.db.streaming.CompressedInputStream.decompress(CompressedInputStream.java:163) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.db.streaming.CompressedInputStream.reBuffer(CompressedInputStream.java:144) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.db.streaming.CompressedInputStream.reBuffer(CompressedInputStream.java:119) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:144) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.util.RebufferingInputStream.readPrimitiveSlowly(RebufferingInputStream.java:108) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.util.RebufferingInputStream.readShort(RebufferingInputStream.java:164) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedShort(RebufferingInputStream.java:170) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.util.TrackedDataInputPlus.readUnsignedShort(TrackedDataInputPlus.java:139) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:367) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:377) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.db.streaming.CassandraStreamReader$StreamDeserializer.newPartition(CassandraStreamReader.java:199) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.db.streaming.CassandraStreamReader.writePartition(CassandraStreamReader.java:172){code} I get the feeling they may be related but I'm not sure...I can open a different Jira for this if you like, but otherwise hope it may point out more clues as to what is going on :/ was (Author: lerh low): Here's another stacktrace that may help - I've also been getting these while testing trunk in EC2. The steps I use are the same: - Disable hintedhandoff - Take out 1 node - Run stress for 10 mins, then run repair It will error out and the nodes also end up in a bizarre situation with gossip that I will have to stop the entire cluster and then start them up one at a time (in a rolling restart they still won't be able to sort themselves out). Sometimes it errors with {{stream can only read forward}}, but here's another stacktrace that has also showed up several times in some of the failed nodes: {code:java} May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: WARN [Stream-Deserializer-35.155.140.194:39371-28daf76d] 2018-05-31 02:07:24,440 CompressedCassandraStreamReader.java:110 - [Stream e12b9b10-6476-11e8-936f-35a28469245e] Error while
[jira] [Commented] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499721#comment-16499721 ] Lerh Chuan Low commented on CASSANDRA-13938: Here's another stacktrace that may help - I've also been getting these while testing trunk in EC2. The steps I use are the same: - Disable hintedhandoff - Take out 1 node - Run stress for 10 mins, then run repair It will error out and the nodes also end up in a bizarre situation with gossip that I will have to stop the entire cluster and then start them up one at a time (in a rolling restart they still won't be able to sort themselves out). Sometimes it errors with {{stream can only read forward}}, but here's another stacktrace that has also showed up several times in some of the failed nodes: {code:java} May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: WARN [Stream-Deserializer-35.155.140.194:39371-28daf76d] 2018-05-31 02:07:24,440 CompressedCassandraStreamReader.java:110 - [Stream e12b9b10-6476-11e8-936f-35a28469245e] Error while reading partition null from stream on May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: ERROR [Stream-Deserializer-35.155.140.194:39371-28daf76d] 2018-05-31 02:07:24,445 StreamingInboundHandler.java:210 - [Stream channel: 28daf76d] stream operation from 35.155.140.194:39371 failed May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 1711542017 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at net.jpountz.util.ByteBufferUtils.checkRange(ByteBufferUtils.java:20) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at net.jpountz.util.ByteBufferUtils.checkRange(ByteBufferUtils.java:14) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:48) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.compress.LZ4Compressor.uncompress(LZ4Compressor.java:162) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.db.streaming.CompressedInputStream.decompress(CompressedInputStream.java:163) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.db.streaming.CompressedInputStream.reBuffer(CompressedInputStream.java:144) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.db.streaming.CompressedInputStream.reBuffer(CompressedInputStream.java:119) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:144) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.util.RebufferingInputStream.readPrimitiveSlowly(RebufferingInputStream.java:108) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.util.RebufferingInputStream.readShort(RebufferingInputStream.java:164) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedShort(RebufferingInputStream.java:170) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.io.util.TrackedDataInputPlus.readUnsignedShort(TrackedDataInputPlus.java:139) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:367) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:377) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.db.streaming.CassandraStreamReader$StreamDeserializer.newPartition(CassandraStreamReader.java:199) May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at org.apache.cassandra.db.streaming.CassandraStreamReader.writePartition(CassandraStreamReader.java:172){code} I get the feeling they may be related but I'm not sure...I can open a different Jira for this if you like, but otherwise hope it may point out more clues as to what is going on :/ > Default repair is broken, crashes other nodes participating in repair (in > trunk) > > > Key: CASSANDRA-13938 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13938 > Project: Cassandra > Issue Type: Bug > Components: Repair >Reporter: Nate McCall >Assignee: Jason Brown >Priority: Critical > Attachments: 13938.yaml, test.sh > > > Running through a simple scenario to test some of the new repair features, I > was not able to make a repair command work. Further, the exception seemed to > trigger a nasty failure state that basically shuts down the netty connections > for messaging *and* CQL on the nodes transferring back data to the node being > repaired. The following steps reproduce this issue consistently. > Cassandra stress profile (probably not
[jira] [Created] (CASSANDRA-14494) Investigate possibility of a cqlsh terminfo
Patrick Bannister created CASSANDRA-14494: - Summary: Investigate possibility of a cqlsh terminfo Key: CASSANDRA-14494 URL: https://issues.apache.org/jira/browse/CASSANDRA-14494 Project: Cassandra Issue Type: Sub-task Environment: This behavior has been observed in xterm on CentOS 7.5 platforms. The test_cqlsh_output.py unit tests (pylib/cqlshlib/test/test_cqlsh_output.py) are a good place to see it in action. Reporter: Patrick Bannister Fix For: 4.x Summary: investigate whether we could use a cqlsh-specific terminfo file to prevent use of the set-meta-mode escape sequence in xterm without breaking colors. If it works, see if we can install it in an appropriate place using Python distutils. If yes to both, generate a cqlsh terminfo file and work it into the install process. Long detailed explanation: In some more recent environments, in Python REPL applications that use the readline module, the set meta mode escape sequence is output before each prompt. This escape sequence has caused problems for some applications, and in our case, some of our cqlsh unit tests (pylib/cqlshlib/test/test_cqlsh_output.py) choke on this output because of the way our tests are designed to detect the cqlsh prompt. This behavior was observed on a CentOS 7.5 platform. The set-meta-mode escape sequence normally appears as "[?1034h" in output; it's normally defined as the bytes 1b 5b 3f 31 30 33 34 68. The exact value of the escape sequence is configurable and may be found on a GNU/Linux platform by running the command: {code:java} tput smm | hexdump{code} If this command gives no output, then the set meta mode sequence is not defined on this platform for the terminal in use. Refer to the xterm and terminfo man pages for more information on this sequence. There are easier ways to solve this problem for the sake of the unit test, but if time allows, I'd like to look into this to achieve a more consistent output behavior for cqlsh on GNU/Linux platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14355) Memory leak
[ https://issues.apache.org/jira/browse/CASSANDRA-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499444#comment-16499444 ] Abdul Patel commented on CASSANDRA-14355: - I have seen nodetool info reporting high memory usage after a weeks of upgrading t0 3.11.2, i rebooted all nodes 1 week no issues, later agian same , no errors in errorlog . is this related to this issue? > Memory leak > --- > > Key: CASSANDRA-14355 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14355 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Debian Jessie, OpenJDK 1.8.0_151 >Reporter: Eric Evans >Priority: Major > Fix For: 3.11.3 > > Attachments: 01_Screenshot from 2018-04-04 14-24-00.png, > 02_Screenshot from 2018-04-04 14-28-33.png, 03_Screenshot from 2018-04-04 > 14-24-50.png > > > We're seeing regular, frequent {{OutOfMemoryError}} exceptions. Similar to > CASSANDRA-13754, an analysis of the heap dumps shows the heap consumed by the > {{threadLocals}} member of the instances of > {{io.netty.util.concurrent.FastThreadLocalThread}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org