[jira] [Comment Edited] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2018-06-03 Thread Lerh Chuan Low (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499721#comment-16499721
 ] 

Lerh Chuan Low edited comment on CASSANDRA-13938 at 6/4/18 4:08 AM:


Here's another stacktrace that may help - I've also been getting these while 
testing trunk in EC2. The steps I use are the same: 
 - Disable hintedhandoff
 - Take out 1 node
 - Run stress for 10 mins, then run repair

It will error out and the nodes also end up in a bizarre situation with gossip 
that I will have to stop the entire cluster and then start them up one at a 
time (in a rolling restart they still won't be able to sort themselves out). 
Sometimes it errors with {{stream can only read forward}} (as above and in the 
JIRA), but here's another stacktrace that has also showed up several times in 
some of the failed nodes:
{code:java}
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: ERROR 
[Stream-Deserializer-35.155.140.194:39371-28daf76d] 2018-05-31 02:07:24,445 
StreamingInboundHandler.java:210 - [Stream channel: 28daf76d] stream operation 
from 35.155.140.194:39371 failed
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: 
java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 1711542017
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
net.jpountz.util.ByteBufferUtils.checkRange(ByteBufferUtils.java:20)
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
net.jpountz.util.ByteBufferUtils.checkRange(ByteBufferUtils.java:14)
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:48)
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.compress.LZ4Compressor.uncompress(LZ4Compressor.java:162)
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.db.streaming.CompressedInputStream.decompress(CompressedInputStream.java:163)
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.db.streaming.CompressedInputStream.reBuffer(CompressedInputStream.java:144)
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.db.streaming.CompressedInputStream.reBuffer(CompressedInputStream.java:119)
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:144)
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.util.RebufferingInputStream.readPrimitiveSlowly(RebufferingInputStream.java:108)
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.util.RebufferingInputStream.readShort(RebufferingInputStream.java:164)
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedShort(RebufferingInputStream.java:170)
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.util.TrackedDataInputPlus.readUnsignedShort(TrackedDataInputPlus.java:139)
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:367)
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:377)
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.db.streaming.CassandraStreamReader$StreamDeserializer.newPartition(CassandraStreamReader.java:199)
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.db.streaming.CassandraStreamReader.writePartition(CassandraStreamReader.java:172){code}
I get the feeling they may be related but I'm not sure...I can open a different 
Jira for this if you like, but otherwise hope it may point out more clues as to 
what is going on :/ 


was (Author: lerh low):
Here's another stacktrace that may help - I've also been getting these while 
testing trunk in EC2. The steps I use are the same: 
 - Disable hintedhandoff
 - Take out 1 node
 - Run stress for 10 mins, then run repair

It will error out and the nodes also end up in a bizarre situation with gossip 
that I will have to stop the entire cluster and then start them up one at a 
time (in a rolling restart they still won't be able to sort themselves out). 
Sometimes it errors with {{stream can only read forward}} (as above and in the 
JIRA), but here's another stacktrace that has also showed up several times in 
some of the failed nodes:
{code:java}
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: WARN 
[Stream-Deserializer-35.155.140.194:39371-28daf76d] 2018-05-31 02:07:24,440 
CompressedCassandraStreamReader.java:110 - [Stream 
e12b9b10-6476-11e8-936f-35a28469245e] Error while reading partition null from 
stream on May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: ERROR 
[Stream-Deserializer-35.155.140.194:39371-28daf76d] 2018-05-31 02:07:24,445 
StreamingInboundHandler.java:210 - [Stream channel: 28daf76d] stream operation 
from 

[jira] [Comment Edited] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2018-06-03 Thread Lerh Chuan Low (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499721#comment-16499721
 ] 

Lerh Chuan Low edited comment on CASSANDRA-13938 at 6/4/18 4:07 AM:


Here's another stacktrace that may help - I've also been getting these while 
testing trunk in EC2. The steps I use are the same: 
 - Disable hintedhandoff
 - Take out 1 node
 - Run stress for 10 mins, then run repair

It will error out and the nodes also end up in a bizarre situation with gossip 
that I will have to stop the entire cluster and then start them up one at a 
time (in a rolling restart they still won't be able to sort themselves out). 
Sometimes it errors with {{stream can only read forward}} (as above and in the 
JIRA), but here's another stacktrace that has also showed up several times in 
some of the failed nodes:
{code:java}
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: WARN 
[Stream-Deserializer-35.155.140.194:39371-28daf76d] 2018-05-31 02:07:24,440 
CompressedCassandraStreamReader.java:110 - [Stream 
e12b9b10-6476-11e8-936f-35a28469245e] Error while reading partition null from 
stream on May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: ERROR 
[Stream-Deserializer-35.155.140.194:39371-28daf76d] 2018-05-31 02:07:24,445 
StreamingInboundHandler.java:210 - [Stream channel: 28daf76d] stream operation 
from 35.155.140.194:39371 failed May 31 02:07:24 ip-10-0-18-230 
cassandra[6034]: java.lang.ArrayIndexOutOfBoundsException: Array index out of 
range: 1711542017 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
net.jpountz.util.ByteBufferUtils.checkRange(ByteBufferUtils.java:20) May 31 
02:07:24 ip-10-0-18-230 cassandra[6034]: at 
net.jpountz.util.ByteBufferUtils.checkRange(ByteBufferUtils.java:14) May 31 
02:07:24 ip-10-0-18-230 cassandra[6034]: at 
net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:48)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.compress.LZ4Compressor.uncompress(LZ4Compressor.java:162)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.db.streaming.CompressedInputStream.decompress(CompressedInputStream.java:163)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.db.streaming.CompressedInputStream.reBuffer(CompressedInputStream.java:144)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.db.streaming.CompressedInputStream.reBuffer(CompressedInputStream.java:119)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:144)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.util.RebufferingInputStream.readPrimitiveSlowly(RebufferingInputStream.java:108)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.util.RebufferingInputStream.readShort(RebufferingInputStream.java:164)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedShort(RebufferingInputStream.java:170)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.util.TrackedDataInputPlus.readUnsignedShort(TrackedDataInputPlus.java:139)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:367)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:377)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.db.streaming.CassandraStreamReader$StreamDeserializer.newPartition(CassandraStreamReader.java:199)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.db.streaming.CassandraStreamReader.writePartition(CassandraStreamReader.java:172){code}
I get the feeling they may be related but I'm not sure...I can open a different 
Jira for this if you like, but otherwise hope it may point out more clues as to 
what is going on :/ 


was (Author: lerh low):
Here's another stacktrace that may help - I've also been getting these while 
testing trunk in EC2. The steps I use are the same: 

- Disable hintedhandoff
- Take out 1 node
- Run stress for 10 mins, then run repair

It will error out and the nodes also end up in a bizarre situation with gossip 
that I will have to stop the entire cluster and then start them up one at a 
time (in a rolling restart they still won't be able to sort themselves out). 
Sometimes it errors with {{stream can only read forward}}, but here's another 
stacktrace that has also showed up several times in some of the failed nodes:
{code:java}
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: WARN 
[Stream-Deserializer-35.155.140.194:39371-28daf76d] 2018-05-31 02:07:24,440 
CompressedCassandraStreamReader.java:110 - [Stream 
e12b9b10-6476-11e8-936f-35a28469245e] Error while 

[jira] [Commented] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2018-06-03 Thread Lerh Chuan Low (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499721#comment-16499721
 ] 

Lerh Chuan Low commented on CASSANDRA-13938:


Here's another stacktrace that may help - I've also been getting these while 
testing trunk in EC2. The steps I use are the same: 

- Disable hintedhandoff
- Take out 1 node
- Run stress for 10 mins, then run repair

It will error out and the nodes also end up in a bizarre situation with gossip 
that I will have to stop the entire cluster and then start them up one at a 
time (in a rolling restart they still won't be able to sort themselves out). 
Sometimes it errors with {{stream can only read forward}}, but here's another 
stacktrace that has also showed up several times in some of the failed nodes:
{code:java}
May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: WARN 
[Stream-Deserializer-35.155.140.194:39371-28daf76d] 2018-05-31 02:07:24,440 
CompressedCassandraStreamReader.java:110 - [Stream 
e12b9b10-6476-11e8-936f-35a28469245e] Error while reading partition null from 
stream on May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: ERROR 
[Stream-Deserializer-35.155.140.194:39371-28daf76d] 2018-05-31 02:07:24,445 
StreamingInboundHandler.java:210 - [Stream channel: 28daf76d] stream operation 
from 35.155.140.194:39371 failed May 31 02:07:24 ip-10-0-18-230 
cassandra[6034]: java.lang.ArrayIndexOutOfBoundsException: Array index out of 
range: 1711542017 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
net.jpountz.util.ByteBufferUtils.checkRange(ByteBufferUtils.java:20) May 31 
02:07:24 ip-10-0-18-230 cassandra[6034]: at 
net.jpountz.util.ByteBufferUtils.checkRange(ByteBufferUtils.java:14) May 31 
02:07:24 ip-10-0-18-230 cassandra[6034]: at 
net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:48)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.compress.LZ4Compressor.uncompress(LZ4Compressor.java:162)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.db.streaming.CompressedInputStream.decompress(CompressedInputStream.java:163)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.db.streaming.CompressedInputStream.reBuffer(CompressedInputStream.java:144)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.db.streaming.CompressedInputStream.reBuffer(CompressedInputStream.java:119)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:144)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.util.RebufferingInputStream.readPrimitiveSlowly(RebufferingInputStream.java:108)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.util.RebufferingInputStream.readShort(RebufferingInputStream.java:164)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedShort(RebufferingInputStream.java:170)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.io.util.TrackedDataInputPlus.readUnsignedShort(TrackedDataInputPlus.java:139)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:367)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:377)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.db.streaming.CassandraStreamReader$StreamDeserializer.newPartition(CassandraStreamReader.java:199)
 May 31 02:07:24 ip-10-0-18-230 cassandra[6034]: at 
org.apache.cassandra.db.streaming.CassandraStreamReader.writePartition(CassandraStreamReader.java:172){code}
I get the feeling they may be related but I'm not sure...I can open a different 
Jira for this if you like, but otherwise hope it may point out more clues as to 
what is going on :/ 

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Repair
>Reporter: Nate McCall
>Assignee: Jason Brown
>Priority: Critical
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not 

[jira] [Created] (CASSANDRA-14494) Investigate possibility of a cqlsh terminfo

2018-06-03 Thread Patrick Bannister (JIRA)
Patrick Bannister created CASSANDRA-14494:
-

 Summary: Investigate possibility of a cqlsh terminfo
 Key: CASSANDRA-14494
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14494
 Project: Cassandra
  Issue Type: Sub-task
 Environment: This behavior has been observed in xterm on CentOS 7.5 
platforms. The test_cqlsh_output.py unit tests 
(pylib/cqlshlib/test/test_cqlsh_output.py) are a good place to see it in action.
Reporter: Patrick Bannister
 Fix For: 4.x


Summary: investigate whether we could use a cqlsh-specific terminfo file to 
prevent use of the set-meta-mode escape sequence in xterm without breaking 
colors. If it works, see if we can install it in an appropriate place using 
Python distutils. If yes to both, generate a cqlsh terminfo file and work it 
into the install process.

Long detailed explanation:

In some more recent environments, in Python REPL applications that use the 
readline module, the set meta mode escape sequence is output before each 
prompt. This escape sequence has caused problems for some applications, and in 
our case, some of our cqlsh unit tests 
(pylib/cqlshlib/test/test_cqlsh_output.py) choke on this output because of the 
way our tests are designed to detect the cqlsh prompt. This behavior was 
observed on a CentOS 7.5 platform.

The set-meta-mode escape sequence normally appears as "[?1034h" in output; it's 
normally defined as the bytes 1b 5b 3f 31 30 33 34 68.  The exact value of the 
escape sequence is configurable and may be found on a GNU/Linux platform by 
running the command:
{code:java}
tput smm | hexdump{code}
If this command gives no output, then the set meta mode sequence is not defined 
on this platform for the terminal in use. Refer to the xterm and terminfo man 
pages for more information on this sequence.

There are easier ways to solve this problem for the sake of the unit test, but 
if time allows, I'd like to look into this to achieve a more consistent output 
behavior for cqlsh on GNU/Linux platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14355) Memory leak

2018-06-03 Thread Abdul Patel (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499444#comment-16499444
 ] 

Abdul Patel commented on CASSANDRA-14355:
-

I have seen nodetool info reporting high memory usage after a weeks of 
upgrading t0 3.11.2, i rebooted all nodes 1 week no issues, later agian same , 
no errors in errorlog .

is this related to this issue?

> Memory leak
> ---
>
> Key: CASSANDRA-14355
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14355
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Debian Jessie, OpenJDK 1.8.0_151
>Reporter: Eric Evans
>Priority: Major
> Fix For: 3.11.3
>
> Attachments: 01_Screenshot from 2018-04-04 14-24-00.png, 
> 02_Screenshot from 2018-04-04 14-28-33.png, 03_Screenshot from 2018-04-04 
> 14-24-50.png
>
>
> We're seeing regular, frequent {{OutOfMemoryError}} exceptions.  Similar to 
> CASSANDRA-13754, an analysis of the heap dumps shows the heap consumed by the 
> {{threadLocals}} member of the instances of 
> {{io.netty.util.concurrent.FastThreadLocalThread}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org