[ https://issues.apache.org/jira/browse/CASSANDRA-10992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302740#comment-15302740 ]
Paulo Motta commented on CASSANDRA-10992: ----------------------------------------- >From the thread dump it seems the stream session is hanged on >{{StreamReader.drain}}, more specifically trying to do >{{CompressedInputStream.read}} which blocks forever on {{Queue.take()}}: {noformat} Thread 16969: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=175 (Compiled frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=2039 (Compiled frame) - java.util.concurrent.ArrayBlockingQueue.take() @bci=20, line=403 (Compiled frame) - org.apache.cassandra.streaming.compress.CompressedInputStream.read() @bci=31, line=95 (Compiled frame) - java.io.InputStream.read(byte[], int, int) @bci=43, line=170 (Compiled frame) - java.io.InputStream.skip(long) @bci=44, line=224 (Interpreted frame) - org.apache.cassandra.streaming.StreamReader.drain(java.io.InputStream, long) @bci=11, line=158 (Interpreted frame) - org.apache.cassandra.streaming.compress.CompressedStreamReader.read(java.nio.channels.ReadableByteChannel) @bci=577, line=129 (Compiled frame) - org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(java.nio.channels.ReadableByteChannel, int, org.apache.cassandra.streaming.StreamSession) @bci=64, line=48 (Compiled frame) - org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(java.nio.channels.ReadableByteChannel, int, org.apache.cassandra.streaming.StreamSession) @bci=4, line=38 (Compiled frame) - org.apache.cassandra.streaming.messages.StreamMessage.deserialize(java.nio.channels.ReadableByteChannel, int, org.apache.cassandra.streaming.StreamSession) @bci=41, line=56 (Compiled frame) - org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run() @bci=24, line=257 (Compiled frame) - java.lang.Thread.run() @bci=11, line=745 (Compiled frame) {noformat} Compressed input stream works with an auxiliary thread that reads compressed chunks from the socket stream and adds that to a data buffer queue that is consumed from {{CompressedStreamReader}} during reads. If there is an exception reading from the socket, the reader thread adds a poison pill to the data buffer queue that throws an {{IOException}} on next read. Upon receiving an exception on read {{CompressedStreamReader}} tries to drain the socket, which performs an additional read on the data buffer queue that is empty and blocks forever, causing the stream session to hang. >From my understanding, we only drain the socket to perform stream session >retry later. But since we never retry on {{IOException}}, we shouldn't try to >drain the socket when getting an {{IOException}} on {{CompressedInputStream}}. >WDYT [~yukim]? We should perhaps go further in a separate ticket and reconsider the stream retry mechanism, is there any situation where retry is working? [~mlowicki] do you see any {{Error while reading compressed input stream}} or {{Error while reading partition}} warning in the system.log? > Hanging streaming sessions > -------------------------- > > Key: CASSANDRA-10992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10992 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.12, Debian Wheezy > Reporter: mlowicki > Assignee: Paulo Motta > Fix For: 2.1.12 > > Attachments: apache-cassandra-2.1.12-SNAPSHOT.jar, db1.ams.jstack, > db6.analytics.jstack > > > I've started recently running repair using [Cassandra > Reaper|https://github.com/spotify/cassandra-reaper] (built-in {{nodetool > repair}} doesn't work for me - CASSANDRA-9935). It behaves fine but I've > noticed hanging streaming sessions: > {code} > root@db1:~# date > Sat Jan 9 16:43:00 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > root@db1:~# date > Sat Jan 9 17:45:42 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > {code} > Such sessions are left even when repair job is long time done (confirmed by > checking Reaper's and Cassandra's logs). {{streaming_socket_timeout_in_ms}} > in cassandra.yaml is set to default value (3600000). -- This message was sent by Atlassian JIRA (v6.3.4#6332)