[ 
https://issues.apache.org/jira/browse/CASSANDRA-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651433#comment-17651433
 ] 

Jon Meredith commented on CASSANDRA-18110:
------------------------------------------

tl;dr an event listener added to collect data for the streaming status vtable 
is causing lock contention and starving the stream deserializer thread which 
causes a TCP user timeout on the sender that fails the stream.

Details - CASSANDRA-17390 exposed streaming status over a virtual table by 
listening to all streaming events with class {{StreamingState}}. It maintains a 
{{SessionInfo}} structure for each peer active in streaming, tracking a 
{{ProgressInfo}} for each streamed file that tracks transferred/total bytes for 
the file and updates it on every streaming event calling 
{{org.apache.cassandra.streaming.StreamingState.Sessions#create}}.

Streaming events include {{FILE_PROGRESS}} events generated on the 
{{StreamDeserializingTask}} for every section (64kb) read by 
{{CompressedStreamReader}}, and for every file components for 
{{CassandraEntireSSTableStreamReader}}. They happen frequently during heavy 
streaming activity.

In the event handler, {{Sessions.create}} recreates a summary 
{{org.apache.cassandra.streaming.StreamingState.Sessions}} object by iterating 
over all of the active session/files it knows about (multiple times) summing 
received/total for bytes and #files.

In one heap dump investigated the top three sessions has 5612, 1780 and 1528 
received files. Generating the summary takes longer as the bootstrap proceeds 
and  creates contention in the synchronized 
org.apache.cassandra.streaming.StreamResultFuture.fireStreamEvent method. 

Method synchronization is unfair and causes starvation for some of the 
{{StreamDeserializerTask}} threads that consume the {{AsyncStreamingInputPlus}} 
slowing calls to netty to read from the streaming channel in {{rebuffer}}.

On the unfairly scheduled threads, eventually socket reads slow down enough to 
keep the TCP receive queue above the high water mark for 300s preventing an ACK 
of pending bytes to the sender, which trips the TCP user timeout on the sender 
causing it to close it before completing the transfer and failing the stream.

Raising the streaming TCP user timeout or disabling it entirely is still the 
correct workaround for this issue until it can be resolved.
{code:java}
"Stream-Deserializer-/1.2.3.4:7000-deadbeef" #176 daemon prio=5 os_prio=0 
cpu=299135.87ms elapsed=8341.21s tid=0x00007f25e2064a00 nid=0xe02f runnable  
[0x00007f25b03a4000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.cassandra.streaming.SessionInfo.getTotalSizeInProgress(SessionInfo.java:172)
        at 
org.apache.cassandra.streaming.SessionInfo.getTotalSizeReceived(SessionInfo.java:125)
        at 
org.apache.cassandra.streaming.StreamingState$Sessions.create(StreamingState.java:379)
        at 
org.apache.cassandra.streaming.StreamingState.handleStreamEvent(StreamingState.java:255)
        - locked <0x0000000604ac7bf0> (a 
org.apache.cassandra.streaming.StreamingState)
        at 
org.apache.cassandra.streaming.StreamResultFuture.fireStreamEvent(StreamResultFuture.java:218)
        - locked <0x000000060c3e57a0> (a 
org.apache.cassandra.streaming.StreamResultFuture)
        at 
org.apache.cassandra.streaming.StreamResultFuture.handleProgress(StreamResultFuture.java:208)
        at 
org.apache.cassandra.streaming.StreamSession.progress(StreamSession.java:1096)
        at 
org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:96)
        at 
org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:84)
        - locked <0x000000064a637510> (a 
org.apache.cassandra.db.streaming.CassandraIncomingFile)
        at 
org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:55)
        at 
org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:41)
        at 
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:50)
        at 
org.apache.cassandra.streaming.StreamDeserializingTask.run(StreamDeserializingTask.java:59)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(java.base@11.0.16/Thread.java:829)
{code}
other streaming threads blocked on the lock
{code:java}
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.cassandra.streaming.StreamResultFuture.fireStreamEvent(StreamResultFuture.java:214)
        - waiting to lock <0x000000060c3e57a0> (a 
org.apache.cassandra.streaming.StreamResultFuture)
        at 
org.apache.cassandra.streaming.StreamResultFuture.handleProgress(StreamResultFuture.java:208)
        at 
org.apache.cassandra.streaming.StreamSession.progress(StreamSession.java:1096)
        at 
org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:96)
        at 
org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:84)
        - locked <0x00000006e0f7aa58> (a 
org.apache.cassandra.db.streaming.CassandraIncomingFile)
        at 
org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:55)
        at 
org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:41)
        at 
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:50)
        at 
org.apache.cassandra.streaming.StreamDeserializingTask.run(StreamDeserializingTask.java:59)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(java.base@11.0.16/Thread.java:829)
{code}

> Streaming fails during multiple concurrent host replacements
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-18110
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18110
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Bootstrap and Decommission
>            Reporter: Jon Meredith
>            Assignee: Jon Meredith
>            Priority: Normal
>             Fix For: 4.1.x
>
>
> Running four concurrent host replacements on a 4.1.0 development cluster has 
> repeatably failed to complete bootstrap with all four hosts failing bootsrrap 
> and staying in JOINING, logging the message.
> {code:java}
> ERROR 2022-12-07T21:15:48,860 [main] 
> org.apache.cassandra.service.StorageService:2019 - Error while waiting on 
> bootstrap to complete. Bootstrap will have to be restarted.
> {code}
> Bootstrap fails as the the FileStreamTasks on the streaming followers 
> encounter an EOF while transmitting the files.
> {code:java}
> ERROR 2022-12-07T15:49:39,164 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:718 - [Stream 
> #8d313690-7674-11ed-813f-95c261b64a82] Streaming error occurred on session 
> with peer 1.2.3.4:7000 through 1.2.3.4:40292
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>        at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>  ~[cassandra.jar]
>        at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>  ~[cassandra.jar]
>        at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>  ~[cassandra.jar]
>        at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>  ~[cassandra.jar]
>        at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:124)
>  ~[cassandra.jar]
>        at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:89)
>  ~[cassandra.jar]
>        at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:120)
>  ~[cassandra.jar]
>        at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:88)
>  ~[cassandra.jar]
>        at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:177)
>  ~[cassandra.jar]
>        at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>  ~[cassandra.jar]
>        at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>  ~[cassandra.jar]
>        at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>  ~[cassandra.jar]
>        at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:39)
>  ~[cassandra.jar]
>        at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$FileStreamTask.run(StreamingMultiplexedChannel.java:311)
>  [cassandra.jar]
>        at 
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96) 
> [cassandra.jar]
>        at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) 
> [cassandra.jar]
>        at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) 
> [cassandra.jar]
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>        at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-all-4.1.58.Final.jar:4.1.58.Final]
>        at java.lang.Thread.run(Thread.java:829) [?:?]
>        Suppressed: java.nio.channels.ClosedChannelException
>                at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.doFlush(AsyncStreamingOutputPlus.java:82)
>  ~[cassandra.jar]
>                at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.flush(AsyncChannelOutputPlus.java:229)
>  ~[cassandra.jar]
>                at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.close(AsyncChannelOutputPlus.java:248)
>  ~[cassandra.jar]
>                at 
> org.apache.cassandra.streaming.async.NettyStreamingChannel$1.close(NettyStreamingChannel.java:141)
>  ~[cassandra.jar]
>                at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$FileStreamTask.run(StreamingMultiplexedChannel.java:312)
>  [cassandra.jar]
>                at 
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96) 
> [cassandra.jar]
>                at 
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) 
> [cassandra.jar]
>                at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) 
> [cassandra.jar]
>                at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>                at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>                at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-all-4.1.58.Final.jar:4.1.58.Final]
>                at java.lang.Thread.run(Thread.java:829) [?:?]
> Caused by: io.netty.channel.unix.Errors$NativeIoException: 
> writevAddresses(..) failed: Connection timed out
> {code}
> Running a debug build with extra logging added to 
> org.apache.cassandra.streaming.StreamSession#progress shows long periods 
> where the streaming task does not log any progress events of up to 15 mins in 
> duration. After these stalls the next attempt to write hits EOF.
> Example caught by extra logging in 
> org.apache.cassandra.streaming.StreamSession#progress
> {code:java}
> INFO  2022-12-07T15:10:56,149 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:1082 - Got stream progress event 
> /datadir/ks/tbl-xxx/ks-tbl-nb-87763-big-Data.db 22216704/25673094 bytes (86%) 
> s
> ent to idx:0/1.2.3.4 for file /datadir/ks/tbl-xxx/ks-tbl-nb-87763-big-Data.db 
> bytes 22216704 out of total 25673094
> # ... 15min delay during which other streams can be seen transmitting to the 
> same and other hosts
> INFO  2022-12-07T15:25:29,614 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:1082 - Got stream progress event 
> /datadir/ks/tbl-xxx/ks-tbl-nb-87763-big-Data.db 22282240/25673094 bytes (86%) 
> sent to idx:0/1.2.3.4 for file 
> /datadir/ks/tbl-xxx/ks-tbl-nb-87763-big-Data.db bytes 22282240 out of total 
> 25673094
> INFO  2022-12-07T15:25:29,615 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:1082 - Got stream progress event 
> /datadir/ks/tbl-xxx/ks-tbl-nb-87763-big-Data.db 22347776/25673094 bytes (87%) 
> sent to idx:0/1.2.3.4 for file 
> /datadir/ks/tbl-xxx/ks-tbl-nb-87763-big-Data.db bytes 22347776 out of total 
> 25673094
> INFO  2022-12-07T15:25:29,626 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:1082 - Got stream progress event 
> /datadir/ks/tbl-xxx/ks-tbl-nb-87763-big-Data.db 23855104/25673094 bytes (92%) 
> sent to idx:0/1.2.3.4 for file 
> /datadir/ks/tbl-xxx/ks-tbl-nb-87763-big-Data.db bytes 23855104 out of total 
> 25673094
> INFO  2022-12-07T15:25:29,627 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:1082 - Got stream progress event 
> /datadir/ks/tbl-xxx/ks-tbl-nb-87763-big-Data.db 23920640/25673094 bytes (93%) 
> sent to idx:0/1.2.3.4 for file 
> /datadir/ks/tbl-xxx/ks-tbl-nb-87763-big-Data.db bytes 23920640 out of total 
> 25673094
> # ...12min delay while other streams continue to transmit
> INFO  2022-12-07T15:37:29,775 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:1082 - Got stream progress event 
> /datadir/ks/tbl-xxx/ks-tbl-nb-87763-big-Data.db 23986176/25673094 bytes (93%) 
> sent to idx:0/1.2.3.4 for file 
> /datadir/ks/tbl-xxx/ks-tbl-nb-87763-big-Data.db bytes 23986176 out of total 
> 25673094
> {code}
> The issue has not succesfully been reproduced in other clusters or in a 
> dtest, but is reliable reproducible on the cluster. The hosts being used are 
> relatively large and have an available CPU count of 32 causing the streaming 
> system to allow up to 32 concurrent streams.
> [https://github.com/apache/cassandra/blob/81c616826ab3c8aa96467771d0e074a874efdd77/src/java/org/apache/cassandra/streaming/async/StreamingMultiplexedChannel.java#L93]
> One possible explanation is that the stream rate limiter is not being fair 
> and blocking sends long enough for TCP user timeout to fire and cause the 
> kernel to close the socket which is detected on the next write, another is 
> something is taking unusually long flushing the socket in 
> AsyncChannelOutputPlus.beginFlush.
> Restarting the instances with the streaming TCP user timeout disabled on the 
> cluster (set to zero in the boostrapping instances config, and updated with 
> JMX on the other instances) allowed the host-replacements to complete 
> bootstrap successfully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to