[jira] [Commented] (CASSANDRA-18110) Streaming fails during multiple concurrent host replacements

2023-01-07 Thread Abe Ratnofsky (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655745#comment-17655745
 ] 

Abe Ratnofsky commented on CASSANDRA-18110:
---

Left review with a few questions on GitHub

> Streaming fails during multiple concurrent host replacements
> 
>
> Key: CASSANDRA-18110
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18110
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jon Meredith
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.1.x
>
>
> Running four concurrent host replacements on a 4.1.0 development cluster has 
> repeatably failed to complete bootstrap with all four hosts failing bootsrrap 
> and staying in JOINING, logging the message.
> {code:java}
> ERROR 2022-12-07T21:15:48,860 [main] 
> org.apache.cassandra.service.StorageService:2019 - Error while waiting on 
> bootstrap to complete. Bootstrap will have to be restarted.
> {code}
> Bootstrap fails as the the FileStreamTasks on the streaming followers 
> encounter an EOF while transmitting the files.
> {code:java}
> ERROR 2022-12-07T15:49:39,164 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:718 - [Stream 
> #8d313690-7674-11ed-813f-95c261b64a82] Streaming error occurred on session 
> with peer 1.2.3.4:7000 through 1.2.3.4:40292
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:124)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:89)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:120)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:88)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:177)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:39)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$FileStreamTask.run(StreamingMultiplexedChannel.java:311)
>  [cassandra.jar]
>at 
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) 
> [cassandra.jar]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-all-4.1.58.Final.jar:4.1.58.Final]
>at java.lang.Thread.run(Thread.java:829) [?:?]
>Suppressed: java.nio.channels.ClosedChannelException
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.doFlush(AsyncStreamingOutputPlus.java:82)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.flush(AsyncChannelOutputPlus.java:229)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.close(AsyncChannelOutputPlus.java:248)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.async.NettyStreamingChannel$1.close(NettyStreamingChannel.java:141)
>  ~[cassandra.jar]

[jira] [Commented] (CASSANDRA-18110) Streaming fails during multiple concurrent host replacements

2023-01-06 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655595#comment-17655595
 ] 

David Capwell commented on CASSANDRA-18110:
---

Update: I added a feature flag to disable tracking, and made it so the costs 
are lower and more predictable (file progress does require a ObjectLongHashMap 
to track per-file progress)

> Streaming fails during multiple concurrent host replacements
> 
>
> Key: CASSANDRA-18110
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18110
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jon Meredith
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.1.x
>
>
> Running four concurrent host replacements on a 4.1.0 development cluster has 
> repeatably failed to complete bootstrap with all four hosts failing bootsrrap 
> and staying in JOINING, logging the message.
> {code:java}
> ERROR 2022-12-07T21:15:48,860 [main] 
> org.apache.cassandra.service.StorageService:2019 - Error while waiting on 
> bootstrap to complete. Bootstrap will have to be restarted.
> {code}
> Bootstrap fails as the the FileStreamTasks on the streaming followers 
> encounter an EOF while transmitting the files.
> {code:java}
> ERROR 2022-12-07T15:49:39,164 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:718 - [Stream 
> #8d313690-7674-11ed-813f-95c261b64a82] Streaming error occurred on session 
> with peer 1.2.3.4:7000 through 1.2.3.4:40292
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:124)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:89)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:120)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:88)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:177)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:39)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$FileStreamTask.run(StreamingMultiplexedChannel.java:311)
>  [cassandra.jar]
>at 
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) 
> [cassandra.jar]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-all-4.1.58.Final.jar:4.1.58.Final]
>at java.lang.Thread.run(Thread.java:829) [?:?]
>Suppressed: java.nio.channels.ClosedChannelException
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.doFlush(AsyncStreamingOutputPlus.java:82)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.flush(AsyncChannelOutputPlus.java:229)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.close(AsyncChannelOutputPlus.java:248)
>  ~[cassandra.jar]

[jira] [Commented] (CASSANDRA-18110) Streaming fails during multiple concurrent host replacements

2023-01-06 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655544#comment-17655544
 ] 

David Capwell commented on CASSANDRA-18110:
---

.bq I'm referring to the pacing of updates to the vtable data, not the 
streaming data. In 
org.apache.cassandra.streaming.StreamingState#handleStreamEvent, only update 
this.sessions on a FILE_PROGRESS event if it hasn't been updated in X millis. 
For other event types, update this.sessions right away.

We could, but that doesn't avoid the 15m to build a single event you and Jon 
saw.

.bq An even better solution would be to not re-create this.sessions but instead 
update it based on the deltas

yep, that's what I did, already sent out patch doing this!

> Streaming fails during multiple concurrent host replacements
> 
>
> Key: CASSANDRA-18110
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18110
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.x
>
>
> Running four concurrent host replacements on a 4.1.0 development cluster has 
> repeatably failed to complete bootstrap with all four hosts failing bootsrrap 
> and staying in JOINING, logging the message.
> {code:java}
> ERROR 2022-12-07T21:15:48,860 [main] 
> org.apache.cassandra.service.StorageService:2019 - Error while waiting on 
> bootstrap to complete. Bootstrap will have to be restarted.
> {code}
> Bootstrap fails as the the FileStreamTasks on the streaming followers 
> encounter an EOF while transmitting the files.
> {code:java}
> ERROR 2022-12-07T15:49:39,164 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:718 - [Stream 
> #8d313690-7674-11ed-813f-95c261b64a82] Streaming error occurred on session 
> with peer 1.2.3.4:7000 through 1.2.3.4:40292
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:124)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:89)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:120)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:88)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:177)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:39)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$FileStreamTask.run(StreamingMultiplexedChannel.java:311)
>  [cassandra.jar]
>at 
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) 
> [cassandra.jar]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-all-4.1.58.Final.jar:4.1.58.Final]
>at java.lang.Thread.run(Thread.java:829) [?:?]
>Suppressed: java.nio.channels.ClosedChannelException
>at 

[jira] [Commented] (CASSANDRA-18110) Streaming fails during multiple concurrent host replacements

2023-01-06 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655542#comment-17655542
 ] 

David Capwell commented on CASSANDRA-18110:
---

pushed update that avoids the double counting of prepare and file progress 
events, this memory cost for file progress is less than before this patch, but 
it no longer is just updating a counter (we need the delta between events)

> Streaming fails during multiple concurrent host replacements
> 
>
> Key: CASSANDRA-18110
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18110
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.x
>
>
> Running four concurrent host replacements on a 4.1.0 development cluster has 
> repeatably failed to complete bootstrap with all four hosts failing bootsrrap 
> and staying in JOINING, logging the message.
> {code:java}
> ERROR 2022-12-07T21:15:48,860 [main] 
> org.apache.cassandra.service.StorageService:2019 - Error while waiting on 
> bootstrap to complete. Bootstrap will have to be restarted.
> {code}
> Bootstrap fails as the the FileStreamTasks on the streaming followers 
> encounter an EOF while transmitting the files.
> {code:java}
> ERROR 2022-12-07T15:49:39,164 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:718 - [Stream 
> #8d313690-7674-11ed-813f-95c261b64a82] Streaming error occurred on session 
> with peer 1.2.3.4:7000 through 1.2.3.4:40292
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:124)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:89)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:120)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:88)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:177)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:39)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$FileStreamTask.run(StreamingMultiplexedChannel.java:311)
>  [cassandra.jar]
>at 
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) 
> [cassandra.jar]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-all-4.1.58.Final.jar:4.1.58.Final]
>at java.lang.Thread.run(Thread.java:829) [?:?]
>Suppressed: java.nio.channels.ClosedChannelException
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.doFlush(AsyncStreamingOutputPlus.java:82)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.flush(AsyncChannelOutputPlus.java:229)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.close(AsyncCha

[jira] [Commented] (CASSANDRA-18110) Streaming fails during multiple concurrent host replacements

2023-01-06 Thread Abe Ratnofsky (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655536#comment-17655536
 ] 

Abe Ratnofsky commented on CASSANDRA-18110:
---

> Changing the pacing could have side effects non of us are aware of...

I'm referring to the pacing of updates to the vtable data, not the streaming 
data. In org.apache.cassandra.streaming.StreamingState#handleStreamEvent, only 
update this.sessions on a FILE_PROGRESS event if it hasn't been updated in X 
millis. For other event types, update this.sessions right away.

This is my first thought at least - would reduce the time spent updating 
sessions when FILE_PROGRESS is frequent (which it should be). 

 

An even better solution would be to not re-create this.sessions but instead 
update it based on the deltas, since the incoming event should only update for 
a single ProgressInfo (peer, direction, file, etc) and the handler is already 
synchronized.

> Streaming fails during multiple concurrent host replacements
> 
>
> Key: CASSANDRA-18110
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18110
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.x
>
>
> Running four concurrent host replacements on a 4.1.0 development cluster has 
> repeatably failed to complete bootstrap with all four hosts failing bootsrrap 
> and staying in JOINING, logging the message.
> {code:java}
> ERROR 2022-12-07T21:15:48,860 [main] 
> org.apache.cassandra.service.StorageService:2019 - Error while waiting on 
> bootstrap to complete. Bootstrap will have to be restarted.
> {code}
> Bootstrap fails as the the FileStreamTasks on the streaming followers 
> encounter an EOF while transmitting the files.
> {code:java}
> ERROR 2022-12-07T15:49:39,164 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:718 - [Stream 
> #8d313690-7674-11ed-813f-95c261b64a82] Streaming error occurred on session 
> with peer 1.2.3.4:7000 through 1.2.3.4:40292
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:124)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:89)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:120)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:88)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:177)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:39)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$FileStreamTask.run(StreamingMultiplexedChannel.java:311)
>  [cassandra.jar]
>at 
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) 
> [cassandra.jar]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLo

[jira] [Commented] (CASSANDRA-18110) Streaming fails during multiple concurrent host replacements

2023-01-06 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655530#comment-17655530
 ] 

David Capwell commented on CASSANDRA-18110:
---

more fun, SessionPreparedEvent is seen twice; we double post the same event!

> Streaming fails during multiple concurrent host replacements
> 
>
> Key: CASSANDRA-18110
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18110
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.x
>
>
> Running four concurrent host replacements on a 4.1.0 development cluster has 
> repeatably failed to complete bootstrap with all four hosts failing bootsrrap 
> and staying in JOINING, logging the message.
> {code:java}
> ERROR 2022-12-07T21:15:48,860 [main] 
> org.apache.cassandra.service.StorageService:2019 - Error while waiting on 
> bootstrap to complete. Bootstrap will have to be restarted.
> {code}
> Bootstrap fails as the the FileStreamTasks on the streaming followers 
> encounter an EOF while transmitting the files.
> {code:java}
> ERROR 2022-12-07T15:49:39,164 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:718 - [Stream 
> #8d313690-7674-11ed-813f-95c261b64a82] Streaming error occurred on session 
> with peer 1.2.3.4:7000 through 1.2.3.4:40292
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:124)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:89)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:120)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:88)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:177)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:39)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$FileStreamTask.run(StreamingMultiplexedChannel.java:311)
>  [cassandra.jar]
>at 
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) 
> [cassandra.jar]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-all-4.1.58.Final.jar:4.1.58.Final]
>at java.lang.Thread.run(Thread.java:829) [?:?]
>Suppressed: java.nio.channels.ClosedChannelException
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.doFlush(AsyncStreamingOutputPlus.java:82)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.flush(AsyncChannelOutputPlus.java:229)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.close(AsyncChannelOutputPlus.java:248)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.async.NettyStreamingChannel$1.close(NettyStreamingChann

[jira] [Commented] (CASSANDRA-18110) Streaming fails during multiple concurrent host replacements

2023-01-06 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655525#comment-17655525
 ] 

David Capwell commented on CASSANDRA-18110:
---

sent out a PR, but seems my bytesReceived/bytesSent needs to be changed... this 
one is a bit more annoying to deal with 

> Streaming fails during multiple concurrent host replacements
> 
>
> Key: CASSANDRA-18110
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18110
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.x
>
>
> Running four concurrent host replacements on a 4.1.0 development cluster has 
> repeatably failed to complete bootstrap with all four hosts failing bootsrrap 
> and staying in JOINING, logging the message.
> {code:java}
> ERROR 2022-12-07T21:15:48,860 [main] 
> org.apache.cassandra.service.StorageService:2019 - Error while waiting on 
> bootstrap to complete. Bootstrap will have to be restarted.
> {code}
> Bootstrap fails as the the FileStreamTasks on the streaming followers 
> encounter an EOF while transmitting the files.
> {code:java}
> ERROR 2022-12-07T15:49:39,164 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:718 - [Stream 
> #8d313690-7674-11ed-813f-95c261b64a82] Streaming error occurred on session 
> with peer 1.2.3.4:7000 through 1.2.3.4:40292
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:124)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:89)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:120)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:88)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:177)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:39)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$FileStreamTask.run(StreamingMultiplexedChannel.java:311)
>  [cassandra.jar]
>at 
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) 
> [cassandra.jar]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-all-4.1.58.Final.jar:4.1.58.Final]
>at java.lang.Thread.run(Thread.java:829) [?:?]
>Suppressed: java.nio.channels.ClosedChannelException
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.doFlush(AsyncStreamingOutputPlus.java:82)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.flush(AsyncChannelOutputPlus.java:229)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.close(AsyncChannelOutputPlus.java:248)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.async.Nett

[jira] [Commented] (CASSANDRA-18110) Streaming fails during multiple concurrent host replacements

2023-01-04 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654713#comment-17654713
 ] 

David Capwell commented on CASSANDRA-18110:
---

bq. For #3 - wouldn't disabling tracking mean that the virtual table isn't 
usable?

Correct, this allows you to break the vtable if the feature causes things to 
not be stable.

bq. I'm also in favor of #4.

I am working on a patch doing both #3 and #4; so fix the issue by doing less 
work, and allow opt-out... 

bq. I was also thinking of changing the pacing (via debounce) for individual 
file progress events, so a progress event does not trigger handleStreamEvent if 
it was called within the last X millis.

I don't think we have enough people familiar with Streaming to make that 
safe... Even the vtable was a problem as no one is around who knows streaming!  
Changing the pacing could have side effects non of us are aware of... 

> Streaming fails during multiple concurrent host replacements
> 
>
> Key: CASSANDRA-18110
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18110
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.x
>
>
> Running four concurrent host replacements on a 4.1.0 development cluster has 
> repeatably failed to complete bootstrap with all four hosts failing bootsrrap 
> and staying in JOINING, logging the message.
> {code:java}
> ERROR 2022-12-07T21:15:48,860 [main] 
> org.apache.cassandra.service.StorageService:2019 - Error while waiting on 
> bootstrap to complete. Bootstrap will have to be restarted.
> {code}
> Bootstrap fails as the the FileStreamTasks on the streaming followers 
> encounter an EOF while transmitting the files.
> {code:java}
> ERROR 2022-12-07T15:49:39,164 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:718 - [Stream 
> #8d313690-7674-11ed-813f-95c261b64a82] Streaming error occurred on session 
> with peer 1.2.3.4:7000 through 1.2.3.4:40292
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:124)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:89)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:120)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:88)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:177)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:39)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$FileStreamTask.run(StreamingMultiplexedChannel.java:311)
>  [cassandra.jar]
>at 
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) 
> [cassandra.jar]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> 

[jira] [Commented] (CASSANDRA-18110) Streaming fails during multiple concurrent host replacements

2023-01-04 Thread Abe Ratnofsky (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654633#comment-17654633
 ] 

Abe Ratnofsky commented on CASSANDRA-18110:
---

For #3 - wouldn't disabling tracking mean that the virtual table isn't usable? 
This would put us in a situation where you can either use the vtable or stream 
successfully on larger clusters, which seems to defeat the purpose of the 
vtable.

 

I'm also in favor of #4.

 

I was also thinking of changing the pacing (via debounce) for individual file 
progress events, so a progress event does not trigger handleStreamEvent if it 
was called within the last X millis.

> Streaming fails during multiple concurrent host replacements
> 
>
> Key: CASSANDRA-18110
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18110
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.x
>
>
> Running four concurrent host replacements on a 4.1.0 development cluster has 
> repeatably failed to complete bootstrap with all four hosts failing bootsrrap 
> and staying in JOINING, logging the message.
> {code:java}
> ERROR 2022-12-07T21:15:48,860 [main] 
> org.apache.cassandra.service.StorageService:2019 - Error while waiting on 
> bootstrap to complete. Bootstrap will have to be restarted.
> {code}
> Bootstrap fails as the the FileStreamTasks on the streaming followers 
> encounter an EOF while transmitting the files.
> {code:java}
> ERROR 2022-12-07T15:49:39,164 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:718 - [Stream 
> #8d313690-7674-11ed-813f-95c261b64a82] Streaming error occurred on session 
> with peer 1.2.3.4:7000 through 1.2.3.4:40292
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:124)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:89)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:120)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:88)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:177)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:39)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$FileStreamTask.run(StreamingMultiplexedChannel.java:311)
>  [cassandra.jar]
>at 
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) 
> [cassandra.jar]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-all-4.1.58.Final.jar:4.1.58.Final]
>at java.lang.Thread.run(Thread.java:829) [?:?]
>Suppressed: java.nio.channels.ClosedChannelException
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.doFlush(AsyncStreamingOutputPlus.java

[jira] [Commented] (CASSANDRA-18110) Streaming fails during multiple concurrent host replacements

2023-01-04 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654630#comment-17654630
 ] 

David Capwell commented on CASSANDRA-18110:
---

[~jonmeredith] if you don't mind I can take this as you found the issue was 
caused by the vtable I worked on.

Looking at the code / stack trace the issue looks to be in 
org.apache.cassandra.streaming.StreamingState#handleStreamEvent the line 

{code}
sessions = Sessions.create(streamProgress.values());
{code}

The single usage of this field is 
org.apache.cassandra.db.virtual.StreamingVirtualTable#updateDataSet, which is 
only needed when the vtable is requested...

There are a few options I think we can take (can take multiple)

1) don't be eager and have 
org.apache.cassandra.streaming.StreamingState#sessions build on-demand from 
org.apache.cassandra.streaming.StreamingState#streamProgress
2) org.apache.cassandra.streaming.StreamingState#onSuccess and 
org.apache.cassandra.streaming.StreamingState#onFailure could build the 
sessions eagerly (like it does today), but push this logic to another thread
3) feature flag to allow tracking to be disabled (why I didn't do this in the 
first place)
4) Sessions is just received/sent so why do we need to compute using 100% of 
events?  can we not just use a counter?

Given that building 1 session can take 15m, then #1 just breaks the vtable... 
for this reason I am in favor of #3 and #4... ill look closer to see if I can 
write #4 and see if there are any tradeoffs I don't see yet

> Streaming fails during multiple concurrent host replacements
> 
>
> Key: CASSANDRA-18110
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18110
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.x
>
>
> Running four concurrent host replacements on a 4.1.0 development cluster has 
> repeatably failed to complete bootstrap with all four hosts failing bootsrrap 
> and staying in JOINING, logging the message.
> {code:java}
> ERROR 2022-12-07T21:15:48,860 [main] 
> org.apache.cassandra.service.StorageService:2019 - Error while waiting on 
> bootstrap to complete. Bootstrap will have to be restarted.
> {code}
> Bootstrap fails as the the FileStreamTasks on the streaming followers 
> encounter an EOF while transmitting the files.
> {code:java}
> ERROR 2022-12-07T15:49:39,164 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:718 - [Stream 
> #8d313690-7674-11ed-813f-95c261b64a82] Streaming error occurred on session 
> with peer 1.2.3.4:7000 through 1.2.3.4:40292
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:124)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:89)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:120)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:88)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:177)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:39)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$FileStreamTask.run(StreamingMultiplexedChannel.java:311)
>  [cassandra.jar]
> 

[jira] [Commented] (CASSANDRA-18110) Streaming fails during multiple concurrent host replacements

2023-01-03 Thread C. Scott Andreas (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654100#comment-17654100
 ] 

C. Scott Andreas commented on CASSANDRA-18110:
--

This one's a hall-of-famer for sure.

> Streaming fails during multiple concurrent host replacements
> 
>
> Key: CASSANDRA-18110
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18110
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1.x
>
>
> Running four concurrent host replacements on a 4.1.0 development cluster has 
> repeatably failed to complete bootstrap with all four hosts failing bootsrrap 
> and staying in JOINING, logging the message.
> {code:java}
> ERROR 2022-12-07T21:15:48,860 [main] 
> org.apache.cassandra.service.StorageService:2019 - Error while waiting on 
> bootstrap to complete. Bootstrap will have to be restarted.
> {code}
> Bootstrap fails as the the FileStreamTasks on the streaming followers 
> encounter an EOF while transmitting the files.
> {code:java}
> ERROR 2022-12-07T15:49:39,164 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:718 - [Stream 
> #8d313690-7674-11ed-813f-95c261b64a82] Streaming error occurred on session 
> with peer 1.2.3.4:7000 through 1.2.3.4:40292
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:124)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:89)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:120)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:88)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:177)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:39)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$FileStreamTask.run(StreamingMultiplexedChannel.java:311)
>  [cassandra.jar]
>at 
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) 
> [cassandra.jar]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-all-4.1.58.Final.jar:4.1.58.Final]
>at java.lang.Thread.run(Thread.java:829) [?:?]
>Suppressed: java.nio.channels.ClosedChannelException
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.doFlush(AsyncStreamingOutputPlus.java:82)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.flush(AsyncChannelOutputPlus.java:229)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.close(AsyncChannelOutputPlus.java:248)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.async.NettyStreamingChannel$1.close(NettyStreamingChannel.java:141)
>  ~[cassandra.jar]
>

[jira] [Commented] (CASSANDRA-18110) Streaming fails during multiple concurrent host replacements

2022-12-22 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651433#comment-17651433
 ] 

Jon Meredith commented on CASSANDRA-18110:
--

tl;dr an event listener added to collect data for the streaming status vtable 
is causing lock contention and starving the stream deserializer thread which 
causes a TCP user timeout on the sender that fails the stream.

Details - CASSANDRA-17390 exposed streaming status over a virtual table by 
listening to all streaming events with class {{StreamingState}}. It maintains a 
{{SessionInfo}} structure for each peer active in streaming, tracking a 
{{ProgressInfo}} for each streamed file that tracks transferred/total bytes for 
the file and updates it on every streaming event calling 
{{org.apache.cassandra.streaming.StreamingState.Sessions#create}}.

Streaming events include {{FILE_PROGRESS}} events generated on the 
{{StreamDeserializingTask}} for every section (64kb) read by 
{{CompressedStreamReader}}, and for every file components for 
{{CassandraEntireSSTableStreamReader}}. They happen frequently during heavy 
streaming activity.

In the event handler, {{Sessions.create}} recreates a summary 
{{org.apache.cassandra.streaming.StreamingState.Sessions}} object by iterating 
over all of the active session/files it knows about (multiple times) summing 
received/total for bytes and #files.

In one heap dump investigated the top three sessions has 5612, 1780 and 1528 
received files. Generating the summary takes longer as the bootstrap proceeds 
and  creates contention in the synchronized 
org.apache.cassandra.streaming.StreamResultFuture.fireStreamEvent method. 

Method synchronization is unfair and causes starvation for some of the 
{{StreamDeserializerTask}} threads that consume the {{AsyncStreamingInputPlus}} 
slowing calls to netty to read from the streaming channel in {{rebuffer}}.

On the unfairly scheduled threads, eventually socket reads slow down enough to 
keep the TCP receive queue above the high water mark for 300s preventing an ACK 
of pending bytes to the sender, which trips the TCP user timeout on the sender 
causing it to close it before completing the transfer and failing the stream.

Raising the streaming TCP user timeout or disabling it entirely is still the 
correct workaround for this issue until it can be resolved.
{code:java}
"Stream-Deserializer-/1.2.3.4:7000-deadbeef" #176 daemon prio=5 os_prio=0 
cpu=299135.87ms elapsed=8341.21s tid=0x7f25e2064a00 nid=0xe02f runnable  
[0x7f25b03a4000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.cassandra.streaming.SessionInfo.getTotalSizeInProgress(SessionInfo.java:172)
at 
org.apache.cassandra.streaming.SessionInfo.getTotalSizeReceived(SessionInfo.java:125)
at 
org.apache.cassandra.streaming.StreamingState$Sessions.create(StreamingState.java:379)
at 
org.apache.cassandra.streaming.StreamingState.handleStreamEvent(StreamingState.java:255)
- locked <0x000604ac7bf0> (a 
org.apache.cassandra.streaming.StreamingState)
at 
org.apache.cassandra.streaming.StreamResultFuture.fireStreamEvent(StreamResultFuture.java:218)
- locked <0x00060c3e57a0> (a 
org.apache.cassandra.streaming.StreamResultFuture)
at 
org.apache.cassandra.streaming.StreamResultFuture.handleProgress(StreamResultFuture.java:208)
at 
org.apache.cassandra.streaming.StreamSession.progress(StreamSession.java:1096)
at 
org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:96)
at 
org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:84)
- locked <0x00064a637510> (a 
org.apache.cassandra.db.streaming.CassandraIncomingFile)
at 
org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:55)
at 
org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:41)
at 
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:50)
at 
org.apache.cassandra.streaming.StreamDeserializingTask.run(StreamDeserializingTask.java:59)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(java.base@11.0.16/Thread.java:829)
{code}
other streaming threads blocked on the lock
{code:java}
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.cassandra.streaming.StreamResultFuture.fireStreamEvent(StreamResultFuture.java:214)
- waiting to lock <0x00060c3e57a0> (a 
org.apache.cassandra.streaming.StreamResultFuture)
at 
org.apache.cassandra.streaming.StreamResultFuture.handleProgress(StreamResultFuture.java:208)
at 
org.apache.cassandra.streaming.StreamSession.progress(StreamSession.java

[jira] [Commented] (CASSANDRA-18110) Streaming fails during multiple concurrent host replacements

2022-12-12 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646377#comment-17646377
 ] 

Jon Meredith commented on CASSANDRA-18110:
--

No success reproducing this issue in a test environment still.

Given that all attempts have failed to reproduce and have completed 
investigating our theories how this could happen without finding anything so 
far, this should not block the release.

Downgrading severity to Normal and will update this ticket with any future 
findings.

> Streaming fails during multiple concurrent host replacements
> 
>
> Key: CASSANDRA-18110
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18110
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
> Fix For: 4.1
>
>
> Running four concurrent host replacements on a 4.1.0 development cluster has 
> repeatably failed to complete bootstrap with all four hosts failing bootsrrap 
> and staying in JOINING, logging the message.
> {code:java}
> ERROR 2022-12-07T21:15:48,860 [main] 
> org.apache.cassandra.service.StorageService:2019 - Error while waiting on 
> bootstrap to complete. Bootstrap will have to be restarted.
> {code}
> Bootstrap fails as the the FileStreamTasks on the streaming followers 
> encounter an EOF while transmitting the files.
> {code:java}
> ERROR 2022-12-07T15:49:39,164 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:718 - [Stream 
> #8d313690-7674-11ed-813f-95c261b64a82] Streaming error occurred on session 
> with peer 1.2.3.4:7000 through 1.2.3.4:40292
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:124)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:89)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:120)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:88)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:177)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:39)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$FileStreamTask.run(StreamingMultiplexedChannel.java:311)
>  [cassandra.jar]
>at 
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) 
> [cassandra.jar]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-all-4.1.58.Final.jar:4.1.58.Final]
>at java.lang.Thread.run(Thread.java:829) [?:?]
>Suppressed: java.nio.channels.ClosedChannelException
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.doFlush(AsyncStreamingOutputPlus.java:82)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.flush(AsyncChannelOutputPlus.java:229)
> 

[jira] [Commented] (CASSANDRA-18110) Streaming fails during multiple concurrent host replacements

2022-12-11 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645812#comment-17645812
 ] 

Jon Meredith commented on CASSANDRA-18110:
--

To clarify the workaround, on any new nodes being started, ensure the config is 
set with
 
{code:java}
internode_streaming_tcp_user_timeout: 0s
{code}
and on existing nodes in the cluster, the timeout can be reset with JMX without 
restarting
{code:java}
org.apache.cassandra.db/StorageService/InternodeStreamingTcpUserTimeoutInMS{code}
before the bootstrap attempt.

> Streaming fails during multiple concurrent host replacements
> 
>
> Key: CASSANDRA-18110
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18110
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Urgent
> Fix For: 4.1
>
>
> Running four concurrent host replacements on a 4.1.0 development cluster has 
> repeatably failed to complete bootstrap with all four hosts failing bootsrrap 
> and staying in JOINING, logging the message.
> {code:java}
> ERROR 2022-12-07T21:15:48,860 [main] 
> org.apache.cassandra.service.StorageService:2019 - Error while waiting on 
> bootstrap to complete. Bootstrap will have to be restarted.
> {code}
> Bootstrap fails as the the FileStreamTasks on the streaming followers 
> encounter an EOF while transmitting the files.
> {code:java}
> ERROR 2022-12-07T15:49:39,164 [NettyStreaming-Outbound-/1.2.3.4.7000:2] 
> org.apache.cassandra.streaming.StreamSession:718 - [Stream 
> #8d313690-7674-11ed-813f-95c261b64a82] Streaming error occurred on session 
> with peer 1.2.3.4:7000 through 1.2.3.4:40292
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:124)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:89)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:120)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:88)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:177)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:39)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.streaming.async.StreamingMultiplexedChannel$FileStreamTask.run(StreamingMultiplexedChannel.java:311)
>  [cassandra.jar]
>at 
> org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) 
> [cassandra.jar]
>at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) 
> [cassandra.jar]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-all-4.1.58.Final.jar:4.1.58.Final]
>at java.lang.Thread.run(Thread.java:829) [?:?]
>Suppressed: java.nio.channels.ClosedChannelException
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.doFlush(AsyncStreamingOutputPlus.java:82)
>  ~[cassandra.jar]
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.flush(Async