[ 
https://issues.apache.org/jira/browse/CASSANDRA-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17079258#comment-17079258
 ] 

Sergio Bossa commented on CASSANDRA-15667:
------------------------------------------

Thanks [~e.dimitrova] for chiming in.
{quote}From what I recall the bootstrap was sometimes completing too fast 
before the streaming is really interrupted from the byteman code and we didn't 
really have an instrument to control that.
{quote}
I went through the "resumable bootstrap" test, and unfortunately I don't see 
how the bootstrap could ever complete before the byteman script is invoked: 
this is because such script [makes node 1 fail before it starts to stream 
files|[https://github.com/ekaterinadimitrova2/cassandra-dtest/blob/b56887d67c353d6d69cd60cfd74859405fa37685/byteman/4.0/stream_failure.btm#L10]],
 which means there's no way for node 3 to finish bootstrapping before it 
received all files from both nodes, which will never happen due to said script 
causing node 1 to fail.

So why did the test fail?

I believe that's because of this issue: in other words, node 3 was correctly 
seeing its streaming session completed (after node 1 finished streaming with an 
error) but *not* failed; this is because the "completed" state is read through 
the actual session state, while the "failed" state is read through the 
{{SessionInfo}} state, which is what we're fixing here.

That said, I would propose to still re-introduce the original 
{{resumable_bootstrap_test}}, because it's an important enough feature to 
deserve its own test, and it uses 3 nodes which increases the chances of 
detecting errors/races.

Thoughts?

> StreamResultFuture check for completeness is inconsistent, leading to races
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15667
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15667
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Streaming and Messaging
>            Reporter: Sergio Bossa
>            Assignee: Massimiliano Tomassi
>            Priority: Normal
>             Fix For: 4.0
>
>
> {{StreamResultFuture#maybeComplete()}} uses 
> {{StreamCoordinator#hasActiveSessions()}} to determine if all sessions are 
> completed, but then accesses each session state via 
> {{StreamCoordinator#getAllSessionInfo()}}: this is inconsistent, as the 
> former relies on the actual {{StreamSession}} state, while the latter on the 
> {{SessionInfo}} state, and the two are concurrently updated with no 
> coordination whatsoever.
> This leads to races, i.e. apparent in some dtest spurious failures, such as 
> {{TestBootstrap.resumable_bootstrap_test}} in CASSANDRA-15614 cc 
> [~e.dimitrova].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to