github-actions[bot] commented on code in PR #62056:
URL: https://github.com/apache/doris/pull/62056#discussion_r3027854713
##########
be/src/exec/operator/hashjoin_build_sink.cpp:
##########
@@ -840,12 +851,20 @@ Status HashJoinBuildSinkOperatorX::sink(RuntimeState*
state, Block* in_block, bo
RETURN_IF_ERROR(local_state.build_asof_index(*local_state._shared_state->build_block));
local_state.init_short_circuit_for_probe();
} else if (!local_state._should_build_hash_table) {
- // the instance which is not build hash table, it's should wait the
signal of hash table build finished.
- // but if it's running and signaled == false, maybe the source
operator have closed caused by some short circuit
- // return eof will make task marked as wake_up_early
- // todo: remove signaled after we can guarantee that wake up eraly is
always set accurately
- if (!_signaled || local_state._terminated) {
- return Status::Error<ErrorCode::END_OF_FILE>("source have closed");
+ // The non-builder instance waits for the builder (task 0) to finish
building the hash table.
+ // If _signaled is false, either the builder hasn't finished yet, or
the builder was
+ // terminated (woken up early) without building the hash table — in
both cases, return EOF.
+ //
+ // The close() Defer in the builder conditionally sets _signaled=true
ONLY when the builder
+ // was NOT terminated (i.e., the hash table was actually built). When
the builder is
+ // terminated, _signaled stays false, so non-builders always hit this
guard and return EOF
+ // safely — never reaching the std::visit on an uninitialized
(monostate) hash table.
+ //
+ // At this point, termination is reflected solely through the value of
_signaled: a
+ // terminated builder never sets _signaled to true. Checking
!_signaled is therefore
+ // sufficient and serves as the real guard against racing with an
uninitialized hash table.
+ if (!_signaled) {
Review Comment:
Please add coverage for this interleaving. The bug only reproduces when
`enable_share_hash_table_for_broadcast_join=true`, task 0 is woken up early,
and another build task reaches this `!_signaled` guard before trying to reuse
the shared hash table. I could not find a new test for that path:
`regression-test/suites/query_p0/join/test_terminate.groovy` exercises a
shuffle join, and the existing `hashjoin_build_sink` BE tests only run
`task_idx == 0`. Since this is a pipeline/kernel race fix, AGENTS.md requires a
focused regression or unit test here so the crash does not silently come back
in a later refactor.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]