rtpsw commented on PR #36499: URL: https://github.com/apache/arrow/pull/36499#issuecomment-1625688058
> I would expect when the processing thread invokes AdvanceAndMemoize, it would find the batch_ is NULLPTR This case doesn't always occur though it is indeed the common one. It occurs when the input-receiving thread invalidates the key hasher while the processing thread is _not_ executing `HashesFor` concurrently. In this case, the invalidation sets `KeyHasher::batch_` to `nullptr`, and the processing threads finds this `nullptr` in a later invocation of `HashesFor`. The other case (also [explained here](https://github.com/apache/arrow/pull/36499#discussion_r1255982569)) is when the processing thread is executing `HashesFor` concurrently with the input-receiving thread executing `Invalidate`. In this case, `KeyHasher::batch_` may be set to `nullptr` first by the invalidation and then set to the batch passed to `HashesFor`, near the end of this function. So, `AdvanceAndMemoize` may find either a `nullptr` or some batch in `KeyHasher::batch_`. > What do you mean by " finds hashes that are incorrect for the above batch" here? I meant a key hasher cache miss, i.e., that the condition in [this if-statement](https://github.com/apache/arrow/blob/fd949f8fb416025dc9d77c035c1cac5f45ed1b94/cpp/src/arrow/acero/asof_join_node.cc#L499) evaluates to `false`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org