rtpsw commented on PR #36499:
URL: https://github.com/apache/arrow/pull/36499#issuecomment-1625688058

   >  I would expect when the processing thread invokes AdvanceAndMemoize, it 
would find the batch_ is NULLPTR
   
   This case doesn't always occur though it is indeed the common one. It occurs 
when the input-receiving thread invalidates the key hasher while the processing 
thread is _not_ executing `HashesFor` concurrently. In this case, the 
invalidation sets `KeyHasher::batch_` to `nullptr`, and the processing threads 
finds this `nullptr` in a later invocation of  `HashesFor`. 
   
   The other case (also [explained 
here](https://github.com/apache/arrow/pull/36499#discussion_r1255982569)) is 
when the processing thread is executing `HashesFor` concurrently with the 
input-receiving thread executing `Invalidate`. In this case, 
`KeyHasher::batch_` may be set to `nullptr` first by the invalidation and then 
set to the batch passed to `HashesFor`, near the end of this function.
   
   So, `AdvanceAndMemoize` may find either a `nullptr` or some batch in 
`KeyHasher::batch_`.
   
   > What do you mean by " finds hashes that are incorrect for the above batch" 
here?
   
   I meant a key hasher cache miss, i.e., that the condition in [this 
if-statement](https://github.com/apache/arrow/blob/fd949f8fb416025dc9d77c035c1cac5f45ed1b94/cpp/src/arrow/acero/asof_join_node.cc#L499)
 evaluates to `false`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to