rtpsw commented on PR #36499: URL: https://github.com/apache/arrow/pull/36499#issuecomment-1623678477
>Is KeyHasher thread-safe? AFAIU, key_hasher_ can be used from two threads at a time... GetKey calling key_hasher_->HashesFor and Push calling key_hasher_->Invalidate. [This doc](https://github.com/apache/arrow/blob/fd949f8fb416025dc9d77c035c1cac5f45ed1b94/cpp/src/arrow/acero/asof_join_node.cc#L493-L494) addresses thread-safety of `Invalidate` and `HashesFor`. > I also don't understand why you call Invalidate explicitly, while the KeyHasher is supposed to invalidate automatically. Presumably, you mean [this invocation](https://github.com/apache/arrow/blob/fd949f8fb416025dc9d77c035c1cac5f45ed1b94/cpp/src/arrow/acero/asof_join_node.cc#L502) of `Invalidate`. Its for safety - it keeps the KeyHasher invalid if any code (now or in the future) fails between that point and [the clean return](https://github.com/apache/arrow/blob/fd949f8fb416025dc9d77c035c1cac5f45ed1b94/cpp/src/arrow/acero/asof_join_node.cc#L518-L519). I suspect that with the explicit invalidation in place, we could just have `if (batch)` [here](https://github.com/apache/arrow/blob/fd949f8fb416025dc9d77c035c1cac5f45ed1b94/cpp/src/arrow/acero/asof_join_node.cc#L499). I could look into this separately. Given past experience, I'd prefer to make small steps when dealing with as-of-join-node's concurrency. > However, I see that KeyHasher is storing a raw RecordBatch pointer, which is problematic. If the previous std::shared_ptr<RecordBatch> was destroyed, another std::shared_ptr<RecordBatch> could re-allocate the same underlying RecordBatch pointer, which is unlikely but not impossible. The explicit invalidation was introduced to resolve exactly this case, which was observed on macOS. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
