rtpsw commented on code in PR #13880:
URL: https://github.com/apache/arrow/pull/13880#discussion_r954636007


##########
cpp/src/arrow/compute/exec/asof_join_node.cc:
##########
@@ -294,10 +452,22 @@ class InputState {
   // Index of the time col
   col_index_t time_col_index_;
   // Index of the key col
-  col_index_t key_col_index_;
+  vec_col_index_t key_col_index_;
+  // Type id of the time column
+  Type::type time_type_id_;
+  // Type id of the key column
+  std::vector<Type::type> key_type_id_;
+  // Hasher for key elements
+  mutable KeyHasher* key_hasher_;
+  // True if hashing is mandatory
+  bool must_hash_;
+  // True if null by-key values are expected
+  bool nullable_by_key_;

Review Comment:
   As implemented now, the faster path just applies `norm_value` to an integral 
value, while assuming it is not null, so it cannot work for a nullable column. 
It's possible to implement the fast path differently, such that the null bit is 
accounted for, by normalizing to a wider type (e.g., a 32-bit nullable value 
would be normalized to a 64-bit value); this would require special handling for 
64-bit values (e.g., normalizing to a yet wider value, or resorting to hashing) 
and I opted to go with the current implementation. Let me know if you feel 
strongly about avoiding a nullable option being exposed to the user.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to