zanmato1984 commented on code in PR #45515:
URL: https://github.com/apache/arrow/pull/45515#discussion_r1954772398
##########
cpp/src/arrow/compute/key_map_internal.h:
##########
@@ -81,31 +81,70 @@ class ARROW_EXPORT SwissTable {
void num_inserted(uint32_t i) { num_inserted_ = i; }
- uint8_t* blocks() const { return blocks_->mutable_data(); }
-
uint32_t* hashes() const {
return reinterpret_cast<uint32_t*>(hashes_->mutable_data());
}
/// \brief Extract group id for a given slot in a given block.
///
- inline uint64_t extract_group_id(const uint8_t* block_ptr, int slot,
- uint64_t group_id_mask) const;
+ static uint32_t extract_group_id(const uint8_t* block_ptr, int local_slot,
+ int num_group_id_bits) {
+ // Extract group id using aligned 32-bit read.
Review Comment:
For the record, there are three places doing group id extraction:
1) Here, extracting single group id, publicly used by swiss join: currently
using aligned read + masking;
2) `extract_group_ids`, extracting a vector of group ids, internally used:
using aligned read w/o masking (the number of bits is constant-ized as template
parameter);
3) `grow_double`, extracting single group id inside a big loop, inlined:
using unaligned read + masking.
I think we should at least keep 2) as is because it makes perfect sense. 1)
and 3) can be unified, either aligned or unaligned.
What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]