Dandandan edited a comment on issue #418:
URL: 
https://github.com/apache/arrow-datafusion/issues/418#issuecomment-850491025


   @ravlio 
   
   You are very right, that part is suboptimal.
   Also in hash aggregates there are a couple of other things:
   
   * Keys are created by row and indexed by ` Vec<u8>`. Each `Vec<u8>`  value 
will be hashed, but is also rehashed as the hash table grows (rusts hashmap 
doesn' t remember hash values by default, so it has to compute it again every 
time the capacity of the table is exhausted).
   We should do something similar (vectorized hashing) in the hash aggregates 
as we do in hash join. There the `create_hashes` function generates `u64` hash 
values for each row in one simple loop without downcasting / converting per 
row. The challenging part of this is detecting hash collisions and making sure 
that that is fast too.
   
   * Hash aggregates with small groups generate lot of small allocations and 
dynamically typed scalar values and trait objects for the aggregate states. 
Also in some parts, intermediate `Vec`s are allocated which also show up in 
profiles. 
   I've addressed some performance issues in different parts earlier, however 
there are still large gains possible, mainly for some more challenging queries. 
I think what would be bes in the long run is building a mutable typed array 
based for the aggregation states, and keeping only the *offsets* to that array 
in a hash table structure. This would allow for vectorized updating the states 
like this (pseudo code):
   
   ```rust
   // sum
   for (offset, val) in batch_offsets.chain(batch_vals) {
      states[offset] += val;
   }
   // count
   for offset in batch_offsets {
      states[offset] += 1;
   }
   // avg 
   for (offset, val) in batch_offsets.chain(batch_vals) {
      state_avg_sum[offset] += val;
      state_avg_count[offset] += 1;
   }
   ```
   
   After the hash aggregation - the arrays don't need any further processing as 
all of the `offset` values are already increasing for each row. Only the avg 
here needs to run a `divide`  at the end, which is fast as we can use the arrow 
kernel directly.
    (currently the group by values are converted to arrays again, which again 
takes time).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to