ic4y commented on pull request #1520:
URL: 
https://github.com/apache/arrow-datafusion/pull/1520#issuecomment-1004941980


   From
   ```rust
   struct Accumulators {
   
       map: RawTable<(u64, usize)>,
   
       group_states: Vec<GroupState>,
   }
   ```
   To
   ```rust
   struct Accumulators {
   
       map: RawTable<(u64, usize)>,
   
       group_states:BumpVec<GroupState>,
   }
   ```
   
   By using bumpalo to allocate memory for group_states, the time to destruct 
group_states can be greatly reduced in the case of high cardinality, and the 
time consumption of destructuring group_states is almost not counted in pprf
   
   
   The total test data is 350 million, and the deduplication number of user_id 
is 50 million。
   `sql : select count(1) from (select user_id from event group by user_id)a`
   
   **master:**
        drop_in_place<GroupState>   takes 6s(50%)  ,total  14s
        
   
![image](https://user-images.githubusercontent.com/83933160/148085458-434bf55e-f6d4-45d7-8c59-e12cb4479a7b.png)
   
   **bumpalo:**
           drop_in_place<GroupState>   takes 0s(not counted)  ,total  8s(40% 
increase)
   
![image](https://user-images.githubusercontent.com/83933160/148085572-441104d5-0b90-4959-9da3-7c37d5e0efdd.png)
   
   Under the TPC-H benchmark test, there is almost no difference. I think the 
reason is that the grouping base is not high enough.
   
![image](https://user-images.githubusercontent.com/83933160/148085917-c4439fd5-2fad-486e-a8b8-a09d3beb98c8.png)
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to