Dandandan opened a new pull request #9214:
URL: https://github.com/apache/arrow/pull/9214


   I think the feature to be able to repartition an in memory table is useful, 
as the repartitioning only needs to be applied once, and it's also quite cheap. 
This can be very useful for in-memory analytics.
   
   The speed up from repartitioning is very big (mainly on aggregates), on my 
(8-core machine): 6-7x on query 1 and 12 versus a single partition, a bit less 
of a difference on query 5 when using 16 partitions and has very high cpu 
utilization.
   
   @jorgecarleitao maybe this is of interest to you, as you mentioned you are 
looking into multi-threading. I think this would be a "high level" way to get 
more parallelism. I think in some optimizer rules and/or dynamically we can do 
repartitions, similar to what's described here 
https://issues.apache.org/jira/browse/ARROW-9464


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to