[GitHub] [arrow-datafusion] jon-chuang commented on issue #1221: Task assignment between Scheduler and Executors

GitBox Sun, 14 Nov 2021 11:34:37 -0800


jon-chuang commented on issue #1221:
URL: 
https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-968350452



   Regarding shuffling, I saw in some benchmarks for [TiDB's distributed query 
engine](https://www.youtube.com/watch?v=mmzoSkEhYrA) (incidentally also relying 
on columnar storage) that an MPP style shuffle seemed to produce better results 
than map reduce style of Apache Spark. I think there are some open questions, 
such as whether Java could be the cause of this discrepancy. But maybe it's 
also worth thinking about how to optimize the shuffles.
   
   I don't know enough about DataFusion to know if it takes into account data 
movement when generating query plans.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] jon-chuang commented on issue #1221: Task assignment between Scheduler and Executors

Reply via email to