yjshen commented on pull request #1104: URL: https://github.com/apache/arrow-datafusion/pull/1104#issuecomment-946339508
> also cc @yjshen in case we missed any item needed from your native spark executor work. Thanks, @houqp. I think what I need most is covered by the `Resource Management` section. I'm working on prototyping a memory limit version of `SortExec` currently. On the Ballista side, I feel Broadcast join is great to add. Besides, we could have a sort-based shuffle writer for memory usage friendly and have a single map output file for each task to avoid creating too many small files when the output partition number is significant. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
