robtandy commented on issue #46: URL: https://github.com/apache/datafusion-ray/issues/46#issuecomment-2596051911
The current implementation i have streams object references out of each partition through an intermediate Actor to hold the queues. Then subsequent stages connect to the intermediate actor and read streaming object references from the queue. They then have to `ray.get()` on the references to materialize the IPC encoded batch. I was really hoping for more performance and scale out of this approach, and I particularly like that it reuses the RepartitionExec's as implemented in DataFusion. But alas. My work isn't documented well yet but if you have time to discuss it sync, I'd appreciate your eyes on it and how we can improve it. Let me know if you have time to pair program for a bit so I can demo. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org