robtandy commented on issue #46: URL: https://github.com/apache/datafusion-ray/issues/46#issuecomment-2596051911
The current implementation i have streams object references out of each partition through an intermediate Actor to hold the queues. Then subsequent stages connect to the intermediate actor and read streaming object references from the queue. They then have to `ray.get()` on the references to materialize the IPC encoded batch. I was really hoping for more performance and scale out of this approach, and I particularly like that it reuses the RepartitionExec's as implemented in DataFusion. But alas. My work isn't documented well yet but if you have time to discuss it sync, I'd appreciate your eyes on it and how we can improve it. Let me know if you have time to pair program for a bit so I can demo. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
