pantShrey commented on PR #21882:
URL: https://github.com/apache/datafusion/pull/21882#issuecomment-4415353823

   @alamb Thank you so much for the review! I scoped out the SortMergeJoin 
migration today, specifically looking at bitwise_stream.rs and 
process_key_match_with_filter, to see what it would take.
   
   Because SortMergeJoin currently reads from the spill file via a synchronous 
for loop inside a hand-rolled poll state machine, making the read path truly 
async requires a major rewrite. We can't just .await the stream, so we may need 
to store the SendableRecordBatchStream in the execution state and manually 
persist variables like matched_count across Poll::Pending yields.
   
   Because ParadeDB is hoping to unblock their Postgres integration next week, 
I'm worried a state machine rewrite of this scale will stall them.
   
   Would you be open to merging this core abstraction first (with 
open_sync_reader marked as #[deprecated])? I can open a dedicated tracking 
issue for the SortMergeJoin async migration and tackle it as a fast follow-up 
PR.
   
   I am happy to defer to your judgment if you feel the tech debt must be 
addressed first!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to