2010YOUY01 commented on issue #14078: URL: https://github.com/apache/datafusion/issues/14078#issuecomment-2585043689
Although we're currently spilling column-wise record batches, I think this will change to row-wise batches in the future. It would be better to benchmark and optimize spilling the Arrow Row format in this issue as well. The reason is that the spilling operation involves sorting, spilling sorted runs, and reading back those runs for merging. Both sorting and merging benefit from the row format. The current implementation performs several unnecessary conversions between row and column formats, which could become inefficient. The preferred way should be: 1. Convert to row format and do sorting 2. Maintaining the row format until the final output (I believe `Sort` will benefit more from it, because it will do 2 phase or merging, `Aggregate` will only do 1 phase) This is tracked by https://github.com/apache/datafusion/issues/7053 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org