ozankabak commented on issue #5230: URL: https://github.com/apache/arrow-datafusion/issues/5230#issuecomment-1454874706
@jaylmiller, I haven't studied the sort code yet, so I'd like to ask a few quick questions to further my understanding first. Let's say we have `P` partitions, each having `N` rows in total (across all batches). Let's say the batch size is `B`. When we have `preserve_partitioning`, is it accurate to say we do the following? 1. Coalesce batches (for every partition independently) since sort needs to operate on the whole data. If so, we would end up with `P` datasets of size `N`. 2. Perform `P` row conversions and `P` sorts on these `N`-long datasets. Is there a third output-related step I'm missing? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org