lidavidm commented on pull request #12252: URL: https://github.com/apache/arrow/pull/12252#issuecomment-1021485326
Basically you are getting "very unlucky" since ConsumingSinkNode does not serialize calls to Consume and all data in batches is ordered by the partition column, so what you get is that each batch gets partitioned, and then the threads _happen_ to line up such that each thread writes out the batch for partition A, then for partition B, ... and so on, so even if only one file can be open at a time, we don't ever need to open a second file for a partition. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
