[GitHub] [arrow] lidavidm commented on pull request #12252: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files

GitBox Tue, 25 Jan 2022 10:29:17 -0800


lidavidm commented on pull request #12252:
URL: https://github.com/apache/arrow/pull/12252#issuecomment-1021485326



   Basically you are getting "very unlucky" since ConsumingSinkNode does not 
serialize calls to Consume and all data in batches is ordered by the partition 
column, so what you get is that each batch gets partitioned, and then the 
threads _happen_ to line up such that each thread writes out the batch for 
partition A, then for partition B, ... and so on, so even if only one file can 
be open at a time, we don't ever need to open a second file for a partition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] lidavidm commented on pull request #12252: ARROW-15438: [Python] Flaky test test_write_dataset_max_open_files

Reply via email to