Weston Pace created ARROW-14164: ----------------------------------- Summary: [C++][Dataset] Enhance dataset writer to allow file-per-batch Key: ARROW-14164 URL: https://issues.apache.org/jira/browse/ARROW-14164 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Weston Pace
The dataset writer currently groups incoming batches into large files which are controlled by max_rows_per_file. In the PR for this work [~jorisvandenbossche] recommended an option where each incoming batch creates a new file. This would give the user fine grained control over how many files are created (provided they are doing a very basic scan/filter/project and not using any more sophisticated nodes which may resize batches.) -- This message was sent by Atlassian Jira (v8.3.4#803005)