Weston Pace created ARROW-14164:
-----------------------------------

             Summary: [C++][Dataset] Enhance dataset writer to allow 
file-per-batch
                 Key: ARROW-14164
                 URL: https://issues.apache.org/jira/browse/ARROW-14164
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


The dataset writer currently groups incoming batches into large files which are 
controlled by max_rows_per_file.  In the PR for this work [~jorisvandenbossche] 
recommended an option where each incoming batch creates a new file.

This would give the user fine grained control over how many files are created 
(provided they are doing a very basic scan/filter/project and not using any 
more sophisticated nodes which may resize batches.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to