[ https://issues.apache.org/jira/browse/ARROW-10439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403455#comment-17403455 ]
Weston Pace commented on ARROW-10439: ------------------------------------- https://github.com/apache/arrow/pull/10955 (as part of ARROW-13650) adds a `max_rows_per_file` option. Max bytes is a little trickier (table.nbytes is the in-memory size and I assume one would want the on-disk size) although doable (the file writer's should be able to keep track of how many bytes they've written but they don't do this today.) I'd prefer to avoid max bytes unless someone has a need for it though. > [C++][Dataset] Add max file size as a dataset writing option > ------------------------------------------------------------ > > Key: ARROW-10439 > URL: https://issues.apache.org/jira/browse/ARROW-10439 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Affects Versions: 2.0.0 > Reporter: Ben Kietzman > Assignee: Weston Pace > Priority: Major > Labels: beginner, dataset, query-engine > Fix For: 6.0.0 > > > This should be specified as a row limit. -- This message was sent by Atlassian Jira (v8.3.4#803005)