[ 
https://issues.apache.org/jira/browse/ARROW-10439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403455#comment-17403455
 ] 

Weston Pace commented on ARROW-10439:
-------------------------------------

https://github.com/apache/arrow/pull/10955 (as part of ARROW-13650) adds a 
`max_rows_per_file` option.  Max bytes is a little trickier (table.nbytes is 
the in-memory size and I assume one would want the on-disk size) although 
doable (the file writer's should be able to keep track of how many bytes 
they've written but they don't do this today.)  I'd prefer to avoid max bytes 
unless someone has a need for it though.

> [C++][Dataset] Add max file size as a dataset writing option
> ------------------------------------------------------------
>
>                 Key: ARROW-10439
>                 URL: https://issues.apache.org/jira/browse/ARROW-10439
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 2.0.0
>            Reporter: Ben Kietzman
>            Assignee: Weston Pace
>            Priority: Major
>              Labels: beginner, dataset, query-engine
>             Fix For: 6.0.0
>
>
> This should be specified as a row limit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to