[ 
https://issues.apache.org/jira/browse/ARROW-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292859#comment-17292859
 ] 

Romain Francois commented on ARROW-9293:
----------------------------------------

Assuming this comes after [https://github.com/apache/arrow/pull/8650] it boils 
down to vec_to_arrow() accepting some sort of chunking policy, and in turn 
means that the converter api needs something too (this is essentially 
https://issues.apache.org/jira/browse/ARROW-5628)

The api we use now in the converter api goes through: 

    Status Extend(SEXP x, int64_t size) override;

which means ingest x and btw it has this number of elements. We need some way 
to be able to express "ingest that range of elements from x". The Chunker 
class, at least in its current form does not help. 

> [R] Add chunk_size to Table$create()
> ------------------------------------
>
>                 Key: ARROW-9293
>                 URL: https://issues.apache.org/jira/browse/ARROW-9293
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Neal Richardson
>            Assignee: Romain Francois
>            Priority: Major
>             Fix For: 4.0.0
>
>
> While working on ARROW-3308, I noticed that write_feather has a chunk_size 
> argument, which by default will write batches of 64k rows into the file. In 
> principle, a chunking strategy like this would prevent the need to bump up to 
> large_utf8 when ingesting a large character vector because you'd end up with 
> many chunks that each fit into a regular utf8 type. However, the way the 
> function works, the data.frame is converted to a Table with all ChunkedArrays 
> containing a single chunk first, which is where the large_utf8 type gets set. 
> But if Table$create() could be instructed to make multiple chunks, this would 
> be resolved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to