[ 
https://issues.apache.org/jira/browse/ARROW-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270940#comment-17270940
 ] 

Laurent commented on ARROW-11120:
---------------------------------

I looked briefly into it and the issue might be caused by a combination of what 
the API in the R arrow package and R's performance when creating many R6 
objects objects.

The R constructor for ChunkedArray expects a list of Array objects.  In my 
example a ChunkedArray has ~2200 chunks. Getting R to build that many dummy 
Array objects (`arrow::Array$create(1)`) takes over half a second. If I 
multiply this by 18 (number of columns in my tables) have slightly over 10 
seconds (almost half of the 24 seconds observed).

It feels like a pair of functions `pyarrow.ChunkedArray._export_to_c() ` and 
`arrow:::ImportChunkedArray()` would be needed.

> [Python][R] Prove out plumbing to pass data between Python and R using rpy2
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-11120
>                 URL: https://issues.apache.org/jira/browse/ARROW-11120
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python, R
>            Reporter: Wes McKinney
>            Priority: Major
>
> Per discussion on the mailing list, we should see what is required (if 
> anything) to be able to pass data structures using the C interface between 
> Python and R from the perspective of the Python user using rpy2. rpy2 is sort 
> of the Python version of reticulate. Unit tests will then validate that it's 
> working



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to