[ 
https://issues.apache.org/jira/browse/ARROW-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270998#comment-17270998
 ] 

Laurent commented on ARROW-11120:
---------------------------------

For what is worth, calling the pyarrow table's `combine_chunks()` to cancel 
chunking results in significant performance improvements. Conversion takes 75ms 
instead of 24s after that.

Few comments about the API in relation with this:

- `pyarrow.lib.Table` has a method `combine_chunks()` but there does not seem 
to be a way to "re-chunk" (say go from 2,200 chunks to 10 chunks)
-  There is no apparent way to specify the number of chunk when creating the 
table from a dataset using `to_table()`: 
{code:python}
tbl = dataset.to_table(filter=ds.field('tip_amount') > 10)
{code}
The named argument `batch_size` does not appear to have any effect on the 
number of chunks.


> [Python][R] Prove out plumbing to pass data between Python and R using rpy2
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-11120
>                 URL: https://issues.apache.org/jira/browse/ARROW-11120
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python, R
>            Reporter: Wes McKinney
>            Priority: Major
>
> Per discussion on the mailing list, we should see what is required (if 
> anything) to be able to pass data structures using the C interface between 
> Python and R from the perspective of the Python user using rpy2. rpy2 is sort 
> of the Python version of reticulate. Unit tests will then validate that it's 
> working



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to