[ https://issues.apache.org/jira/browse/ARROW-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270998#comment-17270998 ]
Laurent commented on ARROW-11120: --------------------------------- For what is worth, calling the pyarrow table's `combine_chunks()` to cancel chunking results in significant performance improvements. Conversion takes 75ms instead of 24s after that. Few comments about the API in relation with this: - `pyarrow.lib.Table` has a method `combine_chunks()` but there does not seem to be a way to "re-chunk" (say go from 2,200 chunks to 10 chunks) - There is no apparent way to specify the number of chunk when creating the table from a dataset using `to_table()`: {code:python} tbl = dataset.to_table(filter=ds.field('tip_amount') > 10) {code} The named argument `batch_size` does not appear to have any effect on the number of chunks. > [Python][R] Prove out plumbing to pass data between Python and R using rpy2 > --------------------------------------------------------------------------- > > Key: ARROW-11120 > URL: https://issues.apache.org/jira/browse/ARROW-11120 > Project: Apache Arrow > Issue Type: Improvement > Components: Python, R > Reporter: Wes McKinney > Priority: Major > > Per discussion on the mailing list, we should see what is required (if > anything) to be able to pass data structures using the C interface between > Python and R from the perspective of the Python user using rpy2. rpy2 is sort > of the Python version of reticulate. Unit tests will then validate that it's > working -- This message was sent by Atlassian Jira (v8.3.4#803005)