James Lamb created ARROW-3205:
---------------------------------

             Summary: [R] Minimum working example round-tripping a data frame 
from R to plasma to pandas
                 Key: ARROW-3205
                 URL: https://issues.apache.org/jira/browse/ARROW-3205
             Project: Apache Arrow
          Issue Type: New Feature
          Components: R
            Reporter: James Lamb


I see tremendous opportunity for interoperability between Python and R (two 
popular languages for data scientists) using Arrow as an interchange format.

To make this concrete and get developers in those languages interested, I think 
it would be valuable to create a minimum working example of writing an R data 
frame into plasma and reading it back up into *pandas* in a separate Python 
process, and vice versa.

I could, for example, envision reading a CSV up into a *data.table* in R to do 
some cleaning and feature engineering, writing that object to *plasma*, then 
kicking off multiple parallel Python processes to search a space of models. 
This could demonstrate the benefits of replacing "load this dataset from a file 
50 times" with "read off this range of memory in plasma".

 

I believe pretty strongly that a tangible example like this would meaningfully 
improve the R community's interest in and engagement with the Arrow project.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to