[ https://issues.apache.org/jira/browse/SPARK-26759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-26759. ---------------------------------- Resolution: Done > Arrow optimization in SparkR's interoperability > ----------------------------------------------- > > Key: SPARK-26759 > URL: https://issues.apache.org/jira/browse/SPARK-26759 > Project: Spark > Issue Type: Umbrella > Components: SparkR, SQL > Affects Versions: 3.0.0 > Reporter: Hyukjin Kwon > Assignee: Hyukjin Kwon > Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > Arrow 0.12.0 is release and it contains R API. We could optimize Spark > DaraFrame <> R DataFrame interoperability. > For instance see the examples below: > - {{dapply}} > {code:java} > df <- createDataFrame(mtcars) > collect(dapply(df, > function(r.data.frame) { > data.frame(r.data.frame$gear) > }, > structType("gear long"))) > {code} > - {{gapply}} > {code:java} > df <- createDataFrame(mtcars) > collect(gapply(df, > "gear", > function(key, group) { > data.frame(gear = key[[1]], disp = mean(group$disp) > > group$disp) > }, > structType("gear double, disp boolean"))) > {code} > - R DataFrame -> Spark DataFrame > {code:java} > createDataFrame(mtcars) > {code} > - Spark DataFrame -> R DataFrame > {code:java} > collect(df) > head(df) > {code} > Currently, some of communication path between R side and JVM side has to > buffer the data and flush it at once due to ARROW-4512. I don't target to fix > it under this umbrella. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org