[jira] [Updated] (SPARK-26759) Arrow optimization in SparkR's interoperability
[ https://issues.apache.org/jira/browse/SPARK-26759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-26759: - Fix Version/s: 3.0.0 > Arrow optimization in SparkR's interoperability > --- > > Key: SPARK-26759 > URL: https://issues.apache.org/jira/browse/SPARK-26759 > Project: Spark > Issue Type: Umbrella > Components: SparkR, SQL >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > Arrow 0.12.0 is release and it contains R API. We could optimize Spark > DaraFrame <> R DataFrame interoperability. > For instance see the examples below: > - {{dapply}} > {code:java} > df <- createDataFrame(mtcars) > collect(dapply(df, >function(r.data.frame) { > data.frame(r.data.frame$gear) >}, >structType("gear long"))) > {code} > - {{gapply}} > {code:java} > df <- createDataFrame(mtcars) > collect(gapply(df, >"gear", >function(key, group) { > data.frame(gear = key[[1]], disp = mean(group$disp) > > group$disp) >}, >structType("gear double, disp boolean"))) > {code} > - R DataFrame -> Spark DataFrame > {code:java} > createDataFrame(mtcars) > {code} > - Spark DataFrame -> R DataFrame > {code:java} > collect(df) > head(df) > {code} > Currently, some of communication path between R side and JVM side has to > buffer the data and flush it at once due to ARROW-4512. I don't target to fix > it under this umbrella. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26759) Arrow optimization in SparkR's interoperability
[ https://issues.apache.org/jira/browse/SPARK-26759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-26759: - Description: Arrow 0.12.0 is release and it contains R API. We could optimize Spark DaraFrame <> R DataFrame interoperability. For instance see the examples below: - {{dapply}} {code:java} df <- createDataFrame(mtcars) collect(dapply(df, function(r.data.frame) { data.frame(r.data.frame$gear) }, structType("gear long"))) {code} - {{gapply}} {code:java} df <- createDataFrame(mtcars) collect(gapply(df, "gear", function(key, group) { data.frame(gear = key[[1]], disp = mean(group$disp) > group$disp) }, structType("gear double, disp boolean"))) {code} - R DataFrame -> Spark DataFrame {code:java} createDataFrame(mtcars) {code} - Spark DataFrame -> R DataFrame {code:java} collect(df) head(df) {code} Currently, some of communication path between R side and JVM side has to buffer the data and flush it at once due to ARROW-4512. I don't target to fix it under this umbrella. was: Arrow 0.12.0 is release and it contains R API. We could optimize Spark DaraFrame <> R DataFrame interoperability. For instance see the examples below: - {{dapply}} {code:java} df <- createDataFrame(mtcars) collect(dapply(df, function(r.data.frame) { data.frame(r.data.frame$gear) }, structType("gear long"))) {code} - {{gapply}} {code:java} df <- createDataFrame(mtcars) collect(gapply(df, "gear", function(key, group) { data.frame(gear = key[[1]], disp = mean(group$disp) > group$disp) }, structType("gear double, disp boolean"))) {code} - R DataFrame -> Spark DataFrame {code:java} createDataFrame(mtcars) {code} - Spark DataFrame -> R DataFrame {code:java} collect(df) head(df) {code} > Arrow optimization in SparkR's interoperability > --- > > Key: SPARK-26759 > URL: https://issues.apache.org/jira/browse/SPARK-26759 > Project: Spark > Issue Type: Umbrella > Components: SparkR, SQL >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: release-notes > > Arrow 0.12.0 is release and it contains R API. We could optimize Spark > DaraFrame <> R DataFrame interoperability. > For instance see the examples below: > - {{dapply}} > {code:java} > df <- createDataFrame(mtcars) > collect(dapply(df, >function(r.data.frame) { > data.frame(r.data.frame$gear) >}, >structType("gear long"))) > {code} > - {{gapply}} > {code:java} > df <- createDataFrame(mtcars) > collect(gapply(df, >"gear", >function(key, group) { > data.frame(gear = key[[1]], disp = mean(group$disp) > > group$disp) >}, >structType("gear double, disp boolean"))) > {code} > - R DataFrame -> Spark DataFrame > {code:java} > createDataFrame(mtcars) > {code} > - Spark DataFrame -> R DataFrame > {code:java} > collect(df) > head(df) > {code} > Currently, some of communication path between R side and JVM side has to > buffer the data and flush it at once due to ARROW-4512. I don't target to fix > it under this umbrella. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26759) Arrow optimization in SparkR's interoperability
[ https://issues.apache.org/jira/browse/SPARK-26759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-26759: - Description: Arrow 0.12.0 is release and it contains R API. We could optimize Spark DaraFrame <> R DataFrame interoperability. For instance see the examples below: - {{dapply}} {code:java} df <- createDataFrame(mtcars) collect(dapply(df, function(r.data.frame) { data.frame(r.data.frame$gear) }, structType("gear long"))) {code} - {{gapply}} {code:java} df <- createDataFrame(mtcars) collect(gapply(df, "gear", function(key, group) { data.frame(gear = key[[1]], disp = mean(group$disp) > group$disp) }, structType("gear double, disp boolean"))) {code} - R DataFrame -> Spark DataFrame {code:java} createDataFrame(mtcars) {code} - Spark DataFrame -> R DataFrame {code:java} collect(df) head(df) {code} was:Arrow 0.12.0 is release and it contains R API. We could optimize Spark DaraFrame <> R DataFrame interoperability. > Arrow optimization in SparkR's interoperability > --- > > Key: SPARK-26759 > URL: https://issues.apache.org/jira/browse/SPARK-26759 > Project: Spark > Issue Type: Umbrella > Components: SparkR, SQL >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: release-notes > > Arrow 0.12.0 is release and it contains R API. We could optimize Spark > DaraFrame <> R DataFrame interoperability. > For instance see the examples below: > - {{dapply}} > {code:java} > df <- createDataFrame(mtcars) > collect(dapply(df, >function(r.data.frame) { > data.frame(r.data.frame$gear) >}, >structType("gear long"))) > {code} > - {{gapply}} > {code:java} > df <- createDataFrame(mtcars) > collect(gapply(df, >"gear", >function(key, group) { > data.frame(gear = key[[1]], disp = mean(group$disp) > > group$disp) >}, >structType("gear double, disp boolean"))) > {code} > - R DataFrame -> Spark DataFrame > {code:java} > createDataFrame(mtcars) > {code} > - Spark DataFrame -> R DataFrame > {code:java} > collect(df) > head(df) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26759) Arrow optimization in SparkR's interoperability
[ https://issues.apache.org/jira/browse/SPARK-26759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-26759: - Component/s: SQL > Arrow optimization in SparkR's interoperability > --- > > Key: SPARK-26759 > URL: https://issues.apache.org/jira/browse/SPARK-26759 > Project: Spark > Issue Type: Umbrella > Components: SparkR, SQL >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: release-notes > > Arrow 0.12.0 is release and it contains R API. We could optimize Spark > DaraFrame <> R DataFrame interoperability. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org