[ https://issues.apache.org/jira/browse/SPARK-17790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549996#comment-15549996 ]
Hossein Falaki commented on SPARK-17790: ---------------------------------------- Thanks for pointing it out. SPARK-6235 seems to be an umbrella ticket. This one can be a subtask of it. > Support for parallelizing data.frame larger than 2GB > ---------------------------------------------------- > > Key: SPARK-17790 > URL: https://issues.apache.org/jira/browse/SPARK-17790 > Project: Spark > Issue Type: Story > Components: SparkR > Affects Versions: 2.0.1 > Reporter: Hossein Falaki > > This issue is a more specific version of SPARK-17762. > Supporting larger than 2GB arguments is more general and arguably harder to > do because the limit exists both in R and JVM (because we receive data as a > ByteArray). However, to support parallalizing R data.frames that are larger > than 2GB we can do what PySpark does. > PySpark uses files to transfer bulk data between Python and JVM. It has > worked well for the large community of Spark Python users. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org