[ https://issues.apache.org/jira/browse/SPARK-17790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550414#comment-15550414 ]
Felix Cheung edited comment on SPARK-17790 at 10/6/16 12:34 AM: ---------------------------------------------------------------- Yes. Driver R and Driver JVM should be on the same machine. I have not checked recently but there might be projects changing on how the Backend is connected that could be affected by this though was (Author: felixcheung): Yes. > Support for parallelizing R data.frame larger than 2GB > ------------------------------------------------------ > > Key: SPARK-17790 > URL: https://issues.apache.org/jira/browse/SPARK-17790 > Project: Spark > Issue Type: Sub-task > Components: SparkR > Affects Versions: 2.0.1 > Reporter: Hossein Falaki > > This issue is a more specific version of SPARK-17762. > Supporting larger than 2GB arguments is more general and arguably harder to > do because the limit exists both in R and JVM (because we receive data as a > ByteArray). However, to support parallalizing R data.frames that are larger > than 2GB we can do what PySpark does. > PySpark uses files to transfer bulk data between Python and JVM. It has > worked well for the large community of Spark Python users. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org