[jira] [Commented] (SPARK-6832) Handle partial reads in SparkR JVM to worker communication
[ https://issues.apache.org/jira/browse/SPARK-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429895#comment-15429895 ] Apache Spark commented on SPARK-6832: - User 'krishnakalyan3' has created a pull request for this issue: https://github.com/apache/spark/pull/14741 > Handle partial reads in SparkR JVM to worker communication > -- > > Key: SPARK-6832 > URL: https://issues.apache.org/jira/browse/SPARK-6832 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Shivaram Venkataraman >Priority: Minor > > After we move to use socket between R worker and JVM, it's possible that > readBin() in R will return partial results (for example, interrupted by > signal). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6832) Handle partial reads in SparkR JVM to worker communication
[ https://issues.apache.org/jira/browse/SPARK-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426798#comment-15426798 ] Shivaram Venkataraman commented on SPARK-6832: -- I think we can add a new method `readBinFully` and then replace calls to `readBin` with that method. Regarding simulating this -- I think you could try to manually send a signal (using something like kill -s SIGCHLD) to an R process while it is reading a large amount of data using readBin. > Handle partial reads in SparkR JVM to worker communication > -- > > Key: SPARK-6832 > URL: https://issues.apache.org/jira/browse/SPARK-6832 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Shivaram Venkataraman >Priority: Minor > > After we move to use socket between R worker and JVM, it's possible that > readBin() in R will return partial results (for example, interrupted by > signal). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6832) Handle partial reads in SparkR JVM to worker communication
[ https://issues.apache.org/jira/browse/SPARK-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426012#comment-15426012 ] Krishna Kalyan commented on SPARK-6832: --- [~shivaram],[~davies], I see there are 7 occurrences of readBin. From what I understand I need to wrap them under a retry method. Is this understanding correct?. Another question I had was how do I test partial reads / simulate this on my local system. Thanks, Krishna > Handle partial reads in SparkR JVM to worker communication > -- > > Key: SPARK-6832 > URL: https://issues.apache.org/jira/browse/SPARK-6832 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Shivaram Venkataraman >Priority: Minor > > After we move to use socket between R worker and JVM, it's possible that > readBin() in R will return partial results (for example, interrupted by > signal). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6832) Handle partial reads in SparkR JVM to worker communication
[ https://issues.apache.org/jira/browse/SPARK-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414262#comment-15414262 ] Shivaram Venkataraman commented on SPARK-6832: -- [~KrishnaKalyan3] Thanks for looking at this issue. The problem we ran into while we opened the bug is discussed in https://github.com/amplab-extras/SparkR-pkg/pull/193#issuecomment-78144164 and the comments following that. I think the goal here was to add a retry mechanism around readBin that would be resilient against signals. > Handle partial reads in SparkR JVM to worker communication > -- > > Key: SPARK-6832 > URL: https://issues.apache.org/jira/browse/SPARK-6832 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Shivaram Venkataraman >Priority: Minor > > After we move to use socket between R worker and JVM, it's possible that > readBin() in R will return partial results (for example, interrupted by > signal). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6832) Handle partial reads in SparkR JVM to worker communication
[ https://issues.apache.org/jira/browse/SPARK-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413972#comment-15413972 ] Krishna Kalyan commented on SPARK-6832: --- [~shivaram] I see that changes need to made in `R/pkg/R/deserialize.R` and specifically in the `readString` function. However I dont understand how to go about making changes to enable partial reads. I have gone through `https://stat.ethz.ch/R-manual/R-devel/library/base/html/readBin.html`. Thanks, Krishna > Handle partial reads in SparkR JVM to worker communication > -- > > Key: SPARK-6832 > URL: https://issues.apache.org/jira/browse/SPARK-6832 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Shivaram Venkataraman >Priority: Minor > > After we move to use socket between R worker and JVM, it's possible that > readBin() in R will return partial results (for example, interrupted by > signal). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org