[ https://issues.apache.org/jira/browse/SPARK-16299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-16299: ------------------------------------ Assignee: Apache Spark > Capture errors from R workers in daemon.R to avoid deletion of R session > temporary directory > -------------------------------------------------------------------------------------------- > > Key: SPARK-16299 > URL: https://issues.apache.org/jira/browse/SPARK-16299 > Project: Spark > Issue Type: Bug > Components: SparkR > Affects Versions: 1.6.2 > Reporter: Sun Rui > Assignee: Apache Spark > > Running SparkR unit tests randomly has the following error: > Failed > ------------------------------------------------------------------------- > 1. Error: pipeRDD() on RDDs (@test_rdd.R#428) > ---------------------------------- > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 792.0 failed 1 times, most recent failure: Lost task 0.0 in stage 792.0 > (TID 1493, localhost): org.apache.spark.SparkException: R computation failed > with > [1] 1 > [1] 1 > [1] 2 > [1] 2 > [1] 3 > [1] 3 > [1] 2 > [1] 2 > [1] 2 > [1] 2 > [1] 2 > [1] 2 > ignoring SIGPIPE signal > Calls: source ... <Anonymous> -> lapply -> lapply -> FUN -> writeRaw -> > writeBin > Execution halted > cannot open the connection > Calls: source ... computeFunc -> FUN -> system2 -> writeLines -> file > In addition: Warning message: > In file(con, "w") : > cannot open file '/tmp/Rtmp0Gr1aU/file2de3efc94b3': No such file or > directory > Execution halted > at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:49) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > This is related to daemon R worker mode. By default, SparkR launches an R > daemon worker per executor, and forks R workers from the daemon when > necessary. > The problem about forking R worker is that all forked R processes share a > temporary directory, as documented at > https://stat.ethz.ch/R-manual/R-devel/library/base/html/tempfile.html. > When any forked R worker exits either normally or caused by errors, the > cleanup procedure of R will delete the temporary directory. This will affect > the still-running forked R workers because any temporary files created by > them under the temporary directories will be removed together. Also all > future R workers that will be forked from the daemon will be affected if they > use tempdir() or tempfile() to get tempoaray files because they will fail to > create temporary files under the already-deleted session temporary directory. > So in order for the daemon mode to work, this problem should be circumvented. > In current dameon.R, R workers directly exits skipping the cleanup procedure > of R so that the shared temporary directory won't be deleted. > {code} > source(script) > # Set SIGUSR1 so that child can exit > tools::pskill(Sys.getpid(), tools::SIGUSR1) > parallel:::mcexit(0L) > {code} > However, this is a bug in daemon.R, that when there is any execution error in > R workers, the error handling of R will finally go into the cleanup > procedure. So try() should be used in daemon.R to catch any error in R > workers, so that R workers will directly exit. > {code} > try(source(script)) > # Set SIGUSR1 so that child can exit > tools::pskill(Sys.getpid(), tools::SIGUSR1) > parallel:::mcexit(0L) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org