[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 Sure, thanks @felixcheung. I will let you know if I see something odds! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18465 merged to master. let's keep an eye on this, ok? thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 Yes, I believe these were all worth being tested. I have veen 9 times with the current change and I believe it is enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18465 ah cool, this https://github.com/apache/spark/pull/18465#issuecomment-313671617 was what I meant and want to see if it recovers (or dies) properly. have you seen enough test passes running in jenkins? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 The comments here are a bit messy. I think https://github.com/apache/spark/pull/18465#issuecomment-313559240 summarises all results with some links. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 I rushed to read the comment. I also try to get rid of that try/catch in c++ as below: **R test alone** ``` vi tmp.R ``` copy and paste the codes in **Before** and **After** and then ran ``` Rscript tmp.R ``` Before ```R library(Rcpp) cppFunction('double takeLog(double val) { throw std::range_error("Inadmissible value"); return NA_REAL; // not reached }') for(i in 0:1) { p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { takeLog(-1.0) print("unreachable") tools::pskill(child, tools::SIGUSR1) } } print("end") Sys.sleep(10L) ``` After ```R library(Rcpp) cppFunction('double takeLog(double val) { throw std::range_error("Inadmissible value"); return NA_REAL; // not reached }') for(i in 0:1) { p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { takeLog(-1.0) print("unreachable") } children <- suppressWarnings(parallel:::selectChildren(timeout = 0)) if (is.integer(children)) { lapply(children, function(child) { print(parallel:::readChild(child)) tools::pskill(child, tools::SIGUSR1) }) } } print("end") Sys.sleep(10L) ``` The symptoms are similar with https://github.com/apache/spark/pull/18465#issuecomment-313049544 **End to end** ``` vi takeLog.cpp ``` copy and paste ```cpp #include using namespace Rcpp; // [[Rcpp::export]] double takeLog(double val) { throw std::range_error("Inadmissible value"); return NA_REAL; // not reached } ``` With SparkR: ```R func <- function(key, x) { library(Rcpp) path <- "/Users/hyukjinkwon/Desktop/workspace/repos/forked/spark/takeLog.cpp" sourceCpp(path) takeLog(-1.0) } df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d")) collect(gapply(df, "a", func, schema(df))) # ... 30 times collect(gapply(df, "a", function(key, x) { x }, schema(df))) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 Doh! Not inside try/catch. Sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 I tested above both on MacOS and CentOS. Basically, they are same but it is, as expected, worse on CentOS causing a crash in the master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 I referred this - http://adv-r.had.co.nz/Rcpp.html and your link. I did as below: **R test alone** ``` vi tmp.R ``` copy and paste the codes in **Before** and **After** and then ran ``` Rscript tmp.R ``` Before ```R library(Rcpp) cppFunction('double takeLog(double val) { try { if (val <= 0.0) { // log() not defined here throw std::range_error("Inadmissible value"); } return log(val); } catch(std::exception &ex) { forward_exception_to_r(ex); } catch(...) { ::Rf_error("c++ exception (unknown reason)"); } return NA_REAL; // not reached }') for(i in 0:1) { p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { takeLog(-1.0) print("unreachable") tools::pskill(child, tools::SIGUSR1) } } print("end") Sys.sleep(10L) ``` After ```R library(Rcpp) cppFunction('double takeLog(double val) { try { if (val <= 0.0) { // log() not defined here throw std::range_error("Inadmissible value"); } return log(val); } catch(std::exception &ex) { forward_exception_to_r(ex); } catch(...) { ::Rf_error("c++ exception (unknown reason)"); } return NA_REAL; // not reached }') for(i in 0:1) { p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { takeLog(-1.0) print("unreachable") } children <- suppressWarnings(parallel:::selectChildren(timeout = 0)) if (is.integer(children)) { lapply(children, function(child) { print(parallel:::readChild(child)) tools::pskill(child, tools::SIGUSR1) }) } } print("end") Sys.sleep(10L) ``` The symptoms are similar with https://github.com/apache/spark/pull/18465#issuecomment-313049544 **End to end** I could not do this as I did above with `cppFunction` due to such errors below: ``` Error in as.character(node[[1]]) : cannot coerce type 'builtin' to vector of type 'character' ``` So, I did as below: ``` vi takeLog.cpp ``` copy and paste ```cpp #include using namespace Rcpp; // [[Rcpp::export]] double takeLog(double val) { try { if (val <= 0.0) { // log() not defined here throw std::range_error("Inadmissible value"); } return log(val); } catch(std::exception &ex) { forward_exception_to_r(ex); } catch(...) { ::Rf_error("c++ exception (unknown reason)"); } return NA_REAL; // not reached } ``` And then ran below with SparkR: ```R func <- function(key, x) { library(Rcpp) path <- "/.../spark/takeLog.cpp" sourceCpp(path) takeLog(-1.0) } df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d")) collect(gapply(df, "a", func, schema(df))) ... 30 times collect(gapply(df, "a", function(key, x) { x }, schema(df))) ``` The symptoms are also similar with https://github.com/apache/spark/pull/18465#issuecomment-313055990 for both before/after. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18465 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79308/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18465 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18465 **[Test build #79308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79308/testReport)** for PR 18465 at commit [`c08ccd5`](https://github.com/apache/spark/commit/c08ccd59f438fce1f841aa70f760ffb9dc24cf50). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18465 **[Test build #79308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79308/testReport)** for PR 18465 at commit [`c08ccd5`](https://github.com/apache/spark/commit/c08ccd59f438fce1f841aa70f760ffb9dc24cf50). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 (simply rebased) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 Mac OS - [ ] jobs with many stages - https://github.com/apache/spark/pull/18320#issuecomment-310312766 - [ ] jobs with long stages - https://github.com/apache/spark/pull/18320#issuecomment-310312766 - [ ] NULL when we expect it - https://github.com/apache/spark/pull/18320#issuecomment-309940653 - [ ] sleep before connection - https://github.com/apache/spark/pull/18320#discussion_r122687051 - [ ] full tests - [ ] dapply/gapply described in the PR CentOS - [ ] jobs with many stages - https://github.com/apache/spark/pull/18320#issuecomment-310312766 - [ ] jobs with long stages - https://github.com/apache/spark/pull/18320#issuecomment-310312766 - [ ] NULL when we expect it - https://github.com/apache/spark/pull/18320#issuecomment-309940653 - [ ] sleep before connection - https://github.com/apache/spark/pull/18320#discussion_r122687051 - [ ] full tests - [ ] dapply/gapply described in the PR +https://github.com/apache/spark/pull/18465#issuecomment-313306125 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 Yea, I get your point and thanks for details. Will go thought the list first and then maybe give a shot for that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18465 hey, thanks! I think it's reasonable. I do want to check out what happens with a native error/exception (ie. [here](http://gallery.rcpp.org/articles/intro-to-exceptions/) a std:: exception *not* inside a try/catch block in c++) as per this https://github.com/apache/spark/pull/18465#issuecomment-312936691 but that would be a stretch goal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 @felixcheung and @shivaram, do you guys see it sounds okay to go through the checklist I used before for the current state? (of course, not asking a sing-off but just a rough judgement) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 Also, just checked via manually calling `stop()` as end-to-end tests after building it. ``` df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d")) collect(gapply(df, "a", function(key, x) { stop() }, schema(df))) ... 30 times collect(gapply(df, "a", function(key, x) { x }, schema(df))) ``` The daemon does not die and the pipes were not left open at all in this case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 @felixcheung, it looks the codes below: ```R for(i in 0:1) { p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { stop("unexpected failure") } children <- suppressWarnings(parallel:::selectChildren(timeout = 0)) if (is.integer(children)) { lapply(children, function(child) { print(parallel:::readChild(child)) tools::pskill(child, tools::SIGUSR1) }) } } ``` works fine. Sounds `parallel:::selectChildren` sometimes prints warnings such as "Bad file descriptor" and looks _few_ pipes are left open but it still does not at least die. I also resembled the original way - ```R for(i in 0:1) { p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { tools::pskill(child, tools::SIGUSR1) stop("unexpected failure") } children <- parallel:::selectChildren(timeout = 0) if (is.integer(children)) { lapply(children, function(child) { print(parallel:::readChild(child)) }) } } ``` In my local, this gets easily dead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 Sounds good to.check. I will be back after investigating. BTW, I guess the original state does not handle that case too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18465 thinking more about this, how would it handle the case when the forked process is in a bad state? say, if we put `stop()` in the gapply UDF. Or something more serious say some sort of native error (Rcpp?). Would that state be reflected there in the returned terminated process id list? Can `pskill` still be effective in those cases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 Just FYI, -9 error has occurred consistently without particularly increasing or decreasing given my observations. I think code changes are not particularly related with it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 Just to be clear, -9 error has occured consistently without paricularly increasing or decreasing given my observations. I don't believe R change is partocularly related with it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18465 I see, quite possibly it is bubbled up more because of that change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 FWIW, I don't know but I guess it happens randomly in the middle of any tests. My wild guess is it is related with triggering many tests (or maybe rebasing a lot to trigger the build). I saw it broke all other builds with -9 in all other PRs when it happens once. Resently I mapped the unknown codes directly to be printed as above because I don't know the reason. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/18465 @felixcheung are these failures happening from the gapply tests ? Also do we have a way to map the error code to an error reason ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18465 btw, even when the earlier commit is revert seems like there's still a high number of test failure with `fails due to an unknown error code, -9.` - do you know how that might be related? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 > isn't being "terminated" enough? that we have to pskill it again? Indeed, this is the point. I really tested so many times but `exit` does not terminate (I think I might have to change the term used in the comments in my PR, maybe to 'kill'). This can be tested as below: ``` vi tmp.R ``` ```r for(i in 0:50) { p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { # Send no data parallel:::mcexit(0L) } } Sys.sleep(10) ``` ``` Rscript tmp.R ``` and, `ps -fe | grep /exec/R` shows the children processes. > the original issue with leaking was thought to be related to child process getting stuck and not terminated properly? I think yes, it is. It looked they are killed but somehow pipes (file descriptors) were left open. This hit the ulimit of open files and then it went an error like `Resource temporarily unavailable fork`. This looked because the children were killed by itself before or while calling `exit()` and left incomplete. > would that manifest again under this new behavior? in other words, would it get into a state where integer is never returned to the master? In the original behaviour, I think the problem is `exit`is not being properly called or terminated during `exit`. The new change here I think it makes sure the children call `exit` properly. I think this test makes sure what I said: ```r read.pids <- function(children) { lapply(children, function(child) { print(parallel:::readChild(child)) }) } kill <- function(children) { lapply(children, function(child) { tools::pskill(child, tools::SIGUSR1) }) } p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { parallel:::mcexit(0L) } Sys.sleep(3) print("Children exited - new behaviour here") children <- parallel:::selectChildren(timeout = 0) if (is.integer(children)) { read.pids(children) kill(children) } p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { tools::pskill(Sys.getpid(), tools::SIGUSR1) parallel:::mcexit(0L) } Sys.sleep(3) print("Children killed itself - old behaviour here") children <- parallel:::selectChildren(timeout = 0) if (is.integer(children)) { read.pids(children) } p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { Sys.sleep(1) } Sys.sleep(3) print("Children killed by parent without exiting") children <- parallel:::selectChildren(timeout = 0) if (is.integer(children)) { kill(children) # wait for kill Sys.sleep(1) read.pids(children) } ``` In my local, this prints the pid when `mcexit` is called properly. ```r [1] "Children exited - new behaviour here" [1] 12992 [[1]] [1] FALSE [1] "Children killed itself - old behaviour here" [1] "Children killed by parent without exiting" ``` And here we kill it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18465 hmm, this is a fairly small but crucial change. One point we were discussing, earlier, we are saying that we don't know if the child is going away, or merely slow starting, so we explicitly send and look for a exit code. I guess this condition is covered in the new change. According to the doc ``` readChild and readChildren return a raw vector with a "pid" attribute if data were available, integer vector of length one with the process ID if a child terminated ``` So I guess it is safer to act only if we get an integer (which, according to above, the child terminated), but to clarify - isn't being "terminated" enough? that we have to pskill it again? - the original issue with leaking was thought to be related to child process getting stuck and *not* terminated properly? would that manifest again under this new behavior? in other words, would it get into a state where integer is never returned to the master? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/18465 Thanks @HyukjinKwon - I will try to look at this later tonight --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18465 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18465 **[Test build #78900 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78900/testReport)** for PR 18465 at commit [`9ff89a7`](https://github.com/apache/spark/commit/9ff89a7859cb9f427fc774f33c3521c7d962b723). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18465 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78900/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18465 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78899/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18465 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18465 **[Test build #78899 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78899/testReport)** for PR 18465 at commit [`9ff89a7`](https://github.com/apache/spark/commit/9ff89a7859cb9f427fc774f33c3521c7d962b723). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 If these tests all pass, 8 times with Jenkins pass. I believe this solves the flakiness if so. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18465 **[Test build #78900 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78900/testReport)** for PR 18465 at commit [`9ff89a7`](https://github.com/apache/spark/commit/9ff89a7859cb9f427fc774f33c3521c7d962b723). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18465 **[Test build #78899 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78899/testReport)** for PR 18465 at commit [`9ff89a7`](https://github.com/apache/spark/commit/9ff89a7859cb9f427fc774f33c3521c7d962b723). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18465 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18465 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78896/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18465 **[Test build #78896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78896/testReport)** for PR 18465 at commit [`9ff89a7`](https://github.com/apache/spark/commit/9ff89a7859cb9f427fc774f33c3521c7d962b723). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 Also tested with this: ```r kill <- function(children) { lapply(children, function(child) { pid <- parallel:::readChild(child) if (is.integer(pid)) { if (child == pid) { print(paste("child PID:", child, "and read PID:", pid)) tools::pskill(child, tools::SIGUSR1) } else { print(paste("child PID:", child, "and read PID:", pid, "not killing")) } } else { print(paste("child PID:", child, "and sent data:", unserialize(pid))) } }) } for(i in 0:100) { for(i in 0:50) { p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { # Send no data parallel:::mcexit(0L) } else { p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { # Send custom data parallel:::sendMaster("arbitrary") parallel:::sendMaster(123) parallel:::mcexit(0L) } } } for(i in 0:50) { c <- parallel:::selectChildren(timeout = 0) if (is.integer(c)) { invisible(kill(c)) } } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 For race condition concern, I tested this as below: ``` vi tmp.R ``` copied and pasted: ```r kill <- function(children) { lapply(children, function(child) { pid <- parallel:::readChild(child) if (is.integer(pid)) { if (child == pid) { print(paste("child PID:", child, "and read PID:", pid)) tools::pskill(child, tools::SIGUSR1) } else { print(paste("child PID:", child, "and read PID:", pid, "not killing")) } } else { print(paste("child PID:", child, "and sent data:", unserialize(pid))) } }) } for(i in 0:1) { p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { # Send no data parallel:::mcexit(0L) } else { p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { # Send custom data parallel:::sendMaster("arbitrary") parallel:::sendMaster(123) parallel:::mcexit(0L) } } c <- parallel:::selectChildren(timeout = 0) if (is.integer(c)) { invisible(kill(c)) } } ``` and ran: ``` Rscript tmp.R ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 To cut this short, the new approach is roughly From ```R exitCode <- 1 ... data <- parallel:::readChild(child) if (is.raw(data)) { if (unserialize(data) == exitCode) { ... } } ... parallel:::mcexit(0L, send = exitCode) ``` to ```R pid <- parallel:::readChild(child) if (is.integer(pid)) { if (child == pid) { ... } } ... parallel:::mcexit(0L) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18465 **[Test build #78896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78896/testReport)** for PR 18465 at commit [`9ff89a7`](https://github.com/apache/spark/commit/9ff89a7859cb9f427fc774f33c3521c7d962b723). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 cc @felixchung and @shivaram, I ran some simple tests on Mac OS (described one in https://github.com/apache/spark/pull/18320). Likewise, I would like to be sure before going through the checklist - https://github.com/apache/spark/pull/18320#issuecomment-310594551. Could you take a look please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org