Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 I referred this - http://adv-r.had.co.nz/Rcpp.html and your link. I did as below: **R test alone** ``` vi tmp.R ``` copy and paste the codes in **Before** and **After** and then ran ``` Rscript tmp.R ``` Before ```R library(Rcpp) cppFunction('double takeLog(double val) { try { if (val <= 0.0) { // log() not defined here throw std::range_error("Inadmissible value"); } return log(val); } catch(std::exception &ex) { forward_exception_to_r(ex); } catch(...) { ::Rf_error("c++ exception (unknown reason)"); } return NA_REAL; // not reached }') for(i in 0:10000) { p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { takeLog(-1.0) print("unreachable") tools::pskill(child, tools::SIGUSR1) } } print("end") Sys.sleep(10L) ``` After ```R library(Rcpp) cppFunction('double takeLog(double val) { try { if (val <= 0.0) { // log() not defined here throw std::range_error("Inadmissible value"); } return log(val); } catch(std::exception &ex) { forward_exception_to_r(ex); } catch(...) { ::Rf_error("c++ exception (unknown reason)"); } return NA_REAL; // not reached }') for(i in 0:10000) { p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { takeLog(-1.0) print("unreachable") } children <- suppressWarnings(parallel:::selectChildren(timeout = 0)) if (is.integer(children)) { lapply(children, function(child) { print(parallel:::readChild(child)) tools::pskill(child, tools::SIGUSR1) }) } } print("end") Sys.sleep(10L) ``` The symptoms are similar with https://github.com/apache/spark/pull/18465#issuecomment-313049544 **End to end** I could not do this as I did above with `cppFunction` due to such errors below: ``` Error in as.character(node[[1]]) : cannot coerce type 'builtin' to vector of type 'character' ``` So, I did as below: ``` vi takeLog.cpp ``` copy and paste ```cpp #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] double takeLog(double val) { try { if (val <= 0.0) { // log() not defined here throw std::range_error("Inadmissible value"); } return log(val); } catch(std::exception &ex) { forward_exception_to_r(ex); } catch(...) { ::Rf_error("c++ exception (unknown reason)"); } return NA_REAL; // not reached } ``` And then ran below with SparkR: ```R func <- function(key, x) { library(Rcpp) path <- "/.../spark/takeLog.cpp" sourceCpp(path) takeLog(-1.0) } df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d")) collect(gapply(df, "a", func, schema(df))) ... 30 times collect(gapply(df, "a", function(key, x) { x }, schema(df))) ``` The symptoms are also similar with https://github.com/apache/spark/pull/18465#issuecomment-313055990 for both before/after.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org