[ https://issues.apache.org/jira/browse/SPARK-29777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hossein Falaki updated SPARK-29777: ----------------------------------- Description: Following code block reproduces the issue: {code:java} library(SparkR) sparkR.session() spark_df <- createDataFrame(na.omit(airquality)) cody_local2 <- function(param2) { 10 + param2 } cody_local1 <- function(param1) { cody_local2(param1) } result <- cody_local2(5) calc_df <- dapplyCollect(spark_df, function(x) { cody_local2(20) cody_local1(5) }) print(result) {code} We get following error message: {code:java} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 12.0 failed 4 times, most recent failure: Lost task 0.3 in stage 12.0 (TID 27, 10.0.174.239, executor 0): org.apache.spark.SparkException: R computation failed with Error in cody_local2(param1) : could not find function "cody_local2" Calls: compute -> computeFunc -> cody_local1 {code} Compare that to this code block that succeeds: {code:java} calc_df <- dapplyCollect(spark_df, function(x) { cody_local2(20) #cody_local1(5) }) {code} was: Following code block reproduces the issue: {code:java} library(SparkR) sparkR.session() spark_df <- createDataFrame(na.omit(airquality)) cody_local2 <- function(param2) { 10 + param2 } cody_local1 <- function(param1) { cody_local2(param1) } result <- cody_local2(5) calc_df <- dapplyCollect(spark_df, function(x) { cody_local2(20) cody_local1(5) }) print(result) {code} We get following error message: {code:java} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 12.0 failed 4 times, most recent failure: Lost task 0.3 in stage 12.0 (TID 27, 10.0.174.239, executor 0): org.apache.spark.SparkException: R computation failed with Error in cody_local2(param1) : could not find function "cody_local2" Calls: compute -> computeFunc -> cody_local1 {code} Compare that to this code block with succeeds: {code:java} calc_df <- dapplyCollect(spark_df, function(x) { cody_local2(20) #cody_local1(5) }) {code} > SparkR::cleanClosure aggressively removes a function required by user function > ------------------------------------------------------------------------------ > > Key: SPARK-29777 > URL: https://issues.apache.org/jira/browse/SPARK-29777 > Project: Spark > Issue Type: Bug > Components: SparkR > Affects Versions: 2.4.4 > Reporter: Hossein Falaki > Priority: Major > > Following code block reproduces the issue: > {code:java} > library(SparkR) > sparkR.session() > spark_df <- createDataFrame(na.omit(airquality)) > cody_local2 <- function(param2) { > 10 + param2 > } > cody_local1 <- function(param1) { > cody_local2(param1) > } > result <- cody_local2(5) > calc_df <- dapplyCollect(spark_df, function(x) { > cody_local2(20) > cody_local1(5) > }) > print(result) > {code} > We get following error message: > {code:java} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 12.0 failed 4 times, most recent failure: Lost task 0.3 in stage 12.0 > (TID 27, 10.0.174.239, executor 0): org.apache.spark.SparkException: R > computation failed with > Error in cody_local2(param1) : could not find function "cody_local2" > Calls: compute -> computeFunc -> cody_local1 > {code} > Compare that to this code block that succeeds: > {code:java} > calc_df <- dapplyCollect(spark_df, function(x) { > cody_local2(20) > #cody_local1(5) > }) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org