[ https://issues.apache.org/jira/browse/SPARK-21959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-21959. ------------------------------- Resolution: Invalid There's no detail on the job, and no indication that this is a problem in Spark. Your app is just running out of memory. You optimized your app and it worked. Not something you report as a JIRA. > Python RDD goes into never ending garbage collection service when spark > submit is triggered in oozie > ---------------------------------------------------------------------------------------------------- > > Key: SPARK-21959 > URL: https://issues.apache.org/jira/browse/SPARK-21959 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Submit > Affects Versions: 2.1.0 > Environment: Head Node - 2 - 8 cores -55 GB/Node > Worker Node - 5 - 4 cores - 28 GB/Node > Reporter: VP > Original Estimate: 30h > Remaining Estimate: 30h > > When the job is submitted through spark submit , the code executes fine > But when called through the oozie , whenever a PythonRDD is triggered , it > gets into garbage collecting service which is never ending. > When the RDD is replaced by Dataframe , the code executes fine. > Need to understand the proper root cause on why the garbage collection > service is invoked only when called through oozir -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org