[ https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vladislav Sterkhov updated SPARK-33620: --------------------------------------- Description: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !VlwWJ.png|width=644,height=150! !mgg1s.png|width=651,height=182! This my code: var allTrafficRDD = sparkContext.emptyRDD[String] for (traffic <- trafficBuffer) { logger.info("Load traffic path - "+traffic) val trafficRDD = sparkContext.textFile(traffic) if (isValidTraffic(trafficRDD, isMasterData)) { allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) } } hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum), outTable, isMasterData) was: Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help !VlwWJ.png! !mgg1s.png! This my code: var allTrafficRDD = sparkContext.emptyRDD[String] for (traffic <- trafficBuffer) { logger.info("Load traffic path - "+traffic) val trafficRDD = sparkContext.textFile(traffic) if (isValidTraffic(trafficRDD, isMasterData)) { allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) } } hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum), outTable, isMasterData) > Task not started after filtering > -------------------------------- > > Key: SPARK-33620 > URL: https://issues.apache.org/jira/browse/SPARK-33620 > Project: Spark > Issue Type: Question > Components: Spark Core > Affects Versions: 2.4.7 > Reporter: Vladislav Sterkhov > Priority: Major > Attachments: VlwWJ.png, mgg1s.png > > > Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb > memory used task starting and complete, but we need use unlimited stack. > Please help > > !VlwWJ.png|width=644,height=150! > > !mgg1s.png|width=651,height=182! > > This my code: > var allTrafficRDD = sparkContext.emptyRDD[String] > for (traffic <- trafficBuffer) { > logger.info("Load traffic path - "+traffic) > val trafficRDD = sparkContext.textFile(traffic) > if (isValidTraffic(trafficRDD, isMasterData)) > { allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) } > } > > hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum), > outTable, isMasterData) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org