driver crashesneed to find out why driver keeps crashing

2019-10-20 Thread Manuel Sopena Ballesteros
Dear Apache Spark community, My spark driver crashes and logs does not gives enough explanation of why it happens: INFO [2019-10-21 16:33:37,045] ({pool-6-thread-7} SchedulerFactory.java[jobStarted]:109) - Job 20190926-163704_913596201 started by scheduler interpreter_2100843352 DEBUG

Re: pyspark - memory leak leading to OOM after submitting 100 jobs?

2019-10-20 Thread Jungtaek Lim
Honestly I'd recommend you to spend you time to look into the issue, via taking memory dump per some interval and compare differences (at least share these dump files to community with redacting if necessary). Otherwise someone has to try to reproduce without reproducer and even couldn't reproduce

pyspark - memory leak leading to OOM after submitting 100 jobs?

2019-10-20 Thread Paul Wais
Dear List, I've observed some sort of memory leak when using pyspark to run ~100 jobs in local mode. Each job is essentially a create RDD -> create DF -> write DF sort of flow. The RDD and DFs go out of scope after each job completes, hence I call this issue a "memory leak." Here's pseudocode: