Re: Executors idle, driver heap exploding and maxing only 1 cpu core

2019-05-29 Thread Akshay Bhardwaj
Hi, A few thoughts to add to Nicholas' apt reply. We were loading multiple files from AWS S3 in our Spark application. When the spark step of load files is called, the driver spends significant time fetching the exact path of files from AWS s3. Especially because we specified S3 paths like regex

Re: Executors idle, driver heap exploding and maxing only 1 cpu core

2019-05-23 Thread Nicholas Hakobian
One potential case that can cause this is the optimizer being a little overzealous with determining if a table can be broadcasted or not. Have you checked the UI or query plan to see if any steps include a BroadcastHashJoin? Its possible that the optimizer thinks that it should be able to fit the

Executors idle, driver heap exploding and maxing only 1 cpu core

2019-05-23 Thread Ashic Mahtab
Hi, We have a quite long winded Spark application we inherited with many stages. When we run on our spark cluster, things start off well enough. Workers are busy, lots of progress made, etc. etc. However, 30 minutes into processing, we see CPU usage of the workers drop drastically. At this