Re: LiveListenerBus is occupying most of the Driver Memory and frequent GC is degrading the performance

2020-09-11 Thread Teja
We did take heap dump from the live job. To our surprise, 85% of the memory is being occupied by `org.apache.spark.scheduler.LiveListenerBus` Here are few pictures for context

Re: LiveListenerBus is occupying most of the Driver Memory and frequent GC is degrading the performance

2020-09-11 Thread Teja
Sorry for the poor formatting -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: LiveListenerBus is occupying most of the Driver Memory and frequent GC is degrading the performance

2020-08-11 Thread Mridul Muralidharan
Hi, 50% of driver time being spent in gc just for listenerbus sounds very high in a 30G heap. Did you try to take a heap dump and see what is occupying so much memory ? This will help us eliminate if the memory usage is due to some user code/library holding references to large objects/graph of

Re: LiveListenerBus is occupying most of the Driver Memory and frequent GC is degrading the performance

2020-08-11 Thread Waleed Fateem
Hi Teja, The only thought I have is maybe considering decreasing the spark.scheduler.listenerbus.eventqueue.capacity parameter. That should decrease the driver memory pressure but of course you'll end up with dropping events probably more frequently, meaning you can't really trust anything you

LiveListenerBus is occupying most of the Driver Memory and frequent GC is degrading the performance

2020-08-11 Thread Teja
We have ~120 executors with 5 cores each, for a very long-running job which crunches ~2.5 TB of data with has too many filters to query. Currently, we have ~30k partitions which make ~90MB per partition. We are using Spark v2.2.2 as of now. The major problem we are facing is due to GC on the