Hi there,
Also for the benefit of others, if you attempt to use any version of Hadoop >
3.2.0 (such as 3.2.1), you will need to update the version of Google Guava used
by Apache Spark to that consumed by Hadoop.
Hadoop 3.2.1 requires guava-27.0-jre.jar. The latest is guava-29.0-jre.jar
which
Hi,
50% of driver time being spent in gc just for listenerbus sounds very
high in a 30G heap.
Did you try to take a heap dump and see what is occupying so much memory ?
This will help us eliminate if the memory usage is due to some user
code/library holding references to large objects/graph of
Hi Teja,
The only thought I have is maybe considering decreasing
the spark.scheduler.listenerbus.eventqueue.capacity parameter. That should
decrease the driver memory pressure but of course you'll end up with
dropping events probably more frequently, meaning you can't really trust
anything you
Hi,
Is there any plan to remove the limitation mentioned below?
*Streaming aggregation doesn't support group aggregate pandas UDF *
We want to run our data modelling jobs real time using Spark 3.0 and kafka
2.4 and need to have support for custom aggregate pandas UDF on stream
windows.
Is there
We have ~120 executors with 5 cores each, for a very long-running job which
crunches ~2.5 TB of data with has too many filters to query. Currently, we
have ~30k partitions which make ~90MB per partition.
We are using Spark v2.2.2 as of now. The major problem we are facing is due
to GC on the