
I cached in a table in a large EMR cluster and it has a size of 62 MB.
Therefore I know the size of the table while cached.

But when I am trying to cache in the table in smaller cluster which still
has a total of 3 GB Driver memory and two executors with close to 2.5 GB
memory the job still keeps on failing giving JVM out of memory errors.

Is there something that I am missing?

sparkSession =  spark.builder \
                .config("spark.rdd.compress", "true") \
"org.apache.spark.serializer.KryoSerializer") \

-XX:+PrintGCDetails -XX:+PrintGCTimeStamps") \

testdf = sparkSession.sql("select * from tablename")

This causes JVM out of memory error.

Gourav Sengupta

