Hi,

I cached in a table in a large EMR cluster and it has a size of 62 MB.
Therefore I know the size of the table while cached.

But when I am trying to cache in the table in smaller cluster which still
has a total of 3 GB Driver memory and two executors with close to 2.5 GB
memory the job still keeps on failing giving JVM out of memory errors.

Is there something that I am missing?

CODE:
=================================================================
sparkSession =  spark.builder \
                .config("spark.rdd.compress", "true") \
                .config("spark.serializer",
"org.apache.spark.serializer.KryoSerializer") \

.config("spark.executor.extraJavaOptions","-XX:+UseCompressedOops
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps") \
                .appName("test").enableHiveSupport().getOrCreate()

testdf = sparkSession.sql("select * from tablename")
testdf.persist(pyspark.storagelevel.StorageLevel.MEMORY_ONLY_SER)
=================================================================

This causes JVM out of memory error.


Regards,
Gourav Sengupta

Reply via email to