Hi, I cached in a table in a large EMR cluster and it has a size of 62 MB. Therefore I know the size of the table while cached.
But when I am trying to cache in the table in smaller cluster which still has a total of 3 GB Driver memory and two executors with close to 2.5 GB memory the job still keeps on failing giving JVM out of memory errors. Is there something that I am missing? CODE: ================================================================= sparkSession = spark.builder \ .config("spark.rdd.compress", "true") \ .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \ .config("spark.executor.extraJavaOptions","-XX:+UseCompressedOops -XX:+PrintGCDetails -XX:+PrintGCTimeStamps") \ .appName("test").enableHiveSupport().getOrCreate() testdf = sparkSession.sql("select * from tablename") testdf.persist(pyspark.storagelevel.StorageLevel.MEMORY_ONLY_SER) ================================================================= This causes JVM out of memory error. Regards, Gourav Sengupta