testdf.persist(pyspark.storagelevel.StorageLevel.MEMORY_ONLY_SER) maybe
StorageLevel should change.And check you config "
spark.memory.storageFraction" which default value is 0.5

2017-07-28 3:04 GMT+08:00 Gourav Sengupta <gourav.sengu...@gmail.com>:

> Hi,
>
> I cached in a table in a large EMR cluster and it has a size of 62 MB.
> Therefore I know the size of the table while cached.
>
> But when I am trying to cache in the table in smaller cluster which still
> has a total of 3 GB Driver memory and two executors with close to 2.5 GB
> memory the job still keeps on failing giving JVM out of memory errors.
>
> Is there something that I am missing?
>
> CODE:
> =================================================================
> sparkSession =  spark.builder \
>                 .config("spark.rdd.compress", "true") \
>                 .config("spark.serializer", 
> "org.apache.spark.serializer.KryoSerializer")
> \
>                 
> .config("spark.executor.extraJavaOptions","-XX:+UseCompressedOops
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps") \
>                 .appName("test").enableHiveSupport().getOrCreate()
>
> testdf = sparkSession.sql("select * from tablename")
> testdf.persist(pyspark.storagelevel.StorageLevel.MEMORY_ONLY_SER)
> =================================================================
>
> This causes JVM out of memory error.
>
>
> Regards,
> Gourav Sengupta
>

Reply via email to