Hi All,
My spark Configuration is following. spark = SparkSession.builder.master(mesos_ip) \ .config('spark.executor.cores','3')\ .config('spark.executor.memory','8g')\ .config('spark.es.scroll.size','10000')\ .config('spark.network.timeout','600s')\ .config('spark.executor.heartbeatInterval','60s')\ .config('spark.driver.cores','3')\ .config('spark.driver.extraJavaOptions','-Xmx4g -Xms4g')\ .config("spark.driver.extraJavaOptions", "-XX:+UseG1GC -XX:MaxDirectMemorySize=1024m")\ .config('spark.files.overwrite','true')\ .config('spark.kryoserializer.buffer', '70')\ .config('spark.driver.extraJavaOptions', '-XX:+UseG1GC')\ .config('spark.executor.extraJavaOptions', '-XX:+UseG1GC')\ .config('spark.serializer','org.apache.spark.serializer.KryoSerializer')\ .config('spark.broadcast.compress', 'true')\ .config('spark.shuffle.compress', 'true')\ .config('spark.shuffle.spill.compress', 'true')\ .config('spark.driver.memory','8g')\ .config('spark.cores.max','12')\ .config('spark.sql.shuffle.partitions','6000')\ .config('es.nodes',es_nodes)\ .config('es.port',es_port)\ .config('spark.sql.autoBroadcastJoinThreshold',-1)\ .config('spark.es.mapping.date.rich','false')\ .config('spark.mesos.executor.memoryOverhead',1000).getOrCreate() Here we have used value of “spark.sql.shuffle.partitions” as high as 6000 as suggested in https://stackoverflow.com/questions/35948714/joining-a-large-and-a-ginormous-spark-dataframe We are using SQL join on two index of size > 10 GB and getting following error. Error - org.apache.spark.shuffle.FetchFailedException: Too large frame: 2500596250 at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:357) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) ... Is there a concrete solution to this problem?. Please help out. Best, Ashutosh