IMO your json cannot be read in parallell at all then spark only offers you to play again with memory.
I d'say at one step it has to feet in both one executor and in the driver. I d'try something like 20GB for both driver and executors and by using dynamic amount of executor in order to then repartition that fat json. 2018-06-05 22:40 GMT+02:00 raksja <shanmugkr...@gmail.com>: > Yes I would say thats the first thing that i tried. thing is even though i > provide more num executor and more memory to each, this process gets OOM in > only one task which is stuck and unfinished. > > I dont think its splitting the load to other tasks. > > I had 11 blocks on that file i stored in hdfs and i got 11 partitions in my > dataframe, when i did show(1), it spinned up 11 tasks, 10 passed quickly 1 > stuck and oom. > > Also i repartitioned to 1000 and that didnt help either. > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >