Re: Dataframe from 1.5G json (non JSONL)

Nicolas Paris Tue, 05 Jun 2018 13:56:31 -0700

IMO your json cannot be read in parallell at all  then spark only offers you
to play again with memory.


I d'say at one step it has to feet in both one executor and in the driver.
I d'try something like 20GB for both driver and executors and by using
dynamic amount of executor in order to then repartition that fat json.




2018-06-05 22:40 GMT+02:00 raksja <shanmugkr...@gmail.com>:

> Yes I would say thats the first thing that i tried. thing is even though i
> provide more num executor and more memory to each, this process gets OOM in
> only one task which is stuck and unfinished.
>
> I dont think its splitting the load to other tasks.
>
> I had 11 blocks on that file i stored in hdfs and i got 11 partitions in my
> dataframe, when i did show(1), it spinned up 11 tasks, 10 passed quickly 1
> stuck and oom.
>
> Also i repartitioned to 1000 and that didnt help either.
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: Dataframe from 1.5G json (non JSONL)

Reply via email to