Its happenning in the executor
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
# Executing /bin/sh -c "kill -9 25800"...
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
I might have missed it but can you tell if the OOM is happening in driver
or executor ? Also it would be good if you can post the actual exception.
On Tue 5 Jun, 2018, 1:55 PM Nicolas Paris, wrote:
> IMO your json cannot be read in parallell at all then spark only offers
> you
> to play again
IMO your json cannot be read in parallell at all then spark only offers you
to play again with memory.
I d'say at one step it has to feet in both one executor and in the driver.
I d'try something like 20GB for both driver and executors and by using
dynamic amount of executor in order to then
Yes I would say thats the first thing that i tried. thing is even though i
provide more num executor and more memory to each, this process gets OOM in
only one task which is stuck and unfinished.
I dont think its splitting the load to other tasks.
I had 11 blocks on that file i stored in hdfs
have you played with driver/executor memory configuration ?
Increasing them should avoid OOM
2018-06-05 22:30 GMT+02:00 raksja :
> Agreed, gzip or non splittable, the question that i have and examples i
> have
> posted above all are referring to non compressed file. A single json file
> with
Agreed, gzip or non splittable, the question that i have and examples i have
posted above all are referring to non compressed file. A single json file
with Array of objects in a continuous line.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Yes its in right format as we are able to process that in python.
Also I agree that JSONL would work when split that
[{},{},...]
array of objects into something like this
{}
{}
{}
But since i get the data from another system like this i cannot control, my
question is whether its possible
If it’s one 33mb file which decompressed to 1.5g then there is also a
chance you need to split the inputs since gzip is a non-splittable
compression format.
On Tue, Jun 5, 2018 at 11:55 AM Anastasios Zouzias
wrote:
> Are you sure that your JSON file has the right format?
>
>
Are you sure that your JSON file has the right format?
spark.read.json(...) expects a file where *each line is a json object*.
My wild guess is that
val hdf=spark.read.json("/user/tmp/hugedatafile")
hdf.show(2) or hdf.take(1) gives OOM
tries to fetch all the data into the driver. Can you
I have a json file which is a continuous array of objects of similar type
[{},{}...] for about 1.5GB uncompressed and 33MB gzip compressed.
This is uploaded hugedatafile to hdfs and this is not a JSONL file, its a
whole regular json file.
[{"id":"1","entityMetadata":{"lastChange":"2018-05-11
10 matches
Mail list logo