Its happenning in the executor
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
# Executing /bin/sh -c "kill -9 25800"...
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Yes I would say thats the first thing that i tried. thing is even though i
provide more num executor and more memory to each, this process gets OOM in
only one task which is stuck and unfinished.
I dont think its splitting the load to other tasks.
I had 11 blocks on that file i stored in hdfs
Agreed, gzip or non splittable, the question that i have and examples i have
posted above all are referring to non compressed file. A single json file
with Array of objects in a continuous line.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Yes its in right format as we are able to process that in python.
Also I agree that JSONL would work when split that
[{},{},...]
array of objects into something like this
{}
{}
{}
But since i get the data from another system like this i cannot control, my
question is whether its possible
I have a json file which is a continuous array of objects of similar type
[{},{}...] for about 1.5GB uncompressed and 33MB gzip compressed.
This is uploaded hugedatafile to hdfs and this is not a JSONL file, its a
whole regular json file.
[{"id":"1","entityMetadata":{"lastChange":"2018-05-11
ok, when to use what?
do you have any recommendation?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
When you mean spark uses, did you meant this
https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala?
InProcessLauncher would just start a subprocess as you mentioned earlier.
How about this, does this makes a rest api call to
thanks for the reply.
Have you tried submit a spark job directly to Yarn using YarnClient.
https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/client/api/YarnClient.html
Not sure whether its performant and scalable?
--
Sent from:
Hi Marcelo,
I'm facing same issue when making spark-submits from an ec2 instance and
reaching native memory limit sooner. we have the #1, but we are still in
spark 2.1.0, couldnt try #2.
So InProcessLauncher wouldnt use the native memory, so will it overload the
mem of parent process?
Is