yes,thanks, that sorted out the issue.
On 30/07/15 09:26, Akhil Das wrote:
sc.parallelize takes a second parameter which is the total number of
partitions, are you using that?
Thanks
Best Regards
On Wed, Jul 29, 2015 at 9:27 PM, Kostas Kougios
kostas.koug...@googlemail.com
yes YARN was terminating the executor because the off heap memory limit
was exceeded.
On 13/07/15 06:55, Ruslan Dautkhanov wrote:
the executor receives a SIGTERM (from whom???)
From YARN Resource Manager.
Check if yarn fair scheduler preemption and/or speculative execution
are turned on,
it was the memoryOverhead. It runs ok with more of that, but do you know
which libraries could affect this? I find it strange that it needs 4g
for a task that processes some xml files. The task themselfs require
less Xmx.
Cheers
On 13/07/15 06:29, Jong Wook Kim wrote:
Based on my
of memory. e.g. the billion laughs xml:
https://en.wikipedia.org/wiki/Billion_laughs
-Ewan
On 13/07/15 10:11, Konstantinos Kougios wrote:
it was the memoryOverhead. It runs ok with more of that, but do you
know which libraries could affect this? I find it strange that it
needs 4g for a task
seems you're correct:
2015-07-07 17:21:27,245 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Container [pid=38506,containerID=container_1436262805092_0022_01_03]
is running be
yond virtual memory limits. Current usage: 4.3 GB of 4.5 GB
Once again I am trying to read a directory tree using binary files.
My directory tree has a root dir ROOTDIR and subdirs where the files are
located, i.e.
ROOTDIR/1
ROOTDIR/2
ROOTDIR/..
ROOTDIR/100
A total of 1 mil files split into 100 sub dirs
Using binaryFiles requires too much memory on
Hi Marchelo,
The collected data are collected in say class C. c.id is the id of each
of those data. But that id might appear more than once in those 1mil xml
files, so I am doing a reduceByKey(). Even if I had multiple binaryFile
RDD's, wouldn't I have to ++ those in order to correctly
Now I am profiling the executor.
There seems to be a memory leak.
20 mins after the run there were:
157k byte[] allocated for 75MB.
519k java.lang.ref.Finalizer for 31MB,
291k java.util.zip.Inflater for 17MB
487k java.util.zip.ZStreamRef for 11MB
An hour after the run I got :
186k byte[]
:01, Konstantinos Kougios wrote:
Now I am profiling the executor.
There seems to be a memory leak.
20 mins after the run there were:
157k byte[] allocated for 75MB.
519k java.lang.ref.Finalizer for 31MB,
291k java.util.zip.Inflater for 17MB
487k java.util.zip.ZStreamRef for 11MB
An hour after
after 2h of running, now I got a 10GB long[], 1.3mil instances of long[]
So probably information about the files again.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail:
Thanks, did that and now I am getting an out of memory. But I am not
sure where this occurs. It can't be on the spark executor as I have 28GB
allocated to it. It is not the driver because I run this locally and
monitor it via jvisualvm. Unfortunately I can't jmx-monitor hadoop.
From the
No luck I am afraid. After giving the namenode 16GB of RAM, I am still
getting an out of mem exception, kind of different one:
15/06/08 15:35:52 ERROR yarn.ApplicationMaster: User class threw
exception: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
:///path/to/files/*).count()
in the spark-shell and verify that part works?
Ewan
-Original Message-
From: Konstantinos Kougios [mailto:kostas.koug...@googlemail.com]
Sent: 08 June 2015 15:40
To: Ewan Leith; user@spark.apache.org
Subject: Re: spark timesout maybe due to binaryFiles
13 matches
Mail list logo