from:"Konstantinos Kougios"

Re: sc.parallelize(512k items) doesn't always use 64 executors

2015-07-30 Thread Konstantinos Kougios

yes,thanks, that sorted out the issue. On 30/07/15 09:26, Akhil Das wrote: sc.parallelize takes a second parameter which is the total number of partitions, are you using that? Thanks Best Regards On Wed, Jul 29, 2015 at 9:27 PM, Kostas Kougios kostas.koug...@googlemail.com

Re: RECEIVED SIGNAL 15: SIGTERM

2015-07-13 Thread Konstantinos Kougios

yes YARN was terminating the executor because the off heap memory limit was exceeded. On 13/07/15 06:55, Ruslan Dautkhanov wrote: the executor receives a SIGTERM (from whom???) From YARN Resource Manager. Check if yarn fair scheduler preemption and/or speculative execution are turned on,

Re: RECEIVED SIGNAL 15: SIGTERM

2015-07-13 Thread Konstantinos Kougios

it was the memoryOverhead. It runs ok with more of that, but do you know which libraries could affect this? I find it strange that it needs 4g for a task that processes some xml files. The task themselfs require less Xmx. Cheers On 13/07/15 06:29, Jong Wook Kim wrote: Based on my

Re: RECEIVED SIGNAL 15: SIGTERM

2015-07-13 Thread Konstantinos Kougios

of memory. e.g. the billion laughs xml: https://en.wikipedia.org/wiki/Billion_laughs -Ewan On 13/07/15 10:11, Konstantinos Kougios wrote: it was the memoryOverhead. It runs ok with more of that, but do you know which libraries could affect this? I find it strange that it needs 4g for a task

Re: is it possible to disable -XX:OnOutOfMemoryError=kill %p for the executors?

2015-07-08 Thread Konstantinos Kougios

seems you're correct: 2015-07-07 17:21:27,245 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=38506,containerID=container_1436262805092_0022_01_03] is running be yond virtual memory limits. Current usage: 4.3 GB of 4.5 GB

binaryFiles() for 1 million files, too much memory required

2015-07-01 Thread Konstantinos Kougios

Once again I am trying to read a directory tree using binary files. My directory tree has a root dir ROOTDIR and subdirs where the files are located, i.e. ROOTDIR/1 ROOTDIR/2 ROOTDIR/.. ROOTDIR/100 A total of 1 mil files split into 100 sub dirs Using binaryFiles requires too much memory on

Re: spark uses too much memory maybe (binaryFiles() with more than 1 million files in HDFS), groupBy or reduceByKey()

2015-06-11 Thread Konstantinos Kougios

Hi Marchelo, The collected data are collected in say class C. c.id is the id of each of those data. But that id might appear more than once in those 1mil xml files, so I am doing a reduceByKey(). Even if I had multiple binaryFile RDD's, wouldn't I have to ++ those in order to correctly

Re: spark uses too much memory maybe (binaryFiles() with more than 1 million files in HDFS), groupBy or reduceByKey()

2015-06-11 Thread Konstantinos Kougios

Now I am profiling the executor. There seems to be a memory leak. 20 mins after the run there were: 157k byte[] allocated for 75MB. 519k java.lang.ref.Finalizer for 31MB, 291k java.util.zip.Inflater for 17MB 487k java.util.zip.ZStreamRef for 11MB An hour after the run I got : 186k byte[]

Re: spark uses too much memory maybe (binaryFiles() with more than 1 million files in HDFS), groupBy or reduceByKey()

2015-06-11 Thread Konstantinos Kougios

:01, Konstantinos Kougios wrote: Now I am profiling the executor. There seems to be a memory leak. 20 mins after the run there were: 157k byte[] allocated for 75MB. 519k java.lang.ref.Finalizer for 31MB, 291k java.util.zip.Inflater for 17MB 487k java.util.zip.ZStreamRef for 11MB An hour after

Re: spark uses too much memory maybe (binaryFiles() with more than 1 million files in HDFS), groupBy or reduceByKey()

2015-06-11 Thread Konstantinos Kougios

after 2h of running, now I got a 10GB long[], 1.3mil instances of long[] So probably information about the files again. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail:

Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

2015-06-08 Thread Konstantinos Kougios

Thanks, did that and now I am getting an out of memory. But I am not sure where this occurs. It can't be on the spark executor as I have 28GB allocated to it. It is not the driver because I run this locally and monitor it via jvisualvm. Unfortunately I can't jmx-monitor hadoop. From the

Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

2015-06-08 Thread Konstantinos Kougios

No luck I am afraid. After giving the namenode 16GB of RAM, I am still getting an out of mem exception, kind of different one: 15/06/08 15:35:52 ERROR yarn.ApplicationMaster: User class threw exception: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded at

Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

2015-06-08 Thread Konstantinos Kougios

:///path/to/files/*).count() in the spark-shell and verify that part works? Ewan -Original Message- From: Konstantinos Kougios [mailto:kostas.koug...@googlemail.com] Sent: 08 June 2015 15:40 To: Ewan Leith; user@spark.apache.org Subject: Re: spark timesout maybe due to binaryFiles

Re: sc.parallelize(512k items) doesn't always use 64 executors

Re: RECEIVED SIGNAL 15: SIGTERM

Re: RECEIVED SIGNAL 15: SIGTERM

Re: RECEIVED SIGNAL 15: SIGTERM

Re: is it possible to disable -XX:OnOutOfMemoryError=kill %p for the executors?

binaryFiles() for 1 million files, too much memory required

Re: spark uses too much memory maybe (binaryFiles() with more than 1 million files in HDFS), groupBy or reduceByKey()

Re: spark uses too much memory maybe (binaryFiles() with more than 1 million files in HDFS), groupBy or reduceByKey()

Re: spark uses too much memory maybe (binaryFiles() with more than 1 million files in HDFS), groupBy or reduceByKey()

Re: spark uses too much memory maybe (binaryFiles() with more than 1 million files in HDFS), groupBy or reduceByKey()

Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

13 matches

Site Navigation

Mail list logo

Footer information