subject:"spark\-local dir running out of space during long ALS run"

Re: spark-local dir running out of space during long ALS run

2015-02-16 Thread Davies Liu

For the last question, you can trigger GC in JVM from Python by :

sc._jvm.System.gc()

On Mon, Feb 16, 2015 at 4:08 PM, Antony Mayi
 wrote:
> thanks, that looks promissing but can't find any reference giving me more
> details - can you please point me to something? Also is it possible to force
> GC from pyspark (as I am using pyspark)?
>
> thanks,
> Antony.
>
>
> On Monday, 16 February 2015, 21:05, Tathagata Das
>  wrote:
>
>
>
> Correct, brute force clean up is not useful. Since Spark 1.0, Spark can do
> automatic cleanup of files based on which RDDs are used/garbage collected by
> JVM. That would be the best way, but depends on the JVM GC characteristics.
> If you force a GC periodically in the driver that might help you get rid of
> files in the workers that are not needed.
>
> TD
>
> On Mon, Feb 16, 2015 at 12:27 AM, Antony Mayi 
> wrote:
>
> spark.cleaner.ttl is not the right way - seems to be really designed for
> streaming. although it keeps the disk usage under control it also causes
> loss of rdds and broadcasts that are required later leading to crash.
>
> is there any other way?
> thanks,
> Antony.
>
>
> On Sunday, 15 February 2015, 21:42, Antony Mayi 
> wrote:
>
>
>
> spark.cleaner.ttl ?
>
>
> On Sunday, 15 February 2015, 18:23, Antony Mayi 
> wrote:
>
>
>
> Hi,
>
> I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using
> about 3 billions of ratings and I am doing several trainImplicit() runs in
> loop within one spark session. I have four node cluster with 3TB disk space
> on each. before starting the job there is less then 8% of the disk space
> used. while the ALS is running I can see the disk usage rapidly growing
> mainly because of files being stored under
> yarn/local/usercache/user/appcache/application_XXX_YYY/spark-local-ZZZ-AAA.
> after about 10 hours the disk usage hits 90% and yarn kills the particular
> containers.
>
> am I missing doing some cleanup somewhere while looping over the several
> trainImplicit() calls? taking 4*3TB of disk space seems immense.
>
> thanks for any help,
> Antony.
>
>
>
>
>
>
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: spark-local dir running out of space during long ALS run

2015-02-16 Thread Antony Mayi

thanks, that looks promissing but can't find any reference giving me more 
details - can you please point me to something? Also is it possible to force GC 
from pyspark (as I am using pyspark)?
thanks,Antony. 

 On Monday, 16 February 2015, 21:05, Tathagata Das 
 wrote:
   
 

 Correct, brute force clean up is not useful. Since Spark 1.0, Spark can do 
automatic cleanup of files based on which RDDs are used/garbage collected by 
JVM. That would be the best way, but depends on the JVM GC characteristics. If 
you force a GC periodically in the driver that might help you get rid of files 
in the workers that are not needed.
TD
On Mon, Feb 16, 2015 at 12:27 AM, Antony Mayi  
wrote:

spark.cleaner.ttl is not the right way - seems to be really designed for 
streaming. although it keeps the disk usage under control it also causes loss 
of rdds and broadcasts that are required later leading to crash.
is there any other way?thanks,Antony. 

 On Sunday, 15 February 2015, 21:42, Antony Mayi  
wrote:
   
 

 spark.cleaner.ttl ? 

 On Sunday, 15 February 2015, 18:23, Antony Mayi  
wrote:
   
 

 Hi,
I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using about 
3 billions of ratings and I am doing several trainImplicit() runs in loop 
within one spark session. I have four node cluster with 3TB disk space on each. 
before starting the job there is less then 8% of the disk space used. while the 
ALS is running I can see the disk usage rapidly growing mainly because of files 
being stored under 
yarn/local/usercache/user/appcache/application_XXX_YYY/spark-local-ZZZ-AAA. 
after about 10 hours the disk usage hits 90% and yarn kills the particular 
containers.
am I missing doing some cleanup somewhere while looping over the several 
trainImplicit() calls? taking 4*3TB of disk space seems immense.
thanks for any help,Antony.

Re: spark-local dir running out of space during long ALS run

2015-02-16 Thread Tathagata Das

Correct, brute force clean up is not useful. Since Spark 1.0, Spark can do
automatic cleanup of files based on which RDDs are used/garbage collected
by JVM. That would be the best way, but depends on the JVM GC
characteristics. If you force a GC periodically in the driver that might
help you get rid of files in the workers that are not needed.

TD

On Mon, Feb 16, 2015 at 12:27 AM, Antony Mayi 
wrote:

> spark.cleaner.ttl is not the right way - seems to be really designed for
> streaming. although it keeps the disk usage under control it also causes
> loss of rdds and broadcasts that are required later leading to crash.
>
> is there any other way?
> thanks,
> Antony.
>
>
>   On Sunday, 15 February 2015, 21:42, Antony Mayi 
> wrote:
>
>
>
> spark.cleaner.ttl ?
>
>
>   On Sunday, 15 February 2015, 18:23, Antony Mayi 
> wrote:
>
>
>
> Hi,
>
> I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using
> about 3 billions of ratings and I am doing several trainImplicit() runs in
> loop within one spark session. I have four node cluster with 3TB disk space
> on each. before starting the job there is less then 8% of the disk space
> used. while the ALS is running I can see the disk usage rapidly growing
> mainly because of files being stored
> under 
> yarn/local/usercache/user/appcache/application_XXX_YYY/spark-local-ZZZ-AAA.
> after about 10 hours the disk usage hits 90% and yarn kills the particular
> containers.
>
> am I missing doing some cleanup somewhere while looping over the several
> trainImplicit() calls? taking 4*3TB of disk space seems immense.
>
> thanks for any help,
> Antony.
>
>
>
>
>
>

Re: spark-local dir running out of space during long ALS run

2015-02-16 Thread Antony Mayi

spark.cleaner.ttl is not the right way - seems to be really designed for 
streaming. although it keeps the disk usage under control it also causes loss 
of rdds and broadcasts that are required later leading to crash.
is there any other way?thanks,Antony. 

 On Sunday, 15 February 2015, 21:42, Antony Mayi  
wrote:
   
 

 spark.cleaner.ttl ? 

 On Sunday, 15 February 2015, 18:23, Antony Mayi  
wrote:
   
 

 Hi,
I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using about 
3 billions of ratings and I am doing several trainImplicit() runs in loop 
within one spark session. I have four node cluster with 3TB disk space on each. 
before starting the job there is less then 8% of the disk space used. while the 
ALS is running I can see the disk usage rapidly growing mainly because of files 
being stored under 
yarn/local/usercache/user/appcache/application_XXX_YYY/spark-local-ZZZ-AAA. 
after about 10 hours the disk usage hits 90% and yarn kills the particular 
containers.
am I missing doing some cleanup somewhere while looping over the several 
trainImplicit() calls? taking 4*3TB of disk space seems immense.
thanks for any help,Antony.

Re: spark-local dir running out of space during long ALS run

2015-02-15 Thread Antony Mayi

spark.cleaner.ttl ? 

 On Sunday, 15 February 2015, 18:23, Antony Mayi  
wrote:
   
 

 Hi,
I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using about 
3 billions of ratings and I am doing several trainImplicit() runs in loop 
within one spark session. I have four node cluster with 3TB disk space on each. 
before starting the job there is less then 8% of the disk space used. while the 
ALS is running I can see the disk usage rapidly growing mainly because of files 
being stored under 
yarn/local/usercache/user/appcache/application_XXX_YYY/spark-local-ZZZ-AAA. 
after about 10 hours the disk usage hits 90% and yarn kills the particular 
containers.
am I missing doing some cleanup somewhere while looping over the several 
trainImplicit() calls? taking 4*3TB of disk space seems immense.
thanks for any help,Antony.

spark-local dir running out of space during long ALS run

2015-02-15 Thread Antony Mayi

Hi,
I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using about 
3 billions of ratings and I am doing several trainImplicit() runs in loop 
within one spark session. I have four node cluster with 3TB disk space on each. 
before starting the job there is less then 8% of the disk space used. while the 
ALS is running I can see the disk usage rapidly growing mainly because of files 
being stored under 
yarn/local/usercache/user/appcache/application_XXX_YYY/spark-local-ZZZ-AAA. 
after about 10 hours the disk usage hits 90% and yarn kills the particular 
containers.
am I missing doing some cleanup somewhere while looping over the several 
trainImplicit() calls? taking 4*3TB of disk space seems immense.
thanks for any help,Antony.

Re: spark-local dir running out of space during long ALS run

Re: spark-local dir running out of space during long ALS run

Re: spark-local dir running out of space during long ALS run

Re: spark-local dir running out of space during long ALS run

Re: spark-local dir running out of space during long ALS run

spark-local dir running out of space during long ALS run

6 matches

Site Navigation

Mail list logo

Footer information