Re: python : Out of memory: Kill process

2015-03-30 Thread Eduardo Cusa
Hi, I change my process flow. Now I am processing a file per hour, instead of process at the end of the day. This decreased the memory comsuption . Regards Eduardo On Thu, Mar 26, 2015 at 3:16 PM, Davies Liu dav...@databricks.com wrote: Could you narrow down to a step which cause the

Re: python : Out of memory: Kill process

2015-03-26 Thread Eduardo Cusa
the last try was without log2.cache() and still getting out of memory I using the following conf, maybe help: conf = (SparkConf() .setAppName(LoadS3) .set(spark.executor.memory, 13g) .set(spark.driver.memory, 13g) .set(spark.driver.maxResultSize,2g)

Re: python : Out of memory: Kill process

2015-03-26 Thread Eduardo Cusa
I running on ec2 : 1 Master : 4 CPU 15 GB RAM (2 GB swap) 2 Slaves 4 CPU 15 GB RAM the uncompressed dataset size is 15 GB On Thu, Mar 26, 2015 at 10:41 AM, Eduardo Cusa eduardo.c...@usmediaconsulting.com wrote: Hi Davies, I upgrade to 1.3.0 and still getting Out of Memory. I ran the

Re: python : Out of memory: Kill process

2015-03-26 Thread Davies Liu
Could you try to remove the line `log2.cache()` ? On Thu, Mar 26, 2015 at 10:02 AM, Eduardo Cusa eduardo.c...@usmediaconsulting.com wrote: I running on ec2 : 1 Master : 4 CPU 15 GB RAM (2 GB swap) 2 Slaves 4 CPU 15 GB RAM the uncompressed dataset size is 15 GB On Thu, Mar 26, 2015

Re: python : Out of memory: Kill process

2015-03-26 Thread Davies Liu
Could you narrow down to a step which cause the OOM, something like: log2= self.sqlContext.jsonFile(path) log2.count() ... out.count() ... On Thu, Mar 26, 2015 at 10:34 AM, Eduardo Cusa eduardo.c...@usmediaconsulting.com wrote: the last try was without log2.cache() and still getting out of

Re: python : Out of memory: Kill process

2015-03-26 Thread Eduardo Cusa
Hi Davies, I upgrade to 1.3.0 and still getting Out of Memory. I ran the same code as before, I need to make any changes? On Wed, Mar 25, 2015 at 4:00 PM, Davies Liu dav...@databricks.com wrote: With batchSize = 1, I think it will become even worse. I'd suggest to go with 1.3, have a

Re: python : Out of memory: Kill process

2015-03-25 Thread Eduardo Cusa
Hi Davies, I running 1.1.0. Now I'm following this thread that recommend use batchsize parameter = 1 http://apache-spark-user-list.1001560.n3.nabble.com/pySpark-memory-usage-td3022.html if this does not work I will install 1.2.1 or 1.3 Regards On Wed, Mar 25, 2015 at 3:39 PM, Davies

Re: python : Out of memory: Kill process

2015-03-25 Thread Davies Liu
What's the version of Spark you are running? There is a bug in SQL Python API [1], it's fixed in 1.2.1 and 1.3, [1] https://issues.apache.org/jira/browse/SPARK-6055 On Wed, Mar 25, 2015 at 10:33 AM, Eduardo Cusa eduardo.c...@usmediaconsulting.com wrote: Hi Guys, I running the following

Re: python : Out of memory: Kill process

2015-03-25 Thread Davies Liu
With batchSize = 1, I think it will become even worse. I'd suggest to go with 1.3, have a taste for the new DataFrame API. On Wed, Mar 25, 2015 at 11:49 AM, Eduardo Cusa eduardo.c...@usmediaconsulting.com wrote: Hi Davies, I running 1.1.0. Now I'm following this thread that recommend use