Hi, I change my process flow.
Now I am processing a file per hour, instead of process at the end of the
day.
This decreased the memory comsuption .
Regards
Eduardo
On Thu, Mar 26, 2015 at 3:16 PM, Davies Liu dav...@databricks.com wrote:
Could you narrow down to a step which cause the
the last try was without log2.cache() and still getting out of memory
I using the following conf, maybe help:
conf = (SparkConf()
.setAppName(LoadS3)
.set(spark.executor.memory, 13g)
.set(spark.driver.memory, 13g)
.set(spark.driver.maxResultSize,2g)
I running on ec2 :
1 Master : 4 CPU 15 GB RAM (2 GB swap)
2 Slaves 4 CPU 15 GB RAM
the uncompressed dataset size is 15 GB
On Thu, Mar 26, 2015 at 10:41 AM, Eduardo Cusa
eduardo.c...@usmediaconsulting.com wrote:
Hi Davies, I upgrade to 1.3.0 and still getting Out of Memory.
I ran the
Could you try to remove the line `log2.cache()` ?
On Thu, Mar 26, 2015 at 10:02 AM, Eduardo Cusa
eduardo.c...@usmediaconsulting.com wrote:
I running on ec2 :
1 Master : 4 CPU 15 GB RAM (2 GB swap)
2 Slaves 4 CPU 15 GB RAM
the uncompressed dataset size is 15 GB
On Thu, Mar 26, 2015
Could you narrow down to a step which cause the OOM, something like:
log2= self.sqlContext.jsonFile(path)
log2.count()
...
out.count()
...
On Thu, Mar 26, 2015 at 10:34 AM, Eduardo Cusa
eduardo.c...@usmediaconsulting.com wrote:
the last try was without log2.cache() and still getting out of
Hi Davies, I upgrade to 1.3.0 and still getting Out of Memory.
I ran the same code as before, I need to make any changes?
On Wed, Mar 25, 2015 at 4:00 PM, Davies Liu dav...@databricks.com wrote:
With batchSize = 1, I think it will become even worse.
I'd suggest to go with 1.3, have a
Hi Davies, I running 1.1.0.
Now I'm following this thread that recommend use batchsize parameter = 1
http://apache-spark-user-list.1001560.n3.nabble.com/pySpark-memory-usage-td3022.html
if this does not work I will install 1.2.1 or 1.3
Regards
On Wed, Mar 25, 2015 at 3:39 PM, Davies
What's the version of Spark you are running?
There is a bug in SQL Python API [1], it's fixed in 1.2.1 and 1.3,
[1] https://issues.apache.org/jira/browse/SPARK-6055
On Wed, Mar 25, 2015 at 10:33 AM, Eduardo Cusa
eduardo.c...@usmediaconsulting.com wrote:
Hi Guys, I running the following
With batchSize = 1, I think it will become even worse.
I'd suggest to go with 1.3, have a taste for the new DataFrame API.
On Wed, Mar 25, 2015 at 11:49 AM, Eduardo Cusa
eduardo.c...@usmediaconsulting.com wrote:
Hi Davies, I running 1.1.0.
Now I'm following this thread that recommend use