Re: Spark APIs memory usage?

2015-07-19 Thread Akhil Das
This is what happens when you create a DataFrame https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L430, in your case, rdd1.values.flatMap(fun) will be executed

Re: Spark APIs memory usage?

2015-07-18 Thread Harit Vishwakarma
Even if I remove numpy calls. (no matrices loaded), Same exception is coming. Can anyone tell what createDataFrame does internally? Are there any alternatives for it? On Fri, Jul 17, 2015 at 6:43 PM, Akhil Das ak...@sigmoidanalytics.com wrote: I suspect its the numpy filling up Memory. Thanks

Re: Spark APIs memory usage?

2015-07-17 Thread Akhil Das
Can you paste the code? How much memory does your system have and how big is your dataset? Did you try df.persist(StorageLevel.MEMORY_AND_DISK)? Thanks Best Regards On Fri, Jul 17, 2015 at 5:14 PM, Harit Vishwakarma harit.vishwaka...@gmail.com wrote: Thanks, Code is running on a single

Re: Spark APIs memory usage?

2015-07-17 Thread Harit Vishwakarma
1. load 3 matrices of size ~ 1 X 1 using numpy. 2. rdd2 = rdd1.values().flatMap( fun ) # rdd1 has roughly 10^7 tuples 3. df = sqlCtx.createDataFrame(rdd2) 4. df.save() # in parquet format It throws exception in createDataFrame() call. I don't know what exactly it is creating ? everything

Re: Spark APIs memory usage?

2015-07-17 Thread Harit Vishwakarma
Thanks, Code is running on a single machine. And it still doesn't answer my question. On Fri, Jul 17, 2015 at 4:52 PM, ayan guha guha.a...@gmail.com wrote: You can bump up number of partitions while creating the rdd you are using for df On 17 Jul 2015 21:03, Harit Vishwakarma

Re: Spark APIs memory usage?

2015-07-17 Thread Akhil Das
I suspect its the numpy filling up Memory. Thanks Best Regards On Fri, Jul 17, 2015 at 5:46 PM, Harit Vishwakarma harit.vishwaka...@gmail.com wrote: 1. load 3 matrices of size ~ 1 X 1 using numpy. 2. rdd2 = rdd1.values().flatMap( fun ) # rdd1 has roughly 10^7 tuples 3. df =

Spark APIs memory usage?

2015-07-17 Thread Harit Vishwakarma
Hi, I used createDataFrame API of SqlContext in python. and getting OutOfMemoryException. I am wondering if it is creating whole dataFrame in memory? I did not find any documentation describing memory usage of Spark APIs. Documentation given is nice but little more details (specially on memory

Re: Spark APIs memory usage?

2015-07-17 Thread ayan guha
You can bump up number of partitions while creating the rdd you are using for df On 17 Jul 2015 21:03, Harit Vishwakarma harit.vishwaka...@gmail.com wrote: Hi, I used createDataFrame API of SqlContext in python. and getting OutOfMemoryException. I am wondering if it is creating whole