This is what happens when you create a DataFrame
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L430,
in your case, rdd1.values.flatMap(fun) will be executed
Even if I remove numpy calls. (no matrices loaded), Same exception is
coming.
Can anyone tell what createDataFrame does internally? Are there any
alternatives for it?
On Fri, Jul 17, 2015 at 6:43 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
I suspect its the numpy filling up Memory.
Thanks
Can you paste the code? How much memory does your system have and how big
is your dataset? Did you try df.persist(StorageLevel.MEMORY_AND_DISK)?
Thanks
Best Regards
On Fri, Jul 17, 2015 at 5:14 PM, Harit Vishwakarma
harit.vishwaka...@gmail.com wrote:
Thanks,
Code is running on a single
1. load 3 matrices of size ~ 1 X 1 using numpy.
2. rdd2 = rdd1.values().flatMap( fun ) # rdd1 has roughly 10^7 tuples
3. df = sqlCtx.createDataFrame(rdd2)
4. df.save() # in parquet format
It throws exception in createDataFrame() call. I don't know what exactly it
is creating ? everything
Thanks,
Code is running on a single machine.
And it still doesn't answer my question.
On Fri, Jul 17, 2015 at 4:52 PM, ayan guha guha.a...@gmail.com wrote:
You can bump up number of partitions while creating the rdd you are using
for df
On 17 Jul 2015 21:03, Harit Vishwakarma
I suspect its the numpy filling up Memory.
Thanks
Best Regards
On Fri, Jul 17, 2015 at 5:46 PM, Harit Vishwakarma
harit.vishwaka...@gmail.com wrote:
1. load 3 matrices of size ~ 1 X 1 using numpy.
2. rdd2 = rdd1.values().flatMap( fun ) # rdd1 has roughly 10^7 tuples
3. df =
Hi,
I used createDataFrame API of SqlContext in python. and getting
OutOfMemoryException. I am wondering if it is creating whole dataFrame in
memory?
I did not find any documentation describing memory usage of Spark APIs.
Documentation given is nice but little more details (specially on memory
You can bump up number of partitions while creating the rdd you are using
for df
On 17 Jul 2015 21:03, Harit Vishwakarma harit.vishwaka...@gmail.com
wrote:
Hi,
I used createDataFrame API of SqlContext in python. and getting
OutOfMemoryException. I am wondering if it is creating whole