Best Regards
On Fri, Jul 17, 2015 at 5:46 PM, Harit Vishwakarma
harit.vishwaka...@gmail.com wrote:
1. load 3 matrices of size ~ 1 X 1 using numpy.
2. rdd2 = rdd1.values().flatMap( fun ) # rdd1 has roughly 10^7 tuples
3. df = sqlCtx.createDataFrame(rdd2)
4. df.save() # in parquet
(StorageLevel.MEMORY_AND_DISK)?
Thanks
Best Regards
On Fri, Jul 17, 2015 at 5:14 PM, Harit Vishwakarma
harit.vishwaka...@gmail.com wrote:
Thanks,
Code is running on a single machine.
And it still doesn't answer my question.
On Fri, Jul 17, 2015 at 4:52 PM, ayan guha guha.a...@gmail.com wrote:
You can
Thanks,
Code is running on a single machine.
And it still doesn't answer my question.
On Fri, Jul 17, 2015 at 4:52 PM, ayan guha guha.a...@gmail.com wrote:
You can bump up number of partitions while creating the rdd you are using
for df
On 17 Jul 2015 21:03, Harit Vishwakarma harit.vishwaka
usage/ data distribution etc.) will really help.
--
Regards
Harit Vishwakarma