Re: Running ALS on comparitively large RDD

2016-03-11 Thread Deepak Gopalakrishnan
ta source are you reading > from? How much driver and executor memory have you provided to Spark? > > > > On Fri, 11 Mar 2016 at 09:21 Deepak Gopalakrishnan <dgk...@gmail.com> > wrote: > >> 1. I'm using about 1 million users against few thousand products. I >> bas

Re: Running ALS on comparitively large RDD

2016-03-10 Thread Deepak Gopalakrishnan
ngs, # users and # products) > 2. Spark cluster set up and version > > Thanks > > On Fri, 11 Mar 2016 at 05:53 Deepak Gopalakrishnan <dgk...@gmail.com> > wrote: > >> Hello All, >> >> I've been running Spark's ALS on a dataset of users and rated items

Re: Mapper side join with DataFrames API

2016-03-05 Thread Deepak Gopalakrishnan
Hello Guys, No help yet. Can someone tell me with a reply to the above question in SO ? Thanks Deepak On Fri, Mar 4, 2016 at 5:32 PM, Deepak Gopalakrishnan <dgk...@gmail.com> wrote: > Have added this to SO, can you guys share any thoughts ? > > > http://stackoverflow.com/

Re: Mapper side join with DataFrames API

2016-03-04 Thread Deepak Gopalakrishnan
ugh-memory=D=1=AFQjCNEzDJqylz5aF0998u08RGlf5YF1-g> On Thu, Mar 3, 2016 at 7:06 AM, Deepak Gopalakrishnan <dgk...@gmail.com> wrote: > Hello, > > I'm using 1.6.0 on EMR > > On Thu, Mar 3, 2016 at 12:34 AM, Yong Zhang <java8...@hotmail.com> wrote: > >> What versi

Re: Mapper side join with DataFrames API

2016-03-02 Thread Deepak Gopalakrishnan
and yet there is a spill ( as in screenshots). Any idea why ? > > Thanks > Deepak > > On Wed, Mar 2, 2016 at 5:14 AM, Michael Armbrust <mich...@databricks.com> > wrote: > > Its helpful to always include the output of df.explain(true) when you are > asking about

Fwd: Mapper side join with DataFrames API

2016-02-29 Thread Deepak Gopalakrishnan
spilling sort data. I'm a little surprised why this happens even when I have enough memory free. Any inputs will be greatly appreciated! Thanks -- Regards, *Deepak Gopalakrishnan* *Mobile*:+918891509774 *Skype* : deepakgk87 http://myexps.blogspot.com

Re: Timeout Error

2015-04-27 Thread Deepak Gopalakrishnan
Zhu zsxw...@gmail.com wrote: The configuration key should be spark.akka.askTimeout for this timeout. The time unit is seconds. Best Regards, Shixiong(Ryan) Zhu 2015-04-26 15:15 GMT-07:00 Deepak Gopalakrishnan dgk...@gmail.com: Hello, Just to add a bit more context : I have done

Timeout Error

2015-04-26 Thread Deepak Gopalakrishnan
issue. I've a r3 xlarge and 2 m3 large. Can anyone suggest a way to fix this? -- Regards, *Deepak Gopalakrishnan* *Mobile*:+918891509774 *Skype* : deepakgk87 http://myexps.blogspot.com

Re: Timeout Error

2015-04-26 Thread Deepak Gopalakrishnan
performance should be for this amount of data, but you could try to increase the timeout with the property spark.akka.timeout to see if that helps. Bryan On Sun, Apr 26, 2015 at 6:57 AM, Deepak Gopalakrishnan dgk...@gmail.com wrote: Hello All, I'm trying to process a 3.5GB file on standalone

Re: Spark timeout issue

2015-04-26 Thread Deepak Gopalakrishnan
, 2015 at 12:42 PM, Deepak Gopalakrishnan dgk...@gmail.com wrote: Hello All, I'm trying to process a 3.5GB file on standalone mode using spark. I could run my spark job succesfully on a 100MB file and it works as expected. But, when I try to run it on the 3.5GB file, I run into the below