Re: UI for spark machine learning.

2017-08-24 Thread Akhil Das
How many iterations are you doing on the data? Like Jörn said, you don't necessarily need a billion samples for linear regression. On Tue, Aug 22, 2017 at 6:28 PM, Sea aj wrote: > Jorn, > > My question is not about the model type but instead, the spark capability > on reusing

Re: UI for spark machine learning.

2017-08-22 Thread Sea aj
Jorn, My question is not about the model type but instead, the spark capability on reusing any already trained ml model in training a new model. On Tue, Aug 22, 2017 at 1:13 PM, Jörn Franke wrote: > Is it really required to have one billion samples for just linear >

Re: UI for spark machine learning.

2017-08-22 Thread Jörn Franke
Is it really required to have one billion samples for just linear regression? Probably your model would do equally well with much less samples. Have you checked bias and variance if you use much less random samples? > On 22. Aug 2017, at 12:58, Sea aj wrote: > > I have a

Re: UI for spark machine learning.

2017-08-22 Thread Sea aj
I have a large dataframe of 1 billion rows of type LabeledPoint. I tried to train a linear regression model on the df but it failed due to lack of memory although I'm using 9 slaves, each with 100gb of ram and 16 cores of CPU. I decided to split my data into multiple chunks and train the model in

Re: UI for spark machine learning.

2017-07-10 Thread Jayant Shekhar
Hello Mahesh, We have built one. You can download from here : https://www.sparkflows.io/download Feel free to ping me for any questions, etc. Best Regards, Jayant On Sun, Jul 9, 2017 at 9:35 PM, Mahesh Sawaiker < mahesh_sawai...@persistent.com> wrote: > Hi, > > > 1) Is anyone aware of any

UI for spark machine learning.

2017-07-09 Thread Mahesh Sawaiker
Hi, 1) Is anyone aware of any workbench kind of tool to run ML jobs in spark. Specifically is the tool could be something like a Web application that is configured to connect to a spark cluster. User is able to select input training sets probably from hdfs , train and then run predictions,