How many iterations are you doing on the data? Like Jörn said, you don't
necessarily need a billion samples for linear regression.
On Tue, Aug 22, 2017 at 6:28 PM, Sea aj wrote:
> Jorn,
>
> My question is not about the model type but instead, the spark capability
> on reusing
Jorn,
My question is not about the model type but instead, the spark capability
on reusing any already trained ml model in training a new model.
On Tue, Aug 22, 2017 at 1:13 PM, Jörn Franke wrote:
> Is it really required to have one billion samples for just linear
>
Is it really required to have one billion samples for just linear regression?
Probably your model would do equally well with much less samples. Have you
checked bias and variance if you use much less random samples?
> On 22. Aug 2017, at 12:58, Sea aj wrote:
>
> I have a
I have a large dataframe of 1 billion rows of type LabeledPoint. I tried to
train a linear regression model on the df but it failed due to lack of
memory although I'm using 9 slaves, each with 100gb of ram and 16 cores of
CPU.
I decided to split my data into multiple chunks and train the model in
Hello Mahesh,
We have built one. You can download from here :
https://www.sparkflows.io/download
Feel free to ping me for any questions, etc.
Best Regards,
Jayant
On Sun, Jul 9, 2017 at 9:35 PM, Mahesh Sawaiker <
mahesh_sawai...@persistent.com> wrote:
> Hi,
>
>
> 1) Is anyone aware of any
Hi,
1) Is anyone aware of any workbench kind of tool to run ML jobs in spark.
Specifically is the tool could be something like a Web application that is
configured to connect to a spark cluster.
User is able to select input training sets probably from hdfs , train and then
run predictions,