How many iterations are you doing on the data? Like Jörn said, you don't
necessarily need a billion samples for linear regression.

On Tue, Aug 22, 2017 at 6:28 PM, Sea aj <saj3...@gmail.com> wrote:

> Jorn,
>
> My question is not about the model type but instead, the spark capability
> on reusing any already trained ml model in training a new model.
>
>
>
>
> On Tue, Aug 22, 2017 at 1:13 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>
>> Is it really required to have one billion samples for just linear
>> regression? Probably your model would do equally well with much less
>> samples. Have you checked bias and variance if you use much less random
>> samples?
>>
>> On 22. Aug 2017, at 12:58, Sea aj <saj3...@gmail.com> wrote:
>>
>> I have a large dataframe of 1 billion rows of type LabeledPoint. I tried
>> to train a linear regression model on the df but it failed due to lack of
>> memory although I'm using 9 slaves, each with 100gb of ram and 16 cores of
>> CPU.
>>
>> I decided to split my data into multiple chunks and train the model in
>> multiple phases but I learned the linear regression model in ml library
>> does not have "setinitialmodel" function to be able to pass the trained
>> model from one chunk to the rest of chunks. In another word, each time I
>> call the fit function over a chunk of my data, it overwrites the previous
>> mode.
>>
>> So far the only solution I found is using Spark Streaming to be able to
>> split the data to multiple dfs and then train over each individually to
>> overcome memory issue.
>>
>> Do you know if there's any other solution?
>>
>>
>>
>>
>> On Mon, Jul 10, 2017 at 7:57 AM, Jayant Shekhar <jayantbaya...@gmail.com>
>> wrote:
>>
>>> Hello Mahesh,
>>>
>>> We have built one. You can download from here :
>>> https://www.sparkflows.io/download
>>>
>>> Feel free to ping me for any questions, etc.
>>>
>>> Best Regards,
>>> Jayant
>>>
>>>
>>> On Sun, Jul 9, 2017 at 9:35 PM, Mahesh Sawaiker <
>>> mahesh_sawai...@persistent.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> 1) Is anyone aware of any workbench kind of tool to run ML jobs in
>>>> spark. Specifically is the tool  could be something like a Web application
>>>> that is configured to connect to a spark cluster.
>>>>
>>>>
>>>> User is able to select input training sets probably from hdfs , train
>>>> and then run predictions, without having to write any Scala code.
>>>>
>>>>
>>>> 2) If there is not tool, is there value in having such tool, what could
>>>> be the challenges.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mahesh
>>>>
>>>>
>>>> DISCLAIMER
>>>> ==========
>>>> This e-mail may contain privileged and confidential information which
>>>> is the property of Persistent Systems Ltd. It is intended only for the use
>>>> of the individual or entity to which it is addressed. If you are not the
>>>> intended recipient, you are not authorized to read, retain, copy, print,
>>>> distribute or use this message. If you have received this communication in
>>>> error, please notify the sender and delete all copies of this message.
>>>> Persistent Systems Ltd. does not accept any liability for virus infected
>>>> mails.
>>>>
>>>
>>>
>>
>


-- 
Cheers!

Reply via email to