Hi Sea,

Could you let us know which ML algorithm you use? What's the number
instances and dimension of your dataset?
AFAIK, Spark MLlib can train model with several millions of feature if you
configure it correctly.

Thanks
Yanbo

On Thu, Aug 24, 2017 at 7:07 AM, Suzen, Mehmet <su...@acm.org> wrote:

> SGD is supported. I see I assumed you were using Scala. Looks like you can
> do streaming regression, not sure of pyspark API though:
>
> https://spark.apache.org/docs/latest/mllib-linear-methods.
> html#streaming-linear-regression
>
> On 23 August 2017 at 18:22, Sea aj <saj3...@gmail.com> wrote:
>
>> Thanks for the reply.
>>
>> As far as I understood mini batch is not yet supported in ML libarary. As
>> for MLLib minibatch, I could not find any pyspark api.
>>
>>
>>
>> <https://mailtrack.io/> Sent with Mailtrack
>> <https://mailtrack.io/install?source=signature&lang=en&referral=saj3...@gmail.com&idSignature=22>
>>
>> On Wed, Aug 23, 2017 at 2:59 PM, Suzen, Mehmet <su...@acm.org> wrote:
>>
>>> It depends on what model you would like to train but models requiring
>>> optimisation could use SGD with mini batches. See:
>>> https://spark.apache.org/docs/latest/mllib-optimization.html
>>> #stochastic-gradient-descent-sgd
>>>
>>> On 23 August 2017 at 14:27, Sea aj <saj3...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to feed a huge dataframe to a ml algorithm in Spark but it
>>>> crashes due to the shortage of memory.
>>>>
>>>> Is there a way to train the model on a subset of the data in multiple
>>>> steps?
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> <https://mailtrack.io/> Sent with Mailtrack
>>>> <https://mailtrack.io/install?source=signature&lang=en&referral=saj3...@gmail.com&idSignature=22>
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Mehmet Süzen, MSc, PhD
>>> <su...@acm.org>
>>>
>>> | PRIVILEGED AND CONFIDENTIAL COMMUNICATION This e-mail transmission,
>>> and any documents, files or previous e-mail messages attached to it, may
>>> contain confidential information that is legally privileged. If you are not
>>> the intended recipient or a person responsible for delivering it to the
>>> intended recipient, you are hereby notified that any disclosure, copying,
>>> distribution or use of any of the information contained in or attached to
>>> this transmission is STRICTLY PROHIBITED within the applicable law. If you
>>> have received this transmission in error, please: (1) immediately notify me
>>> by reply e-mail to su...@acm.org,  and (2) destroy the original
>>> transmission and its attachments without reading or saving in any manner. |
>>>
>>
>>
>
>
> --
>
> Mehmet Süzen, MSc, PhD
> <su...@acm.org>
>
> | PRIVILEGED AND CONFIDENTIAL COMMUNICATION This e-mail transmission, and
> any documents, files or previous e-mail messages attached to it, may
> contain confidential information that is legally privileged. If you are not
> the intended recipient or a person responsible for delivering it to the
> intended recipient, you are hereby notified that any disclosure, copying,
> distribution or use of any of the information contained in or attached to
> this transmission is STRICTLY PROHIBITED within the applicable law. If you
> have received this transmission in error, please: (1) immediately notify me
> by reply e-mail to su...@acm.org,  and (2) destroy the original
> transmission and its attachments without reading or saving in any manner. |
>

Reply via email to