Re: Training A ML Model on a Huge Dataframe

2017-08-24 Thread Yanbo Liang
Hi Sea, Could you let us know which ML algorithm you use? What's the number instances and dimension of your dataset? AFAIK, Spark MLlib can train model with several millions of feature if you configure it correctly. Thanks Yanbo On Thu, Aug 24, 2017 at 7:07 AM, Suzen, Mehmet

Re: Training A ML Model on a Huge Dataframe

2017-08-23 Thread Suzen, Mehmet
SGD is supported. I see I assumed you were using Scala. Looks like you can do streaming regression, not sure of pyspark API though: https://spark.apache.org/docs/latest/mllib-linear-methods.html#streaming-linear-regression On 23 August 2017 at 18:22, Sea aj wrote: > Thanks

Re: Training A ML Model on a Huge Dataframe

2017-08-23 Thread Sea aj
Thanks for the reply. As far as I understood mini batch is not yet supported in ML libarary. As for MLLib minibatch, I could not find any pyspark api. Sent with Mailtrack On Wed, Aug 23, 2017 at

Re: Training A ML Model on a Huge Dataframe

2017-08-23 Thread Suzen, Mehmet
It depends on what model you would like to train but models requiring optimisation could use SGD with mini batches. See: https://spark.apache.org/docs/latest/mllib-optimization.html#stochastic-gradient-descent-sgd On 23 August 2017 at 14:27, Sea aj wrote: > Hi, > > I am

Training A ML Model on a Huge Dataframe

2017-08-23 Thread Sea aj
Hi, I am trying to feed a huge dataframe to a ml algorithm in Spark but it crashes due to the shortage of memory. Is there a way to train the model on a subset of the data in multiple steps? Thanks Sent with Mailtrack