Yep, I think that's what Gerald is saying and they are proposing to default miniBatchFraction = (1 / numInstances). Is that correct?
On Fri, Aug 7, 2015 at 11:16 AM, Meihua Wu <rotationsymmetr...@gmail.com> wrote: > I think in the SGD algorithm, the mini batch sample is done without > replacement. So with fraction=1, then all the rows will be sampled > exactly once to form the miniBatch, resulting to the > deterministic/classical case. > > On Fri, Aug 7, 2015 at 9:05 AM, Feynman Liang <fli...@databricks.com> > wrote: > > Sounds reasonable to me, feel free to create a JIRA (and PR if you're up > for > > it) so we can see what others think! > > > > On Fri, Aug 7, 2015 at 1:45 AM, Gerald Loeffler > > <gerald.loeff...@googlemail.com> wrote: > >> > >> hi, > >> > >> if new LinearRegressionWithSGD() uses a miniBatchFraction of 1.0, > >> doesn’t that make it a deterministic/classical gradient descent rather > >> than a SGD? > >> > >> Specifically, miniBatchFraction=1.0 means the entire data set, i.e. > >> all rows. In the spirit of SGD, shouldn’t the default be the fraction > >> that results in exactly one row of the data set? > >> > >> thank you > >> gerald > >> > >> -- > >> Gerald Loeffler > >> mailto:gerald.loeff...@googlemail.com > >> http://www.gerald-loeffler.net > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> For additional commands, e-mail: user-h...@spark.apache.org > >> > > >