Good point; I agree that defaulting to online SGD (single example per iteration) would be a poor UX due to performance.
On Fri, Aug 7, 2015 at 12:44 PM, Meihua Wu <rotationsymmetr...@gmail.com> wrote: > Feynman, thanks for clarifying. > > If we default miniBatchFraction = (1 / numInstances), then we will > only hit one row for every iteration of SGD regardless the number of > partitions and executors. In other words the parallelism provided by > the RDD is lost in this approach. I think this is something we need to > consider for the default value of miniBatchFraction. > > On Fri, Aug 7, 2015 at 11:24 AM, Feynman Liang <fli...@databricks.com> > wrote: > > Yep, I think that's what Gerald is saying and they are proposing to > default > > miniBatchFraction = (1 / numInstances). Is that correct? > > > > On Fri, Aug 7, 2015 at 11:16 AM, Meihua Wu <rotationsymmetr...@gmail.com > > > > wrote: > >> > >> I think in the SGD algorithm, the mini batch sample is done without > >> replacement. So with fraction=1, then all the rows will be sampled > >> exactly once to form the miniBatch, resulting to the > >> deterministic/classical case. > >> > >> On Fri, Aug 7, 2015 at 9:05 AM, Feynman Liang <fli...@databricks.com> > >> wrote: > >> > Sounds reasonable to me, feel free to create a JIRA (and PR if you're > up > >> > for > >> > it) so we can see what others think! > >> > > >> > On Fri, Aug 7, 2015 at 1:45 AM, Gerald Loeffler > >> > <gerald.loeff...@googlemail.com> wrote: > >> >> > >> >> hi, > >> >> > >> >> if new LinearRegressionWithSGD() uses a miniBatchFraction of 1.0, > >> >> doesn’t that make it a deterministic/classical gradient descent > rather > >> >> than a SGD? > >> >> > >> >> Specifically, miniBatchFraction=1.0 means the entire data set, i.e. > >> >> all rows. In the spirit of SGD, shouldn’t the default be the fraction > >> >> that results in exactly one row of the data set? > >> >> > >> >> thank you > >> >> gerald > >> >> > >> >> -- > >> >> Gerald Loeffler > >> >> mailto:gerald.loeff...@googlemail.com > >> >> http://www.gerald-loeffler.net > >> >> > >> >> --------------------------------------------------------------------- > >> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> >> For additional commands, e-mail: user-h...@spark.apache.org > >> >> > >> > > > > > >