Feynman, thanks for clarifying.

If we default miniBatchFraction = (1 / numInstances), then we will
only hit one row for every iteration of SGD regardless the number of
partitions and executors. In other words the parallelism provided by
the RDD is lost in this approach. I think this is something we need to
consider for the default value of miniBatchFraction.

On Fri, Aug 7, 2015 at 11:24 AM, Feynman Liang <fli...@databricks.com> wrote:
> Yep, I think that's what Gerald is saying and they are proposing to default
> miniBatchFraction = (1 / numInstances). Is that correct?
>
> On Fri, Aug 7, 2015 at 11:16 AM, Meihua Wu <rotationsymmetr...@gmail.com>
> wrote:
>>
>> I think in the SGD algorithm, the mini batch sample is done without
>> replacement. So with fraction=1, then all the rows will be sampled
>> exactly once to form the miniBatch, resulting to the
>> deterministic/classical case.
>>
>> On Fri, Aug 7, 2015 at 9:05 AM, Feynman Liang <fli...@databricks.com>
>> wrote:
>> > Sounds reasonable to me, feel free to create a JIRA (and PR if you're up
>> > for
>> > it) so we can see what others think!
>> >
>> > On Fri, Aug 7, 2015 at 1:45 AM, Gerald Loeffler
>> > <gerald.loeff...@googlemail.com> wrote:
>> >>
>> >> hi,
>> >>
>> >> if new LinearRegressionWithSGD() uses a miniBatchFraction of 1.0,
>> >> doesn’t that make it a deterministic/classical gradient descent rather
>> >> than a SGD?
>> >>
>> >> Specifically, miniBatchFraction=1.0 means the entire data set, i.e.
>> >> all rows. In the spirit of SGD, shouldn’t the default be the fraction
>> >> that results in exactly one row of the data set?
>> >>
>> >> thank you
>> >> gerald
>> >>
>> >> --
>> >> Gerald Loeffler
>> >> mailto:gerald.loeff...@googlemail.com
>> >> http://www.gerald-loeffler.net
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> >> For additional commands, e-mail: user-h...@spark.apache.org
>> >>
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to