At the moment the current SGD implementation works like (modulo
regularization): newWeights = oldWeights - adaptedStepsize *
sumOfGradients/numberOfGradients where adaptedStepsize =
initialStepsize/sqrt(iterationNumber) and sumOfGradients is the simple sum
of the gradients for all points in the batch.

Thanks for the pointer Ted. These methods look really promising. We
definitely have to update our SGD implementation to use a better adaptive
learning rate strategy. I’ll open a JIRA for that.

Maybe also the default learning rate of 0.1 is set too high.
​

On Thu, Jun 4, 2015 at 1:20 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> Any form of generalized linear regression should use adaptive learning
> rates rather than simple SGD.  One of the current best methods is adagrad
> although there are variants such as RMS prop and adadelta.  All are pretty
> easy to implement.
>
> Here is some visualization of various methods that provides some insights:
> http://imgur.com/a/Hqolp
>
> Vowpal wabbit has some tricks that allow very large initial learning rates
> to be used without divergence.  I don't know the details.
>
>
>
>
>
>
> On Wed, Jun 3, 2015 at 8:05 PM, Mikio Braun <mikiobr...@googlemail.com>
> wrote:
>
> > We should probably look into this nevertheless. Requiring full grid
> search
> > for a simple algorithm like mlr sounds like overkill.
> >
> > Do you have written down the math of your implementation somewhere?
> >
> > -M
> >
> > ----- Ursprüngliche Nachricht -----
> > Von: "Till Rohrmann" <till.rohrm...@gmail.com>
> > Gesendet: ‎02.‎06.‎2015 11:31
> > An: "dev@flink.apache.org" <dev@flink.apache.org>
> > Betreff: Re: MultipleLinearRegression - Strange results
> >
> > Great to hear. This should no longer be a pain point once we support
> proper
> > cross validation.
> >
> > On Tue, Jun 2, 2015 at 11:11 AM, Felix Neutatz <neut...@googlemail.com>
> > wrote:
> >
> > > Yes, grid search solved the problem :)
> > >
> > > 2015-06-02 11:07 GMT+02:00 Till Rohrmann <till.rohrm...@gmail.com>:
> > >
> > > > The SGD algorithm adapts the learning rate accordingly. However, this
> > > does
> > > > not help if you choose the initial learning rate too large because
> then
> > > you
> > > > calculate a weight vector in the first iterations from which it takes
> > > > really long to recover.
> > > >
> > > > Cheer,
> > > > Till
> > > >
> > > > On Mon, Jun 1, 2015 at 7:15 PM, Sachin Goel <
> sachingoel0...@gmail.com>
> > > > wrote:
> > > >
> > > > > You can set the learning rate to be 1/sqrt(iteration number). This
> > > > usually
> > > > > works.
> > > > >
> > > > > Regards
> > > > > Sachin Goel
> > > > >
> > > > > On Mon, Jun 1, 2015 at 9:09 PM, Alexander Alexandrov <
> > > > > alexander.s.alexand...@gmail.com> wrote:
> > > > >
> > > > > > I've seen some work on adaptive learning rates in the past days.
> > > > > >
> > > > > > Maybe we can think about extending the base algorithm and
> comparing
> > > the
> > > > > use
> > > > > > case setting for the IMPRO-3 project.
> > > > > >
> > > > > > @Felix you can discuss this with the others on Wednesday, Manu
> will
> > > be
> > > > > also
> > > > > > there and can give some feedback, I'll try to send a link
> tomorrow
> > > > > > morning...
> > > > > >
> > > > > >
> > > > > > 2015-06-01 20:33 GMT+10:00 Till Rohrmann <trohrm...@apache.org>:
> > > > > >
> > > > > > > Since MLR uses stochastic gradient descent, you probably have
> to
> > > > > > configure
> > > > > > > the step size right. SGD is very sensitive to the right step
> size
> > > > > choice.
> > > > > > > If the step size is too high, then the SGD algorithm does not
> > > > converge.
> > > > > > You
> > > > > > > can find the parameter description here [1].
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://ci.apache.org/projects/flink/flink-docs-master/libs/ml/multiple_linear_regression.html
> > > > > > >
> > > > > > > On Mon, Jun 1, 2015 at 11:48 AM, Felix Neutatz <
> > > > neut...@googlemail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I want to use MultipleLinearRegression, but I got really
> > strange
> > > > > > results.
> > > > > > > > So I tested it with the housing price dataset:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
> > > > > > > >
> > > > > > > > And here I get negative house prices - even when I use the
> > > training
> > > > > set
> > > > > > > as
> > > > > > > > dataset:
> > > > > > > > LabeledVector(-1.1901998613214253E78, DenseVector(1500.0,
> > 2197.0,
> > > > > > 2978.0,
> > > > > > > > 1369.0, 1451.0))
> > > > > > > > LabeledVector(-2.7411218018254747E78, DenseVector(4445.0,
> > 4522.0,
> > > > > > 4038.0,
> > > > > > > > 4223.0, 4868.0))
> > > > > > > > LabeledVector(-2.688526857613956E78, DenseVector(4522.0,
> > 4038.0,
> > > > > > 4351.0,
> > > > > > > > 4129.0, 4617.0))
> > > > > > > > LabeledVector(-1.3075960386971714E78, DenseVector(2001.0,
> > 2059.0,
> > > > > > 1992.0,
> > > > > > > > 2008.0, 2504.0))
> > > > > > > > LabeledVector(-1.476238770814297E78, DenseVector(1992.0,
> > 1965.0,
> > > > > > 1983.0,
> > > > > > > > 2300.0, 3811.0))
> > > > > > > > LabeledVector(-1.4298128754759792E78, DenseVector(2059.0,
> > 1992.0,
> > > > > > 1965.0,
> > > > > > > > 2425.0, 3178.0))
> > > > > > > > ...
> > > > > > > >
> > > > > > > > and a huge squared error:
> > > > > > > > Squared error: 4.799184832395361E159
> > > > > > > >
> > > > > > > > You can find my code here:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/test/io/sanfran/wikiTrends/extraction/flink/Regression.scala
> > > > > > > >
> > > > > > > > Can you help me? What did I do wrong?
> > > > > > > >
> > > > > > > > Thank you for your help,
> > > > > > > > Felix
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to