Any form of generalized linear regression should use adaptive learning
rates rather than simple SGD.  One of the current best methods is adagrad
although there are variants such as RMS prop and adadelta.  All are pretty
easy to implement.

Here is some visualization of various methods that provides some insights:
http://imgur.com/a/Hqolp

Vowpal wabbit has some tricks that allow very large initial learning rates
to be used without divergence.  I don't know the details.






On Wed, Jun 3, 2015 at 8:05 PM, Mikio Braun <mikiobr...@googlemail.com>
wrote:

> We should probably look into this nevertheless. Requiring full grid search
> for a simple algorithm like mlr sounds like overkill.
>
> Do you have written down the math of your implementation somewhere?
>
> -M
>
> ----- Ursprüngliche Nachricht -----
> Von: "Till Rohrmann" <till.rohrm...@gmail.com>
> Gesendet: ‎02.‎06.‎2015 11:31
> An: "dev@flink.apache.org" <dev@flink.apache.org>
> Betreff: Re: MultipleLinearRegression - Strange results
>
> Great to hear. This should no longer be a pain point once we support proper
> cross validation.
>
> On Tue, Jun 2, 2015 at 11:11 AM, Felix Neutatz <neut...@googlemail.com>
> wrote:
>
> > Yes, grid search solved the problem :)
> >
> > 2015-06-02 11:07 GMT+02:00 Till Rohrmann <till.rohrm...@gmail.com>:
> >
> > > The SGD algorithm adapts the learning rate accordingly. However, this
> > does
> > > not help if you choose the initial learning rate too large because then
> > you
> > > calculate a weight vector in the first iterations from which it takes
> > > really long to recover.
> > >
> > > Cheer,
> > > Till
> > >
> > > On Mon, Jun 1, 2015 at 7:15 PM, Sachin Goel <sachingoel0...@gmail.com>
> > > wrote:
> > >
> > > > You can set the learning rate to be 1/sqrt(iteration number). This
> > > usually
> > > > works.
> > > >
> > > > Regards
> > > > Sachin Goel
> > > >
> > > > On Mon, Jun 1, 2015 at 9:09 PM, Alexander Alexandrov <
> > > > alexander.s.alexand...@gmail.com> wrote:
> > > >
> > > > > I've seen some work on adaptive learning rates in the past days.
> > > > >
> > > > > Maybe we can think about extending the base algorithm and comparing
> > the
> > > > use
> > > > > case setting for the IMPRO-3 project.
> > > > >
> > > > > @Felix you can discuss this with the others on Wednesday, Manu will
> > be
> > > > also
> > > > > there and can give some feedback, I'll try to send a link tomorrow
> > > > > morning...
> > > > >
> > > > >
> > > > > 2015-06-01 20:33 GMT+10:00 Till Rohrmann <trohrm...@apache.org>:
> > > > >
> > > > > > Since MLR uses stochastic gradient descent, you probably have to
> > > > > configure
> > > > > > the step size right. SGD is very sensitive to the right step size
> > > > choice.
> > > > > > If the step size is too high, then the SGD algorithm does not
> > > converge.
> > > > > You
> > > > > > can find the parameter description here [1].
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://ci.apache.org/projects/flink/flink-docs-master/libs/ml/multiple_linear_regression.html
> > > > > >
> > > > > > On Mon, Jun 1, 2015 at 11:48 AM, Felix Neutatz <
> > > neut...@googlemail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I want to use MultipleLinearRegression, but I got really
> strange
> > > > > results.
> > > > > > > So I tested it with the housing price dataset:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
> > > > > > >
> > > > > > > And here I get negative house prices - even when I use the
> > training
> > > > set
> > > > > > as
> > > > > > > dataset:
> > > > > > > LabeledVector(-1.1901998613214253E78, DenseVector(1500.0,
> 2197.0,
> > > > > 2978.0,
> > > > > > > 1369.0, 1451.0))
> > > > > > > LabeledVector(-2.7411218018254747E78, DenseVector(4445.0,
> 4522.0,
> > > > > 4038.0,
> > > > > > > 4223.0, 4868.0))
> > > > > > > LabeledVector(-2.688526857613956E78, DenseVector(4522.0,
> 4038.0,
> > > > > 4351.0,
> > > > > > > 4129.0, 4617.0))
> > > > > > > LabeledVector(-1.3075960386971714E78, DenseVector(2001.0,
> 2059.0,
> > > > > 1992.0,
> > > > > > > 2008.0, 2504.0))
> > > > > > > LabeledVector(-1.476238770814297E78, DenseVector(1992.0,
> 1965.0,
> > > > > 1983.0,
> > > > > > > 2300.0, 3811.0))
> > > > > > > LabeledVector(-1.4298128754759792E78, DenseVector(2059.0,
> 1992.0,
> > > > > 1965.0,
> > > > > > > 2425.0, 3178.0))
> > > > > > > ...
> > > > > > >
> > > > > > > and a huge squared error:
> > > > > > > Squared error: 4.799184832395361E159
> > > > > > >
> > > > > > > You can find my code here:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/test/io/sanfran/wikiTrends/extraction/flink/Regression.scala
> > > > > > >
> > > > > > > Can you help me? What did I do wrong?
> > > > > > >
> > > > > > > Thank you for your help,
> > > > > > > Felix
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to