Re: Regression+SGD question

Dmitriy Lyubimov Sun, 05 Sep 2010 15:35:01 -0700

Ok, thank you, Ted, for all your generous help. Can't understate how i
appreciate it.


-Dmitriy


On Sun, Sep 5, 2010 at 3:24 PM, Ted Dunning <[email protected]> wrote:

> We don't have this algorithm in Mahout yet: http://arxiv.org/abs/1006.2156
>
> But it looks a lot like what you want.
>
> Short of that, you can definitely do recommendation like things with
> logistic regression and you don't have
> to worry much about the non-negative sort of constraints (in my
> experience).
>
> On Sun, Sep 5, 2010 at 1:12 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
>
> > Ted, thank you very much.
> >
> > I would like to discuss one more generalization here if i may.
> >
> > Let's consider Netflix prize problem for the moment. That is, parameters
> of
> > regression are non-quantitative ones (person, movie ids essentually).
> > Regressand is the user's score. I guess many familiar with Yehuda Koren's
> > approach to this when he basically used SGD as non-negative factorization
> > and he also mentioned something about applying logistics function on top
> of
> > it. I.e. the regression looks exactly like it would for logistic
> regression
> > (he also added biases), with exception that it is more of a nonnegative
> one
> > (factors are not allowed to do negative).
> >
> > The problem i currently have on my hands is a hybrid of those. I.e.
> imagine
> > that in addition to some non-quantitative features (person, movie) you
> know
> > some quantitative features about movie (say genre scores that come out of
> > some sort of encyclopedic database, i.e. manually trained taxonomy) (you
> > might also know some quantitative features about person too, but let's
> keep
> > it simple for the purpose of this discussion).
> >
> > It's very easy for me to go in and create individual regression for a
> user
> > based on their reaction (like /didn't like) and what i know of
> quantitative
> > qualities of movies.
> >
> > However, at some point i start feeling like movie genre ratings are not
> > enough. Some movies have still some pretty unique factors about them that
> > we
> > don't really know or rated as a feature.
> >
> > So what i really want is probably nonnegative factorization but the one
> > that
> > takes into account quantitative features that come from different aspects
> > of
> > a given instance of (person, movie) interaction . (movie genre, time of
> > day,
> > weather outside, etc., whatever we think may have a good chance to be a
> > good
> > feature without really going thru a PCA or feature selection process at
> the
> > moment).
> > So encountering quantitative features we may search for regression
> > parameters, but for non-quatitative features (person, movie) i'd still
> > prefer to have non-negative biggest factors learned based on history.
> >
> > Is there's a way to merge both those approaches into one, as they seem to
> > be
> > really similar? (i.e. regressions with non-negative factorization)?
> >
> > Intuitively i feel that those approaches are really similar (difference
> is
> > in NNF we are really guessing the principal factors input, essentially).
> >  And there must be a relatively simple way to morph it all in a hybrid
> > approach where some of betas interact with quantitative features x but
> yet
> > another ones interact with non-negative factors associated with
> > non-quantitative input (such as person id) encountered in the sample.
> >
> > Does it make sense? is there a way to do this in Mahout?
> >
> > Thank you very much.
> > -Dmitriy.
> >
> >
> >
> > On Sat, Sep 4, 2010 at 3:05 PM, Ted Dunning <[email protected]>
> wrote:
> >
> > > I generally add in the constant term to the feature vector if I want to
> > use
> > > it.  You are correct that it is usually critical to correct function,
> but
> > I
> > > prefer to not have a special case for it.  The one place where I think
> > that
> > > is wrong is where you want to have special treatment by the prior.  It
> is
> > > common to have a very different prior on the intercept than on the
> > > coefficients.  My only defense there is that common priors for the
> > > coefficients like L1 allow for plenty of latitude on the intercept so
> > that
> > > as long as the data outweigh the prior, this doesn't matter.  There is
> a
> > > similar distinctive effect between interactions and main effects.
> > >
> > > One place it would matter a lot is in multi-level inference where you
> > wind
> > > up with a pretty strong prior from the higher level regressions (since
> > that
> > > is where most of the data actually is).  In that case, I would probably
> > > rather separate the handling.  In fact, at that point, I think I would
> > > probably go with a grouped prior to allow handling all of these cases
> in
> > a
> > > coherent setting.
> > >
> > > On the second question, betas can definitely go negative.  That is how
> > the
> > > model expresses an effect that decreases the likelihood of success.
> > >
> > > On Sat, Sep 4, 2010 at 1:28 PM, Dmitriy Lyubimov <[email protected]>
> > > wrote:
> > >
> > > > There's something i don't understand about your derivation .
> > > >
> > > >
> > > >
> > > > I think Bishop  generally suggests that in linear regression y=beta_0
> +
> > > > <beta, x> (so there's an intercept)
> > > > and i think he uses similar approach with fitting to logistic
> function
> > > > where
> > > > i think he suggests to use P( [mu + <beta,x>]/s )
> > > > which of course can be thought of again as P(beta_0+<beta,x>)
> > > >
> > > > but if there's no intercept beta_0, then y(x=(0,...0)^T | beta)  is
> > > always
> > > > 0. Which is not true of course in most situations. Does your method
> > imply
> > > > that having trivial input (all 0s ) would produce 0 estimation?
> > > >
> > > > Second question, are the betas allowed to go negative?
> > > >
> > >
> >
>

Re: Regression+SGD question

Reply via email to