Ok, thank you, Ted, for all your generous help. Can't understate how i appreciate it.
-Dmitriy On Sun, Sep 5, 2010 at 3:24 PM, Ted Dunning <[email protected]> wrote: > We don't have this algorithm in Mahout yet: http://arxiv.org/abs/1006.2156 > > But it looks a lot like what you want. > > Short of that, you can definitely do recommendation like things with > logistic regression and you don't have > to worry much about the non-negative sort of constraints (in my > experience). > > On Sun, Sep 5, 2010 at 1:12 PM, Dmitriy Lyubimov <[email protected]> > wrote: > > > Ted, thank you very much. > > > > I would like to discuss one more generalization here if i may. > > > > Let's consider Netflix prize problem for the moment. That is, parameters > of > > regression are non-quantitative ones (person, movie ids essentually). > > Regressand is the user's score. I guess many familiar with Yehuda Koren's > > approach to this when he basically used SGD as non-negative factorization > > and he also mentioned something about applying logistics function on top > of > > it. I.e. the regression looks exactly like it would for logistic > regression > > (he also added biases), with exception that it is more of a nonnegative > one > > (factors are not allowed to do negative). > > > > The problem i currently have on my hands is a hybrid of those. I.e. > imagine > > that in addition to some non-quantitative features (person, movie) you > know > > some quantitative features about movie (say genre scores that come out of > > some sort of encyclopedic database, i.e. manually trained taxonomy) (you > > might also know some quantitative features about person too, but let's > keep > > it simple for the purpose of this discussion). > > > > It's very easy for me to go in and create individual regression for a > user > > based on their reaction (like /didn't like) and what i know of > quantitative > > qualities of movies. > > > > However, at some point i start feeling like movie genre ratings are not > > enough. Some movies have still some pretty unique factors about them that > > we > > don't really know or rated as a feature. > > > > So what i really want is probably nonnegative factorization but the one > > that > > takes into account quantitative features that come from different aspects > > of > > a given instance of (person, movie) interaction . (movie genre, time of > > day, > > weather outside, etc., whatever we think may have a good chance to be a > > good > > feature without really going thru a PCA or feature selection process at > the > > moment). > > So encountering quantitative features we may search for regression > > parameters, but for non-quatitative features (person, movie) i'd still > > prefer to have non-negative biggest factors learned based on history. > > > > Is there's a way to merge both those approaches into one, as they seem to > > be > > really similar? (i.e. regressions with non-negative factorization)? > > > > Intuitively i feel that those approaches are really similar (difference > is > > in NNF we are really guessing the principal factors input, essentially). > > And there must be a relatively simple way to morph it all in a hybrid > > approach where some of betas interact with quantitative features x but > yet > > another ones interact with non-negative factors associated with > > non-quantitative input (such as person id) encountered in the sample. > > > > Does it make sense? is there a way to do this in Mahout? > > > > Thank you very much. > > -Dmitriy. > > > > > > > > On Sat, Sep 4, 2010 at 3:05 PM, Ted Dunning <[email protected]> > wrote: > > > > > I generally add in the constant term to the feature vector if I want to > > use > > > it. You are correct that it is usually critical to correct function, > but > > I > > > prefer to not have a special case for it. The one place where I think > > that > > > is wrong is where you want to have special treatment by the prior. It > is > > > common to have a very different prior on the intercept than on the > > > coefficients. My only defense there is that common priors for the > > > coefficients like L1 allow for plenty of latitude on the intercept so > > that > > > as long as the data outweigh the prior, this doesn't matter. There is > a > > > similar distinctive effect between interactions and main effects. > > > > > > One place it would matter a lot is in multi-level inference where you > > wind > > > up with a pretty strong prior from the higher level regressions (since > > that > > > is where most of the data actually is). In that case, I would probably > > > rather separate the handling. In fact, at that point, I think I would > > > probably go with a grouped prior to allow handling all of these cases > in > > a > > > coherent setting. > > > > > > On the second question, betas can definitely go negative. That is how > > the > > > model expresses an effect that decreases the likelihood of success. > > > > > > On Sat, Sep 4, 2010 at 1:28 PM, Dmitriy Lyubimov <[email protected]> > > > wrote: > > > > > > > There's something i don't understand about your derivation . > > > > > > > > > > > > > > > > I think Bishop generally suggests that in linear regression y=beta_0 > + > > > > <beta, x> (so there's an intercept) > > > > and i think he uses similar approach with fitting to logistic > function > > > > where > > > > i think he suggests to use P( [mu + <beta,x>]/s ) > > > > which of course can be thought of again as P(beta_0+<beta,x>) > > > > > > > > but if there's no intercept beta_0, then y(x=(0,...0)^T | beta) is > > > always > > > > 0. Which is not true of course in most situations. Does your method > > imply > > > > that having trivial input (all 0s ) would produce 0 estimation? > > > > > > > > Second question, are the betas allowed to go negative? > > > > > > > > > >
