On Tue, Jan 19, 2010 at 10:58 AM, Jeff Eastman <[email protected]>wrote:
> > Looking in MAHOUT-228-3.patch, I don't see any sparse vectorizer. Did you > have another patch in mind? > There should have been one. Let me check to figure out the name. > I'm trying to wrap my mind around "L-1 model distribution". For the classifier learning, what we have is a prior distribution for classifiers that has probability proportional to exp(- sum(abs(w_i))). The log of this probability is - sum(abs(w_i)) = L_1(w) which gives the name. This log probability is what is used as a regularization term in the optimization of the classifier. It isn't obvious from this definition, but this prior/regularizer has the effect of preferring sparse models (for classification). Where L_2 priors prefer lots of small weights in ambiguous conditions because the penalty on large coefficients is so large, L_1 priors prefer to focus the weight on one or a few larger coefficients. > .... Would an L-1 model vector only have integer-valued elements? > In the sense that 0 is an integer, yes. :-) But what it prefers is zero valued coefficients.
