norm 2 and CosineDistanceMeasure are a good, fairly standard, choice.  The
L1
norm is useful for some things too, but you can use any positive integer or
"INF"
for L_infinity normalization.

  -jake

On Wed, Jan 13, 2010 at 4:32 PM, Bogdan Vatkov <[email protected]>wrote:

> Is it related to the distance calculation done
> by org.apache.mahout.common.distance.CosineDistanceMeasure for example?
> I am currently using --norm 2 in combination
> with org.apache.mahout.common.distance.CosineDistanceMeasure, is it ok,
> what
> other options I have for the --norm value?
>
> On Thu, Jan 14, 2010 at 2:28 AM, Jake Mannix <[email protected]>
> wrote:
>
> > It makes sure your vectors are all unit length (according to the norm you
> > choose - L2 norm
> > means: make sure each vector satisfies v.dot(v) == 1.0, for example)
> >
> > This makes sure that when you want to compare vectors to each other, a
> nice
> > "distance"
> > function is just distance(u, v) = 1 - u.dot(v)
> >
> >  -jake
> >
> > On Wed, Jan 13, 2010 at 4:22 PM, Bogdan Vatkov <[email protected]
> > >wrote:
> >
> > > What is the practical meaning of --norm parameter in the text-to-vector
> (
> > > http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html)
> process?
> > >
> > > Best regards,
> > > Bogdan
> > >
> >
>
>
>
> --
> Best regards,
> Bogdan
>

Reply via email to