Re: producing vectors from composite documents

Ted Dunning Tue, 08 Jun 2010 14:56:41 -0700

Got it.

This really needs to be done before vectorization, but you can segregate the
output vector for different handling by passing in a view to different parts
of the vector.

My recommendation is that you apply IDF using the weight dictionary in the
vectorizer.  That will let you have multiple text fields with different
weighting schemes but still put all the results into a single result vector.
 As a side effect, if you put everything into a vector of dimension 1, then
you get multi-field weighted inputs for free.

On Tue, Jun 8, 2010 at 11:01 AM, Robin Anil <[email protected]> wrote:

> > I think that you misunderstand me a little bit, and I know that I am not
> > understanding what you are saying here.
> >
>
> Okay.. Lets take an example. Say you have users with text bio and the
> feature age, weight etc.
> text is sparse and we need to apply tfidf on it, while we should not on age
> and weight. So i this case, we need to hash the text into some range and do
> one pass or two pass idf calculation in that range. We need to leave the
> other features alone right. Otherwise by idf they will be squashed log(1)

Re: producing vectors from composite documents

Reply via email to