Yes, it will be too much to do in real time, but it is a good idea tough.

I don't know if a vector of term frequencies is stored with the document.
Because I could search on the index to get the subset of documents and then
take the term frequencies from there.
In that case I could change MoreLikeThis to receive a set of term
frequencies, instead of an IndexReader, and use that to do all the process.

Anyone knows if a document contains for his fields the term frequencies?

On Wed, Apr 23, 2008 at 7:46 AM, Karl Wettin <[EMAIL PROTECTED]> wrote:

> Jonathan Ariel skrev:
>
> > Smart idea, but it won't help me. I have almost 50 categories and
> > eventually
> > I would like to "filter" not just on category but maybe also on
> > language,
> > etc.
> > Karl: what do you mean by measure the distance between the term vectors
> > and
> > cluster them in real time?
> >
>
> I mean exactly what I say, that if your subsets are small enough you could
> evalute the cosine coefficient and group documents accordingly.
>
> 2 million documents is however way to much data to do that in real time.
>
> I would probably create one index for each "filter" you want to use.
>
>
>        karl
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Reply via email to