10 apr 2007 kl. 16.58 skrev Sengly Heng:
I wanted to do this way as well but I am a bit worrying about
computational
time as I have many documents and each document is a bit large.
I am looking for more solutions.
We don't really know what your problem is. Explaining that rathern
than the solution you have thought of might render a couple of
alternate solutions. Perhaps something could be precalculated and
stored in the documents. Perhaps feature selection (reduction) of the
terms might do the trick for you. And so on.
Let me pull some questions out of nowhere that might help: How slow
is it, and how fast did you expect it to be? How many documents does
your queries normally yeild in? Can you limit the evaulation to the
top n documents?
Please do contribute if you have any. Your help is hightly
appreciated.
As Lucene primarily is an inverted index the document vector space
model is not available in any other fashion than the term frequency
vectors, or building them from scratch by enumerating the whole
index. The latter of course beeing horrible slow in most cases.
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]