1 nov 2007 kl. 18.09 skrev Cool Coder:

prior to adding into index

Easiest way out would be to add the document to a temporary index and extract the term frequency vector. I would recommend using MemoryIndex.

You could also tokenize the document and pass the data to a TermVectorMapper. You could consider replacing the fields of the document with CachedTokenStreams if you got the RAM to spare and don't want to waste CPU analyzing the document twice. I welcome TermVectorMappingChachedTokenStreamFactory. Even cooler would be to pass code down the IndexWriter.addDocument using a command pattern or something, allowing one to extend the document at the time of the analysis.


--
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to