Re: Best way to count tokens

Karl Wettin Thu, 01 Nov 2007 10:57:20 -0800


1 nov 2007 kl. 18.09 skrev Cool Coder:

prior to adding into index

Easiest way out would be to add the document to a temporary index andextract the term frequency vector. I would recommend using MemoryIndex.

You could also tokenize the document and pass the data to aTermVectorMapper. You could consider replacing the fields of thedocument with CachedTokenStreams if you got the RAM to spare anddon't want to waste CPU analyzing the document twice. I welcomeTermVectorMappingChachedTokenStreamFactory. Even cooler would be topass code down the IndexWriter.addDocument using a command pattern orsomething, allowing one to extend the document at the time of theanalysis.



--
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Best way to count tokens

Reply via email to