Re: retrieve tokens

Erik Hatcher Wed, 22 Dec 2004 10:49:26 -0800

On Dec 22, 2004, at 12:43 PM, M. Smit wrote:

Erik Hatcher wrote: But for the other issue on 'store lucene' vs 'store db'. Does anyone can provide me with some field experience on size? The system I'm developing will provide searching through some 2000 pdf's, say some 200 pages each. I feed the plain text into Lucene on a Field.UnStored bases. I also store this plain text in the database for the sole purpose of presenting a context snippet.

If I were to use the Highlighter with a Field.Text, I will not use the database plain part altogether. But still I'm a little worried about speed/space issues. Or am I just seeing bears-on-the-road (Dutch saying, in plain English: making a fuzz about nothing)..

Consider that you're only highlighting 20 or so entries at one time. Getting the text from a Lucene index you're already navigating will be quite quick. But it shouldn't be too bad to pull 20 records from a database either.

There is one other consideration, and that is to use the new (CVS only) feature of capturing term vectors with position information. The author of the Highlighter, Mark Harwood, has posted in the not too distant past, an update to the Highlighter that can use this position information for highlighting rather than re-analyzing the original text. The re-analysis of the text may be the bottleneck, not the database access.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: retrieve tokens

Reply via email to