Re: Lucene in the Humanities

Chris Hostetter Tue, 22 Feb 2005 17:50:24 -0800

: >>> Just curious: it would seem easier to use multiple fields for the
: >>> original case and lowercase searching. Is there any particular reason
: >>> you analyzed the documents to multiple indexes instead of multiple
: >>> fields?
: >>
: >> I considered that approach, however to expose QueryParser I'd have to
: >> get tricky.  If I have title_orig and title_lc fields, how would I
: >> allow freeform queries of title:something?


Why have seperate fields?

Why not index the title into the "title" field twice, once with each term
lowercased and once with the case left alone. (Using an analyzer that
tokenizes "The Quick BrOwN fox" as "[the] [quick] [brown] [fox] [The]
[Quick] [BrOwN] [fox]")

Then at search time, depending on the value of of the checkbox, construct
your QueryParser using the appropriate Analyzer.

The only problem i can think of would be inflated scores for terms that
are naturally lowercased, because they would wind up getting added to the
index twice, but based on what i've seen of hte data you are working
with, i imageing that if you used UPPERCASE instead of lowercase you
could drasticly reduce the likelyhood of any problems with that.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene in the Humanities

Reply via email to