RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

Jong Kim Mon, 09 Jul 2007 12:45:44 -0700

Mark,

I understand your point. 
However, we do not maintain a separate field for the lower-case version of
the words. 
Instead we index them twice at the same position within the same field,
which allows us to provide case-exact match for search queries containing
upper case characters, but case-insensitive match for search queries given
in all low cases.
So I'm afraid I can't use the technique you recommend.

/Jong

-----Original Message-----
From: markharw00d [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 09, 2007 3:13 PM
To: java-user@lucene.apache.org
Subject: Re: Stop-words comparison in MoreLikeThis class in Lucene's
contrib/queries project

 >>the case matters only for those words that should be included.

Jong, just want to check we're on the same page - you do know MoreLikeThis
has a kind of automatic Stop-Wording built in , yes?
MoreLikeThis looks at the document frequency of all terms in the "this" 
text you provide and only selects a shortlist (up to maxQueryTerms) of the
rarer words. As such, users (admin or otherwise) surrender precise control
over what terms are used, hence my earlier point "does case really matter in
this 'inexact' scenario?" and can you use the lower-case version of the
field you said you already create?

Cheers
Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

Reply via email to