Re: Case Sensitivity

Erick Erickson Thu, 14 Aug 2008 08:17:56 -0700

Be aware that StandardAnalyzer lowercases all the input,
both at index and query times. Field.Store.YES will store
the original text without any transformations, so doc.get(<field>)
will return the original text. However, no matter what the
Field.Store value, the *indexed* tokens (using
TOKENIZED as you Field.Index.TOKENIZED)
are passed through the analyzer.


For instance, indexing "MIXed CasE  TEXT" in a
field called "myfield" with Field.Store.YES,
Field.Index.TOKENIZED would index the
following tokens (with StandardAnalyzer).
mixed
case
text

and searches (with StandardAnalyzer) would match
any case in the query terms (e.g. MIXED would hit,
as would mixed as would CaSE).

However, doc.get("myfield") would return
"MIXed CasE  TEXT"

As Doron said, though, a few use cases would
help us provide better answers.

Best
Erick


On Thu, Aug 14, 2008 at 10:31 AM, Sergey Kabashnyuk <[EMAIL PROTECTED]>wrote:

> Thanks for you  reply Erick.
>
>
>  About the only way to do this that I know of is to
>> index the data three times, once without any case
>> changing, once uppercased and once lowercased.
>> You'll have to watch your analyzer, probably making
>> up your own (easily done, see the synonym analyzer
>> in Lucene in Action).
>>
>> Your example doesn't tell us anything, since the critical
>> information is the *analyzer* you use, both at query and
>> at index times. The analyzer is responsible for any
>> transformations, like case folding, tokenizing, etc.
>>
>
>
> In example  I want to show what I  stored field as  Field.Index.NO_NORMS
>
> As I understand it means what field contains original string
> despite what analyzer I chose(StandardAnalyzer by default).
>
> All querys I made myself without using Parsers.
> For example new TermQuery(new Term("filed", "MaMa"));
>
>
> I agree with you about possible implementation,
> but it increase size of index at times.
>
> But are there other possibilities, such as using  custom query, possibly
> similar to  RegexQuery,RegexTermEnum that would compare terms
> at it's  own discretion?
>
>
>
>
>
>> But what is your use-case for needing both upper and
>> lower case comparisons? I have a hard time coming
>> up with a reason to do both that wouldn't be satisfied
>> by just a caseless search.
>>
>> Best
>> Erick
>>
>> On Thu, Aug 14, 2008 at 4:47 AM, Sergey Kabashnyuk <[EMAIL PROTECTED]
>> >wrote:
>>
>>  Hello.
>>>
>>> I have the similar question.
>>>
>>> I need to implement
>>> 1. Case sensitive search.
>>> 2. Lower case search for concrete field.
>>> 3. Upper case search for concrete filed.
>>>
>>> For now I use
>>> new Field("PROPERTIES",
>>>                  content,
>>>                  Field.Store.NO,
>>>                  Field.Index.NO_NORMS,
>>>                  Field.TermVector.NO)
>>> for original string and make case sensitive search.
>>>
>>> But does anyone have an idea to how implement second and third type of
>>> search?
>>>
>>> Thanks
>>>
>>>
>>>
>>>  Hi All,
>>>
>>>> Once I index a bunch of documents with a StandardAnalyzer (and if the
>>>> effort
>>>> I need to put in to reindex the documents is not worth the effort), is
>>>> there
>>>> a way to search on the index without case sensitivity.
>>>> I do not use any sophisticated Analyzer that makes use of
>>>> LowerCaseTokenizer.
>>>> Please let me know if there is a solution to circumvent this case
>>>> sensitivity problem.
>>>> Many thanks
>>>> Dino
>>>>
>>>>
>>>>  --
>>> Sergey Kabashnyuk
>>> eXo Platform SAS
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>>>  --
> Sergey Kabashnyuk
> eXo Platform SAS
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: Case Sensitivity

Reply via email to