Re: Case Sensitivity

Sergey Kabashnyuk Fri, 15 Aug 2008 00:52:18 -0700

Hello

Here's my use case           content of the field
Doc1 -
        Field - “text ” -   “Field Without Norms”


Doc2 -
        Field - “text ” -   “field without norms”

Doc3 -
        Field - “text ” -   “FIELD WITHOUT NORMS”


Query                                     expected result
1. new Term(“text”,”Field Without Norms”)       doc1
2. new Term(“text”,”field without norms”)       doc2
3. new Term(“text”,”FIELD WITHOUT NORMS”)       doc3
lowercase(“text”,”field without norms”)   doc1, doc2, doc3
uppercase(“text”,”FIELD WITHOUT NORMS”)   doc1, doc2, doc3

I stor “text” field like :
new Field(“text”, Field.Store.NO, Field.Index.NO_NORMS,Field.TermVector.NO)

using StandardAnalyzer and query 1-3 works perfectly as I need. Thequestion is

how create query 4-5?

Thanks
Sergey Kabashnyuk
eXo Platform SAS

Be aware that StandardAnalyzer lowercases all the input,
both at index and query times. Field.Store.YES will store
the original text without any transformations, so doc.get(<field>)
will return the original text. However, no matter what the
Field.Store value, the *indexed* tokens (using
TOKENIZED as you Field.Index.TOKENIZED)
are passed through the analyzer.

For instance, indexing "MIXed CasE  TEXT" in a
field called "myfield" with Field.Store.YES,
Field.Index.TOKENIZED would index the
following tokens (with StandardAnalyzer).
mixed
case
text

and searches (with StandardAnalyzer) would match
any case in the query terms (e.g. MIXED would hit,
as would mixed as would CaSE).

However, doc.get("myfield") would return
"MIXed CasE  TEXT"

As Doron said, though, a few use cases would
help us provide better answers.

Best
Erick

On Thu, Aug 14, 2008 at 10:31 AM, Sergey Kabashnyuk<[EMAIL PROTECTED]>wrote:

Thanks for you  reply Erick.


 About the only way to do this that I know of is to

index the data three times, once without any case
changing, once uppercased and once lowercased.
You'll have to watch your analyzer, probably making
up your own (easily done, see the synonym analyzer
in Lucene in Action).

Your example doesn't tell us anything, since the critical
information is the *analyzer* you use, both at query and
at index times. The analyzer is responsible for any
transformations, like case folding, tokenizing, etc.



In example  I want to show what I  stored field as  Field.Index.NO_NORMS

As I understand it means what field contains original string
despite what analyzer I chose(StandardAnalyzer by default).

All querys I made myself without using Parsers.
For example new TermQuery(new Term("filed", "MaMa"));


I agree with you about possible implementation,
but it increase size of index at times.

But are there other possibilities, such as using  custom query, possibly
similar to  RegexQuery,RegexTermEnum that would compare terms
at it's  own discretion?

But what is your use-case for needing both upper and
lower case comparisons? I have a hard time coming
up with a reason to do both that wouldn't be satisfied
by just a caseless search.

Best
Erick

On Thu, Aug 14, 2008 at 4:47 AM, Sergey Kabashnyuk <[EMAIL PROTECTED]
>wrote:

 Hello.


I have the similar question.

I need to implement
1. Case sensitive search.
2. Lower case search for concrete field.
3. Upper case search for concrete filed.

For now I use
new Field("PROPERTIES",
                 content,
                 Field.Store.NO,
                 Field.Index.NO_NORMS,
                 Field.TermVector.NO)
for original string and make case sensitive search.

But does anyone have an idea to how implement second and third type of
search?

Thanks



 Hi All,

Once I index a bunch of documents with a StandardAnalyzer (and if the
effort

I need to put in to reindex the documents is not worth the effort),is

there
a way to search on the index without case sensitivity.
I do not use any sophisticated Analyzer that makes use of
LowerCaseTokenizer.
Please let me know if there is a solution to circumvent this case
sensitivity problem.
Many thanks
Dino


 --

Sergey Kabashnyuk
eXo Platform SAS


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


 --

Sergey Kabashnyuk
eXo Platform SAS

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Case Sensitivity

Reply via email to