Re: Case Sensitivity

Otis Gospodnetic Wed, 27 Aug 2008 08:09:34 -0700

Nah, I think the names are fine, I simply forgot.  I looked at the javadocs, it 
clearly says NO_NORMS doesn't get passed through an Analyzer.  Maybe in 3.0 we 
can switch to NOT_ANALYZED, as suggested, to reflect reality more closely.



Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Michael McCandless <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Wednesday, August 27, 2008 5:36:46 AM
> Subject: Re: Case Sensitivity
> 
> 
> Actually, as confusing as it is, Field.Index.NO_NORMS means  
> Field.Index.UN_TOKENIZED plus field.setOmitNorms(true).
> 
> Probably we should rename it to Field.Index.UN_TOKENiZED_NO_NORMS?
> 
> Mike
> 
> Otis Gospodnetic wrote:
> 
> > Dino, you lost me half-way through your email :(
> >
> > NO_NORMS does not mean the field is not tokenized.
> > UN_TOKENIZED does mean the field is not tokenized.
> >
> >
> > Otis--
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > ----- Original Message ----
> >> From: Dino Korah 
> >> To: [email protected]
> >> Sent: Tuesday, August 26, 2008 9:17:49 AM
> >> Subject: RE: Case Sensitivity
> >>
> >> I think I should rephrase my question.
> >>
> >> [ Context: Using out of the box StandardAnalyzer for indexing and  
> >> searching.
> >> ]
> >>
> >> Is it right to say that a field, if either UN_TOKENIZED or NO_NORMS- 
> >> ized (
> >> field.setOmitNorms(true) ), it doesn't get analyzed while indexing?
> >> Which means that when we search, it gets thru the analyzer and we  
> >> need to
> >> analyze them differently in the analyzer we use for searching?
> >> Doesn't it mean that a setOmitNorms(true) field also doesn't get  
> >> tokenized?
> >>
> >> What is the best solution if one was to add a set of fields  
> >> UN_TOKENIZED and
> >> others TOKENIZED, of the later set a few with setOmitNorms(true)  
> >> (the index
> >> writer is plain StandardAnalyzer based)? A per field analyzer at  
> >> query time
> >> ?!
> >>
> >> Many thanks,
> >> Dino
> >>
> >>
> >> -----Original Message-----
> >> From: Dino Korah [mailto:[EMAIL PROTECTED]
> >> Sent: 26 August 2008 12:12
> >> To: '[email protected]'
> >> Subject: RE: Case Sensitivity
> >>
> >> A little more case sensitivity questions.
> >>
> >> Based on the discussion on http://markmail.org/message/q7dqr4r7o6t6dgo5 
> >>  and
> >> on this thread, is it right to say that a field, if either  
> >> UN_TOKENIZED or
> >> NO_NORMS-ized, it doesn't get analyzed while indexing? Which means  
> >> we need
> >> to case-normalize (down-case) those fields before hand?
> >>
> >> Doest it mean that if I can afford, I should use norms.
> >>
> >> Many thanks,
> >> Dino
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Steven A Rowe [mailto:[EMAIL PROTECTED]
> >> Sent: 19 August 2008 17:43
> >> To: [email protected]
> >> Subject: RE: Case Sensitivity
> >>
> >> Hi Dino,
> >>
> >> I think you'd benefit from reading some FAQ answers, like:
> >>
> >> "Why is it important to use the same analyzer type during indexing  
> >> and
> >> search?"
> >>
> >> 4472d10961ba63c>
> >>
> >> Also, have a look at the AnalysisParalysis wiki page for some hints:
> >>
> >>
> >> On 08/19/2008 at 8:57 AM, Dino Korah wrote:
> >>> From the discussion here what I could understand was, if I am using
> >>> StandardAnalyzer on TOKENIZED fields, for both Indexing and  
> >>> Querying,
> >>> I shouldn't have any problems with cases.
> >>
> >> If by "shouldn't have problems with cases" you mean "can match
> >> case-insensitively", then this is true.
> >>
> >>> But if I have any UN_TOKENIZED fields there will be problems if I do
> >>> not case-normalize them myself before adding them as a field to the
> >>> document.
> >>
> >> Again, assuming that by "case-normalize" you mean "downcase", and  
> >> that you
> >> want case-insensitive matching, and that you use the  
> >> StandardAnalyzer (or
> >> some other downcasing analyzer) at query-time, then this is true.
> >>
> >>> In my case I have a mixed scenario. I am indexing emails and the  
> >>> email
> >>> addresses are indexed UN_TOKENIZED. I do have a second set of custom
> >>> tokenized field, which keep the tokens in individual fields with  
> >>> same
> >>> name.
> >> [...]
> >>> Does it mean that where ever I use UN_TOKENIZED, they do not get
> >>> through the StandardAnalyzer before getting Indexed, but they do  
> >>> when
> >>> they are searched on?
> >>
> >> This is true.
> >>
> >>> If that is the case, Do I need to normalise them before adding to
> >>> document?
> >>
> >> If you want case-insensitive matching, then yes, you do need to  
> >> normalize
> >> them before adding them to the document.
> >>
> >>> I also would like to know if it is better to employ an EmailAnalyzer
> >>> that makes a TokenStream out of the given email address, rather than
> >>> using a simplistic function that gives me a list of string pieces  
> >>> and
> >>> adding them one by one. With searches, would both the approaches  
> >>> give
> >>> same result?
> >>
> >> Yes, both approaches give the same result.  When you add string  
> >> pieces
> >> one-by-one, you are adding multiple same-named fields. By contrast,  
> >> the
> >> EmailAnalyzer approach would add a single field, and would allow  
> >> you to
> >> control positions (via Token.setPositionIncrement():
> >>
> >> ml#setPositionIncrement(int)>), e.g. to improve phrase handling.  
> >> Also, if
> >> you make up an EmailAnalyzer, you can use it to search against your
> >> tokenized email field, along with other analyzer(s) on other  
> >> field(s), using
> >> the PerFieldAnalyzerWrapper
> >>
> >> AnalyzerWrapper.html>.
> >>
> >> Steve
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Case Sensitivity

Reply via email to