Nah, I think the names are fine, I simply forgot. I looked at the javadocs, it clearly says NO_NORMS doesn't get passed through an Analyzer. Maybe in 3.0 we can switch to NOT_ANALYZED, as suggested, to reflect reality more closely.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Michael McCandless <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Wednesday, August 27, 2008 5:36:46 AM > Subject: Re: Case Sensitivity > > > Actually, as confusing as it is, Field.Index.NO_NORMS means > Field.Index.UN_TOKENIZED plus field.setOmitNorms(true). > > Probably we should rename it to Field.Index.UN_TOKENiZED_NO_NORMS? > > Mike > > Otis Gospodnetic wrote: > > > Dino, you lost me half-way through your email :( > > > > NO_NORMS does not mean the field is not tokenized. > > UN_TOKENIZED does mean the field is not tokenized. > > > > > > Otis-- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > ----- Original Message ---- > >> From: Dino Korah > >> To: java-user@lucene.apache.org > >> Sent: Tuesday, August 26, 2008 9:17:49 AM > >> Subject: RE: Case Sensitivity > >> > >> I think I should rephrase my question. > >> > >> [ Context: Using out of the box StandardAnalyzer for indexing and > >> searching. > >> ] > >> > >> Is it right to say that a field, if either UN_TOKENIZED or NO_NORMS- > >> ized ( > >> field.setOmitNorms(true) ), it doesn't get analyzed while indexing? > >> Which means that when we search, it gets thru the analyzer and we > >> need to > >> analyze them differently in the analyzer we use for searching? > >> Doesn't it mean that a setOmitNorms(true) field also doesn't get > >> tokenized? > >> > >> What is the best solution if one was to add a set of fields > >> UN_TOKENIZED and > >> others TOKENIZED, of the later set a few with setOmitNorms(true) > >> (the index > >> writer is plain StandardAnalyzer based)? A per field analyzer at > >> query time > >> ?! > >> > >> Many thanks, > >> Dino > >> > >> > >> -----Original Message----- > >> From: Dino Korah [mailto:[EMAIL PROTECTED] > >> Sent: 26 August 2008 12:12 > >> To: 'java-user@lucene.apache.org' > >> Subject: RE: Case Sensitivity > >> > >> A little more case sensitivity questions. > >> > >> Based on the discussion on http://markmail.org/message/q7dqr4r7o6t6dgo5 > >> and > >> on this thread, is it right to say that a field, if either > >> UN_TOKENIZED or > >> NO_NORMS-ized, it doesn't get analyzed while indexing? Which means > >> we need > >> to case-normalize (down-case) those fields before hand? > >> > >> Doest it mean that if I can afford, I should use norms. > >> > >> Many thanks, > >> Dino > >> > >> > >> > >> -----Original Message----- > >> From: Steven A Rowe [mailto:[EMAIL PROTECTED] > >> Sent: 19 August 2008 17:43 > >> To: java-user@lucene.apache.org > >> Subject: RE: Case Sensitivity > >> > >> Hi Dino, > >> > >> I think you'd benefit from reading some FAQ answers, like: > >> > >> "Why is it important to use the same analyzer type during indexing > >> and > >> search?" > >> > >> 4472d10961ba63c> > >> > >> Also, have a look at the AnalysisParalysis wiki page for some hints: > >> > >> > >> On 08/19/2008 at 8:57 AM, Dino Korah wrote: > >>> From the discussion here what I could understand was, if I am using > >>> StandardAnalyzer on TOKENIZED fields, for both Indexing and > >>> Querying, > >>> I shouldn't have any problems with cases. > >> > >> If by "shouldn't have problems with cases" you mean "can match > >> case-insensitively", then this is true. > >> > >>> But if I have any UN_TOKENIZED fields there will be problems if I do > >>> not case-normalize them myself before adding them as a field to the > >>> document. > >> > >> Again, assuming that by "case-normalize" you mean "downcase", and > >> that you > >> want case-insensitive matching, and that you use the > >> StandardAnalyzer (or > >> some other downcasing analyzer) at query-time, then this is true. > >> > >>> In my case I have a mixed scenario. I am indexing emails and the > >>> email > >>> addresses are indexed UN_TOKENIZED. I do have a second set of custom > >>> tokenized field, which keep the tokens in individual fields with > >>> same > >>> name. > >> [...] > >>> Does it mean that where ever I use UN_TOKENIZED, they do not get > >>> through the StandardAnalyzer before getting Indexed, but they do > >>> when > >>> they are searched on? > >> > >> This is true. > >> > >>> If that is the case, Do I need to normalise them before adding to > >>> document? > >> > >> If you want case-insensitive matching, then yes, you do need to > >> normalize > >> them before adding them to the document. > >> > >>> I also would like to know if it is better to employ an EmailAnalyzer > >>> that makes a TokenStream out of the given email address, rather than > >>> using a simplistic function that gives me a list of string pieces > >>> and > >>> adding them one by one. With searches, would both the approaches > >>> give > >>> same result? > >> > >> Yes, both approaches give the same result. When you add string > >> pieces > >> one-by-one, you are adding multiple same-named fields. By contrast, > >> the > >> EmailAnalyzer approach would add a single field, and would allow > >> you to > >> control positions (via Token.setPositionIncrement(): > >> > >> ml#setPositionIncrement(int)>), e.g. to improve phrase handling. > >> Also, if > >> you make up an EmailAnalyzer, you can use it to search against your > >> tokenized email field, along with other analyzer(s) on other > >> field(s), using > >> the PerFieldAnalyzerWrapper > >> > >> AnalyzerWrapper.html>. > >> > >> Steve > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]