| < #LETTER: // unicode letters [ "\u0041"-"\u005a", "\u0061"-"\u007a", "\u00c0"-"\u00d6", "\u00d8"-"\u00f6", "\u00f8"-"\u00ff", "\u0100"-"\u1fff" ]
becomes | < #LETTER: // unicode letters [ "\u0041"-"\u005a", "\u0061"-"\u007a", "\u00c0"-"\u00d6", "\u00d8"-"\u00f6", "\u00f8"-"\u00ff", "\u0100"-"\u1fff", "\u002d" ] On 7/21/06, Ngo, Anh (ISS Southfield) <[EMAIL PROTECTED]> wrote:
Hello Mark, Please show me how to add "-" to #LETTER definition Thanks, Anh Ngo -----Original Message----- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Friday, July 21, 2006 3:51 PM To: java-user@lucene.apache.org Subject: Re: StandardAnalyzer question I do not beleive so. If you look above you will see that #P is only used when looking for a num: a host ip, a phone number, etc. You will be removing that ability to recognize a "_" while rooting those tokens out. It will still be parsed when tokenizing an EMAIL as well. I dont think this is the behavior you want. - Mark On 7/21/06, Ngo, Anh (ISS Southfield) <[EMAIL PROTECTED]> wrote: > > > What is #LETTER definition in SnardarTokernize.jj? > > > I saw: > > | <#P: ("_"|"-"|"/"|"."|",") > > | <#HAS_DIGIT: // at least one digit > (<LETTER>|<DIGIT>)* > <DIGIT> > (<LETTER>|<DIGIT>)* > > > > > Should I remove "_" and recompile the source code? > > Sincerely, > > > Anh Ngo > > -----Original Message----- > From: Daniel Naber [mailto:[EMAIL PROTECTED] > Sent: Friday, July 21, 2006 2:49 PM > To: java-user@lucene.apache.org > Subject: Re: StandardAnalyzer question > > On Freitag 21 Juli 2006 16:16, Ngo, Anh (ISS Southfield) wrote: > > > The lucene 2.0.0 StandardAnalyzer does treat the "_"(underscore) as a > > token. Is there a way I can make StandardAnalyzer don't tokenize for > > "_" or any given characters? > > You need to add "_" to the #LETTER definition in StandardTokenizer.jj, > then > rebuild StandardTokenizer.java using the appropriate and task. > > Regards > Daniel > > -- > http://www.danielnaber.de > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]