| < #LETTER:                      // unicode letters
     [
      "\u0041"-"\u005a",
      "\u0061"-"\u007a",
      "\u00c0"-"\u00d6",
      "\u00d8"-"\u00f6",
      "\u00f8"-"\u00ff",
      "\u0100"-"\u1fff"
     ]

becomes

| < #LETTER:                      // unicode letters
     [
      "\u0041"-"\u005a",
      "\u0061"-"\u007a",
      "\u00c0"-"\u00d6",
      "\u00d8"-"\u00f6",
      "\u00f8"-"\u00ff",
      "\u0100"-"\u1fff",
      "\u002d"
     ]

On 7/21/06, Ngo, Anh (ISS Southfield) <[EMAIL PROTECTED]> wrote:


Hello Mark,


Please show me how to add "-" to #LETTER definition


Thanks,


Anh Ngo

-----Original Message-----
From: Mark Miller [mailto:[EMAIL PROTECTED]
Sent: Friday, July 21, 2006 3:51 PM
To: java-user@lucene.apache.org
Subject: Re: StandardAnalyzer question

I do not beleive so. If you look above you will see that #P is only used
when looking for a num: a host ip, a phone number, etc. You will be
removing
that ability to recognize a "_" while rooting those tokens out. It will
still be parsed when tokenizing an EMAIL as well. I dont think this is
the
behavior you want.

- Mark

On 7/21/06, Ngo, Anh (ISS Southfield) <[EMAIL PROTECTED]> wrote:
>
>
> What is #LETTER definition in SnardarTokernize.jj?
>
>
> I saw:
>
> | <#P: ("_"|"-"|"/"|"."|",") >
> | <#HAS_DIGIT:                                    // at least one
digit
>     (<LETTER>|<DIGIT>)*
>     <DIGIT>
>     (<LETTER>|<DIGIT>)*
>   >
>
>
> Should I remove "_" and recompile the source code?
>
> Sincerely,
>
>
> Anh Ngo
>
> -----Original Message-----
> From: Daniel Naber [mailto:[EMAIL PROTECTED]
> Sent: Friday, July 21, 2006 2:49 PM
> To: java-user@lucene.apache.org
> Subject: Re: StandardAnalyzer question
>
> On Freitag 21 Juli 2006 16:16, Ngo, Anh (ISS Southfield) wrote:
>
> > The lucene 2.0.0 StandardAnalyzer does treat the "_"(underscore) as
a
> > token. Is there a way I can make StandardAnalyzer don't tokenize for
> > "_" or any given characters?
>
> You need to add "_" to the #LETTER definition in StandardTokenizer.jj,
> then
> rebuild StandardTokenizer.java using the appropriate and task.
>
> Regards
> Daniel
>
> --
> http://www.danielnaber.de
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to