Otis Gospodnetic <[EMAIL PROTECTED]> wrote on 16/10/2006 14:32:13:
> Hi Ryan,
>
> StandardAnalyzer should already be smart about keeping email
> addresses as a single token:
>
>   // email addresses
> | <EMAIL: <ALPHANUM> (("."|"-"|"_") <ALPHANUM>)* "@" <ALPHANUM>
> (("."|"-") <ALPHANUM>)+ >
>
> (this is from StandardAnalyzer.jj)
>
> As for changing the text you feed to Lucene, that's all up to you.
> Changing the String seems like the simplest approach.  If you want
> to wrap that in StringReader, you can, but you can also just work
> with Strings.

Also, if you would to modify the tokens generated by the
[Standard]Analyzer, you could write your own TokenFilter - e.g. like the
SynonymFilter in the LIA book.

>
> Otis
>
> ----- Original Message ----
> From: Ryan O'Hara <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Monday, October 16, 2006 4:28:35 PM
> Subject: Help with Custom Analyzer
>
> I have a few questions regarding writing a custom analyzer.
>
> My situation is that I would like to use the StandardAnalyzer but
> with some data-specific rules.  I was wondering if there was a way of
> telling the StandardAnalyzer to treat a string of text, that would
> normally be tokenized into more than one token, as only one token
> (maybe by inserting quotes around the text).  For example, say the
> StandardAnalyzer normally splits the string of text
> [EMAIL PROTECTED] into 4 tokens, but I want it to split the
> string into only 1 token.  Could I accomplish this by surrounding the
> string with quotes or by using some other type of flag?
>
> Another question I have is how do I modify the text being analyzed?
>  From how I interpreted what I have read (which could easily be off),
> it looks like in order to accomplish what I have previously
> described, I am going to have to add some code to my custom
> analyzer's tokenStream method.  I see that tokenStream() has a Field
> and a Reader as parameters.  Would the way I go about adding rules be
> to edit the reader text?  If so, would manipulation of the text be
> easier if I were to convert the reader into a string?
>
> Any help is greatly appreciated.  Thanks.
>
> -Ryan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to