Otis Gospodnetic <[EMAIL PROTECTED]> wrote on 16/10/2006 14:32:13: > Hi Ryan, > > StandardAnalyzer should already be smart about keeping email > addresses as a single token: > > // email addresses > | <EMAIL: <ALPHANUM> (("."|"-"|"_") <ALPHANUM>)* "@" <ALPHANUM> > (("."|"-") <ALPHANUM>)+ > > > (this is from StandardAnalyzer.jj) > > As for changing the text you feed to Lucene, that's all up to you. > Changing the String seems like the simplest approach. If you want > to wrap that in StringReader, you can, but you can also just work > with Strings.
Also, if you would to modify the tokens generated by the [Standard]Analyzer, you could write your own TokenFilter - e.g. like the SynonymFilter in the LIA book. > > Otis > > ----- Original Message ---- > From: Ryan O'Hara <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Monday, October 16, 2006 4:28:35 PM > Subject: Help with Custom Analyzer > > I have a few questions regarding writing a custom analyzer. > > My situation is that I would like to use the StandardAnalyzer but > with some data-specific rules. I was wondering if there was a way of > telling the StandardAnalyzer to treat a string of text, that would > normally be tokenized into more than one token, as only one token > (maybe by inserting quotes around the text). For example, say the > StandardAnalyzer normally splits the string of text > [EMAIL PROTECTED] into 4 tokens, but I want it to split the > string into only 1 token. Could I accomplish this by surrounding the > string with quotes or by using some other type of flag? > > Another question I have is how do I modify the text being analyzed? > From how I interpreted what I have read (which could easily be off), > it looks like in order to accomplish what I have previously > described, I am going to have to add some code to my custom > analyzer's tokenStream method. I see that tokenStream() has a Field > and a Reader as parameters. Would the way I go about adding rules be > to edit the reader text? If so, would manipulation of the text be > easier if I were to convert the reader into a string? > > Any help is greatly appreciated. Thanks. > > -Ryan > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]