Re: Custom Analyzer Help please

Grant Ingersoll Tue, 27 Mar 2007 07:32:10 -0800

Hi Tim,

From the StandardAnalyzer code, the TokenStream looks like:


/** Constructs a [EMAIL PROTECTED] StandardTokenizer} filtered by a [EMAIL 
PROTECTED]

StandardFilter}, a [EMAIL PROTECTED] LowerCaseFilter} and a [EMAIL PROTECTED]StopFilter}. */

  public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = new StandardTokenizer(reader);
    result = new StandardFilter(result);
    result = new LowerCaseFilter(result);
    result = new StopFilter(result, stopSet);
    return result;
  }

Whereas StopAnalyzer looks like:
/** Filters LowerCaseTokenizer with StopFilter. */
  public TokenStream tokenStream(String fieldName, Reader reader) {
    return new StopFilter(new LowerCaseTokenizer(reader), stopWords);
  }

So, I think the answer is that StandardAnalyzer already has what youstate you want. Is it, perhaps, that certain stopwords that you areinterested in are not currently being stopped?Also, there is a whole section of examples on how to write Analyzersin the contrib/analyzers section of the source code.


-Grant

On Mar 26, 2007, at 6:12 PM, TimF wrote:

I would like to be able to get terms from my data that are acombination of
two existing analyzers.
I would like this for both posting and searching of various fields.
An example of the data might be as follows:
   Hello XY&Z Corporation - [EMAIL PROTECTED]
I would like the following terms to come out of the analyzer:
 [hello]  [xy&z]  [corporation] [EMAIL PROTECTED] [com]  //this is the
StandardAnalyzer output
as well as
  [xy] [z]  [abc] [example]
Essentially, I want the StandardAnalyzer output, but then I want torun theStopAnalyzer on the terms that come out of the StandardAnalyzer.Basically Iwould like to be able to search against part of the "special" wordor thewhole "special" word, where special word contains tokens for thingslike
email and part numbers, etc...
I know the answer is that I have to create a custom analyzer thatcombines
the standard and stop analyzers, and I have tried... but I just cannot
figure out how to do this.
I have read through the LIA book and looked through the samples forkeyword
and perfield analyzers, but they just dont do it.

Anyone have any samples that do this kind of thing?
Thanks,
Tim
--
View this message in context: http://www.nabble.com/Custom-Analyzer-Help-please-tf3469904.html#a9682794
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Custom Analyzer Help please

Reply via email to