Thanks Erik and Incze.
Sorry for this lengthy post.

Here is the class:
import org.apache.lucene.analysis.*;
import org.apache.lucene.analysis.standard.StandardFilter;

import java.io.Reader;

import java.util.Hashtable;

public class KeywordAnalyzer extends Analyzer {
    public static final String[] STOP_WORDS =
StopAnalyzer.ENGLISH_STOP_WORDS;
    private Hashtable stopTable;

    public KeywordAnalyzer() {
        this(STOP_WORDS);
    }

    public KeywordAnalyzer(String[] stopWords) {
        stopTable = StopFilter.makeStopTable(stopWords);
    }

    public TokenStream tokenStream(String fieldName, Reader reader) {
        TokenStream result = new NotTokenizingTokenizer(reader);
        result = new StandardFilter(result);
        result = new LowerCaseFilter(result);
        result = new StopFilter(result, stopTable);

        return result;
    }
}


I have retried everything with the new KeywordAnalyzer class,
PerFieldAnalyzerWrapper, and with Field.Keyword. I don't get results for
any searches, it doesn't even matter whether there is a number at the
end or not.

Using query.toString("url"):

Query query = QueryParser.parse(terms, "contents", analyzer);           
logger.info("search method: query.toString for url= " +
            query.toString("url"));

I can see what the analyzer is searching for.

How do I determine what is the value stored in the index by
Field.Keyword?

I've tried:

            doc.add(Field.Keyword("url", url)); 
            System.out.println("url: doc toString method= " +
doc.toString());

But I don't know if this is the correct value that is compared with what
the analyzer sends in.

Thanks for the help.

            Morris




-----Original Message-----
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, March 24, 2004 4:45 PM
To: Lucene Users List
Subject: Re: Zero hits for queries ending with a number

On Mar 24, 2004, at 5:58 PM, Morris Mizrahi wrote:
> I think the custom analyzer I created is not properly doing what a
> KeywordAnalyzer would do.
>
> Erik, could you please post what KeywordAnalyzer should look like?

It should simply "tokenize" the entire input as a single token.  Incze 
Lajos posted a NonTokenizingTokenizer early today, in fact, that does 
the trick.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to