Re: exact match ..

Robert Watkins Tue, 21 Feb 2006 07:16:53 -0800

The way I have solved the problem of allowing exact matches is, for each
field in which it is possible for an exact match to be requested, a
parallel field is created at index time that is unstemmed and has a
specific prefix:


        if (fieldData.isSearched() && tokenize && usingStemmingAnalyzer) {
                doc.add(new Field(UNSTEMMED_FIELD_PREFIX + fieldName,
                        fieldValueStr, false, true, true));
        }

Also, I use a custom Analyzer for both indexing and searching that
understands this:

        public TokenStream tokenStream(String fieldName, Reader reader)
        {
                TokenStream result = new WISTokenizer(reader);
                result = new StandardFilter(result);
                result = new LowerCaseFilter(result);
                if (stoptable != null) {
                        result = new StopFilter(result, stoptable);
                }
                if (!fieldName.startsWith(UNSTEMMED_FIELD_PREFIX)) {
                        result = new SpellFilter(result);
                        result = new PorterStemFilter(result);
                }
                return result;
        }

For searching, I've written a custom parser using JavaCC (I need to
support more operators than Lucene does OOTB), as well as a
QueryBuilder class that constructs the queries "manually" for each node
type. For a quoted string (i.e. requiring an exact match):

        case JJTQUOTED:
                if (node.hasWildcard()) {
                        Node phraseNode = 
SimpleNode.getPhraseNode(node.getName());
                        query = getSpanQuery(new Node[]{phraseNode}, 
currentField, 0);
                }
                else {
                        // match quoted strings "exactly", i.e. without stemming
                        // NB: matches are case insensitive
                        String fieldToSearch = usingStemmingAnalyzer ?
                                UNSTEMMED_FIELD_PREFIX + currentField : 
currentField;
                        query = getTerminalQuery(node.getName(), fieldToSearch);
                }
                break;

and:

        protected Query getTerminalQuery(String term, String currentField)
                throws QueryBuildingException
        {
                Query q;
                try {
                        q = 
org.apache.lucene.queryParser.QueryParser.parse(term,
                                currentField, analyzer);
                }
                catch (org.apache.lucene.queryParser.ParseException e) {
                        throw new QueryBuildingException(e);
                }
                return q;
        }

There is, obviously, a fair amount of work involved, but the level of
control is the payoff.

-- Robert

--------------------
Robert Watkins
[EMAIL PROTECTED]
--------------------

On Mon, 20 Feb 2006, Erik Hatcher wrote:

Yes, this is what PerFieldAnalyzerWrapper provides for you, as described indetail in several sections of Lucene in Action:
        http://www.lucenebook.com/search?query=PerFieldAnalyzerWrapper

Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: exact match ..

Reply via email to