Erik Hatcher wrote:

Rest assured that human-readable query expressions aren't going away at all. I don't think Mark even implied that.


That's right. The proposal is *not* to replace what is already there - QueryParser will always have a useful role to play supporting the "Google-like" query syntax familiar to millions. I'd just like to see another full-featured query representation for the reasons already outlined.

Picking up on some points raised:

Re: MoreLikeThis queries.
Yes, they can be usefully wrapped as queries (see attached simple example). In fact it was my attempts at bastardising QueryParser to support them that brought home it's limitations. I ended up with a subclass hack that (mis)used the field name to parse a query string "like:123" where 123 was a doc id. With the QueryParser syntax I was not able to pass other parameters which MoreLikeThis could usefully use to control the behaviour of this query type eg choice of fieldname(s) used, max number of terms generated, minNumberShouldTerms to match etc etc. This is not unusual, each query type has potentially multiple optional parameters that tweak it's behaviour. If I don't have a query language that names the parameters explicitly (say, XML) I end up having to define what looks like a function with a long list of parameters: "like (123,,,4,,,)". Ack.

Here's a psuedo-code example that throws together some of the more obscure parts of Lucene not represented in the existing QueryParser as an illustration of how this could look in a more wide-reaching parser. Imagine the user has selected an example doc #44 as something they are interested in, on the subject of "hockey" but they prefer to see documents that don't talk about ice hockey

<BoostingQuery>
            <MatchQuery>
<MoreLikeThisQuery percentTermsToMatch="0.25f" docId="44">
                                    <CompareField name="contents"/>
                                    <CompareField name="title"/>
                        </MoreLikeThis>
            </MatchQuery>
            <DowngradeQuery demoteValue="0.5" >
                     <SimpleQuery defaultField="contents">
<queryText>"ice hockey" OR puck OR rink</queryText>
                     </SimpleQuery>
            </DowngradeQuery>
</BoostingQuery>

BoostingQuery is a class that can use a second query to demote the results of a first query if it matches (see here: http://wiki.apache.org/jakarta-lucene/CommunityContributions) For this and other forms of query to be able to plug into new parser the Query objects just need to adhere to bean conventions to be automatically wired in an ANT/Spring like way using reflection. For example, the implementation of BoostingQuery would need to have getter/setter properties for "MatchQuery" and "downgradeQuery". Note in this example that the existing QueryParser syntax is usefully used in "SimpleQuery" to avoid making the XML too verbose.

There's much detail to be added in how this would work in practice but I thought I'd post it here to show the general shape of one possible direction.






package com.inperspective.lucene.query;

import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.Query;

/**
 * A simple wrapper for MoreLikeThis for use in scenarios where a Query object 
is required eg
 * in custom QueryParser extensions. At query.rewrite() time the reader is used 
to construct the
 * actual MoreLikeThis object and obtain the real Query object.
 * TODO write JUnit Test
 * @author maharwood
 */
public class MoreLikeThisQuery extends Query
{

    
    private int docId;
    private String[] moreLikeFields;
    private Analyzer analyzer;    
    float percentTermsToMatch=0.5f;
    /**
     * @param docId
     * @param moreLikeFields
     */
    public MoreLikeThisQuery(int docId, String[] moreLikeFields, Analyzer 
analyzer)
    {
        this.docId=docId;
        this.moreLikeFields=moreLikeFields;
        this.analyzer=analyzer;
    }
    public Query rewrite(IndexReader reader) throws IOException
    {
        MoreLikeThis mlt=new MoreLikeThis(reader);
        mlt.setFieldNames(moreLikeFields);
        mlt.setAnalyzer(analyzer);
        BooleanQuery bq= (BooleanQuery) mlt.like(docId);        
        BooleanClause[] clauses = bq.getClauses();
        bq.setMinimumNumberShouldMatch((int)(clauses.length* 
percentTermsToMatch));
        return bq;
    }
    /* (non-Javadoc)
     * @see org.apache.lucene.search.Query#toString(java.lang.String)
     */
    public String toString(String field)
    {       
        return "like:"+docId;
    }
        public float getPercentTermsToMatch()
        {
                return percentTermsToMatch;
        }
        public void setPercentTermsToMatch(float percentTermsToMatch)
        {
                this.percentTermsToMatch = percentTermsToMatch;
        }

}

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to