Re: "Advanced" query language

markharw00d Sat, 03 Dec 2005 10:01:21 -0800

Erik Hatcher wrote:

Rest assured that human-readable query expressions aren't going awayat all. I don't think Mark even implied that.

That's right. The proposal is *not* to replace what is already there -QueryParser will always have a useful role to play supporting the"Google-like" query syntax familiar to millions.I'd just like to see another full-featured query representation for thereasons already outlined.


Picking up on some points raised:

Re: MoreLikeThis queries.

Yes, they can be usefully wrapped as queries (see attached simpleexample). In fact it was my attempts at bastardising QueryParser tosupport them that brought home it's limitations. I ended up with asubclass hack that (mis)used the field name to parse a query string"like:123" where 123 was a doc id. With the QueryParser syntax I was notable to pass other parameters which MoreLikeThis could usefully use tocontrol the behaviour of this query type eg choice of fieldname(s) used,max number of terms generated, minNumberShouldTerms to match etc etc.This is not unusual, each query type has potentially multiple optionalparameters that tweak it's behaviour. If I don't have a query languagethat names the parameters explicitly (say, XML) I end up having todefine what looks like a function with a long list of parameters: "like(123,,,4,,,)". Ack.

Here's a psuedo-code example that throws together some of the moreobscure parts of Lucene not represented in the existing QueryParser asan illustration of how this could look in a more wide-reaching parser.Imagine the user has selected an example doc #44 as something they areinterested in, on the subject of "hockey" but they prefer to seedocuments that don't talk about ice hockey


<BoostingQuery>
            <MatchQuery>

<MoreLikeThisQuery percentTermsToMatch="0.25f"docId="44">

                                    <CompareField name="contents"/>
                                    <CompareField name="title"/>
                        </MoreLikeThis>
            </MatchQuery>
            <DowngradeQuery demoteValue="0.5" >
                     <SimpleQuery defaultField="contents">

<queryText>"ice hockey" OR puck ORrink</queryText>

                     </SimpleQuery>
            </DowngradeQuery>
</BoostingQuery>

BoostingQuery is a class that can use a second query to demote theresults of a first query if it matches (see here:http://wiki.apache.org/jakarta-lucene/CommunityContributions)For this and other forms of query to be able to plug into new parser theQuery objects just need to adhere to bean conventions to beautomatically wired in an ANT/Spring like way using reflection.For example, the implementation of BoostingQuery would need to havegetter/setter properties for "MatchQuery" and "downgradeQuery".Note in this example that the existing QueryParser syntax is usefullyused in "SimpleQuery" to avoid making the XML too verbose.

There's much detail to be added in how this would work in practice but Ithought I'd post it here to show the general shape of one possibledirection.

package com.inperspective.lucene.query;

import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.Query;

/**
 * A simple wrapper for MoreLikeThis for use in scenarios where a Query object 
is required eg
 * in custom QueryParser extensions. At query.rewrite() time the reader is used 
to construct the
 * actual MoreLikeThis object and obtain the real Query object.
 * TODO write JUnit Test
 * @author maharwood
 */
public class MoreLikeThisQuery extends Query
{

    
    private int docId;
    private String[] moreLikeFields;
    private Analyzer analyzer;    
    float percentTermsToMatch=0.5f;
    /**
     * @param docId
     * @param moreLikeFields
     */
    public MoreLikeThisQuery(int docId, String[] moreLikeFields, Analyzer 
analyzer)
    {
        this.docId=docId;
        this.moreLikeFields=moreLikeFields;
        this.analyzer=analyzer;
    }
    public Query rewrite(IndexReader reader) throws IOException
    {
        MoreLikeThis mlt=new MoreLikeThis(reader);
        mlt.setFieldNames(moreLikeFields);
        mlt.setAnalyzer(analyzer);
        BooleanQuery bq= (BooleanQuery) mlt.like(docId);        
        BooleanClause[] clauses = bq.getClauses();
        bq.setMinimumNumberShouldMatch((int)(clauses.length* 
percentTermsToMatch));
        return bq;
    }
    /* (non-Javadoc)
     * @see org.apache.lucene.search.Query#toString(java.lang.String)
     */
    public String toString(String field)
    {       
        return "like:"+docId;
    }
        public float getPercentTermsToMatch()
        {
                return percentTermsToMatch;
        }
        public void setPercentTermsToMatch(float percentTermsToMatch)
        {
                this.percentTermsToMatch = percentTermsToMatch;
        }

}

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: "Advanced" query language

Reply via email to