On Friday 02 December 2005 16:03, mark harwood wrote:
> There seems to be a growing gap between Lucene
> functionality and the query language offered by
> QueryParser (eg no support for regex queries, span
> queries, "more like this", filter queries,
> minNumShouldMatch etc etc).
> 
> Closing this gap is hard when:
> a) The availability of Javacc+Lucene skills is a
> bottleneck 
> b) The syntax of the query language makes it difficult
> to add new features eg rapidly running out of "special
> characters"
> 
> I don't think extending the existing query
> parser/language is necessarily useful and I see it
> being used purely to support the classic "simple
> search engine" syntax. 
> 
> Unfortunately the fall-back position for applications
> which require more complex queries is to "just write
> some Java code to instantiate the Query objects
> programmatically." This is OK but I think there is
> value in having an advanced search syntax capable of
> supporting the latest Lucene features and expressed in
> XML. It's worth considering why it's useful to have a
> String-representable form for queries:
> 1) Queries can be stored eg in audit logs or "saved
> queries" used for tasks like auto-categorization
> 2) Clients built in languages other than Java can
> issue queries to a Lucene server
> 3) I can decouple a request from the code that
> implements the query when distributing software e.g my
> applet may not want Lucene dragging down to the client
> 
> Currently we cannot easily do the above for any
> "complex" queries  because they are not easily
> persisted (yes, we could serialize Query objects but
> that seems messy and does not solve points 2 and 3).
> 
> We can potentially use XML in the same way ANT does
> i.e. a declarative way of invoking an extensible list
> of Java-implemented features. A query interpreter is
> used to instantiate the configured Java Query objects
> and populates them with settings from the XML in a
> generic fashion (using reflection) eg:
> ....
>    <MoreLikeThis minNumberShouldMatch="3"
> maxQueryTerms="30">
>       <text>
>     Lorem ipsum dolor sit amet, consectetuer
> adipiscing
>     elit. Morbi eget ante blandit quam faucibus
> posuere. Vivamus
>     porta, elit fringilla venenatis consequat, neque
> lectus
>     gravida dolor, sed cursus nunc elit non lorem.
> Nullam congue
>     orci id eros. Nunc aliquet posuere enim.
>       </text>
>    </MoreLikeThis>
> </BooleanClause>

Quidquid id est ...
Do we have a Latin analyzer?

> 
> Do people feel this would be a worthwhile endeavour?
> I'm not sure if enough people feel pain around the
> points 1-3 outlined above to make it worth pursuing.

There are at least two more issues:

Some queries can be nested inside others, and some
nesting combinations can not be searched. For example it is
not possible to have a BooleanQuery inside a PhraseQuery.
How to deal with these?

XML is not readable/writable by the most humans that could
make good use of the extra power in the gap left open
by the default query language. See also this:
http://ciir.cs.umass.edu/irdemo/inqinfo/inqueryhelp.html
Do you want to decouple (as above) at the human interface?


There is also the contrib/surround query language/
This language avoids using special characters by using prefix
operators. Adding prefix operators like this is straightforward:

moreLikeThis(3,  30,  termList(Lorem ipsum dolor sit amet))

for practical use, this could be simplified to:

mlt(3,  30,  (Lorem ipsum dolor sit amet))

Such additions are a bit of work, but the query possibilities of Lucene
do not change that fast.
Adding infix operators with operators in between their arguments
(infix) is a bit more involved.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to