Re: Nested Queries

Steven Rowe Thu, 28 Dec 2006 08:22:55 -0800

Hi Kapil,

Kapil Chhabra wrote:
> Hi Steve,
> Thanks for the response.
> Actually I am not looking for a query language. My question is, whether
> Lucene supports Nested Queries or self joins?
> As per
> http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html
> 
> In BNF, the query grammar is:
> 
>   Query  ::= ( Clause )*
>   Clause ::= ["+", "-"] [<TERM> ":"] ( <TERM> | "(" Query ")" )
> 
> Which means that FIELD2:(FIELD2:3) is a correct query. Correct me if I
> am wrong.
> 
> What will this query translate into? Will it  be same as FIELD2: 1 OR
> FIELD2: 2


"FIELD2:(FIELD2:3)" translates to "FIELD2:3".  This is because the
FieldX in "FieldX:(TermA OR TermB)" is interpreted distributively - this
query is equivalent to "FieldX:TermA OR FieldX:TermB".  A field
specifier on a nested query term or clause overrides the containing
field specifier, so "FIELD1:(FIELD2:3)" translates to "FIELD2:3".

A more complicated example:

    "Field2:(Field3:TermA OR (Field4:TermB AND TermC))"

translates to:

    "Field3:TermA OR (Field4:TermB AND Field2:TermC)"

Lucene does have nested queries, but these are not the same thing as SQL
nested queries.  Unlike SQL nested queries, in which the nested query is
evaluated and the *results* of the nested query are used as input to the
containing query, Lucene's queries are evaluated all at once.

Of course, you could achieve (self) joins with Lucene manually, by
submitting two queries serially, first the nested query, and then the
containing query, constructed with results returned from the nested
query.  But I know of no built-in Lucene functionality that will invoke
the search machinery for you in this fashion[1].

>From <http://lucene.apache.org/java/docs/scoring.html>:

    Lucene scoring uses a combination of the Vector Space
    Model (VSM) of Information Retrieval[2] and the
    Boolean model[3] to determine how relevant a given
    Document is to a User's query. In general, the idea
    behind the VSM is the more times a query term appears
    in a document relative to the number of times the
    term appears in all the documents in the collection,
    the more relevant that document is to the query. It
    uses the Boolean model to first narrow down the
    documents that need to be scored based on the use of
    boolean logic in the Query specification.

Hope it helps,
Steve

[1] There is a tradition of using something like joins in Information
Retrieval: (Pseudo-)Relevance Feedback, in which a subset of the terms
found in a subset of the documents of an initial query's result set are
combined with the intial query's terms to produce an augmented query.
See Grant Ingersoll's ApacheCon 2005 presentation and code at
<http://www.cnlp.org/apachecon2005/> for an implementation of
Pseudo-Relevance Feedback using Lucene.
[2] <http://en.wikipedia.org/wiki/Vector_Space_Model>
[3] <http://en.wikipedia.org/wiki/Standard_Boolean_model>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Nested Queries

Reply via email to