Hi Kapil, Kapil Chhabra wrote: > Hi Steve, > Thanks for the response. > Actually I am not looking for a query language. My question is, whether > Lucene supports Nested Queries or self joins? > As per > http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html > > In BNF, the query grammar is: > > Query ::= ( Clause )* > Clause ::= ["+", "-"] [<TERM> ":"] ( <TERM> | "(" Query ")" ) > > Which means that FIELD2:(FIELD2:3) is a correct query. Correct me if I > am wrong. > > What will this query translate into? Will it be same as FIELD2: 1 OR > FIELD2: 2
"FIELD2:(FIELD2:3)" translates to "FIELD2:3". This is because the FieldX in "FieldX:(TermA OR TermB)" is interpreted distributively - this query is equivalent to "FieldX:TermA OR FieldX:TermB". A field specifier on a nested query term or clause overrides the containing field specifier, so "FIELD1:(FIELD2:3)" translates to "FIELD2:3". A more complicated example: "Field2:(Field3:TermA OR (Field4:TermB AND TermC))" translates to: "Field3:TermA OR (Field4:TermB AND Field2:TermC)" Lucene does have nested queries, but these are not the same thing as SQL nested queries. Unlike SQL nested queries, in which the nested query is evaluated and the *results* of the nested query are used as input to the containing query, Lucene's queries are evaluated all at once. Of course, you could achieve (self) joins with Lucene manually, by submitting two queries serially, first the nested query, and then the containing query, constructed with results returned from the nested query. But I know of no built-in Lucene functionality that will invoke the search machinery for you in this fashion[1]. >From <http://lucene.apache.org/java/docs/scoring.html>: Lucene scoring uses a combination of the Vector Space Model (VSM) of Information Retrieval[2] and the Boolean model[3] to determine how relevant a given Document is to a User's query. In general, the idea behind the VSM is the more times a query term appears in a document relative to the number of times the term appears in all the documents in the collection, the more relevant that document is to the query. It uses the Boolean model to first narrow down the documents that need to be scored based on the use of boolean logic in the Query specification. Hope it helps, Steve [1] There is a tradition of using something like joins in Information Retrieval: (Pseudo-)Relevance Feedback, in which a subset of the terms found in a subset of the documents of an initial query's result set are combined with the intial query's terms to produce an augmented query. See Grant Ingersoll's ApacheCon 2005 presentation and code at <http://www.cnlp.org/apachecon2005/> for an implementation of Pseudo-Relevance Feedback using Lucene. [2] <http://en.wikipedia.org/wiki/Vector_Space_Model> [3] <http://en.wikipedia.org/wiki/Standard_Boolean_model> --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]