[ 
https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507473
 ] 

Doug Cutting commented on NUTCH-479:
------------------------------------

Neither.  It would end up as the Lucene query:

+"search phrase" +category:cat1 category:cat2

where category:cat2 is a non-required clause that just impacts ranking, not the 
set of documents returned.

As for nested queries, parsing is only half the problem.  The query filter 
plugins would need to be extended to handle such things, as they presently 
expect flat queries.

The query "foo bar" currently expands to a Lucene query that looks something 
like:

+(anchor:foo title:foo content:foo)
+(anchor:bar title:bar content:bar)
anchor:"foo bar"~10
title:"foo bar"~1000
content:"foo bar"~1000

(The latter three boost scores when terms are nearer.  Anchor proximity is 
limited, to keep from matching anchors from other documents.)

So, how should (foo AND (bar OR baz) expand?  Probably something like:

+(anchor:foo title:foo content:foo)
+((anchor:bar title:bar content:bar)
    (anchor:baz title:baz content:baz))
... proximity boosting clauses?...

And (foo OR (bar AND baz)) might expand to:

(anchor:foo title:foo content:foo)
(+(anchor:bar title:bar content:bar)
 +(anchor:baz title:baz content:baz))
... proximity boosting clauses?...

This expansion is done by the query-basic plugin.


> Support for OR queries
> ----------------------
>
>                 Key: NUTCH-479
>                 URL: https://issues.apache.org/jira/browse/NUTCH-479
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>             Fix For: 1.0.0
>
>         Attachments: or.patch
>
>
> There have been many requests from users to extend Nutch query syntax to add 
> support for OR queries, in addition to the implicit AND and NOT queries 
> supported now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to