[ 
https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542389
 ] 

Enis Soztutar commented on NUTCH-573:
-------------------------------------

bq. Using commas is IMHO not intuitive

With your respect I should disagree. We cannot expect search users to type 
queries of the form +(site:www.somesite.com site:www.foo.com). Last time I 
checked google used comma syntax. I think that supporting only a subset of the 
query syntax that lucene supports was the initial intention to implement 
another query parser for nutch, so that ordinary search users will not get 
confused, and they can use the de-facto syntax.   

bq. Also, I'm not sure if the original reporter asked for a generic solution 
that would work with every field - if the issue at hand is just the site: 
field, then we can use "raw field" and a RawQueryFilter to parse multiple terms 
within the SiteQueryFilter implementation, without changing the global query 
syntax.
The original intention was to allow this in only site queries, howeve i cannot 
see a reason to not enable this for other fields. 




> Multiple Domains - Query Search
> -------------------------------
>
>                 Key: NUTCH-573
>                 URL: https://issues.apache.org/jira/browse/NUTCH-573
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 0.9.0
>         Environment: All
>            Reporter: Rajasekar Karthik
>            Assignee: Enis Soztutar
>             Fix For: 1.0.0
>
>         Attachments: multiTermQuery_v1.patch
>
>
> Searching multiple domains can be done on Lucene - nut not that efficiently 
> on nutch.
> Query:
> +content:"abc" +(site"www.aaa.com" site:"www.bbb.com")
> works on lucene but the same concept does not work on nutch.
> In Lucene, it works with 
> org.apache.lucene.analysis.KeywordAnalyzer
> org.apache.lucene.analysis.standard.StandardAnalyzer 
> but NOT on
> org.apache.lucene.analysis.SimpleAnalyzer 
> Is Nutch analyzer based on SimpleAnalyzer? In this case, is there a 
> workaround to make this work? Is there an option to change what analyzer 
> nutch is using? 
> Just FYI, another solution (inefficient I believe) which seems to be working 
> on nutch
> <query> -site:"ccc.com" -site:"ddd.com" 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to