[ https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542414 ]
Andrzej Bialecki commented on NUTCH-573: ----------------------------------------- I hope I didn't come across as arguing - your patch looks good from the technical point of view, I'm just trying to figure out the long-term impact of this patch. I agree, the full Lucene syntax is too complex - but even the Google syntax falls into the "advanced" category, i.e. you need to learn how to construct such query. As far as I could determine, Google indeed treats an infix comma as a list operator but only for some fields, such as inurl:. Try the following queries: {code} site:www.apache.org server site:www.cnn.com server site:www.apache.org,www.cnn.com server {code} For other fields, such as intitle, inanchor it gives inconsistent results (maybe I discovered a Google bug :) ). Regarding the question whether to enable it for any field: I think one important exception would be "raw fields", where a QueryFilter implementation wants to interpret the input token differently, and in such cases infix comma may be a valid token character. Perhaps we could add support for an escape character, which turns comma into a regular token character? > Multiple Domains - Query Search > ------------------------------- > > Key: NUTCH-573 > URL: https://issues.apache.org/jira/browse/NUTCH-573 > Project: Nutch > Issue Type: Improvement > Components: searcher > Affects Versions: 0.9.0 > Environment: All > Reporter: Rajasekar Karthik > Assignee: Enis Soztutar > Fix For: 1.0.0 > > Attachments: multiTermQuery_v1.patch > > > Searching multiple domains can be done on Lucene - nut not that efficiently > on nutch. > Query: > +content:"abc" +(site"www.aaa.com" site:"www.bbb.com") > works on lucene but the same concept does not work on nutch. > In Lucene, it works with > org.apache.lucene.analysis.KeywordAnalyzer > org.apache.lucene.analysis.standard.StandardAnalyzer > but NOT on > org.apache.lucene.analysis.SimpleAnalyzer > Is Nutch analyzer based on SimpleAnalyzer? In this case, is there a > workaround to make this work? Is there an option to change what analyzer > nutch is using? > Just FYI, another solution (inefficient I believe) which seems to be working > on nutch > <query> -site:"ccc.com" -site:"ddd.com" -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.