[ https://issues.apache.org/jira/browse/NUTCH-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398836#comment-13398836 ]
Markus Jelsma commented on NUTCH-1407: -------------------------------------- We usually filter subscribers by host or a small group of hosts. This is, however, not feasible for subscribers with millions of sub domains. It is, in Solr, possible to achieve with copyFields and some regular expressions or a custom update processor but that is cumbersome. Doing it with Nutch and URLUtil has also the advantage that it understands domains with more than one extension/suffix. > BasicIndexingFilter to optionally add domain field > -------------------------------------------------- > > Key: NUTCH-1407 > URL: https://issues.apache.org/jira/browse/NUTCH-1407 > Project: Nutch > Issue Type: Improvement > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Fix For: 1.6 > > Attachments: NUTCH-1407-1.6-1.patch > > > The basic indexing filter already adds the host field to a NutchDocument but > no domain field. In Solr you can copyField a host field and obtain a domain > field but this is a bit cumbersome and not very user friendly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira