[ https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Enis Soztutar updated NUTCH-445: -------------------------------- Attachment: TranslatingRawFieldQueryFilter_v1.0.patch This patch complements index_query_domain_v1.0.patch. However, The class TranslatingRawFieldQueryFilter can be used independently, so i have put this in a seperate file. The javadoc reads : * Similar to [EMAIL PROTECTED] RawFieldQueryFilter} except that the index * and query field names can be different. * <br> * This class can be extended by <code>QueryFilter</code>s to allow * searching a field in the index, but using another field name in the * search. * <br> * For example index field names can be kept in english such as "content", * "lang", "title", ..., however query filters can be build in other languages > Domain İndexing / Query Filter > ------------------------------ > > Key: NUTCH-445 > URL: https://issues.apache.org/jira/browse/NUTCH-445 > Project: Nutch > Issue Type: New Feature > Components: indexer, searcher > Affects Versions: 0.9.0 > Reporter: Enis Soztutar > Attachments: index_query_domain_v1.0.patch, > TranslatingRawFieldQueryFilter_v1.0.patch > > > Hostname's contain information about the domain of th host, and all of the > subdomains. Indexing and Searching the domains are important for intuitive > behavior. > From DomainIndexingFilter javadoc : > Adds the domain(hostname) and all super domains to the index. > * <br> For http://lucene.apache.org/nutch/ the > * following will be added to the index : <br> > * <ul> > * <li>lucene.apache.org </li> > * <li>apache</li> > * <li>org </li> > * </ul> > * All hostnames are domain names, but not all the domain names are > * hostnames. In the above example hostname lucene is a > * subdomain of apache.org, which is itself a subdomain of > * org <br> > * > > Currently Basic indexing filter indexes the hostname in the site field, and > query-site plugin > allows to search in the site field. However site:apache.org will not return > http://lucene.apache.org > By indexing the domain, we can be able to search domains. Unlike > the site field (indexed by BasicIndexingFilter) search, searching the > domain field allows us to retrieve lucene.apache.org to the query > apache.org. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.