[jira] Updated: (NUTCH-445) Domain İndexing / Query Filter
[ https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated NUTCH-445: Attachment: index_query_domain_v1.2.patch This patch is an update of the previous three patches. The patch 1. contains TranslatingRawFieldQueryFilter as an abstract implementation for searching certain fields in the index with a different query fieldname. 2. index-basic indexes the domain and all super domains in the domain field. 3.query-site is changed so that site:site_name will search domain:site_name By this plugin we can search site:apache.org, and get results from http://issues.apache.org, etc. or we can search site:com to retrieve all .com domains. Domain İndexing / Query Filter -- Key: NUTCH-445 URL: https://issues.apache.org/jira/browse/NUTCH-445 Project: Nutch Issue Type: New Feature Components: indexer, searcher Affects Versions: 0.9.0 Reporter: Enis Soztutar Attachments: index_query_domain_v1.0.patch, index_query_domain_v1.1.patch, index_query_domain_v1.2.patch, TranslatingRawFieldQueryFilter_v1.0.patch Hostname's contain information about the domain of th host, and all of the subdomains. Indexing and Searching the domains are important for intuitive behavior. From DomainIndexingFilter javadoc : Adds the domain(hostname) and all super domains to the index. * br For http://lucene.apache.org/nutch/ the * following will be added to the index : br * ul * lilucene.apache.org /li * liapache/li * liorg /li * /ul * All hostnames are domain names, but not all the domain names are * hostnames. In the above example hostname lucene is a * subdomain of apache.org, which is itself a subdomain of * org br * Currently Basic indexing filter indexes the hostname in the site field, and query-site plugin allows to search in the site field. However site:apache.org will not return http://lucene.apache.org By indexing the domain, we can be able to search domains. Unlike the site field (indexed by BasicIndexingFilter) search, searching the domain field allows us to retrieve lucene.apache.org to the query apache.org. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-445) Domain İndexing / Query Filter
[ https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated NUTCH-445: Attachment: index_query_domain_v1.0.patch Patch for index-domain and query-domain plugins. Domain İndexing / Query Filter -- Key: NUTCH-445 URL: https://issues.apache.org/jira/browse/NUTCH-445 Project: Nutch Issue Type: New Feature Components: indexer, searcher Affects Versions: 0.9.0 Reporter: Enis Soztutar Attachments: index_query_domain_v1.0.patch Hostname's contain information about the domain of th host, and all of the subdomains. Indexing and Searching the domains are important for intuitive behavior. From DomainIndexingFilter javadoc : Adds the domain(hostname) and all super domains to the index. * br For http://lucene.apache.org/nutch/ the * following will be added to the index : br * ul * lilucene.apache.org /li * liapache/li * liorg /li * /ul * All hostnames are domain names, but not all the domain names are * hostnames. In the above example hostname lucene is a * subdomain of apache.org, which is itself a subdomain of * org br * Currently Basic indexing filter indexes the hostname in the site field, and query-site plugin allows to search in the site field. However site:apache.org will not return http://lucene.apache.org By indexing the domain, we can be able to search domains. Unlike the site field (indexed by BasicIndexingFilter) search, searching the domain field allows us to retrieve lucene.apache.org to the query apache.org. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-445) Domain İndexing / Query Filter
[ https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated NUTCH-445: Attachment: TranslatingRawFieldQueryFilter_v1.0.patch This patch complements index_query_domain_v1.0.patch. However, The class TranslatingRawFieldQueryFilter can be used independently, so i have put this in a seperate file. The javadoc reads : * Similar to [EMAIL PROTECTED] RawFieldQueryFilter} except that the index * and query field names can be different. * br * This class can be extended by codeQueryFilter/codes to allow * searching a field in the index, but using another field name in the * search. * br * For example index field names can be kept in english such as content, * lang, title, ..., however query filters can be build in other languages Domain İndexing / Query Filter -- Key: NUTCH-445 URL: https://issues.apache.org/jira/browse/NUTCH-445 Project: Nutch Issue Type: New Feature Components: indexer, searcher Affects Versions: 0.9.0 Reporter: Enis Soztutar Attachments: index_query_domain_v1.0.patch, TranslatingRawFieldQueryFilter_v1.0.patch Hostname's contain information about the domain of th host, and all of the subdomains. Indexing and Searching the domains are important for intuitive behavior. From DomainIndexingFilter javadoc : Adds the domain(hostname) and all super domains to the index. * br For http://lucene.apache.org/nutch/ the * following will be added to the index : br * ul * lilucene.apache.org /li * liapache/li * liorg /li * /ul * All hostnames are domain names, but not all the domain names are * hostnames. In the above example hostname lucene is a * subdomain of apache.org, which is itself a subdomain of * org br * Currently Basic indexing filter indexes the hostname in the site field, and query-site plugin allows to search in the site field. However site:apache.org will not return http://lucene.apache.org By indexing the domain, we can be able to search domains. Unlike the site field (indexed by BasicIndexingFilter) search, searching the domain field allows us to retrieve lucene.apache.org to the query apache.org. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.