[jira] Updated: (NUTCH-445) Domain İndexing / Query Filter

2007-02-28 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated NUTCH-445:


Attachment: index_query_domain_v1.2.patch

This patch is an update of the previous three patches. 
The patch 
1. contains TranslatingRawFieldQueryFilter as an abstract implementation for 
searching certain fields in the index with a different query fieldname. 
2. index-basic indexes the domain and all super domains  in the domain field.
3.query-site is changed so that site:site_name will search domain:site_name

By this plugin we can search site:apache.org, and get results from 
http://issues.apache.org, etc. or we can search site:com to retrieve all .com 
domains. 


 Domain İndexing / Query Filter
 --

 Key: NUTCH-445
 URL: https://issues.apache.org/jira/browse/NUTCH-445
 Project: Nutch
  Issue Type: New Feature
  Components: indexer, searcher
Affects Versions: 0.9.0
Reporter: Enis Soztutar
 Attachments: index_query_domain_v1.0.patch, 
 index_query_domain_v1.1.patch, index_query_domain_v1.2.patch, 
 TranslatingRawFieldQueryFilter_v1.0.patch


 Hostname's contain information about the domain of th host, and all of the 
 subdomains. Indexing and Searching the domains are important for intuitive 
 behavior. 
 From DomainIndexingFilter javadoc : 
 Adds the domain(hostname) and all super domains to the index. 
  * br For http://lucene.apache.org/nutch/ the 
  * following will be added to the index : br 
  * ul
  * lilucene.apache.org /li
  * liapache/li
  * liorg /li
  * /ul
  * All hostnames are domain names, but not all the domain names are 
  * hostnames. In the above example hostname lucene is a 
  * subdomain of apache.org, which is itself a subdomain of 
  * org br
  * 
  
 Currently Basic indexing filter indexes the hostname in the site field, and 
 query-site plugin 
 allows to search in the site field. However site:apache.org will not return 
 http://lucene.apache.org
  By indexing the domain, we can be able to search domains. Unlike 
  the site field (indexed by BasicIndexingFilter) search, searching the 
  domain field allows us to retrieve lucene.apache.org to the query 
  apache.org. 
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-445) Domain İndexing / Query Filter

2007-02-15 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated NUTCH-445:


Attachment: index_query_domain_v1.0.patch

Patch for index-domain and query-domain plugins. 

 Domain İndexing / Query Filter
 --

 Key: NUTCH-445
 URL: https://issues.apache.org/jira/browse/NUTCH-445
 Project: Nutch
  Issue Type: New Feature
  Components: indexer, searcher
Affects Versions: 0.9.0
Reporter: Enis Soztutar
 Attachments: index_query_domain_v1.0.patch


 Hostname's contain information about the domain of th host, and all of the 
 subdomains. Indexing and Searching the domains are important for intuitive 
 behavior. 
 From DomainIndexingFilter javadoc : 
 Adds the domain(hostname) and all super domains to the index. 
  * br For http://lucene.apache.org/nutch/ the 
  * following will be added to the index : br 
  * ul
  * lilucene.apache.org /li
  * liapache/li
  * liorg /li
  * /ul
  * All hostnames are domain names, but not all the domain names are 
  * hostnames. In the above example hostname lucene is a 
  * subdomain of apache.org, which is itself a subdomain of 
  * org br
  * 
  
 Currently Basic indexing filter indexes the hostname in the site field, and 
 query-site plugin 
 allows to search in the site field. However site:apache.org will not return 
 http://lucene.apache.org
  By indexing the domain, we can be able to search domains. Unlike 
  the site field (indexed by BasicIndexingFilter) search, searching the 
  domain field allows us to retrieve lucene.apache.org to the query 
  apache.org. 
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-445) Domain İndexing / Query Filter

2007-02-15 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated NUTCH-445:


Attachment: TranslatingRawFieldQueryFilter_v1.0.patch

This patch complements index_query_domain_v1.0.patch. 

However, The class TranslatingRawFieldQueryFilter can be used independently, so 
i have put this in a seperate file. The javadoc reads : 

 * Similar to [EMAIL PROTECTED] RawFieldQueryFilter} except that the index 
 * and query field names can be different. 
 * br
 * This class can be extended by codeQueryFilter/codes to allow 
 * searching a field in the index, but using another field name in the 
 * search. 
 * br
 * For example index field names can be kept in english such as content, 
 * lang, title, ..., however query filters can be build in other languages 

 Domain İndexing / Query Filter
 --

 Key: NUTCH-445
 URL: https://issues.apache.org/jira/browse/NUTCH-445
 Project: Nutch
  Issue Type: New Feature
  Components: indexer, searcher
Affects Versions: 0.9.0
Reporter: Enis Soztutar
 Attachments: index_query_domain_v1.0.patch, 
 TranslatingRawFieldQueryFilter_v1.0.patch


 Hostname's contain information about the domain of th host, and all of the 
 subdomains. Indexing and Searching the domains are important for intuitive 
 behavior. 
 From DomainIndexingFilter javadoc : 
 Adds the domain(hostname) and all super domains to the index. 
  * br For http://lucene.apache.org/nutch/ the 
  * following will be added to the index : br 
  * ul
  * lilucene.apache.org /li
  * liapache/li
  * liorg /li
  * /ul
  * All hostnames are domain names, but not all the domain names are 
  * hostnames. In the above example hostname lucene is a 
  * subdomain of apache.org, which is itself a subdomain of 
  * org br
  * 
  
 Currently Basic indexing filter indexes the hostname in the site field, and 
 query-site plugin 
 allows to search in the site field. However site:apache.org will not return 
 http://lucene.apache.org
  By indexing the domain, we can be able to search domains. Unlike 
  the site field (indexed by BasicIndexingFilter) search, searching the 
  domain field allows us to retrieve lucene.apache.org to the query 
  apache.org. 
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.