[jira] Commented: (NUTCH-386) Plugin to index categories by url rules

2009-01-22 Thread Beaucarnea (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666122#action_12666122
 ] 

Beaucarnea commented on NUTCH-386:
--

Did you activate the plugin not only on crawler side but also on searcher side? 
I mean, did you include the plugin in the nutch-site.xml of your 
Nutch-webapplication in Tomcat?

 Plugin to index categories by url rules
 ---

 Key: NUTCH-386
 URL: https://issues.apache.org/jira/browse/NUTCH-386
 Project: Nutch
  Issue Type: New Feature
  Components: indexer, searcher
Reporter: Ernesto De Santis
Priority: Minor
 Attachments: index-url-category-0.1.zip, index-url-category.jar


 The compressed zip has a install_notes.txt file with instructions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-185) XMLParser is configurable xml parser plugin.

2008-12-14 Thread Beaucarnea (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12656380#action_12656380
 ] 

Beaucarnea commented on NUTCH-185:
--

Is there an update of this plugin available for the current trunk? Or is this 
kind of functionality implemented elsewhere?

Thanks,
Beaucarnea

 XMLParser is configurable xml parser plugin.
 

 Key: NUTCH-185
 URL: https://issues.apache.org/jira/browse/NUTCH-185
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher, indexer
Affects Versions: 0.7.2, 0.8, 0.8.1
 Environment: OS Independent
Reporter: Rida Benjelloun
Assignee: Chris A. Mattmann
 Attachments: parse-xml.patch, parse-xml.zip, parse-xml.zip


 Xml parser  is configurable plugin. It use XPath and namespaces to do the 
 mapping between the XML elements and Lucene fields. 
 Informations :
 1- Copy xmlparser-conf.xml to the nutch/conf dir
 2- To index your custom XML file, you have to modify the 
 xmlparser-conf.xml. 
 This parser uses namespaces and XPATH to parse XML content
 The config file do the mapping between the XML noeds (using XPATH) and lucene 
 field. 
 Example : field name=dctitle xpath=//dc:title type=Text boost=1.4 / 
 3- The xmlIndexerProperties encapsulate a set of fields associated to a 
 namespace. 
 If the namespace is found in the xml document, the fields represented by the 
 namespace will be indexed.
 Example : 
 xmlIndexerProperties type=filePerDocument namespace= 
 http://purl.org/dc/elements/1.1/;
   field name=dctitle xpath=//dc:title type=Text boost= 1.4 / 
   field name=dccreator xpath=//dc:creator type=keyword boost= 1.0 / 
 /xmlIndexerProperties
 4- It is possible to define a default namespace that will be applied when the 
 parser 
 didn't find any namespace in the document or when the namespace found in the 
 xml document doesn't match with the namespace defined in the 
 xmlIndexerProperties. 
 Example :
 xmlIndexerProperties type=filePerDocument namespace=default
   field name=xmlcontent xpath=//* type=Unstored boost=1.0 / 
 /xmlIndexerProperties

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-563) Include custom fields in BasicQueryFilter

2008-12-11 Thread Beaucarnea (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12655679#action_12655679
 ] 

Beaucarnea commented on NUTCH-563:
--

I applied the patch on my dev-1.0 version, but had to change one line in method 
findAdditionalFields(Configuration conf):
Iterator confEntriesIterator = conf.entries();  
changed to   
Iterator confEntriesIterator = conf.iterator();

Then, it worked great for me.
Thanks!
Martina


 Include custom fields in BasicQueryFilter
 -

 Key: NUTCH-563
 URL: https://issues.apache.org/jira/browse/NUTCH-563
 Project: Nutch
  Issue Type: New Feature
  Components: searcher
Reporter: julien nioche
Priority: Minor
 Fix For: 1.0.0

 Attachments: diff.BasicQueryFilter.dynamicFields.txt


 This patch allows to include additional fields in the BasicQueryFilter by 
 specifying runtime parameters.  Any parameter matching the regular expression 
 (query\\.basic\\.(.+)\\.boost) will be added to the list of fields to be 
 used by the BQF and the specified float value will be used as boost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-386) Plugin to index categories by url rules

2008-12-03 Thread Beaucarnea (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beaucarnea updated NUTCH-386:
-

Attachment: index-url-category.jar

This plugin uses the deprecated org.apache.hadoop.io.UTF8 which caused an 
IOException.
I replaced it with org.apache.hadoop.io.Text and now the plugin works fine 
again.
The jar file contains the update.

 Plugin to index categories by url rules
 ---

 Key: NUTCH-386
 URL: https://issues.apache.org/jira/browse/NUTCH-386
 Project: Nutch
  Issue Type: New Feature
  Components: indexer, searcher
Reporter: Ernesto De Santis
Priority: Minor
 Attachments: index-url-category-0.1.zip, index-url-category.jar


 The compressed zip has a install_notes.txt file with instructions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.