[jira] Commented: (NUTCH-386) Plugin to index categories by url rules
[ https://issues.apache.org/jira/browse/NUTCH-386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666122#action_12666122 ] Beaucarnea commented on NUTCH-386: -- Did you activate the plugin not only on crawler side but also on searcher side? I mean, did you include the plugin in the nutch-site.xml of your Nutch-webapplication in Tomcat? Plugin to index categories by url rules --- Key: NUTCH-386 URL: https://issues.apache.org/jira/browse/NUTCH-386 Project: Nutch Issue Type: New Feature Components: indexer, searcher Reporter: Ernesto De Santis Priority: Minor Attachments: index-url-category-0.1.zip, index-url-category.jar The compressed zip has a install_notes.txt file with instructions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-185) XMLParser is configurable xml parser plugin.
[ https://issues.apache.org/jira/browse/NUTCH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12656380#action_12656380 ] Beaucarnea commented on NUTCH-185: -- Is there an update of this plugin available for the current trunk? Or is this kind of functionality implemented elsewhere? Thanks, Beaucarnea XMLParser is configurable xml parser plugin. Key: NUTCH-185 URL: https://issues.apache.org/jira/browse/NUTCH-185 Project: Nutch Issue Type: New Feature Components: fetcher, indexer Affects Versions: 0.7.2, 0.8, 0.8.1 Environment: OS Independent Reporter: Rida Benjelloun Assignee: Chris A. Mattmann Attachments: parse-xml.patch, parse-xml.zip, parse-xml.zip Xml parser is configurable plugin. It use XPath and namespaces to do the mapping between the XML elements and Lucene fields. Informations : 1- Copy xmlparser-conf.xml to the nutch/conf dir 2- To index your custom XML file, you have to modify the xmlparser-conf.xml. This parser uses namespaces and XPATH to parse XML content The config file do the mapping between the XML noeds (using XPATH) and lucene field. Example : field name=dctitle xpath=//dc:title type=Text boost=1.4 / 3- The xmlIndexerProperties encapsulate a set of fields associated to a namespace. If the namespace is found in the xml document, the fields represented by the namespace will be indexed. Example : xmlIndexerProperties type=filePerDocument namespace= http://purl.org/dc/elements/1.1/; field name=dctitle xpath=//dc:title type=Text boost= 1.4 / field name=dccreator xpath=//dc:creator type=keyword boost= 1.0 / /xmlIndexerProperties 4- It is possible to define a default namespace that will be applied when the parser didn't find any namespace in the document or when the namespace found in the xml document doesn't match with the namespace defined in the xmlIndexerProperties. Example : xmlIndexerProperties type=filePerDocument namespace=default field name=xmlcontent xpath=//* type=Unstored boost=1.0 / /xmlIndexerProperties -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-563) Include custom fields in BasicQueryFilter
[ https://issues.apache.org/jira/browse/NUTCH-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12655679#action_12655679 ] Beaucarnea commented on NUTCH-563: -- I applied the patch on my dev-1.0 version, but had to change one line in method findAdditionalFields(Configuration conf): Iterator confEntriesIterator = conf.entries(); changed to Iterator confEntriesIterator = conf.iterator(); Then, it worked great for me. Thanks! Martina Include custom fields in BasicQueryFilter - Key: NUTCH-563 URL: https://issues.apache.org/jira/browse/NUTCH-563 Project: Nutch Issue Type: New Feature Components: searcher Reporter: julien nioche Priority: Minor Fix For: 1.0.0 Attachments: diff.BasicQueryFilter.dynamicFields.txt This patch allows to include additional fields in the BasicQueryFilter by specifying runtime parameters. Any parameter matching the regular expression (query\\.basic\\.(.+)\\.boost) will be added to the list of fields to be used by the BQF and the specified float value will be used as boost. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-386) Plugin to index categories by url rules
[ https://issues.apache.org/jira/browse/NUTCH-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beaucarnea updated NUTCH-386: - Attachment: index-url-category.jar This plugin uses the deprecated org.apache.hadoop.io.UTF8 which caused an IOException. I replaced it with org.apache.hadoop.io.Text and now the plugin works fine again. The jar file contains the update. Plugin to index categories by url rules --- Key: NUTCH-386 URL: https://issues.apache.org/jira/browse/NUTCH-386 Project: Nutch Issue Type: New Feature Components: indexer, searcher Reporter: Ernesto De Santis Priority: Minor Attachments: index-url-category-0.1.zip, index-url-category.jar The compressed zip has a install_notes.txt file with instructions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.