Mailing list is broken. ----------------------- Key: NUTCH-826 URL: https://issues.apache.org/jira/browse/NUTCH-826 Project: Nutch Issue Type: Bug Reporter: John Sherwood Priority: Blocker
All of the following addresses are failing: nutch-u...@nutch.apache.org nutch-user-subscr...@nutch.apache.org nutch-user-subscr...@lucene.apache.org For the last one, the mailer daemon said "This mailing list has moved to user at nutch.apache.org." Below is the message I tried to send: Hi people, I've been banging my head against this problem for two days now. Simply, I want to add a field with the value of a given meta tag. I've been trying the parse-xml plugin, but that seems that it doesn't work with version 1.0. I've tried the code at http://sujitpal.blogspot.com/2009/07/nutch-getting-my-feet-wet.html and it hasn't worked. I don't even know why. I don't even know if my plugin is being used... or even looked for! Nutch seems to have a infuriating "Fail silently" policy for plugins. I put a System.exit(1) in my filters just to see if my code is even being encountered. It has not in spite of my config telling it to. Here's my config: nutch-site.xml ... <property> <name>plugin.includes</name> <value>protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|metadata</value> </property> ... parse-plugins.xml ... <mimeType name="application/xhtml+xml"> <plugin id="parse-html" /> <plugin id="metadata" /> </mimeType> <mimeType name="text/html"> <plugin id="parse-html" /> <plugin id="metadata" /> </mimeType> <mimeType name="text/sgml"> <plugin id="parse-html" /> <plugin id="metadata" /> </mimeType> <mimeType name="text/xml"> <plugin id="parse-html" /> <plugin id="parse-rss" /> <plugin id="metadata" /> <plugin id="feed" /> </mimeType> ... <alias name="metadata" extension-id="com.example.website.nutch.parsing.MetaTagExtractorParseFilter" /> ... I've also copied the plugin.xml and jar from my build/metadata to the plugins root dir. Nonetheless, Nutch runs and puts data in solr for me. Afaik, Nutch is completely unaware of my plugin despite my config options. Is the some other place I need to tell Nutch to use my plugin? Is there some other approach to do this without having to write a plugin? This does seem like a lot of work to simply get a meta tag into a field. Any help would be appreciated. Sincerely, John Sherwood -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.