[ https://issues.apache.org/jira/browse/NUTCH-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870528#action_12870528 ]
Julien Nioche commented on NUTCH-826: ------------------------------------- Nutch has recently become a TLP and some of the info on the website needs updating. To subscribe to the list, send a message to: <user-subscr...@nutch.apache.org> To remove your address from the list, send a message to: <user-unsubscr...@nutch.apache.org> Send mail to the following for info and FAQ for this list: <user-i...@nutch.apache.org> <user-...@nutch.apache.org> PS : this is hardly a blocker > Mailing list is broken. > ----------------------- > > Key: NUTCH-826 > URL: https://issues.apache.org/jira/browse/NUTCH-826 > Project: Nutch > Issue Type: Bug > Reporter: John Sherwood > Priority: Blocker > > All of the following addresses are failing: > nutch-u...@nutch.apache.org > nutch-user-subscr...@nutch.apache.org > nutch-user-subscr...@lucene.apache.org > For the last one, the mailer daemon said > "This mailing list has moved to user at nutch.apache.org." > Below is the message I tried to send: > Hi people, > I've been banging my head against this problem for two days now. > Simply, I want to add a field with the value of a given meta tag. > I've been trying the parse-xml plugin, but that seems that it doesn't > work with version 1.0. I've tried the code at > http://sujitpal.blogspot.com/2009/07/nutch-getting-my-feet-wet.html > and it hasn't worked. I don't even know why. I don't even know if my > plugin is being used... or even looked for! Nutch seems to have a > infuriating "Fail silently" policy for plugins. I put a > System.exit(1) in my filters just to see if my code is even being > encountered. It has not in spite of my config telling it to. > Here's my config: > nutch-site.xml > ... > <property> > <name>plugin.includes</name> > > <value>protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|metadata</value> > </property> > ... > parse-plugins.xml > ... > <mimeType name="application/xhtml+xml"> > <plugin id="parse-html" /> > <plugin id="metadata" /> > </mimeType> > <mimeType name="text/html"> > <plugin id="parse-html" /> > <plugin id="metadata" /> > </mimeType> > <mimeType name="text/sgml"> > <plugin id="parse-html" /> > <plugin id="metadata" /> > </mimeType> > <mimeType name="text/xml"> > <plugin id="parse-html" /> > <plugin id="parse-rss" /> > <plugin id="metadata" /> > <plugin id="feed" /> > </mimeType> > ... > <alias name="metadata" > extension-id="com.example.website.nutch.parsing.MetaTagExtractorParseFilter" > /> > ... > I've also copied the plugin.xml and jar from my build/metadata to the > plugins root dir. > Nonetheless, Nutch runs and puts data in solr for me. Afaik, Nutch is > completely unaware of my plugin despite my config options. Is the > some other place I need to tell Nutch to use my plugin? Is there some > other approach to do this without having to write a plugin? This does > seem like a lot of work to simply get a meta tag into a field. Any > help would be appreciated. > Sincerely, > John Sherwood -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.