Ok, so of course you're right, that it's not a nutch-default.xml or
nutch-site.xml file issue. The first problem was an extension id
mismatch between the parse-plugins.xml and my plugin.xml file, which is
why I didn't see the issue with my urlfilter. However, in both my
plugin.xml and the default parse-rss/plugin.xml the following
contentType nodes exist, yet in the log during the crawl I get this
warning for both of them:
[but its plugin.xml file does not claim to support contentType:
application/rdf+xml]
This is my implementation node. I cannot seem to find any examples or
docs in the wiki that explain the appropriate way to map multiple
contentTypes in one plugin.xml
<implementation id="com.clipblast.plugin.videofilter.StoringRssParser"
class="com.clipblast.plugin.videofilter.StoringRssParser">
<parameter name="contentType"
value="application/rdf+xml"/>
<parameter name="contentType"
value="application/rss+xml"/>
<parameter name="contentType"
value="application/atom+xml"/>
<parameter name="contentType"
value="application/xml"/>
<parameter name="contentType"
value="text/xml"/>
</implementation>
pb
-----Original Message-----
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
Sent: Sunday, May 27, 2007 10:05 AM
To: [EMAIL PROTECTED]
Subject: Re: nutch-site.xml vs. nutch-default.xml
patrik wrote:
> Odd, I'm running 0.8 on FC5, and only noticed this last night. The
> even odder thing in my case was it only applied to parse plugins. A
> urlfilter only specified in the nutch-site.xml was fine.
What you all describe is quite unlikely ... Config resources are loaded
in a specific order, and if this modified nutch-site.xml was read at all
the properties specified there will always override values specified in
nutch-default.xml.
The only possible explanations that come to my mind are these:
* you misspelled a property name in nutch-site.xml - it may happen.
* you changed the file in WEB-INF/classes, and then reloaded the webapp,
and your servlet container re-deployed the app from the original war
file, thus overwriting your modified file.
* you have other copies of (unchanged) nutch-site.xml on your classpath,
or inside jars loaded on your classpath, with identical name (basically,
"./nutch-site.xml", which take precedence over your modified
nutch-site.xml
You can change your log4j.properties to activate DEBUG level (or use
-Dhadoop.root.logger=DEBUG,console -Dnutch.root.logger=DEBUG on the
command-line), and check the log - there should be an information which
config files are loaded in what order and from what locations.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__||
\| || | Embedded Unix, System Integration http://www.sigram.com
Contact: info at sigram dot com
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general