Ok, so of course you're right, that it's not a nutch-default.xml or
nutch-site.xml file issue. The first problem was an extension id
mismatch between the parse-plugins.xml and my plugin.xml file, which is
why I didn't see the issue with my urlfilter. However, in both my
plugin.xml and the default parse-rss/plugin.xml the following
contentType nodes exist, yet in the log during the crawl I get this
warning for both of them:

[but its plugin.xml file does not claim to support contentType:
application/rdf+xml]

This is my implementation node. I cannot seem to find any examples or
docs in the wiki that explain the appropriate way to map multiple
contentTypes in one plugin.xml

<implementation id="com.clipblast.plugin.videofilter.StoringRssParser"
 
class="com.clipblast.plugin.videofilter.StoringRssParser">
                          <parameter name="contentType"
value="application/rdf+xml"/>
                          <parameter name="contentType"
value="application/rss+xml"/>
                          <parameter name="contentType"
value="application/atom+xml"/>
                          <parameter name="contentType"
value="application/xml"/>
                          <parameter name="contentType"
value="text/xml"/>
  </implementation>

pb

-----Original Message-----
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] 
Sent: Sunday, May 27, 2007 10:05 AM
To: [EMAIL PROTECTED]
Subject: Re: nutch-site.xml vs. nutch-default.xml


patrik wrote:
> Odd, I'm running 0.8 on FC5, and only noticed this last night. The 
> even odder thing in my case was it only applied to parse plugins. A 
> urlfilter only specified in the nutch-site.xml was fine.

What you all describe is quite unlikely ... Config resources are loaded 
in a specific order, and if this modified nutch-site.xml was read at all

the properties specified there will always override values specified in 
nutch-default.xml.

The only possible explanations that come to my mind are these:

* you misspelled a property name in nutch-site.xml - it may happen.

* you changed the file in WEB-INF/classes, and then reloaded the webapp,

and your servlet container re-deployed the app from the original war 
file, thus overwriting your modified file.

* you have other copies of (unchanged) nutch-site.xml on your classpath,

or inside jars loaded on your classpath, with identical name (basically,

"./nutch-site.xml", which take precedence over your modified
nutch-site.xml

You can change your log4j.properties to activate DEBUG level (or use 
-Dhadoop.root.logger=DEBUG,console -Dnutch.root.logger=DEBUG on the 
command-line), and check the log - there should be an information which 
config files are loaded in what order and from what locations.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web ___|||__||
\|  ||  |  Embedded Unix, System Integration http://www.sigram.com
Contact: info at sigram dot com


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to