Lincoln Ritter wrote:
Just to clarify: Andrzej, the resolution you speak of in 0.19 - is
that resolution independent of Michael's patch?

Yes, this is something that will be submitted in a separate Hadoop JIRA issue.


I think any solution with less code is preferable, so a configuration
change seems like a great way to go.  (I didn't realize one could
change hadoop parameters from the nutch config!)

Nutch configuration files are loaded later than Hadoop config files, and any properties defined there, which are not already declared "final" in Hadoop, can be overridden. Usually you don't notice this, because Nutch uses property names that don't collide with Hadoop property names. Also, this mechanism was a bit different in older versions of Hadoop, where whole resources were declared "final" instead of individual properties.

That being said, well
defined Hadoop behavior shouldn't break Nutch,

But that's the problem - this Hadoop feature is ill-defined, and it even breaks internal Hadoop classes such as MapFileOutputFormat.getReaders().

 so exposing a public
interface for "special" files (like hidden files) I think is a good
idea.  Nutch mysteriously breaking because it can't determine its
input properly seems much more confusing (to a user anyway) than an
additional few lines of code.

Well, generally speaking I agree - but in this particular case it's a Hadoop mis-feature that needs to be avoided for the time being. We can't fix this bug in Hadoop 0.17 or 0.18, only in 0.19 (and then perhaps it can be backported to 0.17.1 or 0.18.1).


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to