Re: [jira] Commented: (NUTCH-634) Patch - Nutch - Hadoop 0.17.0

Andrzej Bialecki Mon, 30 Jun 2008 13:48:23 -0700

Lincoln Ritter wrote:

Just to clarify: Andrzej, the resolution you speak of in 0.19 - is
that resolution independent of Michael's patch?

Yes, this is something that will be submitted in a separate Hadoop JIRAissue.


I think any solution with less code is preferable, so a configuration
change seems like a great way to go.  (I didn't realize one could
change hadoop parameters from the nutch config!)

Nutch configuration files are loaded later than Hadoop config files, andany properties defined there, which are not already declared "final" inHadoop, can be overridden. Usually you don't notice this, because Nutchuses property names that don't collide with Hadoop property names. Also,this mechanism was a bit different in older versions of Hadoop, wherewhole resources were declared "final" instead of individual properties.

That being said, well
defined Hadoop behavior shouldn't break Nutch,

But that's the problem - this Hadoop feature is ill-defined, and it evenbreaks internal Hadoop classes such as MapFileOutputFormat.getReaders().

 so exposing a public
interface for "special" files (like hidden files) I think is a good
idea.  Nutch mysteriously breaking because it can't determine its
input properly seems much more confusing (to a user anyway) than an
additional few lines of code.

Well, generally speaking I agree - but in this particular case it's aHadoop mis-feature that needs to be avoided for the time being. We can'tfix this bug in Hadoop 0.17 or 0.18, only in 0.19 (and then perhaps itcan be backported to 0.17.1 or 0.18.1).



--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: [jira] Commented: (NUTCH-634) Patch - Nutch - Hadoop 0.17.0

Reply via email to