On Sunday, 2012-09-23, Josh Wills wrote: > On Sun, Sep 23, 2012 at 2:11 AM, Matthias Friedrich <[email protected]> wrote: >> On Saturday, 2012-09-22, Josh Wills wrote: >> [...] > Ah, okay. So what we want for debugging is the Hadoop WARN logs. When > a hadoop job fails on the cluster, we have those logs available on the > JobTracker webpage (at least, I do in CDH, I assume it works the same > way in Hadoop 1.0.3), so enableDebug doesn't do anything for us > (besides altering the Configuration to force Crunch to put try-catch > blocks around the DoNode tasks, which I assume still works fine). I > use enableDebug to force the logging of Hadoop WARN statements on my > machine when I'm testing out pipelines, so in that case, it's only > effecting LocalJobRunner.
Yep. I think we could remove log4j.properties, the log4j setup code in enableDebug(), and the log4j dependency from Crunch and the behavior on the cluster should still be the same. The same holds for LocalJobRunner when running via "hadoop jar". Running the LocalJobRunner from the IDE is the problem because then we need a logging backend on the classpath. If we don't have log4j, then java.util.logging is used, which logs everything on INFO level. As soon as log4j is on the classpath, however, the user really needs a log4j.properties or log4j will complain that it doesn't have configuration (and logs nothing). > Given that, what's the best approach here? Javadoc statement on the > function indicating its intended use, or is there a better option? I'd say let's remove log4j.properties from Crunch, because users can't defend themselves against it. We have local applications at work that run some parts locally, without anything Hadoop-specific; shipping a log4j.properties with Crunch would cause problems for us. We could then add a log4j.properties to src/main/resources in the archetype with an explanation of when exactly this configuration is used (only when running from the IDE). We would keep enableDebug() with its setting of "crunch.debug", but remove the log4j code, and add a "provided" log4j dependency to the archetype (because log4j is missing from hadoop-core). Does this make sense? Will this give you the logging/debugging output that you need? [...] >> Ah, that reminds me: We haven't decided yet if we want an archetype in >> Crunch. > I want one. I thought you created it? I remember seeing an email-- if > I didn't reply, it was b/c I was in the midst of that crazy travel > week and my sleep schedule was off (honestly, I'm just now > recovering.) No worries, I'm a bit sleep-deprived myself so I can relate. With Gabriel we're +3 pro archetype, so I'll make a patch this weekend. Regards, Matthias
