On Sat, Sep 22, 2012 at 9:05 AM, Matthias Friedrich <[email protected]> wrote: > On Saturday, 2012-09-22, Josh Wills wrote: >> On Sat, Sep 22, 2012 at 12:52 AM, Matthias Friedrich <[email protected]> wrote: > [...] >>> 1) Crunch currently ships a log4j.properties which can have precedence >>> over users' log4j.properties, depending on classpath order. Libraries >>> should never ship logging config as it forces users to repackage >>> Crunch if they want to use their own. Our Nexus at work has a nice >>> collection of repackaged libs. > >> I'm on-board for this, with the caveat that we'll need to switch over >> the logging we do now (e.g., which stage of the pipeline is currently >> running, the URLs for job tracking, etc.) under a non-log4j based >> control scheme. > > Sorry, I don't get what you mean. Crunch logs via commons-logging > already. What do we need to change?
Perhaps I'm being dense-- once we remove log4j.properties from the core, won't we stop logging the Crunch status information unless the developer explicitly configures it in their own log4j.properties? That may of course be what the developer desires, but my thought was that we typically do want that information logged, and that repeating it in every log4j.properties file for every project would be tedious. Assuming that is the case, perhaps we could add a default log4j.properties file to the maven archetype that generates Crunch projects? > >>> 2) Discussion about Pipeline.enableDebug() came up in CRUNCH-70. I >>> believe it really shouldn't mess with logging configuration. Right now >>> it bypasses the commons-logging facade and directly accesses log4j, >>> causing a compile time dependency on log4j. It changes VM-wide state >>> beyond Crunch as other Hadoop-related code executed afterwards will >>> get changed logging config, too. And, most importantly, it's the >>> responsibility of the operations team, not the developer to configure >>> logging. Admins are used to log4j.properties, we shouldn't invent >>> another non-standard way of doing things that overrides the usual >>> way. > >> enableDebug is intended to be used by developers who are, well, >> debugging their MR pipelines, which is something that every MR >> developer spends a fair amount of time doing. I will argue against any >> changes that make debugging more difficult-- if anything, I would like >> to make it even easier. I don't draw a distinction between ops and >> development when it comes to creating pipelines. > > Hmm. One of the things I firmly believe in is that libraries (any > system actually) should cause the least amount of surprise in anything > it does. enableDebug() changes VM-wide logging settings and overrides > the well-understood configuration mechanism of log4j.properties. To me > this is an unexpected side effect, at the very least we'll have to put > a big fat warning in the javadocs. I'm okay with a big fat warning (all caps, etc.) in the javadocs. But I think the functionality is warranted, and I think it's reasonable for developers to expect that a lot more logging will occur when they put any library into debug mode. It is far more surprising, IMO, to put a library into debug mode and then not actually provide any of the information that will help debug the problem. -- Director of Data Science Cloudera Twitter: @josh_wills
