The fix for this was just merged into branch 0.9 (will be in 0.9.1+) and master.
On Sun, Feb 9, 2014 at 11:44 PM, Patrick Wendell <pwend...@gmail.com> wrote: > Thanks Paul - it isn't mean to be a "full solution" but just a fix for > the 0.9 branch - for the full solution there is another PR by Sean > Owen. > > On Sun, Feb 9, 2014 at 11:35 PM, Paul Brown <p...@mult.ifario.us> wrote: >> Hi, Patrick -- >> >> I gave that a go locally, and it works as desired. >> >> Best. >> -- Paul >> >> -- >> p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ >> >> >> On Fri, Feb 7, 2014 at 6:10 PM, Patrick Wendell <pwend...@gmail.com> wrote: >> >>> Ah okay sounds good. This is what I meant earlier by "You have >>> some other application that directly calls log4j."... i.e. you have >>> for historical reasons installed the log4j-over-slf4j. >>> >>> Would you mind trying out this fix and seeing if it works? This is >>> designed to be a hotfix for 0.9, not a general solution where we rip >>> out log4j from our published dependencies: >>> >>> https://github.com/apache/incubator-spark/pull/560/files >>> >>> - Patrick >>> >>> On Fri, Feb 7, 2014 at 5:57 PM, Paul Brown <p...@mult.ifario.us> wrote: >>> > Hi, Patrick -- >>> > >>> > I forget which other component is responsible, but we're using the >>> > log4j-over-slf4j as part of an overall requirement to centralize logging, >>> > i.e., *someone* else is logging over log4j and we're pulling that in. >>> > (There's also some jul logging from Jersey, etc.) >>> > >>> > Goals: >>> > >>> > - Fully control/capture all possible logging. (God forbid we have to >>> grab >>> > System.out/err, but we'd do it if needed.) >>> > - Use the backend we like best at the moment. (Happens to be logback.) >>> > >>> > Possible cases: >>> > >>> > - If Spark used Log4j at all, we would pull in that logging via >>> > log4j-over-slf4j. >>> > - If Spark used only slf4j and referenced no backend, we would use it >>> as-is >>> > although we'd still have the log4j-over-slf4j because of other libraries. >>> > - If Spark used only slf4j and referenced the slf4j-log4j12 backend, we >>> > would exclude that one dependency (via our POM). >>> > >>> > Best. >>> > -- Paul >>> > >>> > >>> > -- >>> > p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ >>> > >>> > >>> > On Fri, Feb 7, 2014 at 5:38 PM, Patrick Wendell <pwend...@gmail.com> >>> wrote: >>> > >>> >> Hey Paul, >>> >> >>> >> So if your goal is ultimately to output to logback. Then why don't you >>> >> just use slf4j and logback-classic.jar as described here [1]. Why >>> >> involve log4j-over-slf4j at all? >>> >> >>> >> Let's say we refactored the spark build so it didn't advertise >>> >> slf4j-log4j12 as a dependency. Would you still be using >>> >> log4j-over-slf4j... or is this just a "fix" to deal with the fact that >>> >> Spark is somewhat log4j dependent at this point. >>> >> >>> >> [1] http://www.slf4j.org/manual.html >>> >> >>> >> - Patrick >>> >> >>> >> On Fri, Feb 7, 2014 at 5:14 PM, Paul Brown <p...@mult.ifario.us> wrote: >>> >> > Hi, Patrick -- >>> >> > >>> >> > That's close but not quite it. >>> >> > >>> >> > The issue that occurs is not the delegation loop mentioned in slf4j >>> >> > documentation. The stack overflow is entirely within the code in the >>> >> Spark >>> >> > trait: >>> >> > >>> >> > at org.apache.spark.Logging$class.initializeLogging(Logging.scala:112) >>> >> > at >>> org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97) >>> >> > at org.apache.spark.Logging$class.log(Logging.scala:36) >>> >> > at org.apache.spark.SparkEnv$.log(SparkEnv.scala:94) >>> >> > >>> >> > >>> >> > And then that repeats. >>> >> > >>> >> > As for our situation, we exclude the slf4j-log4j12 dependency when we >>> >> > import the Spark library (because we don't want to use log4j) and have >>> >> > log4j-over-slf4j already in place to ensure that all of the logging in >>> >> the >>> >> > overall application runs through slf4j and then out through logback. >>> (We >>> >> > also, as another poster already mentioned, also force jcl and jul >>> through >>> >> > slf4j.) >>> >> > >>> >> > The zen of slf4j for libraries is that the library uses the slf4j API >>> and >>> >> > then the enclosing application can route logging as it sees fit. >>> Spark >>> >> > master CLI would log via slf4j and include the slf4j-log4j12 backend; >>> >> same >>> >> > for Spark worker CLI. Spark as a library (versus as a container) >>> would >>> >> not >>> >> > include any backend to the slf4j API and leave this up to the >>> >> application. >>> >> > (FWIW, this would also avoid your log4j warning message.) >>> >> > >>> >> > But as I was saying before, I'd be happy with a situation where I can >>> >> avoid >>> >> > log4j being enabled or configured, and I think you'll find an existing >>> >> > choice of logging framework to be a common scenario for those >>> embedding >>> >> > Spark in other systems. >>> >> > >>> >> > Best. >>> >> > -- Paul >>> >> > >>> >> > -- >>> >> > p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ >>> >> > >>> >> > >>> >> > On Fri, Feb 7, 2014 at 3:01 PM, Patrick Wendell <pwend...@gmail.com> >>> >> wrote: >>> >> > >>> >> >> Paul, >>> >> >> >>> >> >> Looking back at your problem. I think it's the one here: >>> >> >> http://www.slf4j.org/codes.html#log4jDelegationLoop >>> >> >> >>> >> >> So let me just be clear what you are doing so I understand. You have >>> >> >> some other application that directly calls log4j. So you have to >>> >> >> include log4j-over-slf4j to route those logs through slf4j to >>> logback. >>> >> >> >>> >> >> At the same time you embed Spark in this application. In the past it >>> >> >> was fine, but now that Spark programmatic ally initializes log4j, it >>> >> >> screws up your application because log4j-over-slf4j doesn't work with >>> >> >> applications that do this explicilty as discussed here: >>> >> >> http://www.slf4j.org/legacy.html >>> >> >> >>> >> >> Correct? >>> >> >> >>> >> >> - Patrick >>> >> >> >>> >> >> On Fri, Feb 7, 2014 at 2:02 PM, Koert Kuipers <ko...@tresata.com> >>> >> wrote: >>> >> >> > got it. that sounds reasonable >>> >> >> > >>> >> >> > >>> >> >> > On Fri, Feb 7, 2014 at 2:31 PM, Patrick Wendell < >>> pwend...@gmail.com> >>> >> >> wrote: >>> >> >> > >>> >> >> >> Koert - my suggestion was this. We let users use any slf4j backend >>> >> >> >> they want. If we detect that they are using the log4j backend and >>> >> >> >> *also* they didn't configure any log4j appenders, we set up some >>> nice >>> >> >> >> defaults for them. If they are using another backend, Spark >>> doesn't >>> >> >> >> try to modify the configuration at all. >>> >> >> >> >>> >> >> >> On Fri, Feb 7, 2014 at 11:14 AM, Koert Kuipers <ko...@tresata.com >>> > >>> >> >> wrote: >>> >> >> >> > well "static binding" is probably the wrong terminology but you >>> get >>> >> >> the >>> >> >> >> > idea. multiple backends are not allowed and cause an even uglier >>> >> >> >> warning... >>> >> >> >> > >>> >> >> >> > see also here: >>> >> >> >> > https://github.com/twitter/scalding/pull/636 >>> >> >> >> > and here: >>> >> >> >> > >>> https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls >>> >> >> >> > all me being annoying and complaining about slf4j-log4j12 >>> >> dependencies >>> >> >> >> > (which did get removed). >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > On Fri, Feb 7, 2014 at 2:09 PM, Koert Kuipers < >>> ko...@tresata.com> >>> >> >> wrote: >>> >> >> >> > >>> >> >> >> >> the issue is that slf4j uses static binding. you can put only >>> one >>> >> >> slf4j >>> >> >> >> >> backend on the classpath, and that's what it uses. more than >>> one >>> >> is >>> >> >> not >>> >> >> >> >> allowed. >>> >> >> >> >> >>> >> >> >> >> so you either keep the slf4j-log4j12 dependency for spark, and >>> >> then >>> >> >> you >>> >> >> >> >> took away people's choice of slf4j backend which is considered >>> bad >>> >> >> form >>> >> >> >> for >>> >> >> >> >> a library, or you do not include it and then people will always >>> >> get >>> >> >> the >>> >> >> >> big >>> >> >> >> >> fat ugly warning and slf4j logging will not flow to log4j. >>> >> >> >> >> >>> >> >> >> >> including log4j itself is not necessary a problem i think? >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell < >>> >> pwend...@gmail.com >>> >> >> >> >wrote: >>> >> >> >> >> >>> >> >> >> >>> This also seems relevant - but not my area of expertise >>> (whether >>> >> >> this >>> >> >> >> >>> is a valid way to check this). >>> >> >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >> >> >>> >> >>> http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to >>> >> >> >> >>> >>> >> >> >> >>> On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell < >>> >> >> pwend...@gmail.com> >>> >> >> >> >>> wrote: >>> >> >> >> >>> > Hey Guys, >>> >> >> >> >>> > >>> >> >> >> >>> > Thanks for explainning. Ya this is a problem - we didn't >>> really >>> >> >> know >>> >> >> >> >>> > that people are using other slf4j backends, slf4j is in >>> there >>> >> for >>> >> >> >> >>> > historical reasons but I think we may assume in a few places >>> >> that >>> >> >> >> >>> > log4j is being used and we should minimize those. >>> >> >> >> >>> > >>> >> >> >> >>> > We should patch this and get a fix into 0.9.1. So some >>> >> solutions I >>> >> >> >> see >>> >> >> >> >>> are: >>> >> >> >> >>> > >>> >> >> >> >>> > (a) Add SparkConf option to disable this. I'm fine with this >>> >> one. >>> >> >> >> >>> > >>> >> >> >> >>> > (b) Ask slf4j which backend is active and only try to >>> enforce >>> >> this >>> >> >> >> >>> > default if we know slf4j is using log4j. Do either of you >>> know >>> >> if >>> >> >> >> this >>> >> >> >> >>> > is possible? Not sure if slf4j exposes this. >>> >> >> >> >>> > >>> >> >> >> >>> > (c) Just remove this default stuff. We'd rather not do this. >>> >> The >>> >> >> goal >>> >> >> >> >>> > of this thing is to provide good usability for people who >>> have >>> >> >> linked >>> >> >> >> >>> > against Spark and haven't done anything to configure >>> logging. >>> >> For >>> >> >> >> >>> > beginners we try to minimize the assumptions about what else >>> >> they >>> >> >> >> know >>> >> >> >> >>> > about, and I've found log4j configuration is a huge mental >>> >> barrier >>> >> >> >> for >>> >> >> >> >>> > people who are getting started. >>> >> >> >> >>> > >>> >> >> >> >>> > Paul if you submit a patch doing (a) we can merge it in. If >>> you >>> >> >> have >>> >> >> >> >>> > any idea if (b) is possible I prefer that one, but it may >>> not >>> >> be >>> >> >> >> >>> > possible or might be brittle. >>> >> >> >> >>> > >>> >> >> >> >>> > - Patrick >>> >> >> >> >>> > >>> >> >> >> >>> > On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers < >>> >> ko...@tresata.com> >>> >> >> >> >>> wrote: >>> >> >> >> >>> >> Totally agree with Paul: a library should not pick the >>> slf4j >>> >> >> >> backend. >>> >> >> >> >>> It >>> >> >> >> >>> >> defeats the purpose of slf4j. That big ugly warning is >>> there >>> >> to >>> >> >> >> alert >>> >> >> >> >>> >> people that its their responsibility to pick the back >>> end... >>> >> >> >> >>> >> On Feb 7, 2014 3:55 AM, "Paul Brown" <p...@mult.ifario.us> >>> >> wrote: >>> >> >> >> >>> >> >>> >> >> >> >>> >>> Hi, Patrick -- >>> >> >> >> >>> >>> >>> >> >> >> >>> >>> From slf4j, you can either backend it into log4j (which is >>> >> the >>> >> >> way >>> >> >> >> >>> that >>> >> >> >> >>> >>> Spark is shipped) or you can route log4j through slf4j and >>> >> then >>> >> >> on >>> >> >> >> to >>> >> >> >> >>> a >>> >> >> >> >>> >>> different backend (e.g., logback). We're doing the latter >>> >> and >>> >> >> >> >>> manipulating >>> >> >> >> >>> >>> the dependencies in the build because that's the way the >>> >> >> enclosing >>> >> >> >> >>> >>> application is set up. >>> >> >> >> >>> >>> >>> >> >> >> >>> >>> The issue with the current situation is that there's no >>> way >>> >> for >>> >> >> an >>> >> >> >> >>> end user >>> >> >> >> >>> >>> to choose to *not* use the log4j backend. (My short-term >>> >> >> solution >>> >> >> >> >>> was to >>> >> >> >> >>> >>> use the Maven shade plugin to swap in a version of the >>> >> Logging >>> >> >> >> trait >>> >> >> >> >>> with >>> >> >> >> >>> >>> the body of that method commented out.) In addition to >>> the >>> >> >> >> situation >>> >> >> >> >>> with >>> >> >> >> >>> >>> log4j-over-slf4j and the empty enumeration of ROOT >>> appenders, >>> >> >> you >>> >> >> >> >>> might >>> >> >> >> >>> >>> also run afoul of someone who intentionally configured >>> log4j >>> >> >> with >>> >> >> >> an >>> >> >> >> >>> empty >>> >> >> >> >>> >>> set of appenders at the time that Spark is initializing. >>> >> >> >> >>> >>> >>> >> >> >> >>> >>> I'd be happy with any implementation that lets me choose >>> my >>> >> >> logging >>> >> >> >> >>> >>> backend: override default behavior via system property, >>> >> plug-in >>> >> >> >> >>> >>> architecture, etc. I do think it's reasonable to expect >>> >> someone >>> >> >> >> >>> digesting >>> >> >> >> >>> >>> a substantial JDK-based system like Spark to understand >>> how >>> >> to >>> >> >> >> >>> initialize >>> >> >> >> >>> >>> logging -- surely they're using logging of some kind >>> >> elsewhere >>> >> >> in >>> >> >> >> >>> their >>> >> >> >> >>> >>> application -- but if you want the default behavior there >>> as >>> >> a >>> >> >> >> >>> courtesy, it >>> >> >> >> >>> >>> might be worth putting an INFO (versus a the glaring log4j >>> >> WARN) >>> >> >> >> >>> message on >>> >> >> >> >>> >>> the output that says something like "Initialized default >>> >> logging >>> >> >> >> via >>> >> >> >> >>> Log4J; >>> >> >> >> >>> >>> pass -Dspark.logging.loadDefaultLogger=false to disable >>> this >>> >> >> >> >>> behavior." so >>> >> >> >> >>> >>> that it's both convenient and explicit. >>> >> >> >> >>> >>> >>> >> >> >> >>> >>> Cheers. >>> >> >> >> >>> >>> -- Paul >>> >> >> >> >>> >>> >>> >> >> >> >>> >>> >>> >> >> >> >>> >>> >>> >> >> >> >>> >>> >>> >> >> >> >>> >>> >>> >> >> >> >>> >>> >>> >> >> >> >>> >>> -- >>> >> >> >> >>> >>> p...@mult.ifario.us | Multifarious, Inc. | >>> >> >> http://mult.ifario.us/ >>> >> >> >> >>> >>> >>> >> >> >> >>> >>> >>> >> >> >> >>> >>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell < >>> >> >> >> pwend...@gmail.com> >>> >> >> >> >>> >>> wrote: >>> >> >> >> >>> >>> >>> >> >> >> >>> >>> > A config option e.g. could just be to add: >>> >> >> >> >>> >>> > >>> >> >> >> >>> >>> > spark.logging.loadDefaultLogger (default true) >>> >> >> >> >>> >>> > If set to true, Spark will try to initialize a log4j >>> >> logger if >>> >> >> >> none >>> >> >> >> >>> is >>> >> >> >> >>> >>> > detected. Otherwise Spark will not modify logging >>> behavior. >>> >> >> >> >>> >>> > >>> >> >> >> >>> >>> > Then users could just set this to false if they have a >>> >> logging >>> >> >> >> >>> set-up >>> >> >> >> >>> >>> > that conflicts with this. >>> >> >> >> >>> >>> > >>> >> >> >> >>> >>> > Maybe there is a nicer fix... >>> >> >> >> >>> >>> > >>> >> >> >> >>> >>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell < >>> >> >> >> >>> pwend...@gmail.com> >>> >> >> >> >>> >>> > wrote: >>> >> >> >> >>> >>> > > Hey Paul, >>> >> >> >> >>> >>> > > >>> >> >> >> >>> >>> > > Thanks for digging this up. I worked on this feature >>> and >>> >> the >>> >> >> >> >>> intent >>> >> >> >> >>> >>> > > was to give users good default behavior if they didn't >>> >> >> include >>> >> >> >> any >>> >> >> >> >>> >>> > > logging configuration on the classpath. >>> >> >> >> >>> >>> > > >>> >> >> >> >>> >>> > > The problem with assuming that CL tooling is going to >>> fix >>> >> >> the >>> >> >> >> job >>> >> >> >> >>> is >>> >> >> >> >>> >>> > > that many people link against spark as a library and >>> run >>> >> >> their >>> >> >> >> >>> >>> > > application using their own scripts. In this case the >>> >> first >>> >> >> >> thing >>> >> >> >> >>> >>> > > people see when they run an application that links >>> >> against >>> >> >> >> Spark >>> >> >> >> >>> was a >>> >> >> >> >>> >>> > > big ugly logging warning. >>> >> >> >> >>> >>> > > >>> >> >> >> >>> >>> > > I'm not super familiar with log4j-over-slf4j, but this >>> >> >> >> behavior of >>> >> >> >> >>> >>> > > returning null for the appenders seems a little weird. >>> >> What >>> >> >> is >>> >> >> >> >>> the use >>> >> >> >> >>> >>> > > case for using this and not just directly use >>> >> slf4j-log4j12 >>> >> >> >> like >>> >> >> >> >>> Spark >>> >> >> >> >>> >>> > > itself does? >>> >> >> >> >>> >>> > > >>> >> >> >> >>> >>> > > Did you have a more general fix for this in mind? Or >>> was >>> >> >> your >>> >> >> >> >>> plan to >>> >> >> >> >>> >>> > > just revert the existing behavior... We might be able >>> to >>> >> >> add a >>> >> >> >> >>> >>> > > configuration option to disable this logging default >>> >> stuff. >>> >> >> Or >>> >> >> >> we >>> >> >> >> >>> >>> > > could just rip it out - but I'd like to avoid that if >>> >> >> possible. >>> >> >> >> >>> >>> > > >>> >> >> >> >>> >>> > > - Patrick >>> >> >> >> >>> >>> > > >>> >> >> >> >>> >>> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown < >>> >> >> >> p...@mult.ifario.us> >>> >> >> >> >>> >>> wrote: >>> >> >> >> >>> >>> > >> We have a few applications that embed Spark, and in >>> >> 0.8.0 >>> >> >> and >>> >> >> >> >>> 0.8.1, >>> >> >> >> >>> >>> we >>> >> >> >> >>> >>> > >> were able to use slf4j, but 0.9.0 broke that and >>> >> >> >> unintentionally >>> >> >> >> >>> >>> forces >>> >> >> >> >>> >>> > >> direct use of log4j as the logging backend. >>> >> >> >> >>> >>> > >> >>> >> >> >> >>> >>> > >> The issue is here in the org.apache.spark.Logging >>> trait: >>> >> >> >> >>> >>> > >> >>> >> >> >> >>> >>> > >> >>> >> >> >> >>> >>> > >>> >> >> >> >>> >>> >>> >> >> >> >>> >>> >> >> >> >>> >> >> >>> >> >>> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107 >>> >> >> >> >>> >>> > >> >>> >> >> >> >>> >>> > >> log4j-over-slf4j *always* returns an empty >>> enumeration >>> >> for >>> >> >> >> >>> appenders >>> >> >> >> >>> >>> to >>> >> >> >> >>> >>> > the >>> >> >> >> >>> >>> > >> ROOT logger: >>> >> >> >> >>> >>> > >> >>> >> >> >> >>> >>> > >> >>> >> >> >> >>> >>> > >>> >> >> >> >>> >>> >>> >> >> >> >>> >>> >> >> >> >>> >> >> >>> >> >>> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81 >>> >> >> >> >>> >>> > >> >>> >> >> >> >>> >>> > >> And this causes an infinite loop and an eventual >>> stack >>> >> >> >> overflow. >>> >> >> >> >>> >>> > >> >>> >> >> >> >>> >>> > >> I'm happy to submit a Jira and a patch, but it would >>> be >>> >> >> >> >>> significant >>> >> >> >> >>> >>> > enough >>> >> >> >> >>> >>> > >> reversal of recent changes that it's probably worth >>> >> >> discussing >>> >> >> >> >>> before >>> >> >> >> >>> >>> I >>> >> >> >> >>> >>> > >> sink a half hour into it. My suggestion would be >>> that >>> >> >> >> >>> initialization >>> >> >> >> >>> >>> > (or >>> >> >> >> >>> >>> > >> not) should be left to the user with reasonable >>> default >>> >> >> >> behavior >>> >> >> >> >>> >>> > supplied >>> >> >> >> >>> >>> > >> by the spark commandline tooling and not forced on >>> >> >> >> applications >>> >> >> >> >>> that >>> >> >> >> >>> >>> > >> incorporate Spark. >>> >> >> >> >>> >>> > >> >>> >> >> >> >>> >>> > >> Thoughts/opinions? >>> >> >> >> >>> >>> > >> >>> >> >> >> >>> >>> > >> -- Paul >>> >> >> >> >>> >>> > >> -- >>> >> >> >> >>> >>> > >> p...@mult.ifario.us | Multifarious, Inc. | >>> >> >> >> http://mult.ifario.us/ >>> >> >> >> >>> >>> > >>> >> >> >> >>> >>> >>> >> >> >> >>> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >>> >> >> >>> >> >>>