Hey Paul, So if your goal is ultimately to output to logback. Then why don't you just use slf4j and logback-classic.jar as described here [1]. Why involve log4j-over-slf4j at all?
Let's say we refactored the spark build so it didn't advertise slf4j-log4j12 as a dependency. Would you still be using log4j-over-slf4j... or is this just a "fix" to deal with the fact that Spark is somewhat log4j dependent at this point. [1] http://www.slf4j.org/manual.html - Patrick On Fri, Feb 7, 2014 at 5:14 PM, Paul Brown <[email protected]> wrote: > Hi, Patrick -- > > That's close but not quite it. > > The issue that occurs is not the delegation loop mentioned in slf4j > documentation. The stack overflow is entirely within the code in the Spark > trait: > > at org.apache.spark.Logging$class.initializeLogging(Logging.scala:112) > at org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97) > at org.apache.spark.Logging$class.log(Logging.scala:36) > at org.apache.spark.SparkEnv$.log(SparkEnv.scala:94) > > > And then that repeats. > > As for our situation, we exclude the slf4j-log4j12 dependency when we > import the Spark library (because we don't want to use log4j) and have > log4j-over-slf4j already in place to ensure that all of the logging in the > overall application runs through slf4j and then out through logback. (We > also, as another poster already mentioned, also force jcl and jul through > slf4j.) > > The zen of slf4j for libraries is that the library uses the slf4j API and > then the enclosing application can route logging as it sees fit. Spark > master CLI would log via slf4j and include the slf4j-log4j12 backend; same > for Spark worker CLI. Spark as a library (versus as a container) would not > include any backend to the slf4j API and leave this up to the application. > (FWIW, this would also avoid your log4j warning message.) > > But as I was saying before, I'd be happy with a situation where I can avoid > log4j being enabled or configured, and I think you'll find an existing > choice of logging framework to be a common scenario for those embedding > Spark in other systems. > > Best. > -- Paul > > -- > [email protected] | Multifarious, Inc. | http://mult.ifario.us/ > > > On Fri, Feb 7, 2014 at 3:01 PM, Patrick Wendell <[email protected]> wrote: > >> Paul, >> >> Looking back at your problem. I think it's the one here: >> http://www.slf4j.org/codes.html#log4jDelegationLoop >> >> So let me just be clear what you are doing so I understand. You have >> some other application that directly calls log4j. So you have to >> include log4j-over-slf4j to route those logs through slf4j to logback. >> >> At the same time you embed Spark in this application. In the past it >> was fine, but now that Spark programmatic ally initializes log4j, it >> screws up your application because log4j-over-slf4j doesn't work with >> applications that do this explicilty as discussed here: >> http://www.slf4j.org/legacy.html >> >> Correct? >> >> - Patrick >> >> On Fri, Feb 7, 2014 at 2:02 PM, Koert Kuipers <[email protected]> wrote: >> > got it. that sounds reasonable >> > >> > >> > On Fri, Feb 7, 2014 at 2:31 PM, Patrick Wendell <[email protected]> >> wrote: >> > >> >> Koert - my suggestion was this. We let users use any slf4j backend >> >> they want. If we detect that they are using the log4j backend and >> >> *also* they didn't configure any log4j appenders, we set up some nice >> >> defaults for them. If they are using another backend, Spark doesn't >> >> try to modify the configuration at all. >> >> >> >> On Fri, Feb 7, 2014 at 11:14 AM, Koert Kuipers <[email protected]> >> wrote: >> >> > well "static binding" is probably the wrong terminology but you get >> the >> >> > idea. multiple backends are not allowed and cause an even uglier >> >> warning... >> >> > >> >> > see also here: >> >> > https://github.com/twitter/scalding/pull/636 >> >> > and here: >> >> > https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls >> >> > all me being annoying and complaining about slf4j-log4j12 dependencies >> >> > (which did get removed). >> >> > >> >> > >> >> > On Fri, Feb 7, 2014 at 2:09 PM, Koert Kuipers <[email protected]> >> wrote: >> >> > >> >> >> the issue is that slf4j uses static binding. you can put only one >> slf4j >> >> >> backend on the classpath, and that's what it uses. more than one is >> not >> >> >> allowed. >> >> >> >> >> >> so you either keep the slf4j-log4j12 dependency for spark, and then >> you >> >> >> took away people's choice of slf4j backend which is considered bad >> form >> >> for >> >> >> a library, or you do not include it and then people will always get >> the >> >> big >> >> >> fat ugly warning and slf4j logging will not flow to log4j. >> >> >> >> >> >> including log4j itself is not necessary a problem i think? >> >> >> >> >> >> >> >> >> On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell <[email protected] >> >> >wrote: >> >> >> >> >> >>> This also seems relevant - but not my area of expertise (whether >> this >> >> >>> is a valid way to check this). >> >> >>> >> >> >>> >> >> >>> >> >> >> http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to >> >> >>> >> >> >>> On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell < >> [email protected]> >> >> >>> wrote: >> >> >>> > Hey Guys, >> >> >>> > >> >> >>> > Thanks for explainning. Ya this is a problem - we didn't really >> know >> >> >>> > that people are using other slf4j backends, slf4j is in there for >> >> >>> > historical reasons but I think we may assume in a few places that >> >> >>> > log4j is being used and we should minimize those. >> >> >>> > >> >> >>> > We should patch this and get a fix into 0.9.1. So some solutions I >> >> see >> >> >>> are: >> >> >>> > >> >> >>> > (a) Add SparkConf option to disable this. I'm fine with this one. >> >> >>> > >> >> >>> > (b) Ask slf4j which backend is active and only try to enforce this >> >> >>> > default if we know slf4j is using log4j. Do either of you know if >> >> this >> >> >>> > is possible? Not sure if slf4j exposes this. >> >> >>> > >> >> >>> > (c) Just remove this default stuff. We'd rather not do this. The >> goal >> >> >>> > of this thing is to provide good usability for people who have >> linked >> >> >>> > against Spark and haven't done anything to configure logging. For >> >> >>> > beginners we try to minimize the assumptions about what else they >> >> know >> >> >>> > about, and I've found log4j configuration is a huge mental barrier >> >> for >> >> >>> > people who are getting started. >> >> >>> > >> >> >>> > Paul if you submit a patch doing (a) we can merge it in. If you >> have >> >> >>> > any idea if (b) is possible I prefer that one, but it may not be >> >> >>> > possible or might be brittle. >> >> >>> > >> >> >>> > - Patrick >> >> >>> > >> >> >>> > On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers <[email protected]> >> >> >>> wrote: >> >> >>> >> Totally agree with Paul: a library should not pick the slf4j >> >> backend. >> >> >>> It >> >> >>> >> defeats the purpose of slf4j. That big ugly warning is there to >> >> alert >> >> >>> >> people that its their responsibility to pick the back end... >> >> >>> >> On Feb 7, 2014 3:55 AM, "Paul Brown" <[email protected]> wrote: >> >> >>> >> >> >> >>> >>> Hi, Patrick -- >> >> >>> >>> >> >> >>> >>> From slf4j, you can either backend it into log4j (which is the >> way >> >> >>> that >> >> >>> >>> Spark is shipped) or you can route log4j through slf4j and then >> on >> >> to >> >> >>> a >> >> >>> >>> different backend (e.g., logback). We're doing the latter and >> >> >>> manipulating >> >> >>> >>> the dependencies in the build because that's the way the >> enclosing >> >> >>> >>> application is set up. >> >> >>> >>> >> >> >>> >>> The issue with the current situation is that there's no way for >> an >> >> >>> end user >> >> >>> >>> to choose to *not* use the log4j backend. (My short-term >> solution >> >> >>> was to >> >> >>> >>> use the Maven shade plugin to swap in a version of the Logging >> >> trait >> >> >>> with >> >> >>> >>> the body of that method commented out.) In addition to the >> >> situation >> >> >>> with >> >> >>> >>> log4j-over-slf4j and the empty enumeration of ROOT appenders, >> you >> >> >>> might >> >> >>> >>> also run afoul of someone who intentionally configured log4j >> with >> >> an >> >> >>> empty >> >> >>> >>> set of appenders at the time that Spark is initializing. >> >> >>> >>> >> >> >>> >>> I'd be happy with any implementation that lets me choose my >> logging >> >> >>> >>> backend: override default behavior via system property, plug-in >> >> >>> >>> architecture, etc. I do think it's reasonable to expect someone >> >> >>> digesting >> >> >>> >>> a substantial JDK-based system like Spark to understand how to >> >> >>> initialize >> >> >>> >>> logging -- surely they're using logging of some kind elsewhere >> in >> >> >>> their >> >> >>> >>> application -- but if you want the default behavior there as a >> >> >>> courtesy, it >> >> >>> >>> might be worth putting an INFO (versus a the glaring log4j WARN) >> >> >>> message on >> >> >>> >>> the output that says something like "Initialized default logging >> >> via >> >> >>> Log4J; >> >> >>> >>> pass -Dspark.logging.loadDefaultLogger=false to disable this >> >> >>> behavior." so >> >> >>> >>> that it's both convenient and explicit. >> >> >>> >>> >> >> >>> >>> Cheers. >> >> >>> >>> -- Paul >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> -- >> >> >>> >>> [email protected] | Multifarious, Inc. | >> http://mult.ifario.us/ >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick Wendell < >> >> [email protected]> >> >> >>> >>> wrote: >> >> >>> >>> >> >> >>> >>> > A config option e.g. could just be to add: >> >> >>> >>> > >> >> >>> >>> > spark.logging.loadDefaultLogger (default true) >> >> >>> >>> > If set to true, Spark will try to initialize a log4j logger if >> >> none >> >> >>> is >> >> >>> >>> > detected. Otherwise Spark will not modify logging behavior. >> >> >>> >>> > >> >> >>> >>> > Then users could just set this to false if they have a logging >> >> >>> set-up >> >> >>> >>> > that conflicts with this. >> >> >>> >>> > >> >> >>> >>> > Maybe there is a nicer fix... >> >> >>> >>> > >> >> >>> >>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick Wendell < >> >> >>> [email protected]> >> >> >>> >>> > wrote: >> >> >>> >>> > > Hey Paul, >> >> >>> >>> > > >> >> >>> >>> > > Thanks for digging this up. I worked on this feature and the >> >> >>> intent >> >> >>> >>> > > was to give users good default behavior if they didn't >> include >> >> any >> >> >>> >>> > > logging configuration on the classpath. >> >> >>> >>> > > >> >> >>> >>> > > The problem with assuming that CL tooling is going to fix >> the >> >> job >> >> >>> is >> >> >>> >>> > > that many people link against spark as a library and run >> their >> >> >>> >>> > > application using their own scripts. In this case the first >> >> thing >> >> >>> >>> > > people see when they run an application that links against >> >> Spark >> >> >>> was a >> >> >>> >>> > > big ugly logging warning. >> >> >>> >>> > > >> >> >>> >>> > > I'm not super familiar with log4j-over-slf4j, but this >> >> behavior of >> >> >>> >>> > > returning null for the appenders seems a little weird. What >> is >> >> >>> the use >> >> >>> >>> > > case for using this and not just directly use slf4j-log4j12 >> >> like >> >> >>> Spark >> >> >>> >>> > > itself does? >> >> >>> >>> > > >> >> >>> >>> > > Did you have a more general fix for this in mind? Or was >> your >> >> >>> plan to >> >> >>> >>> > > just revert the existing behavior... We might be able to >> add a >> >> >>> >>> > > configuration option to disable this logging default stuff. >> Or >> >> we >> >> >>> >>> > > could just rip it out - but I'd like to avoid that if >> possible. >> >> >>> >>> > > >> >> >>> >>> > > - Patrick >> >> >>> >>> > > >> >> >>> >>> > > On Thu, Feb 6, 2014 at 11:41 PM, Paul Brown < >> >> [email protected]> >> >> >>> >>> wrote: >> >> >>> >>> > >> We have a few applications that embed Spark, and in 0.8.0 >> and >> >> >>> 0.8.1, >> >> >>> >>> we >> >> >>> >>> > >> were able to use slf4j, but 0.9.0 broke that and >> >> unintentionally >> >> >>> >>> forces >> >> >>> >>> > >> direct use of log4j as the logging backend. >> >> >>> >>> > >> >> >> >>> >>> > >> The issue is here in the org.apache.spark.Logging trait: >> >> >>> >>> > >> >> >> >>> >>> > >> >> >> >>> >>> > >> >> >>> >>> >> >> >>> >> >> >> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107 >> >> >>> >>> > >> >> >> >>> >>> > >> log4j-over-slf4j *always* returns an empty enumeration for >> >> >>> appenders >> >> >>> >>> to >> >> >>> >>> > the >> >> >>> >>> > >> ROOT logger: >> >> >>> >>> > >> >> >> >>> >>> > >> >> >> >>> >>> > >> >> >>> >>> >> >> >>> >> >> >> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81 >> >> >>> >>> > >> >> >> >>> >>> > >> And this causes an infinite loop and an eventual stack >> >> overflow. >> >> >>> >>> > >> >> >> >>> >>> > >> I'm happy to submit a Jira and a patch, but it would be >> >> >>> significant >> >> >>> >>> > enough >> >> >>> >>> > >> reversal of recent changes that it's probably worth >> discussing >> >> >>> before >> >> >>> >>> I >> >> >>> >>> > >> sink a half hour into it. My suggestion would be that >> >> >>> initialization >> >> >>> >>> > (or >> >> >>> >>> > >> not) should be left to the user with reasonable default >> >> behavior >> >> >>> >>> > supplied >> >> >>> >>> > >> by the spark commandline tooling and not forced on >> >> applications >> >> >>> that >> >> >>> >>> > >> incorporate Spark. >> >> >>> >>> > >> >> >> >>> >>> > >> Thoughts/opinions? >> >> >>> >>> > >> >> >> >>> >>> > >> -- Paul >> >> >>> >>> > >> -- >> >> >>> >>> > >> [email protected] | Multifarious, Inc. | >> >> http://mult.ifario.us/ >> >> >>> >>> > >> >> >>> >>> >> >> >>> >> >> >> >> >> >> >> >> >>
