[ 
https://issues.apache.org/jira/browse/SPARK-40489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Garret Wilson updated SPARK-40489:
----------------------------------
    Description: 
Spark breaks fundamentally with SLF4J 2.x because it uses 
{{StaticLoggerBinder}}.

SLF4J is the logging facade that is meant to shield the application from the 
implementation, whether it be Log4J or Logback or whatever. Historically SLF4J 
1.x used a bad approach to configuration: it used a {{StaticLoggerBinder}} (a 
global static singleton instance) rather than the Java {{ServiceLoader}} 
mechanism.

SLF4J 2.x, which has been in development for years, has been released. It 
finally switches to use the {{ServiceLoader}} mechanism. As [described in the 
FAQ|https://www.slf4j.org/faq.html#changesInVersion200], the API should be 
compatible; an application just needs to use the latest Log4J/Logback 
implementation which has the service loader.

**Above all the application must _not_ use the low-level {{StaticLoggerBinder}} 
method, because it has been removed!**

Unfortunately 
[{{org.apache.spark.internal.Logging}}|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/Logging.scala]
 uses {{StaticLoggerBinder}} and completely breaks any environment using SLF4J 
2.x. For example, in my application, I have pulled in the SLF4J 2.x API and 
pulled in the Logback 1.4.x libraries (I'm not even using Log4J). Spark breaks 
completely just trying to get a Spark session:

{noformat}
Caused by: java.lang.NoClassDefFoundError: org/slf4j/impl/StaticLoggerBinder
        at 
org.apache.spark.internal.Logging$.org$apache$spark$internal$Logging$$isLog4j2(Logging.scala:232)
        at 
org.apache.spark.internal.Logging.initializeLogging(Logging.scala:129)
        at 
org.apache.spark.internal.Logging.initializeLogIfNecessary(Logging.scala:115)
        at 
org.apache.spark.internal.Logging.initializeLogIfNecessary$(Logging.scala:109)
        at 
org.apache.spark.SparkContext.initializeLogIfNecessary(SparkContext.scala:84)
        at 
org.apache.spark.internal.Logging.initializeLogIfNecessary(Logging.scala:106)
        at 
org.apache.spark.internal.Logging.initializeLogIfNecessary$(Logging.scala:105)
        at 
org.apache.spark.SparkContext.initializeLogIfNecessary(SparkContext.scala:84)
        at org.apache.spark.internal.Logging.log(Logging.scala:53)
        at org.apache.spark.internal.Logging.log$(Logging.scala:51)
        at org.apache.spark.SparkContext.log(SparkContext.scala:84)
        at org.apache.spark.internal.Logging.logInfo(Logging.scala:61)
        at org.apache.spark.internal.Logging.logInfo$(Logging.scala:60)
        at org.apache.spark.SparkContext.logInfo(SparkContext.scala:84)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:195)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
        at 
org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
        at scala.Option.getOrElse(Option.scala:201)
        at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
{nofromat}

This is because Spark is playing low-level tricks to find out if the logging 
platform is Log4J, and relying on {{StaticLoggerBinder}} to do it.

Whatever the wisdom of Spark's relying on Log4J-specific functionality, Spark 
should not be using {{StaticLoggerBinder}} to do that detection. There are many 
other approaches. (The code itself suggest one approach: 
{{LogManager.getRootLogger.asInstanceOf[Log4jLogger]}}. You could check to see 
if the root logger actually is a {{Log4jLogger}}. There may be even better 
approaches.)

The other big problem is relying on the Log4J classes themselves. By relying on 
those classes, you force me to bring in Log4J as a dependency, which in the 
latest versions will register themselves with the service loader mechanism, 
causing conflicting SLF4J implementations.

It is paramount that you:

* Remove all reliance ton {{StaticLoggerBinder}}. If you must must must use it, 
please check for it using reflection!
* Remove all static references to the Log4J classes. (In an ideal world you 
wouldn't even be doing Log4J-specific things anyway.) If you must must must do 
Log4J-specific things, access the classes via reflection; don't statically link 
them in the code.

The current situation absolutely (and unnecessarily) 100% breaks the use of 
SLF4J 2.x.

  was:
Spark breaks fundamentally with SLF4J 2.x because it uses 
{{StaticLoggerBinder}}.

SLF4J is the logging facade that is meant to shield the application from the 
implementation, whether it be Log4J or Logback or whatever. Historically SLF4J 
1.x used a bad approach to configuration: it used a {{StaticLoggerBinder}} (a 
global static singleton instance) rather than the Java {{ServiceLoader}} 
mechanism.

SLF4J 2.x, which has been in development for years, has been released. It 
finally switches to use the {{ServiceLoader}} mechanism. As [described in the 
FAQ|https://www.slf4j.org/faq.html#changesInVersion200], the API should be 
compatible; an application just needs to use the latest Log4J/Logback 
implementation which has the service loader.

**Above all the application must _not_ use the low-level {{StaticLoggerBinder}} 
method, because it has been removed!**

Unfortunately 
[{{org.apache.spark.internal.Logging}}|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/Logging.scala]
 uses {{StaticLoggerBinder}} and completely breaks any environment using SLF4J 
2.x. For example, in my application, I have pulled in the SLF4J 2.x API and 
pulled in the Logback 1.4.x libraries (I'm not even using Log4J). Spark breaks 
completely just trying to get a Spark session:

{noformat}
Caused by: java.lang.NoClassDefFoundError: org/slf4j/impl/StaticLoggerBinder
        at 
org.apache.spark.internal.Logging$.org$apache$spark$internal$Logging$$isLog4j2(Logging.scala:232)
        at 
org.apache.spark.internal.Logging.initializeLogging(Logging.scala:129)
        at 
org.apache.spark.internal.Logging.initializeLogIfNecessary(Logging.scala:115)
        at 
org.apache.spark.internal.Logging.initializeLogIfNecessary$(Logging.scala:109)
        at 
org.apache.spark.SparkContext.initializeLogIfNecessary(SparkContext.scala:84)
        at 
org.apache.spark.internal.Logging.initializeLogIfNecessary(Logging.scala:106)
        at 
org.apache.spark.internal.Logging.initializeLogIfNecessary$(Logging.scala:105)
        at 
org.apache.spark.SparkContext.initializeLogIfNecessary(SparkContext.scala:84)
        at org.apache.spark.internal.Logging.log(Logging.scala:53)
        at org.apache.spark.internal.Logging.log$(Logging.scala:51)
        at org.apache.spark.SparkContext.log(SparkContext.scala:84)
        at org.apache.spark.internal.Logging.logInfo(Logging.scala:61)
        at org.apache.spark.internal.Logging.logInfo$(Logging.scala:60)
        at org.apache.spark.SparkContext.logInfo(SparkContext.scala:84)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:195)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
        at 
org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
        at scala.Option.getOrElse(Option.scala:201)
        at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
{noforomat}

This is because Spark is playing low-level tricks to find out if the logging 
platform is Log4J, and relying on {{StaticLoggerBinder}} to do it.

Whatever the wisdom of Spark's relying on Log4J-specific functionality, Spark 
should not be using {{StaticLoggerBinder}} to do that detection. There are many 
other approaches. (The code itself suggest one approach: 
{{LogManager.getRootLogger.asInstanceOf[Log4jLogger]}}. You could check to see 
if the root logger actually is a {{Log4jLogger}}. There may be even better 
approaches.)

The other big problem is relying on the Log4J classes themselves. By relying on 
those classes, you force me to bring in Log4J as a dependency, which in the 
latest versions will register themselves with the service loader mechanism, 
causing conflicting SLF4J implementations.

It is paramount that you:

* Remove all reliance ton {{StaticLoggerBinder}}. If you must must must use it, 
please check for it using reflection!
* Remove all static references to the Log4J classes. (In an ideal world you 
wouldn't even be doing Log4J-specific things anyway.) If you must must must do 
Log4J-specific things, access the classes via reflection; don't statically link 
them in the code.

The current situation absolutely (and unnecessarily) 100% breaks the use of 
SLF4J 2.x.


> Spark 3.3.0 breaks SFL4J 2.
> ---------------------------
>
>                 Key: SPARK-40489
>                 URL: https://issues.apache.org/jira/browse/SPARK-40489
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.3.0
>            Reporter: Garret Wilson
>            Priority: Critical
>
> Spark breaks fundamentally with SLF4J 2.x because it uses 
> {{StaticLoggerBinder}}.
> SLF4J is the logging facade that is meant to shield the application from the 
> implementation, whether it be Log4J or Logback or whatever. Historically 
> SLF4J 1.x used a bad approach to configuration: it used a 
> {{StaticLoggerBinder}} (a global static singleton instance) rather than the 
> Java {{ServiceLoader}} mechanism.
> SLF4J 2.x, which has been in development for years, has been released. It 
> finally switches to use the {{ServiceLoader}} mechanism. As [described in the 
> FAQ|https://www.slf4j.org/faq.html#changesInVersion200], the API should be 
> compatible; an application just needs to use the latest Log4J/Logback 
> implementation which has the service loader.
> **Above all the application must _not_ use the low-level 
> {{StaticLoggerBinder}} method, because it has been removed!**
> Unfortunately 
> [{{org.apache.spark.internal.Logging}}|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/Logging.scala]
>  uses {{StaticLoggerBinder}} and completely breaks any environment using 
> SLF4J 2.x. For example, in my application, I have pulled in the SLF4J 2.x API 
> and pulled in the Logback 1.4.x libraries (I'm not even using Log4J). Spark 
> breaks completely just trying to get a Spark session:
> {noformat}
> Caused by: java.lang.NoClassDefFoundError: org/slf4j/impl/StaticLoggerBinder
>         at 
> org.apache.spark.internal.Logging$.org$apache$spark$internal$Logging$$isLog4j2(Logging.scala:232)
>         at 
> org.apache.spark.internal.Logging.initializeLogging(Logging.scala:129)
>         at 
> org.apache.spark.internal.Logging.initializeLogIfNecessary(Logging.scala:115)
>         at 
> org.apache.spark.internal.Logging.initializeLogIfNecessary$(Logging.scala:109)
>         at 
> org.apache.spark.SparkContext.initializeLogIfNecessary(SparkContext.scala:84)
>         at 
> org.apache.spark.internal.Logging.initializeLogIfNecessary(Logging.scala:106)
>         at 
> org.apache.spark.internal.Logging.initializeLogIfNecessary$(Logging.scala:105)
>         at 
> org.apache.spark.SparkContext.initializeLogIfNecessary(SparkContext.scala:84)
>         at org.apache.spark.internal.Logging.log(Logging.scala:53)
>         at org.apache.spark.internal.Logging.log$(Logging.scala:51)
>         at org.apache.spark.SparkContext.log(SparkContext.scala:84)
>         at org.apache.spark.internal.Logging.logInfo(Logging.scala:61)
>         at org.apache.spark.internal.Logging.logInfo$(Logging.scala:60)
>         at org.apache.spark.SparkContext.logInfo(SparkContext.scala:84)
>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:195)
>         at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
>         at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
>         at scala.Option.getOrElse(Option.scala:201)
>         at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
> {nofromat}
> This is because Spark is playing low-level tricks to find out if the logging 
> platform is Log4J, and relying on {{StaticLoggerBinder}} to do it.
> Whatever the wisdom of Spark's relying on Log4J-specific functionality, Spark 
> should not be using {{StaticLoggerBinder}} to do that detection. There are 
> many other approaches. (The code itself suggest one approach: 
> {{LogManager.getRootLogger.asInstanceOf[Log4jLogger]}}. You could check to 
> see if the root logger actually is a {{Log4jLogger}}. There may be even 
> better approaches.)
> The other big problem is relying on the Log4J classes themselves. By relying 
> on those classes, you force me to bring in Log4J as a dependency, which in 
> the latest versions will register themselves with the service loader 
> mechanism, causing conflicting SLF4J implementations.
> It is paramount that you:
> * Remove all reliance ton {{StaticLoggerBinder}}. If you must must must use 
> it, please check for it using reflection!
> * Remove all static references to the Log4J classes. (In an ideal world you 
> wouldn't even be doing Log4J-specific things anyway.) If you must must must 
> do Log4J-specific things, access the classes via reflection; don't statically 
> link them in the code.
> The current situation absolutely (and unnecessarily) 100% breaks the use of 
> SLF4J 2.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to