(spark) branch master updated: [SPARK-48131][CORE] Unify MDC key `mdc.taskName` and `task_name`

dongjoon Sat, 04 May 2024 17:33:17 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 8443672b1ab1 [SPARK-48131][CORE] Unify MDC key `mdc.taskName` and 
`task_name`
8443672b1ab1 is described below

commit 8443672b1ab1195278a73a9ec487af8e02e3a8de
Author: Gengliang Wang <gengli...@apache.org>
AuthorDate: Sat May 4 17:33:02 2024 -0700

    [SPARK-48131][CORE] Unify MDC key `mdc.taskName` and `task_name`
    
    ### What changes were proposed in this pull request?
    
    Currently there are two MDC keys for task name:
    * `mdc.taskName`, which is introduced in 
https://github.com/apache/spark/pull/28801. Before the change, it was 
`taskName`.
    * `task_name`: introduce from the structured logging framework project.
    
    To make the MDC keys unified, this PR renames the `mdc.taskName` as 
`task_name`. This MDC is showing frequently in logs when running Spark 
application.
    Before change:
    ```
    "context":{"mdc.taskName":"task 19.0 in stage 0.0 (TID 19)”}
    ```
    after change
    ```
    "context":{“task_name":"task 19.0 in stage 0.0 (TID 19)”}
    ```
    
    ### Why are the changes needed?
    
    1. Make the MDC names consistent
    2. Minor upside: this will allow users to query task names with `SELECT * 
FROM logs where context.task_name = ...`.  Otherwise, querying with 
`context.mdc.task_name` will result in an analysis exception. Users will have 
to query with `context['mdc.task_name']`
    
    ### Does this PR introduce _any_ user-facing change?
    
    No really. The MDC key is used by developers for debugging purpose.
    
    ### How was this patch tested?
    
    Manual test
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #46386 from gengliangwang/unify.
    
    Authored-by: Gengliang Wang <gengli...@apache.org>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 core/src/main/scala/org/apache/spark/executor/Executor.scala | 6 +++---
 docs/configuration.md                                        | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala 
b/core/src/main/scala/org/apache/spark/executor/Executor.scala
index fd6c02c07789..3edba45ef89f 100644
--- a/core/src/main/scala/org/apache/spark/executor/Executor.scala
+++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala
@@ -40,7 +40,7 @@ import org.slf4j.MDC
 
 import org.apache.spark._
 import org.apache.spark.deploy.SparkHadoopUtil
-import org.apache.spark.internal.{Logging, MDC => LogMDC}
+import org.apache.spark.internal.{Logging, LogKeys, MDC => LogMDC}
 import org.apache.spark.internal.LogKeys._
 import org.apache.spark.internal.config._
 import org.apache.spark.internal.plugin.PluginContainer
@@ -914,7 +914,7 @@ private[spark] class Executor(
     try {
       mdc.foreach { case (key, value) => MDC.put(key, value) }
       // avoid overriding the takName by the user
-      MDC.put("mdc.taskName", taskName)
+      MDC.put(LogKeys.TASK_NAME.name, taskName)
     } catch {
       case _: NoSuchFieldError => logInfo("MDC is not supported.")
     }
@@ -923,7 +923,7 @@ private[spark] class Executor(
   private def cleanMDCForTask(taskName: String, mdc: Seq[(String, String)]): 
Unit = {
     try {
       mdc.foreach { case (key, _) => MDC.remove(key) }
-      MDC.remove("mdc.taskName")
+      MDC.remove(LogKeys.TASK_NAME.name)
     } catch {
       case _: NoSuchFieldError => logInfo("MDC is not supported.")
     }
diff --git a/docs/configuration.md b/docs/configuration.md
index a55ce89c096b..fb14af6d55b8 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -3693,7 +3693,7 @@ val logDf = 
spark.read.schema(LOG_SCHEMA).json("path/to/logs")
 ```
 
 ## Plain Text Logging
-If you prefer plain text logging, you can use the 
`log4j2.properties.pattern-layout-template` file as a starting point. This is 
the default configuration used by Spark before the 4.0.0 release. This 
configuration uses the 
[PatternLayout](https://logging.apache.org/log4j/2.x/manual/layouts.html#PatternLayout)
 to log all the logs in plain text. MDC information is not included by default. 
In order to print it in the logs, you can update the patternLayout in the file. 
For example, you can ad [...]
+If you prefer plain text logging, you can use the 
`log4j2.properties.pattern-layout-template` file as a starting point. This is 
the default configuration used by Spark before the 4.0.0 release. This 
configuration uses the 
[PatternLayout](https://logging.apache.org/log4j/2.x/manual/layouts.html#PatternLayout)
 to log all the logs in plain text. MDC information is not included by default. 
In order to print it in the logs, you can update the patternLayout in the file. 
For example, you can ad [...]
 Moreover, you can use `spark.sparkContext.setLocalProperty(s"mdc.$name", 
"value")` to add user specific data into MDC.
 The key in MDC will be the string of `mdc.$name`.
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48131][CORE] Unify MDC key `mdc.taskName` and `task_name`

Reply via email to