This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new fbe6b1dba2ab [SPARK-47721][DOC] Guidelines for the Structured Logging Framework fbe6b1dba2ab is described below commit fbe6b1dba2abab03fef2fbbac4640c4c41153e71 Author: Gengliang Wang <gengli...@apache.org> AuthorDate: Thu Apr 4 08:46:21 2024 +0900 [SPARK-47721][DOC] Guidelines for the Structured Logging Framework ### What changes were proposed in this pull request? As suggested in https://github.com/apache/spark/pull/45834/files#r1549565157, I am creating initial guidelines for the structured logging framework. ### Why are the changes needed? We need guidelines to align the logging migration works in the community. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? It's just doc change. ### Was this patch authored or co-authored using generative AI tooling? Yes. Generated-by: GitHub Copilot 1.2.17.2887 Closes #45862 from gengliangwang/logREADME. Authored-by: Gengliang Wang <gengli...@apache.org> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- .../src/main/scala/org/apache/spark/internal/README.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/common/utils/src/main/scala/org/apache/spark/internal/README.md b/common/utils/src/main/scala/org/apache/spark/internal/README.md new file mode 100644 index 000000000000..ed3d77333806 --- /dev/null +++ b/common/utils/src/main/scala/org/apache/spark/internal/README.md @@ -0,0 +1,13 @@ +# Guidelines for the Structured Logging Framework + +## LogKey + +LogKeys serve as identifiers for mapped diagnostic contexts (MDC) within logs. Follow these guidelines when adding new LogKeys: +* Define all structured logging keys in `LogKey.scala`, and sort them alphabetically for ease of search. +* Use `UPPER_SNAKE_CASE` for key names. +* Key names should be both simple and broad, yet include specific identifiers like `STAGE_ID`, `TASK_ID`, and `JOB_ID` when needed for clarity. For instance, use `MAX_ATTEMPTS` as a general key instead of creating separate keys for each scenario such as `EXECUTOR_STATE_SYNC_MAX_ATTEMPTS` and `MAX_TASK_FAILURES`. This balances simplicity with the detail needed for effective logging. +* Use abbreviations in names if they are widely understood, such as `APP_ID` for APPLICATION_ID, and `K8S` for KUBERNETES. + +## Exceptions + +To ensure logs are compatible with Spark SQL and log analysis tools, avoid `Exception.printStackTrace()`. Use `logError`, `logWarning`, and `logInfo` methods from the `Logging` trait to log exceptions, maintaining structured and parsable logs. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org