Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the `scala/java` doc [spark]
panbingkun commented on code in PR #46634: URL: https://github.com/apache/spark/pull/46634#discussion_r1614383361 ## common/utils/src/main/scala/org/apache/spark/internal/README.md: ## Review Comment: Okay, I have adopted the plan of deleting `README.md` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the `scala/java` doc [spark]
mridulm commented on code in PR #46634: URL: https://github.com/apache/spark/pull/46634#discussion_r1614353283 ## common/utils/src/main/scala/org/apache/spark/internal/README.md: ## Review Comment: IIRC @panbingkun included the contents into the existing code; which is also a nice addition to the code documentation. IMO that is a better overall approach given this is internal to spark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the `scala/java` doc [spark]
gengliangwang commented on code in PR #46634: URL: https://github.com/apache/spark/pull/46634#discussion_r1613860819 ## common/utils/src/main/scala/org/apache/spark/internal/README.md: ## Review Comment: I will wait for @mridulm's response until this weekend. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the `scala/java` doc [spark]
panbingkun commented on code in PR #46634: URL: https://github.com/apache/spark/pull/46634#discussion_r1612809775 ## common/utils/src/main/scala/org/apache/spark/internal/README.md: ## Review Comment: I'm fine to it, WDYT @mridulm ? we can just put the new description in the `scala/java` doc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the `scala/java` doc [spark]
gengliangwang commented on PR #46634: URL: https://github.com/apache/spark/pull/46634#issuecomment-2127700362 LGTM except for one comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the `scala/java` doc [spark]
gengliangwang commented on code in PR #46634: URL: https://github.com/apache/spark/pull/46634#discussion_r1612074023 ## common/utils/src/main/scala/org/apache/spark/internal/README.md: ## Review Comment: @mridulm @panbingkun This README is for Spark developers. As long as it doesn't mention 3rd party supports, can we just keep it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the `scala/java` doc [spark]
panbingkun commented on PR #46634: URL: https://github.com/apache/spark/pull/46634#issuecomment-2126374603 > Thanks for the changes @panbingkun. Looks good to me - I will let @gengliangwang review and merge; he is much more knowledgeable about this in general ! Yeah, Thank you very much! ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the `scala/java` doc [spark]
panbingkun commented on PR #46634: URL: https://github.com/apache/spark/pull/46634#issuecomment-2126146078 > Please replace all references to `external` and `external system` - we dont want to set expectations that external system integrations are encouraged. Keep it as custom integrations instead - if power users and frameworks want to leverage the integration, this would be the path for them as it is not explicitly `private [spark]`. > > I added a few cases below, but I would expect @gengliangwang to be better aware and give much better guidance than what I can of what else might need to be changed - as he is more actively reviewing the PR's :-) Thank you very much for your patient and careful review. ❤️ I have updated it again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the `scala/java` doc [spark]
mridulm commented on code in PR #46634: URL: https://github.com/apache/spark/pull/46634#discussion_r1610164692 ## common/utils/src/main/java/org/apache/spark/internal/SparkLogger.java: ## @@ -28,6 +28,50 @@ import org.slf4j.Logger; // checkstyle.on: RegexpSinglelineJava +// checkstyle.off: RegexpSinglelineJava +/** + * Guidelines for the Structured Logging Framework - Java Logging + * + * + * Use the `org.apache.spark.internal.SparkLoggerFactory` to get the logger instance in Java code: + * Getting Logger Instance: + * Instead of using `org.slf4j.LoggerFactory`, use `org.apache.spark.internal.SparkLoggerFactory` + * to ensure structured logging. + * + * + * import org.apache.spark.internal.SparkLogger; + * import org.apache.spark.internal.SparkLoggerFactory; + * private static final SparkLogger logger = SparkLoggerFactory.getLogger(JavaUtils.class); + * + * + * Logging Messages with Variables: + * When logging messages with variables, wrap all the variables with `MDC`s and they will be + * automatically added to the Mapped Diagnostic Context (MDC). + * + * + * import org.apache.spark.internal.LogKeys; + * import org.apache.spark.internal.MDC; + * logger.error("Unable to delete file for partition {}", MDC.of(LogKeys.PARTITION_ID$.MODULE$, i)); + * + * + * Constant String Messages: + * For logging constant string messages, use the standard logging methods. + * + * + * logger.error("Failed to abort the writer after failing to write map output.", e); + * + * + * External third-party ecosystem access: Review Comment: ```suggestion ``` ## common/utils/src/main/java/org/apache/spark/internal/SparkLogger.java: ## @@ -28,6 +28,50 @@ import org.slf4j.Logger; // checkstyle.on: RegexpSinglelineJava +// checkstyle.off: RegexpSinglelineJava +/** + * Guidelines for the Structured Logging Framework - Java Logging + * + * + * Use the `org.apache.spark.internal.SparkLoggerFactory` to get the logger instance in Java code: + * Getting Logger Instance: + * Instead of using `org.slf4j.LoggerFactory`, use `org.apache.spark.internal.SparkLoggerFactory` + * to ensure structured logging. + * + * + * import org.apache.spark.internal.SparkLogger; + * import org.apache.spark.internal.SparkLoggerFactory; + * private static final SparkLogger logger = SparkLoggerFactory.getLogger(JavaUtils.class); + * + * + * Logging Messages with Variables: + * When logging messages with variables, wrap all the variables with `MDC`s and they will be + * automatically added to the Mapped Diagnostic Context (MDC). + * + * + * import org.apache.spark.internal.LogKeys; + * import org.apache.spark.internal.MDC; + * logger.error("Unable to delete file for partition {}", MDC.of(LogKeys.PARTITION_ID$.MODULE$, i)); + * + * + * Constant String Messages: + * For logging constant string messages, use the standard logging methods. + * + * + * logger.error("Failed to abort the writer after failing to write map output.", e); + * + * + * External third-party ecosystem access: + * If you want to output logs in `java code` through the structured log framework, + * you can define `custom LogKey` and use it in `java` code as follows: + * + * + * // External third-party ecosystem `custom LogKey` must be `implements LogKey` Review Comment: ```suggestion * // To add a `custom LogKey`, implement `LogKey` ``` ## common/utils/src/test/java/org/apache/spark/util/SparkLoggerSuiteBase.java: ## @@ -90,6 +94,9 @@ private String basicMsg() { // test for external system custom LogKey abstract String expectedPatternForExternalSystemCustomLogKey(Level level); + // test for external system java custom LogKey + abstract String expectedPatternForExternalSystemJavaCustomLogKey(Level level); Review Comment: ```suggestion abstract String expectedPatternForJavaCustomLogKey(Level level); ``` ## common/utils/src/test/java/org/apache/spark/util/StructuredSparkLoggerSuite.java: ## @@ -161,5 +161,19 @@ String expectedPatternForExternalSystemCustomLogKey(Level level) { "logger": "" }"""); } + + @Override + String expectedPatternForExternalSystemJavaCustomLogKey(Level level) { Review Comment: ```suggestion String expectedPatternForJavaCustomLogKey(Level level) { ``` ## common/utils/src/test/java/org/apache/spark/util/PatternSparkLoggerSuite.java: ## @@ -87,4 +87,10 @@ String expectedPatternForMsgWithMDCValueIsNull(Level level) { String expectedPatternForExternalSystemCustomLogKey(Level level) { return toRegexPattern(level, ".* : External system custom log message.\n"); } + + @Override + String expectedPatternForExternalSystemJavaCustomLogKey(Level level) { +return toRegexPattern(level, + ".* : External system custom log message.\n"); Review Comment: ```suggestion ".* : custom log message.\n"); ``` ## common/utils/src/test/java/org/apa
Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the `scala/java` doc [spark]
panbingkun commented on code in PR #46634: URL: https://github.com/apache/spark/pull/46634#discussion_r1609203297 ## common/utils/src/main/scala/org/apache/spark/internal/README.md: ## @@ -45,3 +45,29 @@ logger.error("Failed to abort the writer after failing to write map output.", e) ## Exceptions To ensure logs are compatible with Spark SQL and log analysis tools, avoid `Exception.printStackTrace()`. Use `logError`, `logWarning`, and `logInfo` methods from the `Logging` trait to log exceptions, maintaining structured and parsable logs. + +## External third-party ecosystem access + +* If you want to output logs in `scala code` through the structured log framework, you can define `custom LogKey` and use it in `scala` code as follows: + +```scala +// External third-party ecosystem `custom LogKey` must be `extends LogKey` +case object CUSTOM_LOG_KEY extends LogKey +``` +```scala +import org.apache.spark.internal.MDC; + +logInfo(log"${MDC(CUSTOM_LOG_KEY, "key")}") +``` + +* If you want to output logs in `java code` through the structured log framework, you can define `custom LogKey` and use it in `java` code as follows: + +```java +// External third-party ecosystem `custom LogKey` must be `implements LogKey` +public static class CUSTOM_LOG_KEY implements LogKey { } +``` +```java +import org.apache.spark.internal.MDC; + +logger.error("Unable to delete key {} for cache", MDC.of(CUSTOM_LOG_KEY, "key")); +``` Review Comment: @mridulm @gengliangwang The `common/utils/src/main/scala/org/apache/spark/internal/README.md` file has been dropped, and the content of the original document has been distributed to `java/scala` docs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org