(spark) branch master updated: [SPARK-47579][CORE][PART3][FOLLOWUP] Fix KubernetesSuite

2024-05-26 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 541158fe0352 [SPARK-47579][CORE][PART3][FOLLOWUP] Fix KubernetesSuite
541158fe0352 is described below

commit 541158fe03529d5a28eaeb61d801d065ff4ef664
Author: panbingkun 
AuthorDate: Sun May 26 08:35:45 2024 -0700

[SPARK-47579][CORE][PART3][FOLLOWUP] Fix KubernetesSuite

### What changes were proposed in this pull request?
The pr is following up https://github.com/apache/spark/pull/46739, and aims 
to fix `KubernetesSuite `.

1.Unfortunately, after `correcting` the `typo` from `decommision` to 
`decommission`, it seems that GA has been broken.
https://github.com/apache/spark/assets/15246973/6212debb-0ff6-4d22-8999-e37aa2cb2c10;>


2.https://github.com/panbingkun/spark/actions/runs/9232744348/job/25406127982
https://github.com/apache/spark/assets/15246973/4e71598c-22f3-4fd2-afba-fd91ddce5f55;>

### Why are the changes needed?
Only fix `KubernetesSuite`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46746 from panbingkun/fix_KubernetesSuite.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala
 
b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala
index 1b9b5310c2ee..ae5f037c6b7d 100644
--- 
a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala
+++ 
b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala
@@ -175,7 +175,7 @@ private[spark] trait DecommissionSuite { k8sSuite: 
KubernetesSuite =>
 expectedDriverLogOnCompletion = Seq(
   "Finished waiting, stopping Spark",
   "Decommission executors",
-  "Remove reason statistics: (gracefully decommissioned: 1, 
decommision unfinished: 0, " +
+  "Remove reason statistics: (gracefully decommissioned: 1, 
decommission unfinished: 0, " +
 "driver killed: 0, unexpectedly exited: 0)."),
 appArgs = Array.empty[String],
 driverPodChecker = doBasicDriverPyPodCheck,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48320][CORE][DOCS] Add structured logging guide to the scala and java doc

2024-05-25 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 468aa842c643 [SPARK-48320][CORE][DOCS] Add structured logging guide to 
the scala and java doc
468aa842c643 is described below

commit 468aa842c6435b3c3ff49df30e8958d08ec2edb0
Author: panbingkun 
AuthorDate: Sat May 25 09:43:01 2024 -0700

[SPARK-48320][CORE][DOCS] Add structured logging guide to the scala and 
java doc

### What changes were proposed in this pull request?
The pr aims to add `external third-party ecosystem access` guide to the 
`scala/java` doc.

The external third-party ecosystem is very extensive. Currently, the 
document covers two scenarios:
- Pure java (for example, an application only uses the java language - many 
of our internal production applications are like this)
- java + scala

### Why are the changes needed?
Provide instructions for external third-party ecosystem access to the 
structured log framework.

### Does this PR introduce _any_ user-facing change?
Yes, When an external third-party ecosystem wants to access the structured 
log framework, developers can get help through this document.

### How was this patch tested?
- Add new UT.
- Manually test.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46634 from panbingkun/SPARK-48320.

Lead-authored-by: panbingkun 
Co-authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../org/apache/spark/internal/SparkLogger.java | 43 +
 .../scala/org/apache/spark/internal/LogKey.scala   | 30 +++-
 .../scala/org/apache/spark/internal/Logging.scala  | 38 +++
 .../main/scala/org/apache/spark/internal/README.md | 47 --
 .../apache/spark/util/PatternSparkLoggerSuite.java |  9 +++-
 .../apache/spark/util/SparkLoggerSuiteBase.java| 55 +-
 .../spark/util/StructuredSparkLoggerSuite.java | 20 ++--
 .../apache/spark/util/PatternLoggingSuite.scala|  4 +-
 .../apache/spark/util/StructuredLoggingSuite.scala | 29 ++--
 9 files changed, 193 insertions(+), 82 deletions(-)

diff --git 
a/common/utils/src/main/java/org/apache/spark/internal/SparkLogger.java 
b/common/utils/src/main/java/org/apache/spark/internal/SparkLogger.java
index bf8adb70637e..32dd8f1f26b5 100644
--- a/common/utils/src/main/java/org/apache/spark/internal/SparkLogger.java
+++ b/common/utils/src/main/java/org/apache/spark/internal/SparkLogger.java
@@ -28,6 +28,49 @@ import 
org.apache.logging.log4j.message.ParameterizedMessageFactory;
 import org.slf4j.Logger;
 // checkstyle.on: RegexpSinglelineJava
 
+// checkstyle.off: RegexpSinglelineJava
+/**
+ * Guidelines for the Structured Logging Framework - Java Logging
+ * 
+ *
+ * Use the `org.apache.spark.internal.SparkLoggerFactory` to get the logger 
instance in Java code:
+ * Getting Logger Instance:
+ *   Instead of using `org.slf4j.LoggerFactory`, use 
`org.apache.spark.internal.SparkLoggerFactory`
+ *   to ensure structured logging.
+ * 
+ *
+ * import org.apache.spark.internal.SparkLogger;
+ * import org.apache.spark.internal.SparkLoggerFactory;
+ * private static final SparkLogger logger = 
SparkLoggerFactory.getLogger(JavaUtils.class);
+ * 
+ *
+ * Logging Messages with Variables:
+ *   When logging messages with variables, wrap all the variables with `MDC`s 
and they will be
+ *   automatically added to the Mapped Diagnostic Context (MDC).
+ * 
+ *
+ * import org.apache.spark.internal.LogKeys;
+ * import org.apache.spark.internal.MDC;
+ * logger.error("Unable to delete file for partition {}", 
MDC.of(LogKeys.PARTITION_ID$.MODULE$, i));
+ * 
+ *
+ * Constant String Messages:
+ *   For logging constant string messages, use the standard logging methods.
+ * 
+ *
+ * logger.error("Failed to abort the writer after failing to write map 
output.", e);
+ * 
+ *
+ * If you want to output logs in `java code` through the structured log 
framework,
+ * you can define `custom LogKey` and use it in `java` code as follows:
+ * 
+ *
+ * // To add a `custom LogKey`, implement `LogKey`
+ * public static class CUSTOM_LOG_KEY implements LogKey { }
+ * import org.apache.spark.internal.MDC;
+ * logger.error("Unable to delete key {} for cache", MDC.of(CUSTOM_LOG_KEY, 
"key"));
+ */
+// checkstyle.on: RegexpSinglelineJava
 public class SparkLogger {
 
   private static final MessageFactory MESSAGE_FACTORY = 
ParameterizedMessageFactory.INSTANCE;
diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index 534f00911922..1366277827f7 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.sca

(spark) branch master updated: [SPARK-47579][SQL][FOLLOWUP] Restore the `--help` print format of spark sql shell

2024-05-24 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3cb30c2366b2 [SPARK-47579][SQL][FOLLOWUP] Restore the `--help` print 
format of spark sql shell
3cb30c2366b2 is described below

commit 3cb30c2366b27c5a65ec02121c30bd1a4eb20584
Author: Kent Yao 
AuthorDate: Fri May 24 09:43:03 2024 -0700

[SPARK-47579][SQL][FOLLOWUP] Restore the `--help` print format of spark sql 
shell

### What changes were proposed in this pull request?

Restore the print format of spark sql shell

### Why are the changes needed?

bugfix

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

manually


![image](https://github.com/apache/spark/assets/8326978/17b9d009-5d93-4d84-9367-7308b4cda426)


![image](https://github.com/apache/spark/assets/8326978/a5e333bd-0e22-4d5a-83f1-843767f6d5f5)

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46735 from yaooqinn/SPARK-47579.

Authored-by: Kent Yao 
Signed-off-by: Gengliang Wang 
---
 common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala | 1 -
 core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala | 3 ++-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index 1f67a211c01f..99fc58b03503 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -585,7 +585,6 @@ object LogKeys {
   case object SESSION_KEY extends LogKey
   case object SET_CLIENT_INFO_REQUEST extends LogKey
   case object SHARD_ID extends LogKey
-  case object SHELL_OPTIONS extends LogKey
   case object SHORT_USER_NAME extends LogKey
   case object SHUFFLE_BLOCK_INFO extends LogKey
   case object SHUFFLE_DB_BACKEND_KEY extends LogKey
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
index 61235a701907..e47596a6ae43 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
@@ -588,7 +588,8 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 )
 
 if (SparkSubmit.isSqlShell(mainClass)) {
-  logInfo(log"CLI options:\n${MDC(SHELL_OPTIONS, getSqlShellOptions())}")
+  logInfo("CLI options:")
+  logInfo(getSqlShellOptions())
 }
 
 throw SparkUserAppException(exitCode)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (4a471cceebed -> 2516fd8439df)

2024-05-23 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4a471cceebed [MINOR][TESTS] Add a helper function for `spark.table` in 
dsl
 add 2516fd8439df [SPARK-45009][SQL][FOLLOW UP] Add error class and tests 
for decorrelation of predicate subqueries in join condition which reference 
both join child

No new revisions were added by this update.

Summary of changes:
 .../src/main/resources/error/error-conditions.json |  6 +++
 .../exists-in-join-condition.sql.out   | 44 ++
 .../in-subquery-in-join-condition.sql.out  | 44 ++
 .../exists-subquery/exists-in-join-condition.sql   |  4 ++
 .../in-subquery/in-subquery-in-join-condition.sql  |  4 ++
 .../exists-in-join-condition.sql.out   | 30 +++
 .../in-subquery-in-join-condition.sql.out  | 30 +++
 7 files changed, 162 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (6b3a88195e30 -> febdbf56fb22)

2024-05-21 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 6b3a88195e30 [SPARK-48329][SQL] Enable 
`spark.sql.sources.v2.bucketing.pushPartValues.enabled` by default
 add febdbf56fb22 [SPARK-48031] Grandfather legacy views to SCHEMA BINDING

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/catalog/interface.scala |   4 +-
 .../sql/catalyst/catalog/SessionCatalogSuite.scala |   5 +-
 .../view-schema-binding-config.sql.out | 136 +++
 .../inputs/view-schema-binding-config.sql  |  29 +++
 .../sql-tests/results/show-tables.sql.out  |   2 +-
 .../results/view-schema-binding-config.sql.out | 256 +
 .../execution/command/ShowTablesSuiteBase.scala|   6 +-
 7 files changed, 429 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated (c1dd4a5df693 -> 1a454287c01e)

2024-05-17 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


from c1dd4a5df693 [SPARK-48297][SQL] Fix a regression TRANSFORM clause with 
char/varchar
 add 1a454287c01e [SPARK-48294][SQL][3.5] Handle lowercase in 
nestedTypeMissingElementTypeError

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/errors/QueryParsingErrors.scala  |  2 +-
 .../spark/sql/errors/QueryParsingErrorsSuite.scala| 19 +++
 2 files changed, 20 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48303][CORE] Reorganize `LogKeys`

2024-05-17 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5643cfb71d34 [SPARK-48303][CORE] Reorganize `LogKeys`
5643cfb71d34 is described below

commit 5643cfb71d343133a185aa257f137074f41abfb3
Author: panbingkun 
AuthorDate: Thu May 16 23:20:23 2024 -0700

[SPARK-48303][CORE] Reorganize `LogKeys`

### What changes were proposed in this pull request?
The pr aims to `reorganize` `LogKeys`, includes:
- remove some unused `LogLeys`
  ACTUAL_BROADCAST_OUTPUT_STATUS_SIZE
  DEFAULT_COMPACTION_INTERVAL
  DRIVER_LIBRARY_PATH_KEY
  EXISTING_JARS
  EXPECTED_ANSWER
  FILTERS
  HAS_R_PACKAGE
  JAR_ENTRY
  LOG_KEY_FILE
  NUM_ADDED_MASTERS
  NUM_ADDED_WORKERS
  NUM_PARTITION_VALUES
  OUTPUT_LINE
  OUTPUT_LINE_NUMBER
  PARTITIONS_SIZE
  RULE_BATCH_NAME
  SERIALIZE_OUTPUT_LENGTH
  SHELL_COMMAND
  STREAM_SOURCE

- merge `PARAMETER` into `PARAM` (because some are `full` spelled, and some 
are `abbreviations`, which are not unified)
  ESTIMATOR_PARAMETER_MAP -> ESTIMATOR_PARAM_MAP
  FUNCTION_PARAMETER -> FUNCTION_PARAM
  METHOD_PARAMETER_TYPES -> METHOD_PARAM_TYPES

- merge `NUMBER` into `NUM` (abbreviations)
  MIN_VERSION_NUMBER -> MIN_VERSION_NUM
  RULE_NUMBER_OF_RUNS -> NUM_RULE_OF_RUNS
  VERSION_NUMBER -> VERSION_NUM

- merge `TOTAL` into `NUM`
  TOTAL_RECORDS_READ -> NUM_RECORDS_READ
  TRAIN_WORD_COUNT -> NUM_TRAIN_WORD

- `NUM` as prefix
  CHECKSUM_FILE_NUM -> NUM_CHECKSUM_FILE
  DATA_FILE_NUM -> NUM_DATA_FILE
  INDEX_FILE_NUM -> NUM_INDEX_FILE

- COUNR -> NUM
  EXECUTOR_DESIRED_COUNT -> NUM_EXECUTOR_DESIRED
  EXECUTOR_LAUNCH_COUNT -> NUM_EXECUTOR_LAUNCH
  EXECUTOR_TARGET_COUNT -> NUM_EXECUTOR_TARGET
  KAFKA_PULLS_COUNT -> NUM_KAFKA_PULLS
  KAFKA_RECORDS_PULLED_COUNT -> NUM_KAFKA_RECORDS_PULLED
  MIN_FREQUENT_PATTERN_COUNT -> MIN_NUM_FREQUENT_PATTERN
  POD_COUNT -> NUM_POD
  POD_SHARED_SLOT_COUNT -> NUM_POD_SHARED_SLOT
  POD_TARGET_COUNT -> NUM_POD_TARGET
  RETRY_COUNT -> NUM_RETRY

- fix some `typo`
  MALFORMATTED_STIRNG -> MALFORMATTED_STRING

- other
  MAX_LOG_NUM_POLICY -> MAX_NUM_LOG_POLICY
  WEIGHTED_NUM -> NUM_WEIGHTED_EXAMPLES

Changes in other code are additional changes caused by the above 
adjustments.

### Why are the changes needed?
Let's make `LogKeys` easier to understand and more consistent.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46612 from panbingkun/reorganize_logkey.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../network/shuffle/RetryingBlockTransferor.java   |  6 +-
 .../scala/org/apache/spark/internal/LogKey.scala   | 68 --
 .../sql/connect/client/GrpcRetryHandler.scala  |  8 +--
 .../sql/kafka010/KafkaOffsetReaderAdmin.scala  |  4 +-
 .../sql/kafka010/KafkaOffsetReaderConsumer.scala   |  4 +-
 .../sql/kafka010/consumer/KafkaDataConsumer.scala  |  6 +-
 .../streaming/kinesis/KinesisBackedBlockRDD.scala  |  4 +-
 .../org/apache/spark/api/r/RBackendHandler.scala   |  4 +-
 .../spark/deploy/history/FsHistoryProvider.scala   |  2 +-
 .../org/apache/spark/deploy/master/Master.scala|  2 +-
 .../apache/spark/ml/tree/impl/RandomForest.scala   |  4 +-
 .../apache/spark/ml/tuning/CrossValidator.scala|  4 +-
 .../spark/ml/tuning/TrainValidationSplit.scala |  4 +-
 .../org/apache/spark/mllib/feature/Word2Vec.scala  |  4 +-
 .../org/apache/spark/mllib/fpm/PrefixSpan.scala|  4 +-
 .../apache/spark/mllib/linalg/VectorsSuite.scala   |  4 +-
 .../cluster/k8s/ExecutorPodsAllocator.scala|  6 +-
 ...ernetesLocalDiskShuffleExecutorComponents.scala |  6 +-
 .../apache/spark/deploy/yarn/YarnAllocator.scala   |  6 +-
 .../catalyst/expressions/V2ExpressionUtils.scala   |  4 +-
 .../spark/sql/catalyst/rules/RuleExecutor.scala|  6 +-
 .../sql/execution/streaming/state/RocksDB.scala| 18 +++---
 .../streaming/state/RocksDBFileManager.scala   | 22 +++
 .../state/RocksDBStateStoreProvider.scala  |  6 +-
 .../apache/hive/service/server/HiveServer2.java|  2 +-
 .../spark/sql/hive/client/HiveClientImpl.scala |  2 +-
 .../org/apache/spark/streaming/Checkpoint.scala|  4 +-
 .../streaming/util/FileBasedWriteAheadLog.scala|  4 +-
 28 files changed, 101 insertions(+), 117 deletions(-)

diff --git 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.jav

(spark) branch master updated: [SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError

2024-05-16 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 59f88c372522 [SPARK-48294][SQL] Handle lowercase in 
nestedTypeMissingElementTypeError
59f88c372522 is described below

commit 59f88c3725222b84b2d0b51ba40a769d99866b56
Author: Michael Zhang 
AuthorDate: Thu May 16 14:58:25 2024 -0700

[SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError

### What changes were proposed in this pull request?

Handle lowercase values inside of nestTypeMissingElementTypeError to 
prevent match errors.

### Why are the changes needed?

The previous match error was not user-friendly. Now it gives an actionable 
`INCOMPLETE_TYPE_DEFINITION` error.

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Newly added tests pass.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46623 from michaelzhan-db/SPARK-48294.

Authored-by: Michael Zhang 
Signed-off-by: Gengliang Wang 
---
 .../apache/spark/sql/errors/QueryParsingErrors.scala  |  2 +-
 .../spark/sql/errors/QueryParsingErrorsSuite.scala| 19 +++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
index 5eafd4d915a4..816fa546a138 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
@@ -289,7 +289,7 @@ private[sql] object QueryParsingErrors extends 
DataTypeErrorsBase {
 
   def nestedTypeMissingElementTypeError(
   dataType: String, ctx: PrimitiveDataTypeContext): Throwable = {
-dataType match {
+dataType.toUpperCase(Locale.ROOT) match {
   case "ARRAY" =>
 new ParseException(
   errorClass = "INCOMPLETE_TYPE_DEFINITION.ARRAY",
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
index 29ab6e994e42..b7fb65091ef7 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala
@@ -647,6 +647,13 @@ class QueryParsingErrorsSuite extends QueryTest with 
SharedSparkSession with SQL
   sqlState = "42K01",
   parameters = Map("elementType" -> ""),
   context = ExpectedContext(fragment = "ARRAY", start = 30, stop = 34))
+// Create column of array type without specifying element type in lowercase
+checkError(
+  exception = parseException("CREATE TABLE tbl_120691 (col1 array)"),
+  errorClass = "INCOMPLETE_TYPE_DEFINITION.ARRAY",
+  sqlState = "42K01",
+  parameters = Map("elementType" -> ""),
+  context = ExpectedContext(fragment = "array", start = 30, stop = 34))
   }
 
   test("INCOMPLETE_TYPE_DEFINITION: struct type definition is incomplete") {
@@ -674,6 +681,12 @@ class QueryParsingErrorsSuite extends QueryTest with 
SharedSparkSession with SQL
   errorClass = "PARSE_SYNTAX_ERROR",
   sqlState = "42601",
   parameters = Map("error" -> "'<'", "hint" -> ": missing ')'"))
+// Create column of struct type without specifying field type in lowercase
+checkError(
+  exception = parseException("CREATE TABLE tbl_120691 (col1 struct)"),
+  errorClass = "INCOMPLETE_TYPE_DEFINITION.STRUCT",
+  sqlState = "42K01",
+  context = ExpectedContext(fragment = "struct", start = 30, stop = 35))
   }
 
   test("INCOMPLETE_TYPE_DEFINITION: map type definition is incomplete") {
@@ -695,6 +708,12 @@ class QueryParsingErrorsSuite extends QueryTest with 
SharedSparkSession with SQL
   errorClass = "PARSE_SYNTAX_ERROR",
   sqlState = "42601",
   parameters = Map("error" -> "'<'", "hint" -> ": missing ')'"))
+// Create column of map type without specifying key/value types in 
lowercase
+checkError(
+  exception = parseException("SELECT CAST(map('1',2) AS map)"),
+  errorClass = "INCOMPLETE_TYPE_DEFINITION.MAP",
+  sqlState = "42K01",
+  context = ExpectedContext(fragment = "map", start = 26, stop = 28))
   }
 
   test("INVALID_ESC: Escape string must contain only one character") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48291][CORE][FOLLOWUP] Rename Java *LoggerSuite* as *SparkLoggerSuite*

2024-05-16 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 283b2ff42221 [SPARK-48291][CORE][FOLLOWUP] Rename Java *LoggerSuite* 
as *SparkLoggerSuite*
283b2ff42221 is described below

commit 283b2ff422218b025e7b0170e4b7ed31a1294a80
Author: panbingkun 
AuthorDate: Thu May 16 11:55:20 2024 -0700

[SPARK-48291][CORE][FOLLOWUP] Rename Java *LoggerSuite* as 
*SparkLoggerSuite*

### What changes were proposed in this pull request?
The pr is follow up https://github.com/apache/spark/pull/46600

 to . Similarly, to maintain consistency,  should be renamed to

### Why are the changes needed?
After `org.apache.spark.internal.Logger` is renamed to 
`org.apache.spark.internal.SparkLogger` and 
`org.apache.spark.internal.LoggerFactory` is renamed to 
`org.apache.spark.internal.SparkLoggerFactory.`, the related UT's names should 
also be `renamed`, so that developers can easily locate the related UT.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46615 from panbingkun/SPARK-48291_follow_up.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../util/{PatternLoggerSuite.java => PatternSparkLoggerSuite.java} | 7 ---
 .../spark/util/{LoggerSuiteBase.java => SparkLoggerSuiteBase.java} | 2 +-
 ...{StructuredLoggerSuite.java => StructuredSparkLoggerSuite.java} | 6 +++---
 common/utils/src/test/resources/log4j2.properties  | 4 ++--
 4 files changed, 10 insertions(+), 9 deletions(-)

diff --git 
a/common/utils/src/test/java/org/apache/spark/util/PatternLoggerSuite.java 
b/common/utils/src/test/java/org/apache/spark/util/PatternSparkLoggerSuite.java
similarity index 91%
rename from 
common/utils/src/test/java/org/apache/spark/util/PatternLoggerSuite.java
rename to 
common/utils/src/test/java/org/apache/spark/util/PatternSparkLoggerSuite.java
index 33de91697efa..2d370bad4cc8 100644
--- a/common/utils/src/test/java/org/apache/spark/util/PatternLoggerSuite.java
+++ 
b/common/utils/src/test/java/org/apache/spark/util/PatternSparkLoggerSuite.java
@@ -22,9 +22,10 @@ import org.apache.logging.log4j.Level;
 import org.apache.spark.internal.SparkLogger;
 import org.apache.spark.internal.SparkLoggerFactory;
 
-public class PatternLoggerSuite extends LoggerSuiteBase {
+public class PatternSparkLoggerSuite extends SparkLoggerSuiteBase {
 
-  private static final SparkLogger LOGGER = 
SparkLoggerFactory.getLogger(PatternLoggerSuite.class);
+  private static final SparkLogger LOGGER =
+SparkLoggerFactory.getLogger(PatternSparkLoggerSuite.class);
 
   private String toRegexPattern(Level level, String msg) {
 return msg
@@ -39,7 +40,7 @@ public class PatternLoggerSuite extends LoggerSuiteBase {
 
   @Override
   String className() {
-return PatternLoggerSuite.class.getSimpleName();
+return PatternSparkLoggerSuite.class.getSimpleName();
   }
 
   @Override
diff --git 
a/common/utils/src/test/java/org/apache/spark/util/LoggerSuiteBase.java 
b/common/utils/src/test/java/org/apache/spark/util/SparkLoggerSuiteBase.java
similarity index 99%
rename from 
common/utils/src/test/java/org/apache/spark/util/LoggerSuiteBase.java
rename to 
common/utils/src/test/java/org/apache/spark/util/SparkLoggerSuiteBase.java
index ecc0a75070c7..46bfe3415080 100644
--- a/common/utils/src/test/java/org/apache/spark/util/LoggerSuiteBase.java
+++ b/common/utils/src/test/java/org/apache/spark/util/SparkLoggerSuiteBase.java
@@ -30,7 +30,7 @@ import org.apache.spark.internal.SparkLogger;
 import org.apache.spark.internal.LogKeys;
 import org.apache.spark.internal.MDC;
 
-public abstract class LoggerSuiteBase {
+public abstract class SparkLoggerSuiteBase {
 
   abstract SparkLogger logger();
   abstract String className();
diff --git 
a/common/utils/src/test/java/org/apache/spark/util/StructuredLoggerSuite.java 
b/common/utils/src/test/java/org/apache/spark/util/StructuredSparkLoggerSuite.java
similarity index 95%
rename from 
common/utils/src/test/java/org/apache/spark/util/StructuredLoggerSuite.java
rename to 
common/utils/src/test/java/org/apache/spark/util/StructuredSparkLoggerSuite.java
index 110e7cc7794e..416f0b6172c0 100644
--- 
a/common/utils/src/test/java/org/apache/spark/util/StructuredLoggerSuite.java
+++ 
b/common/utils/src/test/java/org/apache/spark/util/StructuredSparkLoggerSuite.java
@@ -24,10 +24,10 @@ import org.apache.logging.log4j.Level;
 import org.apache.spark.internal.SparkLogger;
 import org.apache.spark.internal.SparkLoggerFactory;
 
-public class StructuredLoggerSuite extends LoggerSuiteBase {
+public class StructuredSparkLoggerSuite extends SparkLoggerSuiteBase {
 
   private sta

(spark) branch master updated: [SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory`

2024-05-15 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dec910ba3c36 [SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & 
`org.slf4j.LoggerFactory`
dec910ba3c36 is described below

commit dec910ba3c36e27b9cff5b5e139be82af6c799ab
Author: panbingkun 
AuthorDate: Wed May 15 21:50:48 2024 -0700

[SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & 
`org.slf4j.LoggerFactory`

### What changes were proposed in this pull request?
The pr aims to ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory`.

### Why are the changes needed?
After the migration of structured logs on the `java side` is completed, we 
need to ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory` in the code 
to avoid the log format that is not written as required in the future new java 
code.

### Does this PR introduce _any_ user-facing change?
Yes, only for spark developers.

### How was this patch tested?
- Manually test.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46502 from panbingkun/ban_import_slf4j.

Authored-by: panbingkun 
    Signed-off-by: Gengliang Wang 
---
 common/kvstore/pom.xml |  6 ++
 .../java/org/apache/spark/util/kvstore/LevelDBIterator.java|  7 ---
 .../java/org/apache/spark/util/kvstore/DBIteratorSuite.java|  2 ++
 .../java/org/apache/spark/util/kvstore/LevelDBBenchmark.java   |  2 ++
 .../java/org/apache/spark/util/kvstore/RocksDBBenchmark.java   |  2 ++
 .../apache/spark/network/util/TransportFrameDecoderSuite.java  |  2 ++
 .../spark/network/shuffle/RemoteBlockPushResolverSuite.java|  2 ++
 .../apache/spark/network/shuffle/TestShuffleDataContext.java   |  2 ++
 .../src/main/java/org/apache/spark/internal/SparkLogger.java   |  9 ++---
 .../java/org/apache/spark/internal/SparkLoggerFactory.java | 10 ++
 dev/checkstyle.xml |  5 +
 11 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 3820d1b8e395..046648e9c2ae 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -40,6 +40,12 @@
   spark-tags_${scala.binary.version}
 
 
+
+  org.apache.spark
+  spark-common-utils_${scala.binary.version}
+  ${project.version}
+
+
 
   com.google.guava
   guava
diff --git 
a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java
 
b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java
index b830e6afc617..69757fdc65d6 100644
--- 
a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java
+++ 
b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java
@@ -29,8 +29,9 @@ import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Preconditions;
 import com.google.common.base.Throwables;
 import org.iq80.leveldb.DBIterator;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
+
+import org.apache.spark.internal.SparkLogger;
+import org.apache.spark.internal.SparkLoggerFactory;
 
 class LevelDBIterator implements KVStoreIterator {
 
@@ -302,7 +303,7 @@ class LevelDBIterator implements KVStoreIterator {
   }
 
   static class ResourceCleaner implements Runnable {
-private static final Logger LOG = 
LoggerFactory.getLogger(ResourceCleaner.class);
+private static final SparkLogger LOG = 
SparkLoggerFactory.getLogger(ResourceCleaner.class);
 
 private final DBIterator dbIterator;
 
diff --git 
a/common/kvstore/src/test/java/org/apache/spark/util/kvstore/DBIteratorSuite.java
 
b/common/kvstore/src/test/java/org/apache/spark/util/kvstore/DBIteratorSuite.java
index daedd56890a6..72c3690d1a18 100644
--- 
a/common/kvstore/src/test/java/org/apache/spark/util/kvstore/DBIteratorSuite.java
+++ 
b/common/kvstore/src/test/java/org/apache/spark/util/kvstore/DBIteratorSuite.java
@@ -32,8 +32,10 @@ import org.junit.jupiter.api.AfterAll;
 import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.BeforeAll;
 import org.junit.jupiter.api.Test;
+// checkstyle.off: RegexpSinglelineJava
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+// checkstyle.on: RegexpSinglelineJava
 import static org.junit.jupiter.api.Assertions.*;
 
 public abstract class DBIteratorSuite {
diff --git 
a/common/kvstore/src/test/java/org/apache/spark/util/kvstore/LevelDBBenchmark.java
 
b/common/kvstore/src/test/java/org/apache/spark/util/kvstore/LevelDBBenchmark.java
index 3158c18f9e1d..ff6db8fc34c9 100644
--- 
a/common/kvstore/src/test/java/org/apache/spark/util/kvstore/LevelDBBenchmark.java
+++ 
b/common/kvstore/src/test/java/org/a

(spark) branch master updated: [SPARK-48291][CORE] Rename Java Logger as SparkLogger

2024-05-15 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a252cbd5ca13 [SPARK-48291][CORE] Rename Java Logger as SparkLogger
a252cbd5ca13 is described below

commit a252cbd5ca13fb7b758c839edc92b50336747d82
Author: Gengliang Wang 
AuthorDate: Wed May 15 16:43:45 2024 -0700

[SPARK-48291][CORE] Rename Java Logger as SparkLogger

### What changes were proposed in this pull request?

Two new classes `org.apache.spark.internal.Logger` and 
`org.apache.spark.internal.LoggerFactory` were introduced from 
https://github.com/apache/spark/pull/46301.
Given that Logger is a widely recognized **interface** in Log4j, it may 
lead to confusion to have a class with the same name. To avoid this and clarify 
its purpose within the Spark framework, I propose renaming 
`org.apache.spark.internal.Logger` to `org.apache.spark.internal.SparkLogger`. 
Similarly, to maintain consistency, `org.apache.spark.internal.LoggerFactory` 
should be renamed to `org.apache.spark.internal.SparkLoggerFactory`.

### Why are the changes needed?

To avoid naming confusion and clarify the java Spark logger purpose within 
the logging framework

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

GA tests
### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46600 from gengliangwang/refactorLogger.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../java/org/apache/spark/network/TransportContext.java |  6 +++---
 .../org/apache/spark/network/client/TransportClient.java|  6 +++---
 .../apache/spark/network/client/TransportClientFactory.java |  7 ---
 .../spark/network/client/TransportResponseHandler.java  |  7 ---
 .../apache/spark/network/crypto/AuthClientBootstrap.java|  6 +++---
 .../org/apache/spark/network/crypto/AuthRpcHandler.java |  6 +++---
 .../org/apache/spark/network/protocol/MessageDecoder.java   |  6 +++---
 .../org/apache/spark/network/protocol/MessageEncoder.java   |  6 +++---
 .../apache/spark/network/protocol/SslMessageEncoder.java|  6 +++---
 .../org/apache/spark/network/sasl/SaslClientBootstrap.java  |  6 +++---
 .../java/org/apache/spark/network/sasl/SaslRpcHandler.java  |  6 +++---
 .../java/org/apache/spark/network/sasl/SparkSaslClient.java |  6 +++---
 .../java/org/apache/spark/network/sasl/SparkSaslServer.java |  6 +++---
 .../spark/network/server/ChunkFetchRequestHandler.java  |  7 ---
 .../apache/spark/network/server/OneForOneStreamManager.java |  7 ---
 .../java/org/apache/spark/network/server/RpcHandler.java|  6 +++---
 .../spark/network/server/TransportChannelHandler.java   |  7 ---
 .../spark/network/server/TransportRequestHandler.java   |  7 ---
 .../org/apache/spark/network/server/TransportServer.java|  6 +++---
 .../apache/spark/network/ssl/ReloadingX509TrustManager.java |  7 ---
 .../main/java/org/apache/spark/network/ssl/SSLFactory.java  |  6 +++---
 .../main/java/org/apache/spark/network/util/DBProvider.java |  6 +++---
 .../java/org/apache/spark/network/util/LevelDBProvider.java |  8 
 .../java/org/apache/spark/network/util/NettyLogger.java |  6 +++---
 .../java/org/apache/spark/network/util/RocksDBProvider.java |  8 
 .../org/apache/spark/network/sasl/ShuffleSecretManager.java |  7 ---
 .../org/apache/spark/network/shuffle/BlockStoreClient.java  |  6 +++---
 .../apache/spark/network/shuffle/ExternalBlockHandler.java  |  7 ---
 .../spark/network/shuffle/ExternalShuffleBlockResolver.java |  7 ---
 .../apache/spark/network/shuffle/OneForOneBlockFetcher.java |  7 ---
 .../apache/spark/network/shuffle/OneForOneBlockPusher.java  |  7 ---
 .../spark/network/shuffle/RemoteBlockPushResolver.java  |  7 ---
 .../spark/network/shuffle/RetryingBlockTransferor.java  |  7 ---
 .../spark/network/shuffle/ShuffleTransportContext.java  |  9 +
 .../network/shuffle/checksum/ShuffleChecksumHelper.java |  8 
 .../org/apache/spark/network/yarn/YarnShuffleService.java   | 13 +++--
 .../apache/spark/internal/{Logger.java => SparkLogger.java} |  4 ++--
 .../{LoggerFactory.java => SparkLoggerFactory.java} | 10 +-
 .../main/java/org/apache/spark/network/util/JavaUtils.java  |  6 +++---
 .../test/java/org/apache/spark/util/LoggerSuiteBase.java|  4 ++--
 .../test/java/org/apache/spark/util/PatternLoggerSuite.java |  8 
 .../java/org/apache/spark/util/StructuredLoggerSuite.java   |  9 +
 .../java/com/codahale/metrics/ganglia/GangliaReporter.java  |  6 +++---
 .../main/java/org/apache/spark/io/ReadAheadInputStream.java |  7 ---
 .../java/org/apache/spark/

(spark) branch master updated: [SPARK-47599][MLLIB] MLLib: Migrate logWarn with variables to structured logging framework

2024-05-14 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3ae78c4c39a7 [SPARK-47599][MLLIB] MLLib: Migrate logWarn with 
variables to structured logging framework
3ae78c4c39a7 is described below

commit 3ae78c4c39a7084a321f2e01b4745cb6c442b7a5
Author: panbingkun 
AuthorDate: Tue May 14 17:23:44 2024 -0700

[SPARK-47599][MLLIB] MLLib: Migrate logWarn with variables to structured 
logging framework

### What changes were proposed in this pull request?
The pr aims to migrate `logWarn` in module `MLLib` with variables to 
`structured logging framework`.

### Why are the changes needed?
To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46527 from panbingkun/SPARK-47599.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/LogKey.scala   | 18 +++
 .../main/scala/org/apache/spark/ml/Predictor.scala |  5 +++--
 .../spark/ml/classification/Classifier.scala   |  5 +++--
 .../apache/spark/ml/classification/LinearSVC.scala |  4 ++--
 .../ml/classification/LogisticRegression.scala | 10 +
 .../apache/spark/ml/classification/OneVsRest.scala |  8 ---
 .../classification/ProbabilisticClassifier.scala   |  5 +++--
 .../spark/ml/clustering/GaussianMixture.scala  |  5 +++--
 .../org/apache/spark/ml/clustering/KMeans.scala|  4 ++--
 .../org/apache/spark/ml/feature/Binarizer.scala|  6 +++--
 .../apache/spark/ml/feature/StopWordsRemover.scala |  7 +++---
 .../apache/spark/ml/feature/StringIndexer.scala|  5 +++--
 .../spark/ml/optim/WeightedLeastSquares.scala  | 16 ++---
 .../org/apache/spark/ml/recommendation/ALS.scala   |  5 +++--
 .../ml/regression/AFTSurvivalRegression.scala  | 10 -
 .../ml/regression/DecisionTreeRegressor.scala  |  5 +++--
 .../apache/spark/ml/regression/GBTRegressor.scala  |  6 ++---
 .../regression/GeneralizedLinearRegression.scala   |  6 ++---
 .../spark/ml/regression/LinearRegression.scala | 18 +++
 .../ml/regression/RandomForestRegressor.scala  |  5 +++--
 .../spark/ml/tree/impl/DecisionTreeMetadata.scala  |  8 ---
 .../apache/spark/ml/tree/impl/RandomForest.scala   |  9 
 .../spark/mllib/clustering/LocalKMeans.scala   |  6 ++---
 .../mllib/linalg/distributed/BlockMatrix.scala |  7 --
 .../spark/mllib/linalg/distributed/RowMatrix.scala | 26 +-
 .../spark/mllib/optimization/GradientDescent.scala | 11 +
 .../recommendation/MatrixFactorizationModel.scala  |  9 
 .../apache/spark/mllib/stat/test/ChiSqTest.scala   |  7 +++---
 .../spark/mllib/tree/model/DecisionTreeModel.scala | 18 +--
 .../mllib/tree/model/treeEnsembleModels.scala  | 18 +--
 30 files changed, 164 insertions(+), 108 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index bf5b7daab705..e03987933306 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -82,6 +82,7 @@ object LogKeys {
   case object CHECKPOINT_TIME extends LogKey
   case object CHECKSUM_FILE_NUM extends LogKey
   case object CHOSEN_WATERMARK extends LogKey
+  case object CLASSIFIER extends LogKey
   case object CLASS_LOADER extends LogKey
   case object CLASS_NAME extends LogKey
   case object CLASS_PATH extends LogKey
@@ -157,12 +158,14 @@ object LogKeys {
   case object DEPRECATED_KEY extends LogKey
   case object DESCRIPTION extends LogKey
   case object DESIRED_NUM_PARTITIONS extends LogKey
+  case object DESIRED_TREE_DEPTH extends LogKey
   case object DESTINATION_PATH extends LogKey
   case object DFS_FILE extends LogKey
   case object DIFF_DELTA extends LogKey
   case object DIVISIBLE_CLUSTER_INDICES_SIZE extends LogKey
   case object DRIVER_ID extends LogKey
   case object DRIVER_LIBRARY_PATH_KEY extends LogKey
+  case object DRIVER_MEMORY_SIZE extends LogKey
   case object DRIVER_STATE extends LogKey
   case object DROPPED_PARTITIONS extends LogKey
   case object DURATION extends LogKey
@@ -196,6 +199,7 @@ object LogKeys {
   case object EXECUTOR_IDS extends LogKey
   case object EXECUTOR_LAUNCH_COMMANDS extends LogKey
   case object EXECUTOR_LAUNCH_COUNT extends LogKey
+  case object EXECUTOR_MEMORY_SIZE extends LogKey
   case object EXECUTOR_RESOURCES extends LogKey
   case object EXECUTOR_SHUFFLE_INFO extends LogKey
   case object EXECUTOR_STATE

(spark) branch master updated: [SPARK-47579][CORE][PART2] Migrate logInfo with variables to structured logging framework

2024-05-14 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 79aeae1a9aaa [SPARK-47579][CORE][PART2] Migrate logInfo with variables 
to structured logging framework
79aeae1a9aaa is described below

commit 79aeae1a9aaa2e9cfaf03a1f5d88e1447a3f9b19
Author: Tuan Pham 
AuthorDate: Tue May 14 13:07:08 2024 -0700

[SPARK-47579][CORE][PART2] Migrate logInfo with variables to structured 
logging framework

The PR aims to migrate `logInfo` in Core module with variables to 
structured logging framework.

### Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46494 from zeotuan/coreInfo2.

Lead-authored-by: Tuan Pham 
Co-authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/LogKey.scala   |  33 ++
 .../org/apache/spark/api/python/PythonUtils.scala  |   5 +-
 .../apache/spark/broadcast/TorrentBroadcast.scala  |   9 +-
 .../org/apache/spark/deploy/SparkSubmit.scala  |  35 ---
 .../apache/spark/deploy/SparkSubmitArguments.scala |  12 +--
 .../spark/deploy/history/ApplicationCache.scala|   8 +-
 .../deploy/history/EventLogFileCompactor.scala |   3 +-
 .../spark/deploy/history/EventLogFileWriters.scala |   2 +-
 .../spark/deploy/history/HistoryServer.scala   |   7 +-
 .../deploy/history/HistoryServerDiskManager.scala  |  19 ++--
 .../history/HistoryServerMemoryManager.scala   |  17 ++--
 .../org/apache/spark/deploy/master/Master.scala| 111 -
 .../spark/deploy/master/ui/MasterWebUI.scala   |   6 +-
 .../spark/deploy/rest/RestSubmissionServer.scala   |   6 +-
 .../security/HBaseDelegationTokenProvider.scala|   4 +-
 .../security/HadoopDelegationTokenManager.scala|  12 ++-
 .../security/HadoopFSDelegationTokenProvider.scala |   8 +-
 .../apache/spark/deploy/worker/DriverRunner.scala  |  10 +-
 .../apache/spark/deploy/worker/DriverWrapper.scala |   5 +-
 .../spark/deploy/worker/ExecutorRunner.scala   |   2 +-
 .../apache/spark/deploy/worker/WorkerWatcher.scala |   4 +-
 .../spark/mapred/SparkHadoopMapRedUtil.scala   |  15 +--
 .../apache/spark/memory/ExecutionMemoryPool.scala  |   3 +-
 .../apache/spark/memory/UnifiedMemoryManager.scala |   8 +-
 .../org/apache/spark/metrics/sink/StatsdSink.scala |   5 +-
 .../main/scala/org/apache/spark/rdd/JdbcRDD.scala  |   5 +-
 .../spark/shuffle/ShuffleWriteProcessor.scala  |   9 +-
 .../apache/spark/storage/DiskBlockManager.scala|   4 +-
 .../org/apache/spark/storage/FallbackStorage.scala |   4 +-
 .../src/main/scala/org/apache/spark/ui/WebUI.scala |   5 +-
 .../org/apache/spark/util/HadoopFSUtils.scala  |   7 +-
 .../main/scala/org/apache/spark/util/Utils.scala   |   9 +-
 .../spark/util/collection/ExternalSorter.scala |   8 +-
 .../apache/spark/util/logging/DriverLogger.scala   |   4 +-
 .../apache/spark/util/logging/FileAppender.scala   |  14 ++-
 .../spark/util/logging/RollingFileAppender.scala   |   5 +-
 36 files changed, 260 insertions(+), 163 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index a3c93a4b9f5e..bf5b7daab705 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -85,6 +85,7 @@ object LogKeys {
   case object CLASS_LOADER extends LogKey
   case object CLASS_NAME extends LogKey
   case object CLASS_PATH extends LogKey
+  case object CLASS_PATHS extends LogKey
   case object CLAUSES extends LogKey
   case object CLEANUP_LOCAL_DIRS extends LogKey
   case object CLUSTER_CENTROIDS extends LogKey
@@ -122,6 +123,7 @@ object LogKeys {
   case object COST extends LogKey
   case object COUNT extends LogKey
   case object CREATED_POOL_NAME extends LogKey
+  case object CREDENTIALS_RENEWAL_INTERVAL_RATIO extends LogKey
   case object CROSS_VALIDATION_METRIC extends LogKey
   case object CROSS_VALIDATION_METRICS extends LogKey
   case object CSV_HEADER_COLUMN_NAME extends LogKey
@@ -215,6 +217,7 @@ object LogKeys {
   case object FALLBACK_VERSION extends LogKey
   case object FEATURE_COLUMN extends LogKey
   case object FEATURE_DIMENSION extends LogKey
+  case object FETCH_SIZE extends LogKey
   case object FIELD_NAME extends LogKey
   case object FILE_ABSOLUTE_PATH extends LogKey
   case object FILE_END_OFFSET extends LogKey
@@ -226,6 +229,7 @@ object LogKeys {
   case object FILE_NAME2 extends LogKey

(spark) branch master updated: [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework

2024-05-13 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 78d2a86a927f [SPARK-48209][CORE] Common (java side): Migrate 
`error/warn/info` with variables to structured logging framework
78d2a86a927f is described below

commit 78d2a86a927f64403e485b14715a119e282cbdc8
Author: panbingkun 
AuthorDate: Mon May 13 21:04:11 2024 -0700

[SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with 
variables to structured logging framework

### What changes were proposed in this pull request?
The pr aims to
1.migrate `error/warn/info` in module `common` with variables to 
`structured logging framework` for java side.
2.convert all dependencies on `org.slf4j.Logger & org.slf4j.LoggerFactory` 
to `org.apache.spark.internal.Logger & 
org.apache.spark.internal.LoggerFactory`, in order to completely `prohibit` 
importing `org.slf4j.Logger & org.slf4j.LoggerFactory` in java code later.

### Why are the changes needed?
To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46493 from panbingkun/common_java_sl.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../org/apache/spark/network/TransportContext.java |   4 +-
 .../spark/network/client/TransportClient.java  |  14 ++-
 .../network/client/TransportClientFactory.java |  24 ++--
 .../network/client/TransportResponseHandler.java   |  38 +--
 .../spark/network/crypto/AuthClientBootstrap.java  |   4 +-
 .../spark/network/crypto/AuthRpcHandler.java   |   8 +-
 .../spark/network/protocol/MessageDecoder.java |   5 +-
 .../spark/network/protocol/MessageEncoder.java |  12 +-
 .../spark/network/protocol/SslMessageEncoder.java  |  11 +-
 .../spark/network/sasl/SaslClientBootstrap.java|   4 +-
 .../apache/spark/network/sasl/SaslRpcHandler.java  |   4 +-
 .../apache/spark/network/sasl/SparkSaslClient.java |   5 +-
 .../apache/spark/network/sasl/SparkSaslServer.java |   5 +-
 .../network/server/ChunkFetchRequestHandler.java   |  23 ++--
 .../network/server/OneForOneStreamManager.java |   4 +-
 .../apache/spark/network/server/RpcHandler.java|   5 +-
 .../network/server/TransportChannelHandler.java|  14 ++-
 .../network/server/TransportRequestHandler.java|  30 +++--
 .../spark/network/server/TransportServer.java  |   4 +-
 .../network/ssl/ReloadingX509TrustManager.java |   9 +-
 .../org/apache/spark/network/ssl/SSLFactory.java   |   7 +-
 .../org/apache/spark/network/util/DBProvider.java  |   4 +-
 .../apache/spark/network/util/LevelDBProvider.java |  16 +--
 .../org/apache/spark/network/util/NettyLogger.java |   5 +-
 .../apache/spark/network/util/RocksDBProvider.java |  14 ++-
 .../spark/network/sasl/ShuffleSecretManager.java   |  12 +-
 .../spark/network/shuffle/BlockStoreClient.java|  14 ++-
 .../network/shuffle/ExternalBlockHandler.java  |  10 +-
 .../network/shuffle/ExternalBlockStoreClient.java  |  21 ++--
 .../shuffle/ExternalShuffleBlockResolver.java  |  46 +---
 .../network/shuffle/OneForOneBlockFetcher.java |   4 +-
 .../network/shuffle/OneForOneBlockPusher.java  |   4 +-
 .../network/shuffle/RemoteBlockPushResolver.java   | 123 ++---
 .../network/shuffle/RetryingBlockTransferor.java   |  34 --
 .../network/shuffle/ShuffleTransportContext.java   |   5 +-
 .../shuffle/checksum/ShuffleChecksumHelper.java|  13 ++-
 .../spark/network/yarn/YarnShuffleService.java |  44 +---
 .../org/apache/spark/internal/LoggerFactory.java   |   5 +
 .../org/apache/spark/network/util/JavaUtils.java   |  11 +-
 .../scala/org/apache/spark/internal/LogKey.scala   |  31 +-
 .../sql/connect/client/GrpcRetryHandler.scala  |   4 +-
 .../codahale/metrics/ganglia/GangliaReporter.java  |  23 ++--
 .../network/netty/NettyBlockTransferService.scala  |  14 ++-
 43 files changed, 452 insertions(+), 239 deletions(-)

diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/TransportContext.java
 
b/common/network-common/src/main/java/org/apache/spark/network/TransportContext.java
index 9f3b9c59256b..815f4dc6e6cd 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/TransportContext.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/TransportContext.java
@@ -34,9 +34,9 @@ import io.netty.handler.ssl.SslHandler;
 import io.netty.handler.stream.ChunkedWriteHandler;
 import io.netty.handler.timeout.IdleStateHandler;
 import io.netty.handler.codec.MessageToMessageEncoder;
-import org.slf4j.Logger;
-import org.slf4j.L

(spark) branch master updated (a101c48dd965 -> d9ff78e2e341)

2024-05-13 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from a101c48dd965 [SPARK-44953][CORE] Log a warning when shuffle tracking 
is enabled along side another DA supported mechanism
 add d9ff78e2e341 [SPARK-48260][SQL] Disable output committer coordination 
in one test of ParquetIOSuite

No new revisions were added by this update.

Summary of changes:
 .../datasources/parquet/ParquetIOSuite.scala   | 89 +-
 1 file changed, 51 insertions(+), 38 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (b14abb3a2ed0 -> 8d8cc623085e)

2024-05-13 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b14abb3a2ed0 [SPARK-48241][SQL] CSV parsing failure with char/varchar 
type columns
 add 8d8cc623085e [SPARK-41794][SQL] Add `try_remainder` function and 
re-enable column tests

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/functions.scala |  8 
 .../sql/streaming/StreamingQueryListenerBus.scala  |  6 ++-
 .../CheckConnectJvmClientCompatibility.scala   |  3 +-
 docs/sql-ref-ansi-compliance.md|  1 +
 python/pyspark/sql/connect/functions/builtin.py|  7 +++
 python/pyspark/sql/functions/builtin.py| 52 +-
 .../sql/tests/connect/test_connect_column.py   | 16 +++
 .../sql/catalyst/analysis/FunctionRegistry.scala   |  1 +
 .../spark/sql/catalyst/expressions/TryEval.scala   | 37 +++
 .../sql/catalyst/expressions/arithmetic.scala  |  4 ++
 .../sql/catalyst/expressions/TryEvalSuite.scala| 13 ++
 .../scala/org/apache/spark/sql/functions.scala |  9 
 .../sql-functions/sql-expression-schema.md |  1 +
 .../org/apache/spark/sql/MathFunctionsSuite.scala  | 11 +
 14 files changed, 156 insertions(+), 13 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (5891b20ef492 -> 85a6e35d834e)

2024-05-08 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 5891b20ef492 [SPARK-47186][TESTS][FOLLOWUP] Correct the name of 
spark.test.docker.connectionTimeout
 add 85a6e35d834e [SPARK-48182][SQL] SQL (java side): Migrate 
`error/warn/info` with variables to structured logging framework

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/spark/internal/Logger.java |  4 ++
 .../scala/org/apache/spark/internal/LogKey.scala   | 13 ++
 .../expressions/RowBasedKeyValueBatch.java | 11 ++---
 .../spark/sql/util/CaseInsensitiveStringMap.java   | 18 
 .../org/apache/hive/service/AbstractService.java   | 13 +++---
 .../org/apache/hive/service/CompositeService.java  | 14 ---
 .../java/org/apache/hive/service/CookieSigner.java |  5 ++-
 .../org/apache/hive/service/ServiceOperations.java | 12 +++---
 .../java/org/apache/hive/service/ServiceUtils.java |  2 +-
 .../apache/hive/service/auth/HiveAuthFactory.java  | 15 ---
 .../apache/hive/service/auth/HttpAuthUtils.java| 12 --
 .../hive/service/auth/TSetIpAddressProcessor.java  |  7 ++--
 .../org/apache/hive/service/cli/CLIService.java| 21 ++
 .../apache/hive/service/cli/ColumnBasedSet.java|  9 ++--
 .../cli/operation/ClassicTableTypeMapping.java | 13 --
 .../hive/service/cli/operation/Operation.java  | 28 -
 .../service/cli/operation/OperationManager.java| 10 +++--
 .../hive/service/cli/session/HiveSessionImpl.java  | 49 +-
 .../hive/service/cli/session/SessionManager.java   | 49 +-
 .../hive/service/cli/thrift/ThriftCLIService.java  | 16 ---
 .../hive/service/cli/thrift/ThriftHttpServlet.java | 14 ---
 .../apache/hive/service/server/HiveServer2.java| 12 +++---
 .../service/server/ThreadWithGarbageCleanup.java   |  5 ++-
 .../sql/hive/thriftserver/SparkSQLCLIService.scala |  2 +-
 24 files changed, 222 insertions(+), 132 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48126][CORE] Make `spark.log.structuredLogging.enabled` effective

2024-05-07 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6bbf6b1eff2c [SPARK-48126][CORE] Make 
`spark.log.structuredLogging.enabled` effective
6bbf6b1eff2c is described below

commit 6bbf6b1eff2cffe8d116ebba0194fac233b42348
Author: Gengliang Wang 
AuthorDate: Tue May 7 19:10:27 2024 -0700

[SPARK-48126][CORE] Make `spark.log.structuredLogging.enabled` effective

### What changes were proposed in this pull request?

Currently, the spark conf `spark.log.structuredLogging.enabled` is not 
taking effect. The current code base checks this config in the method 
`prepareSubmitEnvironment`. However, Log4j is already initialized before that.

This PR is to fix it by checking the config 
`spark.log.structuredLogging.enabled` before the initialization of Log4j.
Also, this PR enhances the doc for this configuration.

### Why are the changes needed?

Bug fix. After the fix, the Spark conf 
`spark.log.structuredLogging.enabled` takes effect.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manual test.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: GPT-4
I used GPT-4 to improve the documents.

Closes #46452 from gengliangwang/makeConfEffective.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../org/apache/spark/deploy/SparkSubmit.scala  | 33 --
 .../org/apache/spark/internal/config/package.scala |  9 +++---
 docs/configuration.md  |  6 +++-
 docs/core-migration-guide.md   |  4 ++-
 4 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
index 076aa8387dc5..5a7e5542cbd0 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -69,10 +69,20 @@ private[spark] class SparkSubmit extends Logging {
 
   def doSubmit(args: Array[String]): Unit = {
 val appArgs = parseArguments(args)
+val sparkConf = appArgs.toSparkConf()
+
 // For interpreters, structured logging is disabled by default to avoid 
generating mixed
 // plain text and structured logs on the same console.
 if (isShell(appArgs.primaryResource) || isSqlShell(appArgs.mainClass)) {
   Logging.disableStructuredLogging()
+} else {
+  // For non-shell applications, enable structured logging if it's not 
explicitly disabled
+  // via the configuration `spark.log.structuredLogging.enabled`.
+  if (sparkConf.getBoolean(STRUCTURED_LOGGING_ENABLED.key, defaultValue = 
true)) {
+Logging.enableStructuredLogging()
+  } else {
+Logging.disableStructuredLogging()
+  }
 }
 // Initialize logging if it hasn't been done yet. Keep track of whether 
logging needs to
 // be reset before the application starts.
@@ -82,9 +92,9 @@ private[spark] class SparkSubmit extends Logging {
   logInfo(appArgs.toString)
 }
 appArgs.action match {
-  case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog)
-  case SparkSubmitAction.KILL => kill(appArgs)
-  case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs)
+  case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog, sparkConf)
+  case SparkSubmitAction.KILL => kill(appArgs, sparkConf)
+  case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs, 
sparkConf)
   case SparkSubmitAction.PRINT_VERSION => printVersion()
 }
   }
@@ -96,12 +106,11 @@ private[spark] class SparkSubmit extends Logging {
   /**
* Kill an existing submission.
*/
-  private def kill(args: SparkSubmitArguments): Unit = {
+  private def kill(args: SparkSubmitArguments, sparkConf: SparkConf): Unit = {
 if (RestSubmissionClient.supportsRestClient(args.master)) {
   new RestSubmissionClient(args.master)
 .killSubmission(args.submissionToKill)
 } else {
-  val sparkConf = args.toSparkConf()
   sparkConf.set("spark.master", args.master)
   SparkSubmitUtils
 .getSubmitOperations(args.master)
@@ -112,12 +121,11 @@ private[spark] class SparkSubmit extends Logging {
   /**
* Request the status of an existing submission.
*/
-  private def requestStatus(args: SparkSubmitArguments): Unit = {
+  private def requestStatus(args: SparkSubmitArguments, sparkConf: SparkConf): 
Unit = {
 if (RestSubmissionClient.supportsRestClient(args.master)) {
   new RestSubmissionClient(args.master)
 .requestSubmissionStatus(args.submissionToRequestStatusFor)
   

(spark) branch master updated: [SPARK-47240][CORE][PART1] Migrate logInfo with variables to structured logging framework

2024-05-07 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a15adeb3a215 [SPARK-47240][CORE][PART1] Migrate logInfo with variables 
to structured logging framework
a15adeb3a215 is described below

commit a15adeb3a215ad2ef7222e18112d23cdffa8569a
Author: Tuan Pham 
AuthorDate: Tue May 7 17:35:35 2024 -0700

[SPARK-47240][CORE][PART1] Migrate logInfo with variables to structured 
logging framework

The PR aims to migrate `logInfo` in Core module with variables to 
structured logging framework.

### Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46362 from zeotuan/coreInfo.

Lead-authored-by: Tuan Pham 
Co-authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/LogKey.scala   | 48 ++-
 .../org/apache/spark/BarrierCoordinator.scala  | 17 ---
 .../org/apache/spark/BarrierTaskContext.scala  | 35 +++---
 .../apache/spark/ExecutorAllocationManager.scala   | 13 --
 .../scala/org/apache/spark/MapOutputTracker.scala  | 48 ---
 .../main/scala/org/apache/spark/SparkContext.scala | 54 ++
 .../apache/spark/api/python/PythonHadoopUtil.scala |  2 +-
 .../org/apache/spark/api/python/PythonRDD.scala|  6 ++-
 .../org/apache/spark/api/python/PythonRunner.scala |  2 +-
 .../org/apache/spark/api/python/PythonUtils.scala  |  7 +--
 .../spark/api/python/StreamingPythonRunner.scala   | 12 +++--
 .../scala/org/apache/spark/deploy/Client.scala | 18 
 .../spark/deploy/ExternalShuffleService.scala  | 10 ++--
 .../apache/spark/rdd/ReliableCheckpointRDD.scala   |  9 ++--
 .../spark/rdd/ReliableRDDCheckpointData.scala  |  6 ++-
 .../scala/org/apache/spark/ui/JettyUtils.scala |  7 ++-
 .../main/scala/org/apache/spark/ui/SparkUI.scala   |  4 +-
 .../scala/org/apache/spark/util/ListenerBus.scala  |  7 +--
 .../scala/org/apache/spark/util/SignalUtils.scala  |  2 +-
 19 files changed, 205 insertions(+), 102 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index c127f9c3d1f9..14e822c6349f 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -26,13 +26,15 @@ trait LogKey {
 }
 
 /**
- * Various keys used for mapped diagnostic contexts(MDC) in logging.
- * All structured logging keys should be defined here for standardization.
+ * Various keys used for mapped diagnostic contexts(MDC) in logging. All 
structured logging keys
+ * should be defined here for standardization.
  */
 object LogKeys {
   case object ACCUMULATOR_ID extends LogKey
+  case object ACTUAL_BROADCAST_OUTPUT_STATUS_SIZE extends LogKey
   case object ACTUAL_NUM_FILES extends LogKey
   case object ACTUAL_PARTITION_COLUMN extends LogKey
+  case object ADDED_JARS extends LogKey
   case object AGGREGATE_FUNCTIONS extends LogKey
   case object ALPHA extends LogKey
   case object ANALYSIS_ERROR extends LogKey
@@ -42,7 +44,10 @@ object LogKeys {
   case object APP_NAME extends LogKey
   case object APP_STATE extends LogKey
   case object ARGS extends LogKey
+  case object AUTH_ENABLED extends LogKey
   case object BACKUP_FILE extends LogKey
+  case object BARRIER_EPOCH extends LogKey
+  case object BARRIER_ID extends LogKey
   case object BATCH_ID extends LogKey
   case object BATCH_NAME extends LogKey
   case object BATCH_TIMESTAMP extends LogKey
@@ -55,6 +60,7 @@ object LogKeys {
   case object BOOT extends LogKey
   case object BROADCAST extends LogKey
   case object BROADCAST_ID extends LogKey
+  case object BROADCAST_OUTPUT_STATUS_SIZE extends LogKey
   case object BUCKET extends LogKey
   case object BYTECODE_SIZE extends LogKey
   case object CACHED_TABLE_PARTITION_METADATA_SIZE extends LogKey
@@ -62,6 +68,7 @@ object LogKeys {
   case object CACHE_UNTIL_HIGHEST_CONSUMED_SIZE extends LogKey
   case object CACHE_UNTIL_LAST_PRODUCED_SIZE extends LogKey
   case object CALL_SITE_LONG_FORM extends LogKey
+  case object CALL_SITE_SHORT_FORM extends LogKey
   case object CATALOG_NAME extends LogKey
   case object CATEGORICAL_FEATURES extends LogKey
   case object CHECKPOINT_FILE extends LogKey
@@ -142,11 +149,13 @@ object LogKeys {
   case object DEPRECATED_KEY extends LogKey
   case object DESCRIPTION extends LogKey
   case object DESIRED_NUM_PARTITIONS extends LogKey
+  case object

(spark) branch master updated: [SPARK-48134][CORE] Spark core (java side): Migrate `error/warn/info` with variables to structured logging framework

2024-05-07 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3d9d1f3dc05a [SPARK-48134][CORE] Spark core (java side): Migrate 
`error/warn/info` with variables to structured logging framework
3d9d1f3dc05a is described below

commit 3d9d1f3dc05a2825bf315c68fc4e4232354dbd00
Author: panbingkun 
AuthorDate: Tue May 7 13:08:00 2024 -0700

[SPARK-48134][CORE] Spark core (java side): Migrate `error/warn/info` with 
variables to structured logging framework

### What changes were proposed in this pull request?
The pr aims to
1.migrate `error/warn/info` in module `core` with variables to `structured 
logging framework` for java side.
2.convert all dependencies on `org.slf4j.Logger & org.slf4j.LoggerFactory` 
to `org.apache.spark.internal.Logger & 
org.apache.spark.internal.LoggerFactory`, in order to completely `prohibit` 
importing `org.slf4j.Logger & org.slf4j.LoggerFactory` in java code later.

### Why are the changes needed?
To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46390 from panbingkun/core_java_sl.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../java/org/apache/spark/internal/Logger.java | 21 +++-
 .../scala/org/apache/spark/internal/LogKey.scala   |  9 +++
 .../org/apache/spark/io/ReadAheadInputStream.java  | 19 ---
 .../org/apache/spark/memory/TaskMemoryManager.java | 28 +++---
 .../shuffle/sort/BypassMergeSortShuffleWriter.java |  9 ---
 .../spark/shuffle/sort/ShuffleExternalSorter.java  | 25 ++-
 .../spark/shuffle/sort/UnsafeShuffleWriter.java|  9 ---
 .../sort/io/LocalDiskShuffleMapOutputWriter.java   | 10 
 .../apache/spark/unsafe/map/BytesToBytesMap.java   | 12 ++
 .../unsafe/sort/UnsafeExternalSorter.java  | 21 +---
 .../unsafe/sort/UnsafeSorterSpillReader.java   |  4 ++--
 11 files changed, 113 insertions(+), 54 deletions(-)

diff --git a/common/utils/src/main/java/org/apache/spark/internal/Logger.java 
b/common/utils/src/main/java/org/apache/spark/internal/Logger.java
index 2b4dd3bb45bc..d8ab26424bae 100644
--- a/common/utils/src/main/java/org/apache/spark/internal/Logger.java
+++ b/common/utils/src/main/java/org/apache/spark/internal/Logger.java
@@ -34,6 +34,10 @@ public class Logger {
 this.slf4jLogger = slf4jLogger;
   }
 
+  public boolean isErrorEnabled() {
+return slf4jLogger.isErrorEnabled();
+  }
+
   public void error(String msg) {
 slf4jLogger.error(msg);
   }
@@ -58,6 +62,10 @@ public class Logger {
 }
   }
 
+  public boolean isWarnEnabled() {
+return slf4jLogger.isWarnEnabled();
+  }
+
   public void warn(String msg) {
 slf4jLogger.warn(msg);
   }
@@ -82,6 +90,10 @@ public class Logger {
 }
   }
 
+  public boolean isInfoEnabled() {
+return slf4jLogger.isInfoEnabled();
+  }
+
   public void info(String msg) {
 slf4jLogger.info(msg);
   }
@@ -106,6 +118,10 @@ public class Logger {
 }
   }
 
+  public boolean isDebugEnabled() {
+return slf4jLogger.isDebugEnabled();
+  }
+
   public void debug(String msg) {
 slf4jLogger.debug(msg);
   }
@@ -126,6 +142,10 @@ public class Logger {
 slf4jLogger.debug(msg, throwable);
   }
 
+  public boolean isTraceEnabled() {
+return slf4jLogger.isTraceEnabled();
+  }
+
   public void trace(String msg) {
 slf4jLogger.trace(msg);
   }
@@ -146,7 +166,6 @@ public class Logger {
 slf4jLogger.trace(msg, throwable);
   }
 
-
   private void withLogContext(
   String pattern,
   MDC[] mdcs,
diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index d4e1d9f535af..c127f9c3d1f9 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -168,6 +168,7 @@ object LogKeys {
   case object EXCEPTION extends LogKey
   case object EXECUTE_INFO extends LogKey
   case object EXECUTE_KEY extends LogKey
+  case object EXECUTION_MEMORY_SIZE extends LogKey
   case object EXECUTION_PLAN_LEAVES extends LogKey
   case object EXECUTOR_BACKEND extends LogKey
   case object EXECUTOR_DESIRED_COUNT extends LogKey
@@ -302,6 +303,7 @@ object LogKeys {
   case object MAX_SLOTS extends LogKey
   case object MAX_SPLIT_BYTES extends LogKey
   case object MAX_TABLE_PARTITION_METADATA_SIZE extends LogKey
+  case object MEMORY_CONSUMER extends LogKey
   case object MEMORY_POOL_NAME extends Lo

(spark) branch master updated: [SPARK-48124][CORE] Disable structured logging for Connect-Repl by default

2024-05-04 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b924e689942d [SPARK-48124][CORE] Disable structured logging for 
Connect-Repl by default
b924e689942d is described below

commit b924e689942d735f165d31660d26efad057f4827
Author: panbingkun 
AuthorDate: Sat May 4 22:47:24 2024 -0700

[SPARK-48124][CORE] Disable structured logging for Connect-Repl by default

### What changes were proposed in this pull request?
The pr is followup https://github.com/apache/spark/pull/46383, to `disable` 
structured logging for` Connect-Repl` by default.

### Why are the changes needed?
Before:
https://github.com/apache/spark/assets/15246973/10d93a09-f098-4653-9e95-571481dd03e9;>

After:
https://github.com/apache/spark/assets/15246973/e3354359-d6bc-4b2c-801b-8a2c3697f78e;>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manually test.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46387 from panbingkun/SPARK-48124_FOLLOWUP.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../main/scala/org/apache/spark/sql/application/ConnectRepl.scala| 5 +
 1 file changed, 5 insertions(+)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/application/ConnectRepl.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/application/ConnectRepl.scala
index 0360a4057886..9fd3ae4368f4 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/application/ConnectRepl.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/application/ConnectRepl.scala
@@ -26,6 +26,7 @@ import ammonite.compiler.iface.CodeWrapper
 import ammonite.util.{Bind, Imports, Name, Util}
 
 import org.apache.spark.annotation.DeveloperApi
+import org.apache.spark.internal.Logging
 import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.connect.client.{SparkConnectClient, 
SparkConnectClientParser}
 
@@ -55,6 +56,10 @@ object ConnectRepl {
   inputStream: InputStream = System.in,
   outputStream: OutputStream = System.out,
   errorStream: OutputStream = System.err): Unit = {
+// For interpreters, structured logging is disabled by default to avoid 
generating mixed
+// plain text and structured logs on the same console.
+Logging.disableStructuredLogging()
+
 // Build the client.
 val client =
   try {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-48123][CORE] Provide a constant table schema for querying structured logs

2024-05-04 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e4453b480f98 [SPARK-48123][CORE] Provide a constant table schema for 
querying structured logs
e4453b480f98 is described below

commit e4453b480f988bf6683930ae14b7043a2cecffc4
Author: Gengliang Wang 
AuthorDate: Sat May 4 00:18:18 2024 -0700

[SPARK-48123][CORE] Provide a constant table schema for querying structured 
logs

### What changes were proposed in this pull request?

Providing a table schema LOG_SCHEMA, so that users can load structured logs 
with the following code:

```
import org.apache.spark.util.LogUtils.LOG_SCHEMA

val logDf = spark.read.schema(LOG_SCHEMA).json("path/to/logs")
```

### Why are the changes needed?

Provide a convenient way to query Spark logs using Spark SQL.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

New UT

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46375 from gengliangwang/logSchema.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/util/LogUtils.scala | 50 +
 docs/configuration.md  | 10 ++-
 sql/core/src/test/resources/log4j2.properties  | 12 
 .../scala/org/apache/spark/sql/LogQuerySuite.scala | 83 ++
 4 files changed, 154 insertions(+), 1 deletion(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/util/LogUtils.scala 
b/common/utils/src/main/scala/org/apache/spark/util/LogUtils.scala
new file mode 100644
index ..5a798ffad3a9
--- /dev/null
+++ b/common/utils/src/main/scala/org/apache/spark/util/LogUtils.scala
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import org.apache.spark.annotation.DeveloperApi
+
+/**
+ * :: : DeveloperApi ::
+ * Utils for querying Spark logs with Spark SQL.
+ *
+ * @since 4.0.0
+ */
+@DeveloperApi
+object LogUtils {
+  /**
+   * Schema for structured Spark logs.
+   * Example usage:
+   *   val logDf = spark.read.schema(LOG_SCHEMA).json("path/to/logs")
+   */
+  val LOG_SCHEMA: String = """
+|ts TIMESTAMP,
+|level STRING,
+|msg STRING,
+|context map,
+|exception STRUCT<
+|  class STRING,
+|  msg STRING,
+|  stacktrace ARRAY>
+|>,
+|logger STRING""".stripMargin
+}
diff --git a/docs/configuration.md b/docs/configuration.md
index a3b4e731f057..7966aceccdea 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -3675,7 +3675,15 @@ Spark uses [log4j](http://logging.apache.org/log4j/) for 
logging. You can config
 ## Structured Logging
 Starting from version 4.0.0, Spark has adopted the [JSON Template 
Layout](https://logging.apache.org/log4j/2.x/manual/json-template-layout.html) 
for logging, which outputs logs in JSON format. This format facilitates 
querying logs using Spark SQL with the JSON data source. Additionally, the logs 
include all Mapped Diagnostic Context (MDC) information for search and 
debugging purposes.
 
-To implement structured logging, start with the `log4j2.properties.template` 
file.
+To configure the layout of structured logging, start with the 
`log4j2.properties.template` file.
+
+To query Spark logs using Spark SQL, you can use the following Scala code 
snippet:
+
+```scala
+import org.apache.spark.util.LogUtils.LOG_SCHEMA
+
+val logDf = spark.read.schema(LOG_SCHEMA).json("path/to/logs")
+```
 
 ## Plain Text Logging
 If you prefer plain text logging, you can use the 
`log4j2.properties.pattern-layout-template` file as a starting point. This is 
the default configuration used by Spark before the 4.0.0 release. This 
configuration uses the 
[PatternLayout](https://logging.apache.org/log4j/2.x/manual/layouts.html#PatternLayout)
 to log all the logs in plain text. MDC 

(spark) branch master updated: [SPARK-48059][CORE] Implement the structured log framework on the java side

2024-05-03 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5c01f196afc3 [SPARK-48059][CORE] Implement the structured log 
framework on the java side
5c01f196afc3 is described below

commit 5c01f196afc3ba75f10c4aedf2c8405b6f59336a
Author: panbingkun 
AuthorDate: Fri May 3 16:30:36 2024 -0700

[SPARK-48059][CORE] Implement the structured log framework on the java side

### What changes were proposed in this pull request?
The pr aims to implement the structured log framework on the `java side`.

### Why are the changes needed?
Currently, the structured log framework on the `scala side` is basically 
available, but the`Spark Core` code also includes some `Java code`, which also 
needs to be connected to the structured log framework.

### Does this PR introduce _any_ user-facing change?
Yes, only for developers.

### How was this patch tested?
- Add some new UT.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46301 from panbingkun/structured_logger_java.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../java/org/apache/spark/internal/Logger.java | 184 +++
 .../org/apache/spark/internal/LoggerFactory.java   |  26 +++
 .../scala/org/apache/spark/internal/Logging.scala  |   4 +
 .../org/apache/spark/util/LoggerSuiteBase.java | 248 +
 .../org/apache/spark/util/PatternLoggerSuite.java  |  89 
 .../apache/spark/util/StructuredLoggerSuite.java   | 164 ++
 common/utils/src/test/resources/log4j2.properties  |  28 ++-
 .../apache/spark/util/StructuredLoggingSuite.scala |   8 +-
 8 files changed, 739 insertions(+), 12 deletions(-)

diff --git a/common/utils/src/main/java/org/apache/spark/internal/Logger.java 
b/common/utils/src/main/java/org/apache/spark/internal/Logger.java
new file mode 100644
index ..f252f44b3b76
--- /dev/null
+++ b/common/utils/src/main/java/org/apache/spark/internal/Logger.java
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.internal;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.function.Consumer;
+
+import org.apache.logging.log4j.CloseableThreadContext;
+import org.apache.logging.log4j.message.MessageFactory;
+import org.apache.logging.log4j.message.ParameterizedMessageFactory;
+
+public class Logger {
+
+  private static final MessageFactory MESSAGE_FACTORY = 
ParameterizedMessageFactory.INSTANCE;
+  private final org.slf4j.Logger slf4jLogger;
+
+  Logger(org.slf4j.Logger slf4jLogger) {
+this.slf4jLogger = slf4jLogger;
+  }
+
+  public void error(String msg) {
+slf4jLogger.error(msg);
+  }
+
+  public void error(String msg, Throwable throwable) {
+slf4jLogger.error(msg, throwable);
+  }
+
+  public void error(String msg, MDC... mdcs) {
+if (mdcs == null || mdcs.length == 0) {
+  slf4jLogger.error(msg);
+} else if (slf4jLogger.isErrorEnabled()) {
+  withLogContext(msg, mdcs, null, mt -> slf4jLogger.error(mt.message));
+}
+  }
+
+  public void error(String msg, Throwable throwable, MDC... mdcs) {
+if (mdcs == null || mdcs.length == 0) {
+  slf4jLogger.error(msg, throwable);
+} else if (slf4jLogger.isErrorEnabled()) {
+  withLogContext(msg, mdcs, throwable, mt -> slf4jLogger.error(mt.message, 
mt.throwable));
+}
+  }
+
+  public void warn(String msg) {
+slf4jLogger.warn(msg);
+  }
+
+  public void warn(String msg, Throwable throwable) {
+slf4jLogger.warn(msg, throwable);
+  }
+
+  public void warn(String msg, MDC... mdcs) {
+if (mdcs == null || mdcs.length == 0) {
+  slf4jLogger.warn(msg);
+} else if (slf4jLogger.isWarnEnabled()) {
+  withLogContext(msg, mdcs, null, mt -> slf4jLogger.warn(mt.message));
+}
+  }
+
+  public void warn(String msg, Throwable throwable, MDC... mdcs) {
+if (mdcs == null || mdcs.length == 0) {
+  slf4j

(spark) branch master updated: [SPARK-48067][SQL] Fix variant default columns

2024-05-02 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ffa4d198cec6 [SPARK-48067][SQL] Fix variant default columns
ffa4d198cec6 is described below

commit ffa4d198cec6620f0385a0e428b023d2ac4e3d5c
Author: Richard Chen 
AuthorDate: Thu May 2 12:22:02 2024 -0700

[SPARK-48067][SQL] Fix variant default columns

### What changes were proposed in this pull request?

Changes the literal `sql` representation of a variant value to 
`parse_json(variant.toJson)`. This is because there is no other representation 
of a literal variant.

This allows variant default columns to work because default columns store a 
literal string representation in the schema struct fields metadata as the 
default value.

### Why are the changes needed?

previously we could not set a variant default column like
```
create table t(
v6 variant default parse_json('{\"k\": \"v\"}')
)
```

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

added UT

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46312 from richardc-db/fix_variant_default_cols.

Authored-by: Richard Chen 
Signed-off-by: Gengliang Wang 
---
 .../spark/sql/catalyst/expressions/literals.scala  |   4 +
 .../scala/org/apache/spark/sql/VariantSuite.scala  | 145 -
 2 files changed, 146 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
index 0fad3eff2da5..4cffc7f0b53a 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
@@ -42,6 +42,7 @@ import org.json4s.JsonAST._
 
 import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow, 
ScalaReflection}
 import org.apache.spark.sql.catalyst.expressions.codegen._
+import 
org.apache.spark.sql.catalyst.expressions.variant.VariantExpressionEvalUtils
 import org.apache.spark.sql.catalyst.trees.TreePattern
 import org.apache.spark.sql.catalyst.trees.TreePattern.{LITERAL, NULL_LITERAL, 
TRUE_OR_FALSE_LITERAL}
 import org.apache.spark.sql.catalyst.types._
@@ -204,6 +205,8 @@ object Literal {
   create(new GenericInternalRow(
 struct.fields.map(f => default(f.dataType).value)), struct)
 case udt: UserDefinedType[_] => Literal(default(udt.sqlType).value, udt)
+case VariantType =>
+  create(VariantExpressionEvalUtils.castToVariant(0, IntegerType), 
VariantType)
 case other =>
   throw QueryExecutionErrors.noDefaultForDataTypeError(dataType)
   }
@@ -549,6 +552,7 @@ case class Literal (value: Any, dataType: DataType) extends 
LeafExpression {
   s"${Literal(kv._1, mapType.keyType).sql}, ${Literal(kv._2, 
mapType.valueType).sql}"
 }
   s"MAP(${keysAndValues.mkString(", ")})"
+case (v: VariantVal, variantType: VariantType) => 
s"PARSE_JSON('${v.toJson(timeZoneId)}')"
 case _ => value.toString
   }
 }
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/VariantSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/VariantSuite.scala
index 19e5f9ba63e6..caab98b6239a 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/VariantSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/VariantSuite.scala
@@ -26,15 +26,17 @@ import scala.jdk.CollectionConverters._
 import scala.util.Random
 
 import org.apache.spark.SparkRuntimeException
-import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode
+import org.apache.spark.sql.catalyst.expressions.{CodegenObjectFactoryMode, 
ExpressionEvalHelper, Literal}
+import 
org.apache.spark.sql.catalyst.expressions.variant.{VariantExpressionEvalUtils, 
VariantGet}
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, 
GenericArrayData}
 import org.apache.spark.sql.functions._
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSparkSession
 import org.apache.spark.sql.types._
-import org.apache.spark.unsafe.types.VariantVal
+import org.apache.spark.unsafe.types.{UTF8String, VariantVal}
 import org.apache.spark.util.ArrayImplicits._
 
-class VariantSuite extends QueryTest with SharedSparkSession {
+class VariantSuite extends QueryTest with SharedSparkSession with 
ExpressionEvalHelper {
   import testImplicits._
 
   test("basic tests") {
@@ -445,4 +447,141 @@ class VariantSuite extends QueryTest with 
SharedSparkSession {
   }
 }
   }
+

(spark) branch branch-3.5 updated: [SPARK-48016][SQL][3.5] Fix a bug in try_divide function when with decimals

2024-05-01 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 6a4475c0b8cb [SPARK-48016][SQL][3.5] Fix a bug in try_divide function 
when with decimals
6a4475c0b8cb is described below

commit 6a4475c0b8cbcc6fca5fe7a9cd499d05c428c418
Author: Gengliang Wang 
AuthorDate: Wed May 1 14:32:52 2024 -0700

[SPARK-48016][SQL][3.5] Fix a bug in try_divide function when with decimals

### What changes were proposed in this pull request?

 Currently, the following query will throw DIVIDE_BY_ZERO error instead of 
returning null
 ```
SELECT try_divide(1, decimal(0));
```

This is caused by the rule `DecimalPrecision`:
```
case b  BinaryOperator(left, right) if left.dataType != right.dataType =>
  (left, right) match {
 ...
case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
l.dataType.isInstanceOf[IntegralType] &&
literalPickMinimumPrecision =>
  b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r))
```
The result of the above makeCopy will contain `ANSI` as the `evalMode`, 
instead of `TRY`.
This PR is to fix this bug by replacing the makeCopy method calls with 
withNewChildren

### Why are the changes needed?

Bug fix in try_* functions.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes a long-standing bug in the try_divide function.

### How was this patch tested?

New UT

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46323 from gengliangwang/pickFix.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../sql/catalyst/analysis/DecimalPrecision.scala   | 14 ++---
 .../spark/sql/catalyst/analysis/TypeCoercion.scala | 10 ++--
 .../analyzer-results/ansi/try_arithmetic.sql.out   | 56 +++
 .../analyzer-results/try_arithmetic.sql.out| 56 +++
 .../resources/sql-tests/inputs/try_arithmetic.sql  |  8 +++
 .../sql-tests/results/ansi/try_arithmetic.sql.out  | 64 ++
 .../sql-tests/results/try_arithmetic.sql.out   | 64 ++
 7 files changed, 260 insertions(+), 12 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
index 09cf61a77955..f51127f53b38 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
@@ -83,7 +83,7 @@ object DecimalPrecision extends TypeCoercionRule {
   val resultType = widerDecimalType(p1, s1, p2, s2)
   val newE1 = if (e1.dataType == resultType) e1 else Cast(e1, resultType)
   val newE2 = if (e2.dataType == resultType) e2 else Cast(e2, resultType)
-  b.makeCopy(Array(newE1, newE2))
+  b.withNewChildren(Seq(newE1, newE2))
   }
 
   /**
@@ -202,21 +202,21 @@ object DecimalPrecision extends TypeCoercionRule {
 case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
 l.dataType.isInstanceOf[IntegralType] &&
 literalPickMinimumPrecision =>
-  b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r))
+  b.withNewChildren(Seq(Cast(l, DataTypeUtils.fromLiteral(l)), r))
 case (l, r: Literal) if l.dataType.isInstanceOf[DecimalType] &&
 r.dataType.isInstanceOf[IntegralType] &&
 literalPickMinimumPrecision =>
-  b.makeCopy(Array(l, Cast(r, DataTypeUtils.fromLiteral(r
+  b.withNewChildren(Seq(l, Cast(r, DataTypeUtils.fromLiteral(r
 // Promote integers inside a binary expression with fixed-precision 
decimals to decimals,
 // and fixed-precision decimals in an expression with floats / doubles 
to doubles
 case (l @ IntegralTypeExpression(), r @ DecimalExpression(_, _)) =>
-  b.makeCopy(Array(Cast(l, DecimalType.forType(l.dataType)), r))
+  b.withNewChildren(Seq(Cast(l, DecimalType.forType(l.dataType)), r))
 case (l @ DecimalExpression(_, _), r @ IntegralTypeExpression()) =>
-  b.makeCopy(Array(l, Cast(r, DecimalType.forType(r.dataType
+  b.withNewChildren(Seq(l, Cast(r, DecimalType.forType(r.dataType
 case (l, r @ DecimalExpression(_, _)) if isFloat(l.dataType) =>
-  b.makeCopy(Array(l, Cast(r, DoubleType)))
+  b.withNewChildren(Seq(l, Cast(r, DoubleType)))
 case (l @ DecimalExpression(_, _), r) if isFloat(r.dataType) =>
-  b.makeCopy(Array(Cast(l, 

(spark) branch master updated: [SPARK-47585][SQL] SQL core: Migrate logInfo with variables to structured logging framework

2024-04-29 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 87b20b166c41 [SPARK-47585][SQL] SQL core: Migrate logInfo with 
variables to structured logging framework
87b20b166c41 is described below

commit 87b20b166c41d4c265ac54eed75707b7726d371f
Author: panbingkun 
AuthorDate: Mon Apr 29 22:10:59 2024 -0700

[SPARK-47585][SQL] SQL core: Migrate logInfo with variables to structured 
logging framework

### What changes were proposed in this pull request?
The pr aims to migrate `logInfo` in module `SQL core` with variables to 
`structured logging framework`.

### Why are the changes needed?
To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46264 from panbingkun/SPARK-47585.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/LogKey.scala   | 55 +++---
 .../org/apache/spark/ml/util/Instrumentation.scala |  4 +-
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |  9 ++--
 .../spark/sql/catalyst/rules/RuleExecutor.scala|  6 +--
 .../spark/sql/columnar/CachedBatchSerializer.scala |  6 ++-
 .../spark/sql/execution/DataSourceScanExec.scala   | 11 +++--
 .../ExternalAppendOnlyUnsafeRowArray.scala |  8 ++--
 .../sql/execution/WholeStageCodegenExec.scala  | 14 +++---
 .../sql/execution/adaptive/AQEOptimizer.scala  |  9 ++--
 .../aggregate/AggregateCodegenSupport.scala| 10 ++--
 .../execution/aggregate/HashAggregateExec.scala|  6 ++-
 .../aggregate/ObjectAggregationIterator.scala  | 13 +++--
 .../spark/sql/execution/command/CommandUtils.scala |  9 ++--
 .../execution/command/createDataSourceTables.scala |  2 +-
 .../apache/spark/sql/execution/command/ddl.scala   | 30 +++-
 .../datasources/BasicWriteStatsTracker.scala   |  8 ++--
 .../sql/execution/datasources/DataSource.scala |  4 +-
 .../execution/datasources/DataSourceStrategy.scala |  5 +-
 .../datasources/FileFormatDataWriter.scala | 11 +++--
 .../execution/datasources/FileFormatWriter.scala   |  9 ++--
 .../sql/execution/datasources/FilePartition.scala  |  8 ++--
 .../sql/execution/datasources/FileScanRDD.scala|  4 +-
 .../execution/datasources/FileSourceStrategy.scala | 12 ++---
 .../execution/datasources/InMemoryFileIndex.scala  |  7 +--
 .../datasources/PartitioningAwareFileIndex.scala   |  7 +--
 .../SQLHadoopMapReduceCommitProtocol.scala |  9 ++--
 .../sql/execution/datasources/jdbc/JDBCRDD.scala   |  5 +-
 .../execution/datasources/jdbc/JDBCRelation.scala  |  7 +--
 .../datasources/parquet/ParquetUtils.scala |  7 +--
 .../execution/datasources/v2/FileBatchWrite.scala  |  9 ++--
 .../datasources/v2/FilePartitionReader.scala   |  4 +-
 .../GroupBasedRowLevelOperationScanPlanning.scala  | 17 ---
 .../datasources/v2/V2ScanRelationPushDown.scala| 32 +++--
 .../datasources/v2/WriteToDataSourceV2Exec.scala   | 52 +++-
 .../python/PythonStreamingSinkCommitRunner.scala   |  5 +-
 .../execution/exchange/EnsureRequirements.scala| 42 ++---
 .../python/PythonStreamingSourceRunner.scala   |  5 +-
 .../spark/sql/execution/r/ArrowRRunner.scala   | 24 +-
 .../WriteToContinuousDataSourceExec.scala  |  2 +-
 .../state/HDFSBackedStateStoreProvider.scala   | 16 ---
 .../apache/spark/sql/internal/SharedState.scala| 15 +++---
 .../sql/streaming/StreamingQueryManager.scala  |  4 +-
 42 files changed, 318 insertions(+), 204 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index 2ca80a496ccb..238432d354f6 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -33,6 +33,7 @@ object LogKeys {
   case object ACCUMULATOR_ID extends LogKey
   case object ACTUAL_NUM_FILES extends LogKey
   case object ACTUAL_PARTITION_COLUMN extends LogKey
+  case object AGGREGATE_FUNCTIONS extends LogKey
   case object ALPHA extends LogKey
   case object ANALYSIS_ERROR extends LogKey
   case object APP_ATTEMPT_ID extends LogKey
@@ -43,10 +44,13 @@ object LogKeys {
   case object ARGS extends LogKey
   case object BACKUP_FILE extends LogKey
   case object BATCH_ID extends LogKey
+  case object BATCH_NAME extends LogKey
   case object BATCH_TIMESTAMP extends LogKey
   case object BATCH_WRITE extends LogKey
   case object BLOCK_ID extends LogKey
   case object BLOCK_MANAGER_ID

(spark) branch branch-3.4 updated: [SPARK-48016][SQL][3.4] Fix a bug in try_divide function when with decimals

2024-04-29 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 2870c76fb582 [SPARK-48016][SQL][3.4] Fix a bug in try_divide function 
when with decimals
2870c76fb582 is described below

commit 2870c76fb58266db69419e05a204c554f9733357
Author: Gengliang Wang 
AuthorDate: Mon Apr 29 22:01:16 2024 -0700

[SPARK-48016][SQL][3.4] Fix a bug in try_divide function when with decimals

### What changes were proposed in this pull request?

 Currently, the following query will throw DIVIDE_BY_ZERO error instead of 
returning null
 ```
SELECT try_divide(1, decimal(0));
```

This is caused by the rule `DecimalPrecision`:
```
case b  BinaryOperator(left, right) if left.dataType != right.dataType =>
  (left, right) match {
 ...
case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
l.dataType.isInstanceOf[IntegralType] &&
literalPickMinimumPrecision =>
  b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r))
```
The result of the above makeCopy will contain `ANSI` as the `evalMode`, 
instead of `TRY`.
This PR is to fix this bug by replacing the makeCopy method calls with 
withNewChildren

### Why are the changes needed?

Bug fix in try_* functions.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes a long-standing bug in the try_divide function.

### How was this patch tested?

New UT

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46289 from gengliangwang/PICK_PR_46286_BRANCH-3.4.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../sql/catalyst/analysis/DecimalPrecision.scala   |  14 +-
 .../spark/sql/catalyst/analysis/TypeCoercion.scala |  10 +-
 .../analyzer-results/ansi/try_arithmetic.sql.out   | 491 +
 .../analyzer-results/try_arithmetic.sql.out| 491 +
 .../resources/sql-tests/inputs/try_arithmetic.sql  |   8 +
 .../sql-tests/results/ansi/try_arithmetic.sql.out  |  64 +++
 .../sql-tests/results/try_arithmetic.sql.out   |  64 +++
 7 files changed, 1130 insertions(+), 12 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
index 46fbf071f437..19b6f2cf8dab 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
@@ -82,7 +82,7 @@ object DecimalPrecision extends TypeCoercionRule {
   val resultType = widerDecimalType(p1, s1, p2, s2)
   val newE1 = if (e1.dataType == resultType) e1 else Cast(e1, resultType)
   val newE2 = if (e2.dataType == resultType) e2 else Cast(e2, resultType)
-  b.makeCopy(Array(newE1, newE2))
+  b.withNewChildren(Seq(newE1, newE2))
   }
 
   /**
@@ -201,21 +201,21 @@ object DecimalPrecision extends TypeCoercionRule {
 case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
 l.dataType.isInstanceOf[IntegralType] &&
 literalPickMinimumPrecision =>
-  b.makeCopy(Array(Cast(l, DecimalType.fromLiteral(l)), r))
+  b.withNewChildren(Seq(Cast(l, DecimalType.fromLiteral(l)), r))
 case (l, r: Literal) if l.dataType.isInstanceOf[DecimalType] &&
 r.dataType.isInstanceOf[IntegralType] &&
 literalPickMinimumPrecision =>
-  b.makeCopy(Array(l, Cast(r, DecimalType.fromLiteral(r
+  b.withNewChildren(Seq(l, Cast(r, DecimalType.fromLiteral(r
 // Promote integers inside a binary expression with fixed-precision 
decimals to decimals,
 // and fixed-precision decimals in an expression with floats / doubles 
to doubles
 case (l @ IntegralType(), r @ DecimalType.Expression(_, _)) =>
-  b.makeCopy(Array(Cast(l, DecimalType.forType(l.dataType)), r))
+  b.withNewChildren(Seq(Cast(l, DecimalType.forType(l.dataType)), r))
 case (l @ DecimalType.Expression(_, _), r @ IntegralType()) =>
-  b.makeCopy(Array(l, Cast(r, DecimalType.forType(r.dataType
+  b.withNewChildren(Seq(l, Cast(r, DecimalType.forType(r.dataType
 case (l, r @ DecimalType.Expression(_, _)) if isFloat(l.dataType) =>
-  b.makeCopy(Array(l, Cast(r, DoubleType)))
+  b.withNewChildren(Seq(l, Cast(r, DoubleType)))
 case (l @ DecimalType.Expression(_, _), r) if isFloat(r.dataType) =>
-  b.makeCopy(Array(Cast(l, DoubleType), r))
+ 

(spark) branch branch-3.5 updated: [SPARK-48016][SQL] Fix a bug in try_divide function when with decimals

2024-04-29 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new e78ee2c57702 [SPARK-48016][SQL] Fix a bug in try_divide function when 
with decimals
e78ee2c57702 is described below

commit e78ee2c5770218a521340cb84f57a02dd00f7f3a
Author: Gengliang Wang 
AuthorDate: Mon Apr 29 16:40:56 2024 -0700

[SPARK-48016][SQL] Fix a bug in try_divide function when with decimals

 Currently, the following query will throw DIVIDE_BY_ZERO error instead of 
returning null
 ```
SELECT try_divide(1, decimal(0));
```

This is caused by the rule `DecimalPrecision`:
```
case b  BinaryOperator(left, right) if left.dataType != right.dataType =>
  (left, right) match {
 ...
case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
l.dataType.isInstanceOf[IntegralType] &&
literalPickMinimumPrecision =>
  b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r))
```
The result of the above makeCopy will contain `ANSI` as the `evalMode`, 
instead of `TRY`.
This PR is to fix this bug by replacing the makeCopy method calls with 
withNewChildren

Bug fix in try_* functions.

Yes, it fixes a long-standing bug in the try_divide function.

New UT

No

Closes #46286 from gengliangwang/avoidMakeCopy.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
(cherry picked from commit 3fbcb26d8e992c65a2778b96da4142e234786e53)
Signed-off-by: Gengliang Wang 
---
 .../sql/catalyst/analysis/DecimalPrecision.scala   | 14 ++---
 .../spark/sql/catalyst/analysis/TypeCoercion.scala | 10 ++--
 sql/core/src/test/resources/log4j2.properties  |  2 +-
 .../analyzer-results/ansi/try_arithmetic.sql.out   | 56 +++
 .../analyzer-results/try_arithmetic.sql.out| 56 +++
 .../resources/sql-tests/inputs/try_arithmetic.sql  |  8 +++
 .../sql-tests/results/ansi/try_arithmetic.sql.out  | 64 ++
 .../sql-tests/results/try_arithmetic.sql.out   | 64 ++
 8 files changed, 261 insertions(+), 13 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
index 09cf61a77955..f51127f53b38 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
@@ -83,7 +83,7 @@ object DecimalPrecision extends TypeCoercionRule {
   val resultType = widerDecimalType(p1, s1, p2, s2)
   val newE1 = if (e1.dataType == resultType) e1 else Cast(e1, resultType)
   val newE2 = if (e2.dataType == resultType) e2 else Cast(e2, resultType)
-  b.makeCopy(Array(newE1, newE2))
+  b.withNewChildren(Seq(newE1, newE2))
   }
 
   /**
@@ -202,21 +202,21 @@ object DecimalPrecision extends TypeCoercionRule {
 case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
 l.dataType.isInstanceOf[IntegralType] &&
 literalPickMinimumPrecision =>
-  b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r))
+  b.withNewChildren(Seq(Cast(l, DataTypeUtils.fromLiteral(l)), r))
 case (l, r: Literal) if l.dataType.isInstanceOf[DecimalType] &&
 r.dataType.isInstanceOf[IntegralType] &&
 literalPickMinimumPrecision =>
-  b.makeCopy(Array(l, Cast(r, DataTypeUtils.fromLiteral(r
+  b.withNewChildren(Seq(l, Cast(r, DataTypeUtils.fromLiteral(r
 // Promote integers inside a binary expression with fixed-precision 
decimals to decimals,
 // and fixed-precision decimals in an expression with floats / doubles 
to doubles
 case (l @ IntegralTypeExpression(), r @ DecimalExpression(_, _)) =>
-  b.makeCopy(Array(Cast(l, DecimalType.forType(l.dataType)), r))
+  b.withNewChildren(Seq(Cast(l, DecimalType.forType(l.dataType)), r))
 case (l @ DecimalExpression(_, _), r @ IntegralTypeExpression()) =>
-  b.makeCopy(Array(l, Cast(r, DecimalType.forType(r.dataType
+  b.withNewChildren(Seq(l, Cast(r, DecimalType.forType(r.dataType
 case (l, r @ DecimalExpression(_, _)) if isFloat(l.dataType) =>
-  b.makeCopy(Array(l, Cast(r, DoubleType)))
+  b.withNewChildren(Seq(l, Cast(r, DoubleType)))
 case (l @ DecimalExpression(_, _), r) if isFloat(r.dataType) =>
-  b.makeCopy(Array(Cast(l, DoubleType), r))
+  b.withNewChildren(Seq(Cast(l, DoubleType), r))
 case _ => b
   }
   }
diff 

(spark) branch master updated: [SPARK-48016][SQL] Fix a bug in try_divide function when with decimals

2024-04-29 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3fbcb26d8e99 [SPARK-48016][SQL] Fix a bug in try_divide function when 
with decimals
3fbcb26d8e99 is described below

commit 3fbcb26d8e992c65a2778b96da4142e234786e53
Author: Gengliang Wang 
AuthorDate: Mon Apr 29 16:40:56 2024 -0700

[SPARK-48016][SQL] Fix a bug in try_divide function when with decimals

### What changes were proposed in this pull request?

 Currently, the following query will throw DIVIDE_BY_ZERO error instead of 
returning null
 ```
SELECT try_divide(1, decimal(0));
```

This is caused by the rule `DecimalPrecision`:
```
case b  BinaryOperator(left, right) if left.dataType != right.dataType =>
  (left, right) match {
 ...
case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
l.dataType.isInstanceOf[IntegralType] &&
literalPickMinimumPrecision =>
  b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r))
```
The result of the above makeCopy will contain `ANSI` as the `evalMode`, 
instead of `TRY`.
This PR is to fix this bug by replacing the makeCopy method calls with 
withNewChildren

### Why are the changes needed?

Bug fix in try_* functions.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes a long-standing bug in the try_divide function.

### How was this patch tested?

New UT

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46286 from gengliangwang/avoidMakeCopy.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../sql/catalyst/analysis/DecimalPrecision.scala   | 14 ++---
 .../spark/sql/catalyst/analysis/TypeCoercion.scala | 10 ++--
 .../analyzer-results/ansi/try_arithmetic.sql.out   | 56 +++
 .../analyzer-results/try_arithmetic.sql.out| 56 +++
 .../resources/sql-tests/inputs/try_arithmetic.sql  |  8 +++
 .../sql-tests/results/ansi/try_arithmetic.sql.out  | 64 ++
 .../sql-tests/results/try_arithmetic.sql.out   | 64 ++
 7 files changed, 260 insertions(+), 12 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
index 9ad8368d007e..6524ff9b2c57 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
@@ -92,7 +92,7 @@ object DecimalPrecision extends TypeCoercionRule {
   val resultType = widerDecimalType(p1, s1, p2, s2)
   val newE1 = if (e1.dataType == resultType) e1 else Cast(e1, resultType)
   val newE2 = if (e2.dataType == resultType) e2 else Cast(e2, resultType)
-  b.makeCopy(Array(newE1, newE2))
+  b.withNewChildren(Seq(newE1, newE2))
   }
 
   /**
@@ -211,21 +211,21 @@ object DecimalPrecision extends TypeCoercionRule {
 case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
 l.dataType.isInstanceOf[IntegralType] &&
 literalPickMinimumPrecision =>
-  b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r))
+  b.withNewChildren(Seq(Cast(l, DataTypeUtils.fromLiteral(l)), r))
 case (l, r: Literal) if l.dataType.isInstanceOf[DecimalType] &&
 r.dataType.isInstanceOf[IntegralType] &&
 literalPickMinimumPrecision =>
-  b.makeCopy(Array(l, Cast(r, DataTypeUtils.fromLiteral(r
+  b.withNewChildren(Seq(l, Cast(r, DataTypeUtils.fromLiteral(r
 // Promote integers inside a binary expression with fixed-precision 
decimals to decimals,
 // and fixed-precision decimals in an expression with floats / doubles 
to doubles
 case (l @ IntegralTypeExpression(), r @ DecimalExpression(_, _)) =>
-  b.makeCopy(Array(Cast(l, DecimalType.forType(l.dataType)), r))
+  b.withNewChildren(Seq(Cast(l, DecimalType.forType(l.dataType)), r))
 case (l @ DecimalExpression(_, _), r @ IntegralTypeExpression()) =>
-  b.makeCopy(Array(l, Cast(r, DecimalType.forType(r.dataType
+  b.withNewChildren(Seq(l, Cast(r, DecimalType.forType(r.dataType
 case (l, r @ DecimalExpression(_, _)) if isFloat(l.dataType) =>
-  b.makeCopy(Array(l, Cast(r, DoubleType)))
+  b.withNewChildren(Seq(l, Cast(r, DoubleType)))
 case (l @ DecimalExpression(_, _), r) if isFloat(r.dataType) =>
-  b.makeCopy(Array(Cast(l, DoubleType), r))

(spark) branch master updated: [SPARK-47597][STREAMING] Streaming: Migrate logInfo with variables to structured logging framework

2024-04-25 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d540786d9cea [SPARK-47597][STREAMING] Streaming: Migrate logInfo with 
variables to structured logging framework
d540786d9cea is described below

commit d540786d9ceacd7426803ad615f7ab32ec6faf67
Author: Daniel Tenedorio 
AuthorDate: Thu Apr 25 13:23:21 2024 -0700

[SPARK-47597][STREAMING] Streaming: Migrate logInfo with variables to 
structured logging framework

### What changes were proposed in this pull request?

Migrate logInfo with variables of the streaming module to structured 
logging framework. This transforms the logInfo entries of the following API
```
def logInfo(msg: => String): Unit
```
to
```
def logInfo(entry: LogEntry): Unit
```

### Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?

Yes, Spark core logs will contain additional MDC

### How was this patch tested?

Compiler and scala style checks, as well as code review.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46192 from dtenedor/streaming-log-info.

Authored-by: Daniel Tenedorio 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/LogKey.scala   | 47 -
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../AsyncProgressTrackingMicroBatchExecution.scala |  5 +-
 .../streaming/CheckpointFileManager.scala  |  8 ++-
 .../streaming/CompactibleFileStreamLog.scala   | 11 ++--
 .../sql/execution/streaming/FileStreamSink.scala   |  4 +-
 .../execution/streaming/FileStreamSinkLog.scala|  4 +-
 .../sql/execution/streaming/FileStreamSource.scala | 13 ++--
 .../sql/execution/streaming/HDFSMetadataLog.scala  |  7 +-
 .../execution/streaming/IncrementalExecution.scala |  4 +-
 .../streaming/ManifestFileCommitProtocol.scala |  4 +-
 .../execution/streaming/MetadataLogFileIndex.scala |  4 +-
 .../execution/streaming/MicroBatchExecution.scala  | 34 ++
 .../sql/execution/streaming/ProgressReporter.scala |  8 +--
 .../execution/streaming/ResolveWriteToStream.scala |  5 +-
 .../sql/execution/streaming/StreamExecution.scala  |  7 +-
 .../sql/execution/streaming/WatermarkTracker.scala |  7 +-
 .../streaming/continuous/ContinuousExecution.scala | 13 ++--
 .../continuous/ContinuousQueuedDataReader.scala|  6 +-
 .../streaming/continuous/ContinuousWriteRDD.scala  | 10 +--
 .../WriteToContinuousDataSourceExec.scala  |  7 +-
 .../sources/RateStreamMicroBatchStream.scala   |  5 +-
 .../state/HDFSBackedStateStoreProvider.scala   | 34 ++
 .../sql/execution/streaming/state/RocksDB.scala| 54 ---
 .../streaming/state/RocksDBFileManager.scala   | 77 +-
 .../streaming/state/RocksDBMemoryManager.scala |  7 +-
 .../state/RocksDBStateStoreProvider.scala  | 12 ++--
 .../sql/execution/streaming/state/StateStore.scala | 23 ---
 .../streaming/state/StateStoreChangelog.scala  |  8 ++-
 .../state/StreamingSessionWindowStateManager.scala |  5 +-
 .../state/SymmetricHashJoinStateManager.scala  |  6 +-
 31 files changed, 286 insertions(+), 155 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index fab5e80dd0e6..6df7cb5a5867 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -34,6 +34,7 @@ object LogKey extends Enumeration {
   val ARGS = Value
   val BACKUP_FILE = Value
   val BATCH_ID = Value
+  val BATCH_TIMESTAMP = Value
   val BATCH_WRITE = Value
   val BLOCK_ID = Value
   val BLOCK_MANAGER_ID = Value
@@ -48,8 +49,12 @@ object LogKey extends Enumeration {
   val CATALOG_NAME = Value
   val CATEGORICAL_FEATURES = Value
   val CHECKPOINT_FILE = Value
+  val CHECKPOINT_LOCATION = Value
+  val CHECKPOINT_PATH = Value
+  val CHECKPOINT_ROOT = Value
   val CHECKPOINT_TIME = Value
   val CHECKSUM_FILE_NUM = Value
+  val CHOSEN_WATERMARK = Value
   val CLASS_LOADER = Value
   val CLASS_NAME = Value
   val CLUSTER_CENTROIDS = Value
@@ -66,6 +71,8 @@ object LogKey extends Enumeration {
   val COLUMN_NAME = Value
   val COMMAND = Value
   val COMMAND_OUTPUT = Value
+  val COMMITTED_VERSION = Value
+  val COMPACT_INTERVAL = Value
   val COMPONENT = Value
   val CONFIG = Value
   val CONFIG2 = Value
@@ -86,6 +93,7 @@ object LogKey extends Enumeration {
   val CSV_SCHEMA_FIELD_NAME = Value
   val CSV_SCHEMA_FIELD_NAMES = Value
   val CSV_SOURCE = Value
+  val CURRENT_BATCH_ID = Value
   

(spark) branch master updated (08caa567fb29 -> 775bc54fcd0d)

2024-04-25 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 08caa567fb29 [SPARK-47980][SQL][TESTS] Reactivate test 'Empty 
float/double array columns raise EOFException'
 add 775bc54fcd0d [SPARK-47580][SQL] SQL catalyst: eliminate unnamed 
variables in error logs

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/codegen/CodeGenerator.scala  | 6 ++
 .../scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala| 2 +-
 .../expressions/CodeGeneratorWithInterpretedFallbackSuite.scala | 2 +-
 3 files changed, 4 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47583][CORE] SQL core: Migrate logError with variables to structured logging framework

2024-04-24 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 62dd64a5d13d [SPARK-47583][CORE] SQL core: Migrate logError with 
variables to structured logging framework
62dd64a5d13d is described below

commit 62dd64a5d13d14a4e3bce50d9c264f8e494c7863
Author: Daniel Tenedorio 
AuthorDate: Wed Apr 24 13:43:05 2024 -0700

[SPARK-47583][CORE] SQL core: Migrate logError with variables to structured 
logging framework

### What changes were proposed in this pull request?

Migrate logError with variables of the sql/core module to structured 
logging framework. This transforms the logError entries of the following API
```
def logError(msg: => String): Unit
```
to
```
def logError(entry: LogEntry): Unit
```

### Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?

Yes, Spark core logs will contain additional MDC

### How was this patch tested?

Compiler and scala style checks, as well as code review.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45969 from dtenedor/log-error-sql-core.

Authored-by: Daniel Tenedorio 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/LogKey.scala   |  6 +
 .../execution/BaseScriptTransformationExec.scala   | 10 ---
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 31 +-
 .../command/InsertIntoDataSourceDirCommand.scala   |  4 ++-
 .../execution/command/createDataSourceTables.scala |  5 +++-
 .../apache/spark/sql/execution/command/ddl.scala   | 13 -
 .../execution/datasources/FileFormatWriter.scala   |  7 ++---
 .../datasources/v2/WriteToDataSourceV2Exec.scala   | 20 --
 .../execution/exchange/BroadcastExchangeExec.scala |  4 ++-
 9 files changed, 63 insertions(+), 37 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index b9b0e372a2b0..fab5e80dd0e6 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -34,6 +34,7 @@ object LogKey extends Enumeration {
   val ARGS = Value
   val BACKUP_FILE = Value
   val BATCH_ID = Value
+  val BATCH_WRITE = Value
   val BLOCK_ID = Value
   val BLOCK_MANAGER_ID = Value
   val BROADCAST_ID = Value
@@ -116,6 +117,7 @@ object LogKey extends Enumeration {
   val ESTIMATOR_PARAMETER_MAP = Value
   val EVENT_LOOP = Value
   val EVENT_QUEUE = Value
+  val EXCEPTION = Value
   val EXECUTE_INFO = Value
   val EXECUTE_KEY = Value
   val EXECUTION_PLAN_LEAVES = Value
@@ -162,6 +164,7 @@ object LogKey extends Enumeration {
   val HIVE_OPERATION_TYPE = Value
   val HOST = Value
   val HOST_PORT = Value
+  val IDENTIFIER = Value
   val INCOMPATIBLE_TYPES = Value
   val INDEX = Value
   val INDEX_FILE_NUM = Value
@@ -330,11 +333,13 @@ object LogKey extends Enumeration {
   val SPARK_PLAN_ID = Value
   val SQL_TEXT = Value
   val SRC_PATH = Value
+  val STAGE_ATTEMPT = Value
   val STAGE_ID = Value
   val START_INDEX = Value
   val STATEMENT_ID = Value
   val STATE_STORE_PROVIDER = Value
   val STATUS = Value
+  val STDERR = Value
   val STORAGE_LEVEL = Value
   val STORAGE_LEVEL_DESERIALIZED = Value
   val STORAGE_LEVEL_REPLICATION = Value
@@ -402,6 +407,7 @@ object LogKey extends Enumeration {
   val WEIGHTED_NUM = Value
   val WORKER_URL = Value
   val WRITE_AHEAD_LOG_INFO = Value
+  val WRITE_JOB_UUID = Value
   val XSD_PATH = Value
 
   type LogKey = Value
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
index 91042b59677b..6e54bde46942 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
@@ -27,7 +27,8 @@ import scala.util.control.NonFatal
 import org.apache.hadoop.conf.Configuration
 
 import org.apache.spark.{SparkFiles, TaskContext}
-import org.apache.spark.internal.Logging
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey._
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
 import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, 
Cast, Expression, GenericInternalRow, JsonToStructs, Literal, StructsToJson, 
UnsafeProjection}
@@ -185,7 +186,7 @@ trait BaseScriptTransformationExec extends UnaryExecN

(spark) branch master updated: [SPARK-47604][CORE] Resource managers: Migrate logInfo with variables to structured logging framework

2024-04-23 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c88fabfee41d [SPARK-47604][CORE] Resource managers: Migrate logInfo 
with variables to structured logging framework
c88fabfee41d is described below

commit c88fabfee41df1ca4729058450ec6f798641c936
Author: panbingkun 
AuthorDate: Tue Apr 23 11:00:44 2024 -0700

[SPARK-47604][CORE] Resource managers: Migrate logInfo with variables to 
structured logging framework

### What changes were proposed in this pull request?
The pr aims to migrate `logInfo` in module `Resource managers` with 
variables to `structured logging framework`.

### Why are the changes needed?
To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46130 from panbingkun/SPARK-47604.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/LogKey.scala   |  45 +++-
 .../execution/ExecuteResponseObserver.scala|   2 +-
 .../deploy/k8s/SparkKubernetesClientFactory.scala  |   9 +-
 .../k8s/submit/KubernetesClientApplication.scala   |   8 +-
 .../deploy/k8s/submit/KubernetesClientUtils.scala  |   4 +-
 .../k8s/submit/LoggingPodStatusWatcher.scala   |  20 ++--
 .../cluster/k8s/ExecutorPodsAllocator.scala|  25 ++--
 .../cluster/k8s/ExecutorPodsLifecycleManager.scala |   8 +-
 .../scheduler/cluster/k8s/ExecutorRollPlugin.scala |   4 +-
 .../cluster/k8s/KubernetesClusterManager.scala |   5 +-
 .../k8s/KubernetesClusterSchedulerBackend.scala|  11 +-
 ...ernetesLocalDiskShuffleExecutorComponents.scala |  21 ++--
 .../spark/deploy/yarn/ApplicationMaster.scala  |  40 ---
 .../org/apache/spark/deploy/yarn/Client.scala  |  59 ++
 .../apache/spark/deploy/yarn/ClientArguments.scala |   5 +-
 .../spark/deploy/yarn/ExecutorRunnable.scala   |  29 ++---
 .../spark/deploy/yarn/SparkRackResolver.scala  |   7 +-
 .../apache/spark/deploy/yarn/YarnAllocator.scala   | 126 -
 .../yarn/YarnAllocatorNodeHealthTracker.scala  |  12 +-
 .../cluster/YarnClientSchedulerBackend.scala   |   4 +-
 .../scheduler/cluster/YarnSchedulerBackend.scala   |  17 +--
 21 files changed, 283 insertions(+), 178 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index 585373f1782b..b9b0e372a2b0 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -26,9 +26,12 @@ object LogKey extends Enumeration {
   val ACTUAL_PARTITION_COLUMN = Value
   val ALPHA = Value
   val ANALYSIS_ERROR = Value
+  val APP_ATTEMPT_ID = Value
   val APP_DESC = Value
   val APP_ID = Value
+  val APP_NAME = Value
   val APP_STATE = Value
+  val ARGS = Value
   val BACKUP_FILE = Value
   val BATCH_ID = Value
   val BLOCK_ID = Value
@@ -45,6 +48,7 @@ object LogKey extends Enumeration {
   val CATEGORICAL_FEATURES = Value
   val CHECKPOINT_FILE = Value
   val CHECKPOINT_TIME = Value
+  val CHECKSUM_FILE_NUM = Value
   val CLASS_LOADER = Value
   val CLASS_NAME = Value
   val CLUSTER_CENTROIDS = Value
@@ -70,6 +74,7 @@ object LogKey extends Enumeration {
   val CONSUMER = Value
   val CONTAINER = Value
   val CONTAINER_ID = Value
+  val CONTAINER_STATE = Value
   val COST = Value
   val COUNT = Value
   val CROSS_VALIDATION_METRIC = Value
@@ -85,6 +90,7 @@ object LogKey extends Enumeration {
   val DATABASE_NAME = Value
   val DATAFRAME_CACHE_ENTRY = Value
   val DATAFRAME_ID = Value
+  val DATA_FILE_NUM = Value
   val DATA_SOURCE = Value
   val DATA_SOURCES = Value
   val DATA_SOURCE_PROVIDER = Value
@@ -113,10 +119,16 @@ object LogKey extends Enumeration {
   val EXECUTE_INFO = Value
   val EXECUTE_KEY = Value
   val EXECUTION_PLAN_LEAVES = Value
+  val EXECUTOR_DESIRED_COUNT = Value
+  val EXECUTOR_ENVS = Value
   val EXECUTOR_ENV_REGEX = Value
   val EXECUTOR_ID = Value
   val EXECUTOR_IDS = Value
+  val EXECUTOR_LAUNCH_COMMANDS = Value
+  val EXECUTOR_LAUNCH_COUNT = Value
+  val EXECUTOR_RESOURCES = Value
   val EXECUTOR_STATE = Value
+  val EXECUTOR_TARGET_COUNT = Value
   val EXIT_CODE = Value
   val EXPECTED_NUM_FILES = Value
   val EXPECTED_PARTITION_COLUMN = Value
@@ -129,8 +141,10 @@ object LogKey extends Enumeration {
   val FEATURE_COLUMN = Value
   val FEATURE_DIMENSION = Value
   val FIELD_NAME = Value
+  val FILE_ABSOLUTE_PATH = Value
   val FILE_FORMAT = Value
   val FILE_FORMAT2 = Value
+  val FILE_NAME = Value
   val FILE_VERSION

(spark) branch master updated (e01ac581f46a -> 3c7905e00d2e)

2024-04-22 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from e01ac581f46a [SPARK-47933][PYTHON] Parent Column class for Spark 
Connect and Spark Classic
 add 3c7905e00d2e [SPARK-47600][CORE] MLLib: Migrate logInfo with variables 
to structured logging framework

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/internal/LogKey.scala   | 47 ++
 .../org/apache/spark/ml/clustering/KMeans.scala| 22 +-
 .../optim/IterativelyReweightedLeastSquares.scala  | 10 +++--
 .../spark/ml/optim/WeightedLeastSquares.scala  |  5 ++-
 .../org/apache/spark/ml/r/RWrapperUtils.scala  | 11 ++---
 .../ml/regression/RandomForestRegressor.scala  |  2 +-
 .../spark/ml/tree/impl/GradientBoostedTrees.scala  |  5 ++-
 .../apache/spark/ml/tree/impl/RandomForest.scala   | 14 ---
 .../apache/spark/ml/tuning/CrossValidator.scala| 10 +++--
 .../spark/ml/tuning/TrainValidationSplit.scala | 11 +++--
 .../org/apache/spark/ml/util/DatasetUtils.scala|  8 ++--
 .../org/apache/spark/ml/util/Instrumentation.scala | 18 ++---
 .../scala/org/apache/spark/ml/util/ReadWrite.scala |  5 ++-
 .../spark/mllib/clustering/BisectingKMeans.scala   | 18 +
 .../org/apache/spark/mllib/clustering/KMeans.scala | 23 ++-
 .../spark/mllib/clustering/LocalKMeans.scala   |  8 ++--
 .../clustering/PowerIterationClustering.scala  | 15 +++
 .../spark/mllib/clustering/StreamingKMeans.scala   |  9 +++--
 .../evaluation/BinaryClassificationMetrics.scala   |  8 ++--
 .../org/apache/spark/mllib/feature/Word2Vec.scala  | 11 +++--
 .../org/apache/spark/mllib/fpm/PrefixSpan.scala| 24 ++-
 .../regression/StreamingLinearAlgorithm.scala  |  7 ++--
 .../apache/spark/ml/recommendation/ALSSuite.scala  | 13 +++---
 .../apache/spark/mllib/linalg/VectorsSuite.scala   |  4 +-
 24 files changed, 206 insertions(+), 102 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (86563169eef8 -> f2d0cf23018f)

2024-04-22 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 86563169eef8 [SPARK-47940][BUILD][TESTS] Upgrade `guava` dependency to 
`33.1.0-jre` in Docker IT
 add f2d0cf23018f [SPARK-47907][SQL] Put bang under a config

No new revisions were added by this update.

Summary of changes:
 .../src/main/resources/error/error-conditions.json |  15 ++
 ...-conditions-syntax-discontinued-error-class.md} |  16 +-
 docs/sql-migration-guide.md|   1 +
 .../spark/sql/catalyst/parser/SqlBaseParser.g4 |  55 ++---
 .../spark/sql/catalyst/parser/AstBuilder.scala |  62 +-
 .../org/apache/spark/sql/internal/SQLConf.scala|  10 +
 .../sql/catalyst/parser/ErrorParserSuite.scala |  33 +++
 .../analyzer-results/predicate-functions.sql.out   | 194 ++
 .../sql-tests/inputs/predicate-functions.sql   |  36 
 .../sql-tests/results/predicate-functions.sql.out  | 224 +
 10 files changed, 602 insertions(+), 44 deletions(-)
 copy docs/{sql-error-conditions-illegal-state-store-value-error-class.md => 
sql-error-conditions-syntax-discontinued-error-class.md} (71%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (fe47edece059 -> 8aa2dad46b79)

2024-04-18 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from fe47edece059 [SPARK-47883][SQL] Make `CollectTailExec.doExecute` lazy 
with RowQueue
 add 8aa2dad46b79 [SPARK-47596][DSTREAMS] Streaming: Migrate logWarn with 
variables to structured logging framework

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/internal/LogKey.scala| 18 ++
 .../org/apache/spark/streaming/Checkpoint.scala | 21 +
 .../apache/spark/streaming/dstream/DStream.scala|  9 ++---
 .../streaming/dstream/DStreamCheckpointData.scala   |  9 ++---
 .../spark/streaming/dstream/FileInputDStream.scala  | 13 ++---
 .../spark/streaming/dstream/InputDStream.scala  |  6 --
 .../spark/streaming/receiver/BlockGenerator.scala   |  6 --
 .../streaming/receiver/ReceivedBlockHandler.scala   | 18 +++---
 .../streaming/receiver/ReceiverSupervisor.scala |  6 +++---
 .../streaming/receiver/ReceiverSupervisorImpl.scala |  5 +++--
 .../spark/streaming/scheduler/JobGenerator.scala|  6 --
 .../streaming/scheduler/ReceivedBlockTracker.scala  |  5 +++--
 .../spark/streaming/scheduler/ReceiverTracker.scala | 14 --
 .../spark/streaming/util/BatchedWriteAheadLog.scala |  5 +++--
 .../streaming/util/FileBasedWriteAheadLog.scala |  5 +++--
 15 files changed, 95 insertions(+), 51 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (47d783bc6489 -> 9718573ce748)

2024-04-17 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 47d783bc6489 [SPARK-47882][SQL] createTableColumnTypes need to be 
mapped to database types instead of using directly
 add 9718573ce748 [SPARK-47591][SQL] Hive-thriftserver: Migrate logInfo 
with variables to structured logging framework

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/internal/LogKey.scala  |  3 +++
 .../thriftserver/SparkExecuteStatementOperation.scala  | 18 --
 .../hive/thriftserver/SparkGetCatalogsOperation.scala  |  5 +++--
 .../hive/thriftserver/SparkGetColumnsOperation.scala   | 12 ++--
 .../hive/thriftserver/SparkGetSchemasOperation.scala   | 11 +--
 .../thriftserver/SparkGetTableTypesOperation.scala |  5 +++--
 .../hive/thriftserver/SparkGetTablesOperation.scala| 12 ++--
 .../hive/thriftserver/SparkGetTypeInfoOperation.scala  |  5 +++--
 .../spark/sql/hive/thriftserver/SparkOperation.scala   |  2 +-
 9 files changed, 54 insertions(+), 19 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47584][SQL] SQL core: Migrate logWarn with variables to structured logging framework

2024-04-17 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4957a40d6e6b [SPARK-47584][SQL] SQL core: Migrate logWarn with 
variables to structured logging framework
4957a40d6e6b is described below

commit 4957a40d6e6bf68226c8047687e8f30c93adb8ce
Author: panbingkun 
AuthorDate: Wed Apr 17 11:59:09 2024 -0700

[SPARK-47584][SQL] SQL core: Migrate logWarn with variables to structured 
logging framework

### What changes were proposed in this pull request?
The pr aims to migrate `logWarning` in module `SQL core` with variables to 
`structured logging framework`.

### Why are the changes needed?
To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46057 from panbingkun/SPARK-47584.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/LogKey.scala   | 65 +-
 .../catalyst/analysis/StreamingJoinHelper.scala|  4 +-
 .../ReplaceNullWithFalseInPredicate.scala  |  4 +-
 .../main/scala/org/apache/spark/sql/Column.scala   | 13 +++--
 .../scala/org/apache/spark/sql/SparkSession.scala  | 20 ---
 .../spark/sql/api/python/PythonSQLUtils.scala  |  7 ++-
 .../org/apache/spark/sql/api/r/SQLUtils.scala  |  9 +--
 .../catalyst/analysis/ResolveSessionCatalog.scala  |  9 ++-
 .../apache/spark/sql/execution/ExistingRDD.scala   | 12 ++--
 .../spark/sql/execution/QueryExecution.scala   |  6 +-
 .../spark/sql/execution/SparkStrategies.scala  | 10 ++--
 .../sql/execution/WholeStageCodegenExec.scala  |  8 ++-
 .../adaptive/InsertAdaptiveSparkPlan.scala |  6 +-
 .../execution/command/AnalyzeTablesCommand.scala   |  6 +-
 .../spark/sql/execution/command/CommandUtils.scala |  9 +--
 .../spark/sql/execution/command/SetCommand.scala   | 28 ++
 .../apache/spark/sql/execution/command/ddl.scala   |  8 ++-
 .../datasources/BasicWriteStatsTracker.scala   |  9 +--
 .../sql/execution/datasources/DataSource.scala | 10 ++--
 .../execution/datasources/DataSourceManager.scala  |  6 +-
 .../sql/execution/datasources/FilePartition.scala  | 11 ++--
 .../sql/execution/datasources/FileScanRDD.scala|  8 ++-
 .../execution/datasources/FileStatusCache.scala| 14 +++--
 .../execution/datasources/csv/CSVDataSource.scala  |  9 +--
 .../execution/datasources/jdbc/JDBCRelation.scala  | 15 ++---
 .../sql/execution/datasources/jdbc/JdbcUtils.scala | 11 ++--
 .../datasources/json/JsonOutputWriter.scala|  8 ++-
 .../sql/execution/datasources/orc/OrcUtils.scala   |  5 +-
 .../datasources/parquet/ParquetFileFormat.scala| 13 +++--
 .../datasources/parquet/ParquetUtils.scala |  9 +--
 .../execution/datasources/v2/CacheTableExec.scala  |  4 +-
 .../execution/datasources/v2/CreateIndexExec.scala |  5 +-
 .../datasources/v2/CreateNamespaceExec.scala   |  5 +-
 .../execution/datasources/v2/CreateTableExec.scala |  5 +-
 .../datasources/v2/DataSourceV2Strategy.scala  |  5 +-
 .../execution/datasources/v2/DropIndexExec.scala   |  4 +-
 .../datasources/v2/FilePartitionReader.scala   |  7 ++-
 .../sql/execution/datasources/v2/FileScan.scala|  7 ++-
 .../v2/V2ScanPartitioningAndOrdering.scala |  8 ++-
 .../ApplyInPandasWithStatePythonRunner.scala   |  8 ++-
 .../python/AttachDistributedSequenceExec.scala |  5 +-
 .../streaming/AvailableNowDataStreamWrapper.scala  | 15 ++---
 .../streaming/CheckpointFileManager.scala  | 24 
 .../streaming/CompactibleFileStreamLog.scala   |  5 +-
 .../sql/execution/streaming/FileStreamSink.scala   |  9 +--
 .../sql/execution/streaming/FileStreamSource.scala | 16 --
 .../execution/streaming/IncrementalExecution.scala |  9 +--
 .../streaming/ManifestFileCommitProtocol.scala |  6 +-
 .../execution/streaming/MicroBatchExecution.scala  | 24 
 .../spark/sql/execution/streaming/OffsetSeq.scala  | 15 +++--
 .../sql/execution/streaming/ProgressReporter.scala | 19 ---
 .../execution/streaming/ResolveWriteToStream.scala | 15 +++--
 .../sql/execution/streaming/StreamExecution.scala  |  8 ++-
 .../sql/execution/streaming/TimerStateImpl.scala   | 11 ++--
 .../sql/execution/streaming/TriggerExecutor.scala  |  8 ++-
 .../continuous/ContinuousTextSocketSource.scala|  5 +-
 .../sources/TextSocketMicroBatchStream.scala   |  5 +-
 .../state/HDFSBackedStateStoreProvider.scala   | 25 +
 .../sql/execution/streaming/state/RocksDB.scala|  6 +-
 .../streaming/state/RocksDBFileManager.scala   | 14 +++--
 .../sql/execution/streaming/state

(spark) branch master updated: [SPARK-47627][SQL] Add SQL MERGE syntax to enable schema evolution

2024-04-17 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 898838a239d3 [SPARK-47627][SQL] Add SQL MERGE syntax to enable schema 
evolution
898838a239d3 is described below

commit 898838a239d370429e49108a56c6a7fb22d6b399
Author: Paddy Xu 
AuthorDate: Wed Apr 17 10:53:02 2024 -0700

[SPARK-47627][SQL] Add SQL MERGE syntax to enable schema evolution

### Why are the changes needed?

This PR introduces a syntax `WITH SCHEMA EVOLUTION` to the SQL MERGE 
command, which allows the user to specify automatic schema evolution for a 
specific operation.

```sql
MERGE WITH SCHEMA EVOLUTION
INTO tgt
USING src
ON ...
WHEN ...
```

When `WITH SCHEMA EVOLUTION` is specified, schema evolution-related 
features must be turned on for this single statement and only in this statement.

Spark is only responsible for recognizing the existence or absence of the 
syntax `WITH SCHEMA EVOLUTION`, and the result is passed down to the MERGE 
command. Data sources must respect the syntax and give appropriate reactions: 
turn on features that are categorised as "schema evolution" when the syntax 
does exist. For example, when the underlying table is Delta Lake, the feature 
"mergeSchema" will be turned on (see 
https://github.com/delta-io/delta/blob/c41977db3529a3139d6306abe5ded161 [...]

### Does this PR introduce _any_ user-facing change?

Yes, see the previous section.

### How was this patch tested?

New tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45748 from xupefei/merge-schema-evolution.

Authored-by: Paddy Xu 
Signed-off-by: Gengliang Wang 
---
 .../CheckConnectJvmClientCompatibility.scala   |  1 +
 docs/sql-ref-ansi-compliance.md|  1 +
 .../spark/sql/catalyst/parser/SqlBaseLexer.g4  |  1 +
 .../spark/sql/catalyst/parser/SqlBaseParser.g4 |  4 +-
 .../spark/sql/catalyst/analysis/Analyzer.scala |  2 +-
 .../catalyst/analysis/RewriteMergeIntoTable.scala  |  6 +--
 .../spark/sql/catalyst/parser/AstBuilder.scala |  4 +-
 .../sql/catalyst/plans/logical/v2Commands.scala|  3 +-
 .../sql/catalyst/analysis/AnalysisSuite.scala  |  7 ++--
 .../PullupCorrelatedPredicatesSuite.scala  |  5 ++-
 .../ReplaceNullWithFalseInPredicateSuite.scala |  6 ++-
 .../spark/sql/catalyst/parser/DDLParserSuite.scala | 42 +
 .../org/apache/spark/sql/MergeIntoWriter.scala | 19 +-
 .../sql-tests/results/ansi/keywords.sql.out|  1 +
 .../resources/sql-tests/results/keywords.sql.out   |  1 +
 .../execution/command/PlanResolutionSuite.scala| 43 +-
 .../ThriftServerWithSparkContextSuite.scala|  2 +-
 17 files changed, 113 insertions(+), 35 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
index 0f383d007f29..f73290c5ce29 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
@@ -304,6 +304,7 @@ object CheckConnectJvmClientCompatibility {
 
   // MergeIntoWriter
   
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.MergeIntoWriter"),
+  
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.MergeIntoWriter$"),
   
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.WhenMatched"),
   
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.WhenMatched$"),
   
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.WhenNotMatched"),
diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index bf1819b9767b..0256a3e0869d 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -492,6 +492,7 @@ Below is a list of all the keywords in Spark SQL.
 |END|reserved|non-reserved|reserved|
 |ESCAPE|reserved|non-reserved|reserved|
 |ESCAPED|non-reserved|non-reserved|non-reserved|
+|EVOLUTION|non-reserved|non-reserved|non-reserved|
 |EXCEPT|reserved|strict-non-reserved|reserved|
 |EXCHANGE|non-reserved|non-reserved|non-reserved|
 |EXCLUDE|non-reserved|non-reserved|non-reserved|
diff --git 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
inde

(spark) branch master updated: [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework

2024-04-16 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f77495909b29 [SPARK-47588][CORE] Hive module: Migrate logInfo with 
variables to structured logging framework
f77495909b29 is described below

commit f77495909b29fe4883afcfd8fec7be048fe494a3
Author: Gengliang Wang 
AuthorDate: Tue Apr 16 22:32:34 2024 -0700

[SPARK-47588][CORE] Hive module: Migrate logInfo with variables to 
structured logging framework

### What changes were proposed in this pull request?

Migrate logInfo in Hive module with variables to structured logging 
framework.

### Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

GA tests
### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46086 from gengliangwang/hive_loginfo.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/LogKey.scala   |  4 +++
 .../spark/sql/hive/HiveExternalCatalog.scala   | 30 --
 .../spark/sql/hive/HiveMetastoreCatalog.scala  |  9 ---
 .../org/apache/spark/sql/hive/HiveUtils.scala  | 27 +++
 .../spark/sql/hive/client/HiveClientImpl.scala |  5 ++--
 .../sql/hive/client/IsolatedClientLoader.scala |  4 +--
 .../spark/sql/hive/orc/OrcFileOperator.scala   |  4 +--
 7 files changed, 48 insertions(+), 35 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index bfeb733af30a..838ef0355e3a 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -95,10 +95,13 @@ object LogKey extends Enumeration {
   val GROUP_ID = Value
   val HADOOP_VERSION = Value
   val HISTORY_DIR = Value
+  val HIVE_CLIENT_VERSION = Value
+  val HIVE_METASTORE_VERSION = Value
   val HIVE_OPERATION_STATE = Value
   val HIVE_OPERATION_TYPE = Value
   val HOST = Value
   val HOST_PORT = Value
+  val INCOMPATIBLE_TYPES = Value
   val INDEX = Value
   val INFERENCE_MODE = Value
   val INITIAL_CAPACITY = Value
@@ -152,6 +155,7 @@ object LogKey extends Enumeration {
   val POLICY = Value
   val PORT = Value
   val PRODUCER_ID = Value
+  val PROVIDER = Value
   val QUERY_CACHE_VALUE = Value
   val QUERY_HINT = Value
   val QUERY_ID = Value
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
index 8c35e10b383f..60f2d2f3e5fe 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
@@ -34,7 +34,7 @@ import org.apache.thrift.TException
 
 import org.apache.spark.{SparkConf, SparkException}
 import org.apache.spark.internal.{Logging, MDC}
-import org.apache.spark.internal.LogKey.{DATABASE_NAME, SCHEMA, SCHEMA2, 
TABLE_NAME}
+import org.apache.spark.internal.LogKey.{DATABASE_NAME, INCOMPATIBLE_TYPES, 
PROVIDER, SCHEMA, SCHEMA2, TABLE_NAME}
 import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
@@ -338,35 +338,37 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, 
hadoopConf: Configurat
 val (hiveCompatibleTable, logMessage) = maybeSerde match {
   case _ if options.skipHiveMetadata =>
 val message =
-  s"Persisting data source table $qualifiedTableName into Hive 
metastore in" +
-"Spark SQL specific format, which is NOT compatible with Hive."
+  log"Persisting data source table ${MDC(TABLE_NAME, 
qualifiedTableName)} into Hive " +
+log"metastore in Spark SQL specific format, which is NOT 
compatible with Hive."
 (None, message)
 
   case _ if incompatibleTypes.nonEmpty =>
+val incompatibleTypesStr = incompatibleTypes.mkString(", ")
 val message =
-  s"Hive incompatible types found: ${incompatibleTypes.mkString(", 
")}. " +
-s"Persisting data source table $qualifiedTableName into Hive 
metastore in " +
-"Spark SQL specific format, which is NOT compatible with Hive."
+  log"Hive incompatible types found: ${MDC(INCOMPATIBLE_TYPES, 
incompatibleTypesStr)}. " +
+log"Persisting data source table ${MDC(TABLE_NAME, 
qualifiedTableN

(spark) branch master updated: [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework

2024-04-16 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f7440f384191 [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn 
with variables to structured logging framework
f7440f384191 is described below

commit f7440f3841918f2cdb4a8e710cfe31d3fc85230c
Author: Haejoon Lee 
AuthorDate: Tue Apr 16 13:56:03 2024 -0700

[SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to 
structured logging framework

### What changes were proposed in this pull request?

This PR proposes to migrate `logWarning` with variables of 
Hive-thriftserver module to structured logging framework.

### Why are the changes needed?

To improve the existing logging system by migrating into structured logging.

### Does this PR introduce _any_ user-facing change?

No API changes, but the SQL catalyst logs will contain MDC(Mapped 
Diagnostic Context) from now.

### How was this patch tested?

Run Scala auto formatting and style check. Also the existing CI should pass.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45923 from itholic/hive-ts-logwarn.

Lead-authored-by: Haejoon Lee 
Co-authored-by: Haejoon Lee <44108233+itho...@users.noreply.github.com>
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/LogKey.scala   |  1 +
 .../SparkExecuteStatementOperation.scala   |  4 ++-
 .../sql/hive/thriftserver/SparkSQLCLIDriver.scala  | 15 -
 .../ui/HiveThriftServer2Listener.scala | 36 --
 4 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index 41289c641424..bfeb733af30a 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -94,6 +94,7 @@ object LogKey extends Enumeration {
   val FUNCTION_PARAMETER = Value
   val GROUP_ID = Value
   val HADOOP_VERSION = Value
+  val HISTORY_DIR = Value
   val HIVE_OPERATION_STATE = Value
   val HIVE_OPERATION_TYPE = Value
   val HOST = Value
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index 628925007f7e..f8f58cd422b6 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -256,7 +256,9 @@ private[hive] class SparkExecuteStatementOperation(
 val currentState = getStatus().getState()
 if (currentState.isTerminal) {
   // This may happen if the execution was cancelled, and then closed 
from another thread.
-  logWarning(s"Ignore exception in terminal state with $statementId: 
$e")
+  logWarning(
+log"Ignore exception in terminal state with ${MDC(STATEMENT_ID, 
statementId)}", e
+  )
 } else {
   logError(log"Error executing query with ${MDC(STATEMENT_ID, 
statementId)}, " +
 log"currentState ${MDC(HIVE_OPERATION_STATE, currentState)}, ", e)
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
index 03d8fd0c8ff2..888c086e9042 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
@@ -41,7 +41,7 @@ import sun.misc.{Signal, SignalHandler}
 import org.apache.spark.{ErrorMessageFormat, SparkConf, SparkThrowable, 
SparkThrowableHelper}
 import org.apache.spark.deploy.SparkHadoopUtil
 import org.apache.spark.internal.{Logging, MDC}
-import org.apache.spark.internal.LogKey.ERROR
+import org.apache.spark.internal.LogKey._
 import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.analysis.FunctionRegistry
 import org.apache.spark.sql.catalyst.util.SQLKeywordUtils
@@ -232,14 +232,14 @@ private[hive] object SparkSQLCLIDriver extends Logging {
 val historyFile = historyDirectory + File.separator + ".hivehistory"
 reader.setHistory(new FileHistory(new File(historyFile)))
   } else {
-logWarning("WARNING: Directory for Hive history file: " 

(spark) branch master updated (9a1fc112677f -> 6919febfcc87)

2024-04-16 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 9a1fc112677f [SPARK-47871][SQL] Oracle: Map TimestampType to TIMESTAMP 
WITH LOCAL TIME ZONE
 add 6919febfcc87 [SPARK-47594] Connector module: Migrate logInfo with 
variables to structured logging framework

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/internal/LogKey.scala   | 36 +-
 .../org/apache/spark/sql/avro/AvroUtils.scala  |  7 +++--
 .../execution/ExecuteGrpcResponseSender.scala  | 33 +++-
 .../execution/ExecuteResponseObserver.scala| 19 +++-
 .../sql/connect/planner/SparkConnectPlanner.scala  |  7 +++--
 .../planner/StreamingForeachBatchHelper.scala  | 20 
 .../planner/StreamingQueryListenerHelper.scala |  7 +++--
 .../sql/connect/service/LoggingInterceptor.scala   |  9 --
 .../spark/sql/connect/service/SessionHolder.scala  | 15 ++---
 .../service/SparkConnectExecutionManager.scala | 17 ++
 .../sql/connect/service/SparkConnectServer.scala   |  7 +++--
 .../sql/connect/service/SparkConnectService.scala  |  5 +--
 .../service/SparkConnectSessionManager.scala   | 11 +--
 .../service/SparkConnectStreamingQueryCache.scala  | 26 +++-
 .../spark/sql/connect/utils/ErrorUtils.scala   |  4 +--
 .../sql/kafka010/KafkaBatchPartitionReader.scala   | 14 ++---
 .../spark/sql/kafka010/KafkaContinuousStream.scala |  4 +--
 .../spark/sql/kafka010/KafkaMicroBatchStream.scala |  4 +--
 .../sql/kafka010/KafkaOffsetReaderAdmin.scala  |  4 +--
 .../sql/kafka010/KafkaOffsetReaderConsumer.scala   |  4 +--
 .../apache/spark/sql/kafka010/KafkaRelation.scala  |  7 +++--
 .../org/apache/spark/sql/kafka010/KafkaSink.scala  |  5 +--
 .../apache/spark/sql/kafka010/KafkaSource.scala| 11 ---
 .../apache/spark/sql/kafka010/KafkaSourceRDD.scala |  6 ++--
 .../sql/kafka010/consumer/KafkaDataConsumer.scala  | 13 +---
 .../kafka010/producer/CachedKafkaProducer.scala|  5 +--
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 10 +++---
 .../kafka010/DirectKafkaInputDStream.scala |  9 --
 .../streaming/kafka010/KafkaDataConsumer.scala | 18 ++-
 .../apache/spark/streaming/kafka010/KafkaRDD.scala | 12 +---
 .../spark/streaming/kinesis/KinesisReceiver.scala  |  7 +++--
 .../streaming/kinesis/KinesisRecordProcessor.scala | 12 +---
 .../executor/profiler/ExecutorJVMProfiler.scala|  5 +--
 .../executor/profiler/ExecutorProfilerPlugin.scala |  6 ++--
 .../scala/org/apache/spark/deploy/Client.scala |  6 ++--
 .../spark/deploy/yarn/ApplicationMaster.scala  |  4 +--
 .../scheduler/cluster/YarnSchedulerBackend.scala   |  6 ++--
 37 files changed, 257 insertions(+), 138 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47804] Add Dataframe cache debug log

2024-04-15 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f10ad3d56896 [SPARK-47804] Add Dataframe cache debug log
f10ad3d56896 is described below

commit f10ad3d56896f8a0eb9b0c73a6ee628cfc7df3a2
Author: Xinyi Yu 
AuthorDate: Mon Apr 15 18:37:36 2024 -0700

[SPARK-47804] Add Dataframe cache debug log

### What changes were proposed in this pull request?

This PR adds a debug log for Dataframe cache that uses SQL conf to turn on. 
It logs necessary information on
* cache hit during cache application (the application happens basically on 
every query)
* cache miss
* adding new cache entries
* removing cache entries (including clear all entries)

Because every query applies cache, this log could be huge and should be 
only turned on during some debugging process, and should not enabled by default 
in production.

Example:
```
spark.conf.set("spark.sql.dataframeCache.logLevel", "warn")
val df = spark.range(1, 10)

df.collect()
{"ts":"2024-04-10T16:41:10.010-0700","level":"WARN","msg":"Dataframe cache 
miss for input plan:\nRange (1, 10, step=1, 
splits=Some(10))\n","logger":"org.apache.spark.sql.execution.CacheManager"}
{"ts":"2024-04-10T16:41:10.010-0700","level":"WARN","msg":"Last 20 
Dataframe cache entry logical 
plans:\n[]","logger":"org.apache.spark.sql.execution.CacheManager"}

df.cache()
{"ts":"2024-04-10T16:42:18.647-0700","level":"WARN","msg":"Dataframe cache 
miss for input plan:\nRange (1, 10, step=1, 
splits=Some(10))\n","logger":"org.apache.spark.sql.execution.CacheManager"}
{"ts":"2024-04-10T16:42:18.647-0700","level":"WARN","msg":"Last 20 
Dataframe cache entry logical 
plans:\n[]","logger":"org.apache.spark.sql.execution.CacheManager"}
{"ts":"2024-04-10T16:42:18.662-0700","level":"WARN","msg":"Added Dataframe 
cache entry:\nCachedData(\nlogicalPlan=Range (1, 10, step=1, 
splits=Some(10))\n\nInMemoryRelation=InMemoryRelation [id#2L], 
StorageLevel(disk, memory, deserialized, 1 replicas)\n   +- *(1) Range (1, 10, 
step=1, splits=10)\n)\n","logger":"org.apache.spark.sql.execution.CacheManager"}

df.count()
{"ts":"2024-04-10T16:43:36.033-0700","level":"WARN","msg":"Dataframe cache 
hit for input plan:\nRange (1, 10, step=1, splits=Some(10))\nmatched with cache 
entry:\nCachedData(\nlogicalPlan=Range (1, 10, step=1, 
splits=Some(10))\n\nInMemoryRelation=InMemoryRelation [id#2L], 
StorageLevel(disk, memory, deserialized, 1 replicas)\n   +- *(1) Range (1, 10, 
step=1, splits=10)\n)\n","logger":"org.apache.spark.sql.execution.CacheManager"}
{"ts":"2024-04-10T16:43:36.041-0700","level":"WARN","msg":"Dataframe cache 
hit plan change summary:\n Aggregate [count(1) AS count#13L]   
Aggregate [count(1) AS count#13L]\n!+- Range (1, 10, step=1, splits=Some(10))   
+- InMemoryRelation [id#2L], StorageLevel(disk, memory, deserialized, 1 
replicas)\n!  +- *(1) Range (1, 
10, step=1, splits=10)","logger":"org.apache.spark.sql.execution.CacheManager"}

df.unpersist()
{"ts":"2024-04-10T16:44:15.965-0700","level":"WARN","msg":"Removed 1 
Dataframe cache entries, with logical plans being \n[Range (1, 10, step=1, 
splits=Some(10))\n]","logger":"org.apache.spark.sql.execution.CacheManager"}
```

### Why are the changes needed?
Easier debugging.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Run local spark shell.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45990 from anchovYu/SPARK-47804.

Authored-by: Xinyi Yu 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/LogKey.scala   |  2 +
 .../scala/org/apache/spark/internal/Logging.scala  | 32 
 common/utils/src/test/resources/log4j2.properties  |  4 +-
 .../apache/spark/util/StructuredLoggingSuite.scala | 20 ++--
 .../org/apache/spark/sql/internal/SQLConf.scala| 16 ++
 .../apache/spark/sql/execution/CacheManager.scala  | 59 --
 6 fil

(spark) branch master updated (83fe9b16ab5a -> 61264f77fd68)

2024-04-15 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 83fe9b16ab5a [SPARK-47694][CONNECT] Make max message size configurable 
on the client side
 add 61264f77fd68 [SPARK-47603][KUBERNETES][YARN] Resource managers: 
Migrate logWarn with variables to structured logging framework

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/internal/LogKey.scala   | 10 +++-
 .../apache/spark/deploy/k8s/KubernetesConf.scala   | 11 +++--
 .../apache/spark/deploy/k8s/KubernetesUtils.scala  |  7 +--
 .../k8s/features/DriverCommandFeatureStep.scala| 12 +++--
 .../deploy/k8s/submit/KubernetesClientUtils.scala  | 12 +++--
 .../cluster/k8s/ExecutorPodsAllocator.scala| 11 +++--
 .../cluster/k8s/ExecutorPodsSnapshot.scala |  8 +--
 .../scheduler/cluster/k8s/ExecutorRollPlugin.scala | 11 +++--
 .../spark/deploy/yarn/ApplicationMaster.scala  |  7 +--
 .../org/apache/spark/deploy/yarn/Client.scala  | 27 +-
 .../spark/deploy/yarn/ResourceRequestHelper.scala  |  9 ++--
 .../apache/spark/deploy/yarn/YarnAllocator.scala   | 57 --
 .../cluster/YarnClientSchedulerBackend.scala   |  4 +-
 .../scheduler/cluster/YarnSchedulerBackend.scala   | 15 +++---
 14 files changed, 118 insertions(+), 83 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (9ffdbc65029a -> 1ee3496f4836)

2024-04-11 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 9ffdbc65029a [SPARK-47784][SS] Merge TTLMode and TimeoutMode into a 
single TimeMode
 add 1ee3496f4836 [SPARK-47792][CORE] Make the value of MDC can support 
`null` & cannot be `MessageWithContext`

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/internal/Logging.scala  | 10 +---
 .../scala/org/apache/spark/util/MDCSuite.scala | 15 +++
 .../apache/spark/util/PatternLoggingSuite.scala|  3 +++
 .../apache/spark/util/StructuredLoggingSuite.scala | 30 ++
 4 files changed, 55 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47587][SQL] Hive module: Migrate logWarn with variables to structured logging framework

2024-04-10 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new eaf6b518f67c [SPARK-47587][SQL] Hive module: Migrate logWarn with 
variables to structured logging framework
eaf6b518f67c is described below

commit eaf6b518f67c0e3ed04f264c3a89573bd7e74fe7
Author: panbingkun 
AuthorDate: Wed Apr 10 22:34:14 2024 -0700

[SPARK-47587][SQL] Hive module: Migrate logWarn with variables to 
structured logging framework

### What changes were proposed in this pull request?
The pr aims to migrate `logWarning` in module `Hive` with variables to 
`structured logging framework`.

### Why are the changes needed?
To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45927 from panbingkun/SPARK-47587.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/LogKey.scala   |  9 +++
 .../security/HBaseDelegationTokenProvider.scala|  7 ++---
 .../main/scala/org/apache/spark/util/Utils.scala   | 10 
 .../spark/sql/hive/HiveExternalCatalog.scala   | 28 ++--
 .../spark/sql/hive/HiveMetastoreCatalog.scala  | 30 ++
 .../org/apache/spark/sql/hive/HiveUtils.scala  |  5 ++--
 .../spark/sql/hive/client/HiveClientImpl.scala |  8 +++---
 .../apache/spark/sql/hive/client/HiveShim.scala| 23 +
 .../sql/hive/client/IsolatedClientLoader.scala | 13 ++
 .../spark/sql/hive/execution/HiveFileFormat.scala  | 11 
 .../spark/sql/hive/execution/HiveTempPath.scala|  5 ++--
 .../spark/sql/hive/orc/OrcFileOperator.scala   |  5 ++--
 .../security/HiveDelegationTokenProvider.scala |  8 +++---
 13 files changed, 97 insertions(+), 65 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index a9a79de05c27..28b06f448784 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -44,6 +44,7 @@ object LogKey extends Enumeration {
   val COMPONENT = Value
   val CONFIG = Value
   val CONFIG2 = Value
+  val CONFIG3 = Value
   val CONTAINER = Value
   val CONTAINER_ID = Value
   val COUNT = Value
@@ -58,6 +59,7 @@ object LogKey extends Enumeration {
   val DRIVER_ID = Value
   val DROPPED_PARTITIONS = Value
   val END_POINT = Value
+  val ENGINE = Value
   val ERROR = Value
   val EVENT_LOOP = Value
   val EVENT_QUEUE = Value
@@ -66,14 +68,19 @@ object LogKey extends Enumeration {
   val EXIT_CODE = Value
   val EXPRESSION_TERMS = Value
   val FAILURES = Value
+  val FALLBACK_VERSION = Value
   val FIELD_NAME = Value
+  val FILE_FORMAT = Value
+  val FILE_FORMAT2 = Value
   val FUNCTION_NAME = Value
   val FUNCTION_PARAMETER = Value
   val GROUP_ID = Value
+  val HADOOP_VERSION = Value
   val HIVE_OPERATION_STATE = Value
   val HIVE_OPERATION_TYPE = Value
   val HOST = Value
   val INDEX = Value
+  val INFERENCE_MODE = Value
   val JOB_ID = Value
   val JOIN_CONDITION = Value
   val JOIN_CONDITION_SUB_EXPRESSION = Value
@@ -132,6 +139,8 @@ object LogKey extends Enumeration {
   val RULE_BATCH_NAME = Value
   val RULE_NAME = Value
   val RULE_NUMBER_OF_RUNS = Value
+  val SCHEMA = Value
+  val SCHEMA2 = Value
   val SERVICE_NAME = Value
   val SESSION_ID = Value
   val SHARD_ID = Value
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/security/HBaseDelegationTokenProvider.scala
 
b/core/src/main/scala/org/apache/spark/deploy/security/HBaseDelegationTokenProvider.scala
index d60e5975071d..1b2e41bc0a2e 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/security/HBaseDelegationTokenProvider.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/security/HBaseDelegationTokenProvider.scala
@@ -27,7 +27,8 @@ import org.apache.hadoop.security.Credentials
 import org.apache.hadoop.security.token.{Token, TokenIdentifier}
 
 import org.apache.spark.SparkConf
-import org.apache.spark.internal.Logging
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.SERVICE_NAME
 import org.apache.spark.security.HadoopDelegationTokenProvider
 import org.apache.spark.util.Utils
 
@@ -53,8 +54,8 @@ private[security] class HBaseDelegationTokenProvider
   creds.addToken(token.getService, token)
 } catch {
   case NonFatal(e) =>
-logWarning(Utils.createFailedToGetTokenMessage(serviceName, e) +
-  s" Retrying to fetch HBase security token with $serviceName 
connection p

(spark) branch master updated (3da52fb4490e -> 75d43dd05757)

2024-04-10 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 3da52fb4490e [SPARK-47798][SQL] Enrich the error message for the 
reading failures of decimal values
 add 75d43dd05757 [SPARK-47601][GRAPHX] Graphx: Migrate logs with variables 
to structured logging framework

No new revisions were added by this update.

Summary of changes:
 graphx/src/main/scala/org/apache/spark/graphx/GraphLoader.scala  | 8 +---
 graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala   | 5 +++--
 graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala | 9 +
 3 files changed, 13 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47595][STREAMING] Streaming: Migrate logError with variables to structured logging framework

2024-04-10 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 52fd8c01cc8b [SPARK-47595][STREAMING] Streaming: Migrate logError with 
variables to structured logging framework
52fd8c01cc8b is described below

commit 52fd8c01cc8b2a6ce1db3e059b0b962d258f4342
Author: panbingkun 
AuthorDate: Wed Apr 10 15:21:13 2024 -0700

[SPARK-47595][STREAMING] Streaming: Migrate logError with variables to 
structured logging framework

### What changes were proposed in this pull request?
The pr aims to migrate `logError` in module `Streaming` with variables to 
`structured logging framework`.

### Why are the changes needed?
To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45910 from panbingkun/SPARK-47595.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../utils/src/main/scala/org/apache/spark/internal/LogKey.scala   | 2 ++
 .../org/apache/spark/streaming/dstream/FileInputDStream.scala | 8 +---
 .../org/apache/spark/streaming/receiver/ReceiverSupervisor.scala  | 8 +---
 .../apache/spark/streaming/scheduler/ReceivedBlockTracker.scala   | 6 --
 .../org/apache/spark/streaming/scheduler/ReceiverTracker.scala| 6 --
 .../org/apache/spark/streaming/util/FileBasedWriteAheadLog.scala  | 5 +++--
 .../test/scala/org/apache/spark/streaming/MasterFailureTest.scala | 5 +++--
 7 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index 6cdec011e2ae..a9a79de05c27 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -121,6 +121,7 @@ object LogKey extends Enumeration {
   val RANGE = Value
   val RDD_ID = Value
   val REASON = Value
+  val RECEIVED_BLOCK_INFO = Value
   val REDUCE_ID = Value
   val RELATION_NAME = Value
   val REMAINING_PARTITIONS = Value
@@ -143,6 +144,7 @@ object LogKey extends Enumeration {
   val STAGE_ID = Value
   val STATEMENT_ID = Value
   val STATUS = Value
+  val STREAM_ID = Value
   val STREAM_NAME = Value
   val SUBMISSION_ID = Value
   val SUBSAMPLING_RATE = Value
diff --git 
a/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
 
b/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
index 414fdf5d619d..e301311c922a 100644
--- 
a/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
+++ 
b/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
@@ -26,6 +26,8 @@ import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{FileStatus, FileSystem, Path}
 import org.apache.hadoop.mapreduce.{InputFormat => NewInputFormat}
 
+import org.apache.spark.internal.LogKey.PATH
+import org.apache.spark.internal.MDC
 import org.apache.spark.rdd.{RDD, UnionRDD}
 import org.apache.spark.streaming._
 import org.apache.spark.streaming.scheduler.StreamInputInfo
@@ -288,9 +290,9 @@ class FileInputDStream[K, V, F <: NewInputFormat[K, V]](
 case None => context.sparkContext.newAPIHadoopFile[K, V, F](file)
   }
   if (rdd.partitions.isEmpty) {
-logError("File " + file + " has no data in it. Spark Streaming can 
only ingest " +
-  "files that have been \"moved\" to the directory assigned to the 
file stream. " +
-  "Refer to the streaming programming guide for more details.")
+logError(log"File ${MDC(PATH, file)} has no data in it. Spark 
Streaming can only ingest " +
+  log"""files that have been "moved" to the directory assigned to the 
file stream. """ +
+  log"Refer to the streaming programming guide for more details.")
   }
   rdd
 }
diff --git 
a/streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceiverSupervisor.scala
 
b/streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceiverSupervisor.scala
index 672452a4af4f..15f346484864 100644
--- 
a/streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceiverSupervisor.scala
+++ 
b/streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceiverSupervisor.scala
@@ -25,7 +25,8 @@ import scala.concurrent._
 import scala.util.control.NonFatal
 
 import org.apache.spark.SparkConf
-import org.apache.spark.internal.Logging
+import org.apache.spark.internal.{Logging, 

(spark) branch master updated: [SPARK-47593][CORE] Connector module: Migrate logWarn with variables to structured logging framework

2024-04-09 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 520f3b1c192b [SPARK-47593][CORE] Connector module: Migrate logWarn 
with variables to structured logging framework
520f3b1c192b is described below

commit 520f3b1c192b1bae53509fdad770f5711ca3791f
Author: panbingkun 
AuthorDate: Tue Apr 9 21:42:39 2024 -0700

[SPARK-47593][CORE] Connector module: Migrate logWarn with variables to 
structured logging framework

### What changes were proposed in this pull request?
The pr aims to migrate `logWarning` in module `Connector` with variables to 
`structured logging framework`.

### Why are the changes needed?
To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45879 from panbingkun/SPARK-47593_warning.

Lead-authored-by: panbingkun 
Co-authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/LogKey.scala   | 23 +
 .../scala/org/apache/spark/util/MDCSuite.scala | 15 +++-
 .../org/apache/spark/sql/avro/AvroUtils.scala  |  9 ++---
 .../ExecutePlanResponseReattachableIterator.scala  |  2 +-
 .../sql/connect/client/GrpcRetryHandler.scala  | 18 +-
 .../execution/ExecuteGrpcResponseSender.scala  | 15 +---
 .../service/SparkConnectStreamingQueryCache.scala  | 14 +---
 .../connect/ui/SparkConnectServerListener.scala| 36 +--
 .../sql/jdbc/DockerJDBCIntegrationSuite.scala  |  7 ++--
 .../spark/sql/kafka010/KafkaContinuousStream.scala |  5 +--
 .../spark/sql/kafka010/KafkaMicroBatchStream.scala |  5 +--
 .../sql/kafka010/KafkaOffsetReaderAdmin.scala  | 10 +++---
 .../sql/kafka010/KafkaOffsetReaderConsumer.scala   | 10 +++---
 .../apache/spark/sql/kafka010/KafkaSource.scala|  5 +--
 .../sql/kafka010/consumer/FetchedDataPool.scala|  7 ++--
 .../sql/kafka010/consumer/KafkaDataConsumer.scala  | 40 +-
 .../producer/InternalKafkaProducerPool.scala   |  6 ++--
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala |  9 ++---
 .../kafka010/KafkaDelegationTokenProvider.scala| 10 +++---
 .../streaming/kafka010/ConsumerStrategy.scala  | 11 +++---
 .../streaming/kafka010/KafkaDataConsumer.scala |  7 ++--
 .../spark/streaming/kafka010/KafkaUtils.scala  | 16 +
 .../streaming/kinesis/KinesisBackedBlockRDD.scala  |  6 ++--
 .../streaming/kinesis/KinesisRecordProcessor.scala |  2 +-
 .../spark/streaming/kinesis/KinesisTestUtils.scala |  7 ++--
 25 files changed, 195 insertions(+), 100 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index 2cb5eac4548c..6cdec011e2ae 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -34,6 +34,7 @@ object LogKey extends Enumeration {
   val CATEGORICAL_FEATURES = Value
   val CLASS_LOADER = Value
   val CLASS_NAME = Value
+  val CLUSTER_ID = Value
   val COLUMN_DATA_TYPE_SOURCE = Value
   val COLUMN_DATA_TYPE_TARGET = Value
   val COLUMN_DEFAULT_VALUE = Value
@@ -43,6 +44,7 @@ object LogKey extends Enumeration {
   val COMPONENT = Value
   val CONFIG = Value
   val CONFIG2 = Value
+  val CONTAINER = Value
   val CONTAINER_ID = Value
   val COUNT = Value
   val CSV_HEADER_COLUMN_NAME = Value
@@ -51,6 +53,7 @@ object LogKey extends Enumeration {
   val CSV_SCHEMA_FIELD_NAME = Value
   val CSV_SCHEMA_FIELD_NAMES = Value
   val CSV_SOURCE = Value
+  val DATA = Value
   val DATABASE_NAME = Value
   val DRIVER_ID = Value
   val DROPPED_PARTITIONS = Value
@@ -70,9 +73,11 @@ object LogKey extends Enumeration {
   val HIVE_OPERATION_STATE = Value
   val HIVE_OPERATION_TYPE = Value
   val HOST = Value
+  val INDEX = Value
   val JOB_ID = Value
   val JOIN_CONDITION = Value
   val JOIN_CONDITION_SUB_EXPRESSION = Value
+  val KEY = Value
   val LEARNING_RATE = Value
   val LINE = Value
   val LINE_NUM = Value
@@ -80,17 +85,23 @@ object LogKey extends Enumeration {
   val LOG_TYPE = Value
   val MASTER_URL = Value
   val MAX_ATTEMPTS = Value
+  val MAX_CAPACITY = Value
   val MAX_CATEGORIES = Value
   val MAX_EXECUTOR_FAILURES = Value
   val MAX_SIZE = Value
   val MERGE_DIR_NAME = Value
   val METHOD_NAME = Value
   val MIN_SIZE = Value
+  val NEW_VALUE = Value
   val NUM_COLUMNS = Value
   val NUM_ITERATIONS = Value
   val OBJECT_ID = Value
+  val OFFSET = Value
+  val OFFSETS = Value
   val OLD_BLOCK_MANAGER_ID = Value
+  val OLD_VALUE = Value

(spark) branch master updated: [SPARK-47586][SQL] Hive module: Migrate logError with variables to structured logging framework

2024-04-09 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ec509b49dcaa [SPARK-47586][SQL] Hive module: Migrate logError with 
variables to structured logging framework
ec509b49dcaa is described below

commit ec509b49dcaa21d6dcdf18c1b40ac9d6df1827d7
Author: Haejoon Lee 
AuthorDate: Tue Apr 9 18:22:40 2024 -0700

[SPARK-47586][SQL] Hive module: Migrate logError with variables to 
structured logging framework

### What changes were proposed in this pull request?

This PR proposes to migrate `logError` with variables of Hive module to 
structured logging framework.

### Why are the changes needed?

To improve the existing logging system by migrating into structured logging.

### Does this PR introduce _any_ user-facing change?

No API changes, but the SQL catalyst logs will contain MDC(Mapped 
Diagnostic Context) from now.

### How was this patch tested?

Run Scala auto formatting and style check. Also the existing CI should pass.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45876 from itholic/hive-logerror.

Lead-authored-by: Haejoon Lee 
Co-authored-by: Haejoon Lee <44108233+itho...@users.noreply.github.com>
Signed-off-by: Gengliang Wang 
---
 .../main/scala/org/apache/spark/internal/LogKey.scala   |  6 ++
 .../scala/org/apache/spark/sql/hive/TableReader.scala   |  8 ++--
 .../apache/spark/sql/hive/client/HiveClientImpl.scala   | 17 ++---
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index 7fa0331515cb..2cb5eac4548c 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -51,7 +51,9 @@ object LogKey extends Enumeration {
   val CSV_SCHEMA_FIELD_NAME = Value
   val CSV_SCHEMA_FIELD_NAMES = Value
   val CSV_SOURCE = Value
+  val DATABASE_NAME = Value
   val DRIVER_ID = Value
+  val DROPPED_PARTITIONS = Value
   val END_POINT = Value
   val ERROR = Value
   val EVENT_LOOP = Value
@@ -61,6 +63,7 @@ object LogKey extends Enumeration {
   val EXIT_CODE = Value
   val EXPRESSION_TERMS = Value
   val FAILURES = Value
+  val FIELD_NAME = Value
   val FUNCTION_NAME = Value
   val FUNCTION_PARAMETER = Value
   val GROUP_ID = Value
@@ -92,6 +95,7 @@ object LogKey extends Enumeration {
   val PARSE_MODE = Value
   val PARTITION_ID = Value
   val PARTITION_SPECIFICATION = Value
+  val PARTITION_SPECS = Value
   val PATH = Value
   val PATHS = Value
   val POD_ID = Value
@@ -105,6 +109,7 @@ object LogKey extends Enumeration {
   val REASON = Value
   val REDUCE_ID = Value
   val RELATION_NAME = Value
+  val REMAINING_PARTITIONS = Value
   val REMOTE_ADDRESS = Value
   val RETRY_COUNT = Value
   val RETRY_INTERVAL = Value
@@ -124,6 +129,7 @@ object LogKey extends Enumeration {
   val STATEMENT_ID = Value
   val SUBMISSION_ID = Value
   val SUBSAMPLING_RATE = Value
+  val TABLE_NAME = Value
   val TASK_ATTEMPT_ID = Value
   val TASK_ID = Value
   val TASK_NAME = Value
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
index d72406f094a6..60970eecc2df 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
@@ -36,7 +36,8 @@ import org.apache.hadoop.mapred.{FileInputFormat, InputFormat 
=> oldInputClass,
 import org.apache.hadoop.mapreduce.{InputFormat => newInputClass}
 
 import org.apache.spark.deploy.SparkHadoopUtil
-import org.apache.spark.internal.Logging
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey._
 import org.apache.spark.rdd.{EmptyRDD, HadoopRDD, NewHadoopRDD, RDD, UnionRDD}
 import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.catalyst.{InternalRow, SQLConfHelper}
@@ -518,7 +519,10 @@ private[hive] object HadoopTableReader extends 
HiveInspectors with Logging {
   i += 1
 } catch {
   case ex: Throwable =>
-logError(s"Exception thrown in field 
<${fieldRefs(i).getFieldName}>")
+logError(
+  log"Exception thrown in field <${MDC(FIELD_NAME, 
fieldRefs(i).getFieldName)}>",
+  ex
+)
 throw ex
 }
   }
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
index 46dc56372334..92561be

(spark) branch master updated (07b1346db477 -> a68337892246)

2024-04-09 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 07b1346db477 [SPARK-47581][CORE][FOLLOWUP] Fix GA failure
 add a68337892246 [SPARK-47783] Add some missing SQLSTATEs an clean up the 
YY000 to use…

No new revisions were added by this update.

Summary of changes:
 .../src/main/resources/error/error-categories.json |   2 +-
 .../src/main/resources/error/error-states.json | 356 +
 2 files changed, 219 insertions(+), 139 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (2793397140af -> 07b1346db477)

2024-04-09 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 2793397140af [SPARK-47782][BUILD] Remove redundant json4s-jackson 
definition in sql/api POM
 add 07b1346db477 [SPARK-47581][CORE][FOLLOWUP] Fix GA failure

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/csv/CSVHeaderChecker.scala   | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47581][CORE] SQL catalyst: Migrate logWarning with variables to structured logging framework

2024-04-08 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 149ac0f8893b [SPARK-47581][CORE] SQL catalyst: Migrate logWarning with 
variables to structured logging framework
149ac0f8893b is described below

commit 149ac0f8893b5be8b8b0556ef47a2384aaad1850
Author: Daniel Tenedorio 
AuthorDate: Mon Apr 8 22:56:10 2024 -0700

[SPARK-47581][CORE] SQL catalyst: Migrate logWarning with variables to 
structured logging framework

### What changes were proposed in this pull request?

Migrate logWarning with variables of the Catalyst module to structured 
logging framework. This transforms the logWarning entries of the following API
```
def logWarning(msg: => String): Unit
```
to
```
def logWarning(entry: LogEntry): Unit
```

### Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?

Yes, Spark core logs will contain additional MDC

### How was this patch tested?

Compiler and scala style checks, as well as code review.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45904 from dtenedor/log-warn-catalyst.

Lead-authored-by: Daniel Tenedorio 
Co-authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/LogKey.scala   | 26 ++
 .../sql/catalyst/analysis/FunctionRegistry.scala   |  6 +++--
 .../sql/catalyst/analysis/HintErrorLogger.scala| 19 ++--
 .../catalyst/analysis/StreamingJoinHelper.scala| 19 ++--
 .../analysis/UnsupportedOperationChecker.scala |  6 +++--
 .../spark/sql/catalyst/catalog/interface.scala |  6 +++--
 .../spark/sql/catalyst/csv/CSVHeaderChecker.scala  | 25 -
 .../catalyst/expressions/V2ExpressionUtils.scala   | 10 +
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |  4 ++--
 .../ReplaceNullWithFalseInPredicate.scala  |  7 --
 .../spark/sql/catalyst/optimizer/joins.scala   |  7 --
 .../spark/sql/catalyst/parser/AstBuilder.scala |  8 ---
 .../spark/sql/catalyst/rules/RuleExecutor.scala| 14 +++-
 .../spark/sql/catalyst/util/CharVarcharUtils.scala | 11 -
 .../apache/spark/sql/catalyst/util/ParseMode.scala |  9 +---
 .../catalyst/util/ResolveDefaultColumnsUtil.scala  | 10 ++---
 .../spark/sql/catalyst/util/StringUtils.scala  | 11 +
 17 files changed, 133 insertions(+), 65 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index a0e99f1edc34..7fa0331515cb 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -22,6 +22,7 @@ package org.apache.spark.internal
  */
 object LogKey extends Enumeration {
   val ACCUMULATOR_ID = Value
+  val ANALYSIS_ERROR = Value
   val APP_DESC = Value
   val APP_ID = Value
   val APP_STATE = Value
@@ -33,6 +34,10 @@ object LogKey extends Enumeration {
   val CATEGORICAL_FEATURES = Value
   val CLASS_LOADER = Value
   val CLASS_NAME = Value
+  val COLUMN_DATA_TYPE_SOURCE = Value
+  val COLUMN_DATA_TYPE_TARGET = Value
+  val COLUMN_DEFAULT_VALUE = Value
+  val COLUMN_NAME = Value
   val COMMAND = Value
   val COMMAND_OUTPUT = Value
   val COMPONENT = Value
@@ -40,6 +45,12 @@ object LogKey extends Enumeration {
   val CONFIG2 = Value
   val CONTAINER_ID = Value
   val COUNT = Value
+  val CSV_HEADER_COLUMN_NAME = Value
+  val CSV_HEADER_COLUMN_NAMES = Value
+  val CSV_HEADER_LENGTH = Value
+  val CSV_SCHEMA_FIELD_NAME = Value
+  val CSV_SCHEMA_FIELD_NAMES = Value
+  val CSV_SOURCE = Value
   val DRIVER_ID = Value
   val END_POINT = Value
   val ERROR = Value
@@ -48,13 +59,17 @@ object LogKey extends Enumeration {
   val EXECUTOR_ID = Value
   val EXECUTOR_STATE = Value
   val EXIT_CODE = Value
+  val EXPRESSION_TERMS = Value
   val FAILURES = Value
+  val FUNCTION_NAME = Value
+  val FUNCTION_PARAMETER = Value
   val GROUP_ID = Value
   val HIVE_OPERATION_STATE = Value
   val HIVE_OPERATION_TYPE = Value
   val HOST = Value
   val JOB_ID = Value
   val JOIN_CONDITION = Value
+  val JOIN_CONDITION_SUB_EXPRESSION = Value
   val LEARNING_RATE = Value
   val LINE = Value
   val LINE_NUM = Value
@@ -68,21 +83,28 @@ object LogKey extends Enumeration {
   val MERGE_DIR_NAME = Value
   val METHOD_NAME = Value
   val MIN_SIZE = Value
+  val NUM_COLUMNS = Value
   val NUM_ITERATIONS = Value
   val OBJECT_ID = Value
   val OLD_BLOCK_MANAGER_ID = Value
   val OPTIMIZER_CLASS_NAME = Value
   val OP_TYPE = Value
+  

(spark) branch master updated: [SPARK-47589][SQL] Hive-Thriftserver: Migrate logError with variables to structured logging framework

2024-04-08 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f8e652e88320 [SPARK-47589][SQL] Hive-Thriftserver: Migrate logError 
with variables to structured logging framework
f8e652e88320 is described below

commit f8e652e88320528a70e605a6a3cf986725e153a5
Author: Gengliang Wang 
AuthorDate: Mon Apr 8 17:13:28 2024 -0700

[SPARK-47589][SQL] Hive-Thriftserver: Migrate logError with variables to 
structured logging framework

### What changes were proposed in this pull request?

Migrate logError with variables of Hive-thriftserver module to the 
structured logging framework.

### Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

Existing UT

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45936 from gengliangwang/LogError_HiveThriftServer2.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../main/scala/org/apache/spark/internal/LogKey.scala   |  3 +++
 .../thriftserver/SparkExecuteStatementOperation.scala   | 17 +++--
 .../spark/sql/hive/thriftserver/SparkOperation.scala|  6 --
 .../spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala |  5 +++--
 .../spark/sql/hive/thriftserver/SparkSQLDriver.scala|  5 +++--
 5 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index 1144887e0b47..a0e99f1edc34 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -50,6 +50,8 @@ object LogKey extends Enumeration {
   val EXIT_CODE = Value
   val FAILURES = Value
   val GROUP_ID = Value
+  val HIVE_OPERATION_STATE = Value
+  val HIVE_OPERATION_TYPE = Value
   val HOST = Value
   val JOB_ID = Value
   val JOIN_CONDITION = Value
@@ -96,6 +98,7 @@ object LogKey extends Enumeration {
   val SIZE = Value
   val SLEEP_TIME = Value
   val STAGE_ID = Value
+  val STATEMENT_ID = Value
   val SUBMISSION_ID = Value
   val SUBSAMPLING_RATE = Value
   val TASK_ATTEMPT_ID = Value
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index 77b2aa131a24..628925007f7e 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -30,9 +30,11 @@ import 
org.apache.hive.service.cli.operation.ExecuteStatementOperation
 import org.apache.hive.service.cli.session.HiveSession
 import org.apache.hive.service.rpc.thrift.{TCLIServiceConstants, TColumnDesc, 
TPrimitiveTypeEntry, TRowSet, TTableSchema, TTypeDesc, TTypeEntry, TTypeId, 
TTypeQualifiers, TTypeQualifierValue}
 
-import org.apache.spark.internal.Logging
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.{HIVE_OPERATION_STATE, STATEMENT_ID, 
TIMEOUT, USER_NAME}
 import org.apache.spark.sql.{DataFrame, Row, SQLContext}
 import org.apache.spark.sql.catalyst.util.CharVarcharUtils
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.MILLIS_PER_SECOND
 import org.apache.spark.sql.execution.HiveResult.getTimeFormatters
 import org.apache.spark.sql.internal.{SQLConf, VariableSubstitution}
 import org.apache.spark.sql.types._
@@ -142,7 +144,9 @@ private[hive] class SparkExecuteStatementOperation(
   } catch {
 case NonFatal(e) =>
   setOperationException(new HiveSQLException(e))
-  logError(s"Error cancelling the query after timeout: $timeout 
seconds")
+  val timeout_ms = timeout * MILLIS_PER_SECOND
+  logError(
+log"Error cancelling the query after timeout: ${MDC(TIMEOUT, 
timeout_ms)} ms")
   } finally {
 timeoutExecutor.shutdown()
   }
@@ -178,8 +182,8 @@ private[hive] class SparkExecuteStatementOperation(
   } catch {
 case e: Exception =>
   setOperationException(new HiveSQLException(e))
-  logError("Error running hive query as user : " +
-sparkServiceUGI.getShortUserName(), e)
+  logError(log"Error running hive query as user : " +
+log"${MDC(

(spark) branch master updated (42dc815b8446 -> 7385f19c7539)

2024-04-06 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 42dc815b8446 [SPARK-47743][CORE] Use milliseconds as the time unit in 
logging
 add 7385f19c7539 [SPARK-47592][CORE] Connector module: Migrate logError 
with variables to structured logging framework

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/spark/internal/LogKey.scala |  6 ++
 .../org/apache/spark/sql/connect/utils/ErrorUtils.scala   |  7 ---
 .../spark/sql/connect/ProtoToParsedPlanTestSuite.scala|  4 +++-
 .../spark/sql/jdbc/DockerJDBCIntegrationSuite.scala   |  5 -
 .../org/apache/spark/streaming/kafka010/KafkaUtils.scala  |  6 --
 .../spark/streaming/kinesis/KinesisCheckpointer.scala |  8 +---
 .../spark/streaming/kinesis/KinesisRecordProcessor.scala  | 15 +--
 7 files changed, 35 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (11abc64a731d -> 42dc815b8446)

2024-04-05 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 11abc64a731d [SPARK-47094][SQL] SPJ : Dynamically rebalance number of 
buckets when they are not equal
 add 42dc815b8446 [SPARK-47743][CORE] Use milliseconds as the time unit in 
logging

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/spark/internal/LogKey.scala  |  5 +++--
 .../utils/src/main/scala/org/apache/spark/internal/README.md   |  1 +
 .../src/main/scala/org/apache/spark/storage/BlockManager.scala |  8 
 .../spark/sql/catalyst/expressions/codegen/CodeGenerator.scala |  2 +-
 .../org/apache/spark/sql/catalyst/rules/RuleExecutor.scala | 10 +-
 5 files changed, 14 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (1efbf43160aa -> d1ace24f8fac)

2024-04-05 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1efbf43160aa [SPARK-47310][SS] Add micro-benchmark for merge 
operations for multiple values in value portion of state store
 add d1ace24f8fac [SPARK-47582][SQL] Migrate Catalyst logInfo with 
variables to structured logging framework

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/internal/LogKey.scala   |  9 +
 .../scala/org/apache/spark/internal/Logging.scala  |  2 ++
 .../catalyst/analysis/StreamingJoinHelper.scala|  8 +++--
 .../expressions/codegen/CodeGenerator.scala|  7 ++--
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |  9 +++--
 .../spark/sql/catalyst/rules/RuleExecutor.scala| 42 +++---
 .../spark/sql/catalyst/xml/ValidatorUtil.scala |  5 ++-
 7 files changed, 54 insertions(+), 28 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47577][CORE][PART2] Migrate logError with variables to structured logging framework

2024-04-05 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 18072b5357a5 [SPARK-47577][CORE][PART2] Migrate logError with 
variables to structured logging framework
18072b5357a5 is described below

commit 18072b5357a5fd671829e312ca359fcf34d47c63
Author: Gengliang Wang 
AuthorDate: Fri Apr 5 14:04:51 2024 -0700

[SPARK-47577][CORE][PART2] Migrate logError with variables to structured 
logging framework

### What changes were proposed in this pull request?

Migrate logError with variables of core module to structured logging 
framework. This is part2 which transforms the logError entries of the following 
API
```
def logError(msg: => String, throwable: Throwable): Unit
```
to
```
def logError(entry: LogEntry, throwable: Throwable): Unit
```

migration Part1 was in https://github.com/apache/spark/pull/45834
### Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.
### Does this PR introduce _any_ user-facing change?

Yes, Spark core logs will contain additional MDC

### How was this patch tested?

Compiler and scala style checks, as well as code review.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45890 from gengliangwang/coreError2.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/LogKey.scala   | 23 +-
 .../scala/org/apache/spark/ContextCleaner.scala| 19 +++---
 .../scala/org/apache/spark/MapOutputTracker.scala  |  2 +-
 .../main/scala/org/apache/spark/SparkContext.scala | 15 +-
 .../scala/org/apache/spark/TaskContextImpl.scala   |  5 +++--
 .../org/apache/spark/api/r/RBackendHandler.scala   |  7 ---
 .../scala/org/apache/spark/deploy/Client.scala |  4 ++--
 .../spark/deploy/StandaloneResourceUtils.scala |  8 +---
 .../main/scala/org/apache/spark/deploy/Utils.scala |  6 --
 .../spark/deploy/history/FsHistoryProvider.scala   |  7 ---
 .../org/apache/spark/deploy/worker/Worker.scala| 18 ++---
 .../apache/spark/deploy/worker/ui/LogPage.scala|  6 --
 .../scala/org/apache/spark/executor/Executor.scala | 11 ++-
 .../spark/executor/ExecutorClassLoader.scala   |  5 +++--
 .../spark/internal/io/SparkHadoopWriter.scala  |  4 ++--
 .../spark/mapred/SparkHadoopMapRedUtil.scala   |  7 +--
 .../org/apache/spark/metrics/MetricsConfig.scala   |  5 +++--
 .../org/apache/spark/metrics/MetricsSystem.scala   |  3 ++-
 .../main/scala/org/apache/spark/rdd/PipedRDD.scala |  7 ---
 .../scala/org/apache/spark/rpc/netty/Inbox.scala   |  6 --
 .../org/apache/spark/rpc/netty/MessageLoop.scala   |  5 +++--
 .../org/apache/spark/scheduler/DAGScheduler.scala  | 10 ++
 .../apache/spark/scheduler/ReplayListenerBus.scala |  4 ++--
 .../spark/scheduler/SchedulableBuilder.scala   | 14 -
 .../apache/spark/scheduler/TaskSetManager.scala|  4 ++--
 .../apache/spark/serializer/KryoSerializer.scala   |  5 +++--
 .../org/apache/spark/status/AppStatusStore.scala   |  5 +++--
 .../org/apache/spark/storage/BlockManager.scala|  7 ---
 .../spark/storage/BlockManagerDecommissioner.scala | 10 ++
 .../spark/storage/BlockManagerMasterEndpoint.scala |  5 +++--
 .../storage/BlockManagerStorageEndpoint.scala  | 21 +++-
 .../apache/spark/storage/DiskBlockManager.scala| 11 +++
 .../spark/storage/DiskBlockObjectWriter.scala  |  3 ++-
 .../spark/storage/PushBasedFetchHelper.scala   | 15 --
 .../storage/ShuffleBlockFetcherIterator.scala  |  5 +++--
 .../scala/org/apache/spark/ui/DriverLogPage.scala  |  6 --
 .../main/scala/org/apache/spark/ui/SparkUI.scala   |  5 +++--
 .../src/main/scala/org/apache/spark/ui/WebUI.scala |  5 +++--
 .../scala/org/apache/spark/util/EventLoop.scala|  7 ---
 .../scala/org/apache/spark/util/ListenerBus.scala  |  8 +---
 .../apache/spark/util/ShutdownHookManager.scala|  6 --
 .../spark/util/SparkUncaughtExceptionHandler.scala | 20 +--
 .../main/scala/org/apache/spark/util/Utils.scala   | 22 +
 .../apache/spark/util/logging/FileAppender.scala   |  5 +++--
 .../spark/util/logging/RollingFileAppender.scala   |  8 +---
 45 files changed, 244 insertions(+), 140 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index 66f3b803c0d4..1d8161282c5b 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/inter

(spark) branch master updated: [SPARK-47719][SQL] Change spark.sql.legacy.timeParserPolicy default to CORRECTED

2024-04-05 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c34baebb36d4 [SPARK-47719][SQL] Change 
spark.sql.legacy.timeParserPolicy default to CORRECTED
c34baebb36d4 is described below

commit c34baebb36d4e4c8895085b3114da8dc07165469
Author: Serge Rielau 
AuthorDate: Fri Apr 5 11:35:38 2024 -0700

[SPARK-47719][SQL] Change spark.sql.legacy.timeParserPolicy default to 
CORRECTED

### What changes were proposed in this pull request?

We changed the time parser policy in Spark 3.0.0.
The config has since defaulted to raise an exception if there is a 
potential conflict between teh legacy and the new policy.
Spark 4.0.0 is a good time to default to the new policy

### Why are the changes needed?

Move the product forward and retire legacy behavior over time.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Run existing unit tests and verify changes.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45859 from srielau/SPARK-47719-parser-policy-default-to-corrected.

Lead-authored-by: Serge Rielau 
Co-authored-by: Wenchen Fan 
Signed-off-by: Gengliang Wang 
---
 .../org/apache/spark/sql/ClientE2ETestSuite.scala  |  4 +-
 docs/sql-migration-guide.md|  2 +
 .../sql/tests/connect/test_connect_session.py  |  1 +
 .../org/apache/spark/sql/internal/SqlApiConf.scala |  2 +-
 .../org/apache/spark/sql/internal/SQLConf.scala|  6 +-
 .../sql/catalyst/util/DateFormatterSuite.scala |  2 +-
 .../sql/catalyst/util/DatetimeFormatterSuite.scala | 59 +++
 .../catalyst/util/TimestampFormatterSuite.scala| 36 +-
 .../results/ansi/datetime-parsing-invalid.sql.out  | 72 +--
 .../results/datetime-parsing-invalid.sql.out   | 84 --
 .../sql-tests/results/json-functions.sql.out   | 24 ++-
 .../sql-tests/results/xml-functions.sql.out| 24 ++-
 12 files changed, 115 insertions(+), 201 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
index f2f1571452c0..95ee69d2a47d 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
@@ -74,7 +74,9 @@ class ClientE2ETestSuite extends RemoteSparkSession with 
SQLHelper with PrivateM
 
   for (enrichErrorEnabled <- Seq(false, true)) {
 test(s"cause exception - ${enrichErrorEnabled}") {
-  withSQLConf("spark.sql.connect.enrichError.enabled" -> 
enrichErrorEnabled.toString) {
+  withSQLConf(
+"spark.sql.connect.enrichError.enabled" -> enrichErrorEnabled.toString,
+"spark.sql.legacy.timeParserPolicy" -> "EXCEPTION") {
 val ex = intercept[SparkUpgradeException] {
   spark
 .sql("""
diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 13d6702c4cf9..019728a45f40 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -46,6 +46,8 @@ license: |
 - Since Spark 4.0, MySQL JDBC datasource will read FLOAT as FloatType, while 
in Spark 3.5 and previous, it was read as DoubleType. To restore the previous 
behavior, you can cast the column to the old type.
 - Since Spark 4.0, MySQL JDBC datasource will read BIT(n > 1) as BinaryType, 
while in Spark 3.5 and previous, read as LongType. To restore the previous 
behavior, set `spark.sql.legacy.mysql.bitArrayMapping.enabled` to `true`.
 - Since Spark 4.0, MySQL JDBC datasource will write ShortType as SMALLINT, 
while in Spark 3.5 and previous, write as INTEGER. To restore the previous 
behavior, you can replace the column with IntegerType whenever before writing.
+- Since Spark 4.0, The default value for 
`spark.sql.legacy.ctePrecedencePolicy` has been changed from `EXCEPTION` to 
`CORRECTED`. Instead of raising an error, inner CTE definitions take precedence 
over outer definitions.
+- Since Spark 4.0, The default value for `spark.sql.legacy.timeParserPolicy` 
has been changed from `EXCEPTION` to `CORRECTED`. Instead of raising an 
`INCONSISTENT_BEHAVIOR_CROSS_VERSION` error, `CANNOT_PARSE_TIMESTAMP` will be 
raised if ANSI mode is enable. `NULL` will be returned if ANSI mode is 
disabled. See [Datetime Patterns for Formatting and 
Parsing](sql-ref-datetime-pattern.html).
 
 ## Upgrading from Spark SQL 3.5.1 to 3.5.2
 
diff --git a/python/pyspark/sql/tests/connect/test_connect_session.p

(spark) branch master updated: [SPARK-47723][CORE][TESTS] Introduce a tool that can sort alphabetically enumeration field in `LogEntry` automatically

2024-04-04 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fb96b1a8d648 [SPARK-47723][CORE][TESTS] Introduce a tool that can sort 
alphabetically enumeration field in `LogEntry` automatically
fb96b1a8d648 is described below

commit fb96b1a8d6480612ca61ec39f62c8db0b341327b
Author: panbingkun 
AuthorDate: Thu Apr 4 17:04:53 2024 -0700

[SPARK-47723][CORE][TESTS] Introduce a tool that can sort alphabetically 
enumeration field in `LogEntry` automatically

### What changes were proposed in this pull request?
The pr aims to `introduce` a `tool` that can `sort alphabetically` 
enumeration field in `LogEntry` automatically.

### Why are the changes needed?
Enable developers to more conveniently write the enumeration values in 
`LogEntry` in alphabetical order according to the requirements of structured 
log development documents.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
  ```
  SPARK_GENERATE_GOLDEN_FILES=1
  build/sbt "common-utils/testOnly *LogKeySuite -- -t \"LogKey enumeration 
fields are correctly sorted\""
  ```
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45867 from panbingkun/SPARK-47723.

Lead-authored-by: panbingkun 
Co-authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/util/LogKeySuite.scala  | 71 --
 1 file changed, 67 insertions(+), 4 deletions(-)

diff --git 
a/common/utils/src/test/scala/org/apache/spark/util/LogKeySuite.scala 
b/common/utils/src/test/scala/org/apache/spark/util/LogKeySuite.scala
index 1f3c2d77d35f..24a24538ad72 100644
--- a/common/utils/src/test/scala/org/apache/spark/util/LogKeySuite.scala
+++ b/common/utils/src/test/scala/org/apache/spark/util/LogKeySuite.scala
@@ -17,17 +17,80 @@
 
 package org.apache.spark.util
 
+import java.nio.charset.StandardCharsets
+import java.nio.file.{Files, Path}
+import java.util.{ArrayList => JList}
+
+import scala.jdk.CollectionConverters._
+
+import org.apache.commons.io.FileUtils
 import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
 
 import org.apache.spark.internal.{Logging, LogKey}
+import org.apache.spark.internal.LogKey.LogKey
 
+// scalastyle:off line.size.limit
+/**
+ * To re-generate the LogKey class file, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "common-utils/testOnly 
org.apache.spark.util.LogKeySuite"
+ * }}}
+ */
+// scalastyle:on line.size.limit
 class LogKeySuite
 extends AnyFunSuite // scalastyle:ignore funsuite
 with Logging {
 
-  test("LogKey enumeration fields must be sorted alphabetically") {
-val keys = LogKey.values.toSeq
-assert(keys === keys.sortBy(_.toString),
-  "LogKey enumeration fields must be sorted alphabetically")
+  /**
+   * Get a Path relative to the root project. It is assumed that a spark home 
is set.
+   */
+  protected final def getWorkspaceFilePath(first: String, more: String*): Path 
= {
+if (!(sys.props.contains("spark.test.home") || 
sys.env.contains("SPARK_HOME"))) {
+  fail("spark.test.home or SPARK_HOME is not set.")
+}
+val sparkHome = sys.props.getOrElse("spark.test.home", 
sys.env("SPARK_HOME"))
+java.nio.file.Paths.get(sparkHome, first +: more: _*)
+  }
+
+  private val regenerateGoldenFiles: Boolean = 
System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
+
+  private val logKeyFilePath = getWorkspaceFilePath("common", "utils", "src", 
"main", "scala",
+"org", "apache", "spark", "internal", "LogKey.scala")
+
+  // regenerate the file `LogKey.scala` with its enumeration fields sorted 
alphabetically
+  private def regenerateLogKeyFile(
+  originalKeys: Seq[LogKey], sortedKeys: Seq[LogKey]): Unit = {
+if (originalKeys != sortedKeys) {
+  val logKeyFile = logKeyFilePath.toFile
+  logInfo(s"Regenerating LogKey file $logKeyFile")
+  val originalContents = FileUtils.readLines(logKeyFile, 
StandardCharsets.UTF_8)
+  val sortedContents = new JList[String]()
+  var firstMatch = false
+  originalContents.asScala.foreach { line =>
+if (line.trim.startsWith("val ") && line.trim.endsWith(" = Value")) {
+  if (!firstMatch) {
+sortedKeys.foreach { logKey =>
+  sortedContents.add(s"  val ${logKey.toString} = Value")
+}
+firstMatch = true
+  }
+} else {
+  sortedConten

(spark) branch master updated: [SPARK-47598][CORE] MLLib: Migrate logError with variables to structured logging framework

2024-04-04 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3fd0cd609df6 [SPARK-47598][CORE] MLLib: Migrate logError with 
variables to structured logging framework
3fd0cd609df6 is described below

commit 3fd0cd609df65051920c56861fa6da54caf4cc9e
Author: panbingkun 
AuthorDate: Thu Apr 4 10:46:54 2024 -0700

[SPARK-47598][CORE] MLLib: Migrate logError with variables to structured 
logging framework

### What changes were proposed in this pull request?
The pr aims to migrate `logError` in module `MLLib` with variables to 
`structured logging framework`.

### Why are the changes needed?
To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45837 from panbingkun/SPARK-47598.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/LogKey.scala   |  7 +
 .../scala/org/apache/spark/internal/Logging.scala  |  2 +-
 .../apache/spark/ml/classification/LinearSVC.scala | 15 +-
 .../ml/classification/LogisticRegression.scala | 14 -
 .../ml/regression/AFTSurvivalRegression.scala  |  5 +---
 .../spark/ml/regression/LinearRegression.scala |  5 +---
 .../org/apache/spark/ml/util/Instrumentation.scala | 35 +-
 .../apache/spark/mllib/util/DataValidators.scala   | 11 ---
 .../org/apache/spark/mllib/util/MLUtils.scala  | 10 ++-
 .../spark/ml/feature/VectorIndexerSuite.scala  | 17 ++-
 .../mllib/tree/GradientBoostedTreesSuite.scala | 20 +
 11 files changed, 99 insertions(+), 42 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index 608c0c6d521e..66f3b803c0d4 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -28,6 +28,7 @@ object LogKey extends Enumeration {
   val BLOCK_MANAGER_ID = Value
   val BROADCAST_ID = Value
   val BUCKET = Value
+  val CATEGORICAL_FEATURES = Value
   val CLASS_LOADER = Value
   val CLASS_NAME = Value
   val COMMAND = Value
@@ -44,17 +45,22 @@ object LogKey extends Enumeration {
   val EXIT_CODE = Value
   val HOST = Value
   val JOB_ID = Value
+  val LEARNING_RATE = Value
   val LINE = Value
   val LINE_NUM = Value
   val MASTER_URL = Value
   val MAX_ATTEMPTS = Value
+  val MAX_CATEGORIES = Value
   val MAX_EXECUTOR_FAILURES = Value
   val MAX_SIZE = Value
   val MIN_SIZE = Value
+  val NUM_ITERATIONS = Value
   val OLD_BLOCK_MANAGER_ID = Value
+  val OPTIMIZER_CLASS_NAME = Value
   val PARTITION_ID = Value
   val PATH = Value
   val POD_ID = Value
+  val RANGE = Value
   val REASON = Value
   val REMOTE_ADDRESS = Value
   val RETRY_COUNT = Value
@@ -63,6 +69,7 @@ object LogKey extends Enumeration {
   val SIZE = Value
   val STAGE_ID = Value
   val SUBMISSION_ID = Value
+  val SUBSAMPLING_RATE = Value
   val TASK_ATTEMPT_ID = Value
   val TASK_ID = Value
   val TASK_NAME = Value
diff --git 
a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
index 84b9debb2afd..2132e166eacf 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
@@ -117,7 +117,7 @@ trait Logging {
 }
   }
 
-  private def withLogContext(context: java.util.HashMap[String, String])(body: 
=> Unit): Unit = {
+  protected def withLogContext(context: java.util.HashMap[String, 
String])(body: => Unit): Unit = {
 val threadContext = CloseableThreadContext.putAll(context)
 try {
   body
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala 
b/mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala
index 13898a304b3d..024693ba06f2 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala
@@ -25,7 +25,8 @@ import org.apache.hadoop.fs.Path
 
 import org.apache.spark.SparkException
 import org.apache.spark.annotation.Since
-import org.apache.spark.internal.Logging
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.{COUNT, RANGE}
 import org.apache.spark.ml.feature._
 import org.apache.spark.ml.linalg._
 import org.apache.spark.ml.optim.aggregator._
@@ -36,6 +37,7 @@ import org.apache.spark.ml.stat._
 

(spark) branch master updated (d75c77562d27 -> 3f6ac60e9966)

2024-04-03 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d75c77562d27 [SPARK-46812][PYTHON][TESTS][FOLLOWUP] Skip 
`pandas`-required tests if pandas is not available
 add 3f6ac60e9966 [SPARK-47577][CORE][PART1] Migrate logError with 
variables to structured logging framework

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/internal/LogKey.scala   | 43 +-
 .../scala/org/apache/spark/MapOutputTracker.scala  | 22 ++-
 .../spark/api/python/PythonGatewayServer.scala |  7 ++--
 .../apache/spark/api/python/PythonHadoopUtil.scala |  5 ++-
 .../apache/spark/broadcast/TorrentBroadcast.scala  |  7 +++-
 .../scala/org/apache/spark/deploy/Client.scala | 16 
 .../org/apache/spark/deploy/SparkSubmit.scala  |  9 +++--
 .../apache/spark/deploy/history/EventFilter.scala  |  7 ++--
 .../org/apache/spark/deploy/master/Master.scala| 12 --
 .../spark/deploy/rest/RestSubmissionClient.scala   | 20 ++
 .../org/apache/spark/deploy/worker/Worker.scala| 12 +++---
 .../apache/spark/deploy/worker/WorkerWatcher.scala | 10 +++--
 .../executor/CoarseGrainedExecutorBackend.scala| 11 +++---
 .../scala/org/apache/spark/executor/Executor.scala | 14 ---
 .../spark/internal/io/SparkHadoopWriter.scala  |  5 ++-
 .../org/apache/spark/metrics/MetricsSystem.scala   |  5 ++-
 .../main/scala/org/apache/spark/rdd/PipedRDD.scala |  6 ++-
 .../apache/spark/scheduler/AsyncEventQueue.scala   |  9 +++--
 .../org/apache/spark/scheduler/DAGScheduler.scala  | 21 +--
 .../org/apache/spark/scheduler/HealthTracker.scala |  8 ++--
 .../apache/spark/scheduler/LiveListenerBus.scala   |  8 ++--
 .../apache/spark/scheduler/ReplayListenerBus.scala |  5 ++-
 .../apache/spark/scheduler/TaskResultGetter.scala  |  6 ++-
 .../apache/spark/scheduler/TaskSchedulerImpl.scala | 17 +
 .../apache/spark/scheduler/TaskSetManager.scala| 24 +++-
 .../cluster/CoarseGrainedSchedulerBackend.scala|  7 ++--
 .../cluster/StandaloneSchedulerBackend.scala   |  5 ++-
 .../spark/shuffle/IndexShuffleBlockResolver.scala  | 10 +++--
 .../org/apache/spark/storage/BlockManager.scala|  6 +--
 .../spark/storage/BlockManagerMasterEndpoint.scala |  8 ++--
 .../spark/storage/DiskBlockObjectWriter.scala  | 11 +++---
 .../storage/ShuffleBlockFetcherIterator.scala  | 17 +
 .../main/scala/org/apache/spark/util/Utils.scala   | 12 --
 .../org/apache/spark/deploy/yarn/Client.scala  |  4 +-
 .../cluster/YarnClientSchedulerBackend.scala   |  4 +-
 35 files changed, 242 insertions(+), 151 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47705][INFRA][FOLLOWUP] Sort LogKey alphabetically and build a test to ensure it

2024-04-03 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c25fd93199cc [SPARK-47705][INFRA][FOLLOWUP] Sort LogKey alphabetically 
and build a test to ensure it
c25fd93199cc is described below

commit c25fd93199cc2d8795414cdb09a7129793a3e206
Author: panbingkun 
AuthorDate: Wed Apr 3 19:38:37 2024 -0700

[SPARK-47705][INFRA][FOLLOWUP] Sort LogKey alphabetically and build a test 
to ensure it

### What changes were proposed in this pull request?
The pr aims to fix bug about https://github.com/apache/spark/pull/45857

### Why are the changes needed?
In fact, `LogKey.values.toSeq.sorted` did not sort alphabetically as 
expected.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.
- Manually test.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45864 from panbingkun/fix_sort_logkey.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala  | 2 +-
 common/utils/src/test/scala/org/apache/spark/util/LogKeySuite.scala | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index b8a43a03d8b6..86ea648d47c1 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -30,8 +30,8 @@ object LogKey extends Enumeration {
   val MAX_EXECUTOR_FAILURES = Value
   val MAX_SIZE = Value
   val MIN_SIZE = Value
-  val REMOTE_ADDRESS = Value
   val POD_ID = Value
+  val REMOTE_ADDRESS = Value
 
   type LogKey = Value
 }
diff --git 
a/common/utils/src/test/scala/org/apache/spark/util/LogKeySuite.scala 
b/common/utils/src/test/scala/org/apache/spark/util/LogKeySuite.scala
index 39229f4b910b..1f3c2d77d35f 100644
--- a/common/utils/src/test/scala/org/apache/spark/util/LogKeySuite.scala
+++ b/common/utils/src/test/scala/org/apache/spark/util/LogKeySuite.scala
@@ -27,6 +27,7 @@ class LogKeySuite
 
   test("LogKey enumeration fields must be sorted alphabetically") {
 val keys = LogKey.values.toSeq
-assert(keys === keys.sorted, "LogKey enumeration fields must be sorted 
alphabetically")
+assert(keys === keys.sortBy(_.toString),
+  "LogKey enumeration fields must be sorted alphabetically")
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47705][INFRA] Sort LogKey alphabetically and build a test to ensure it

2024-04-03 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7dec5eb14644 [SPARK-47705][INFRA] Sort LogKey alphabetically and build 
a test to ensure it
7dec5eb14644 is described below

commit 7dec5eb14644aee6c0562bad1d14421d9fa07f17
Author: Daniel Tenedorio 
AuthorDate: Wed Apr 3 14:38:52 2024 -0700

[SPARK-47705][INFRA] Sort LogKey alphabetically and build a test to ensure 
it

### What changes were proposed in this pull request?

This PR adds a unit test to ensure that the fields of the `LogKey` 
enumeration are sorted alphabetically, as specified by 
https://issues.apache.org/jira/browse/SPARK-47705.

### Why are the changes needed?

This will make sure that the fields of the enumeration remain easy to read 
in the future as we add more cases.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This PR adds testing coverage only.

### Was this patch authored or co-authored using generative AI tooling?

GitHub copilot offered some suggestions, but I rejected them

Closes #45857 from dtenedor/logs.

Authored-by: Daniel Tenedorio 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/util/LogKeySuite.scala  | 32 ++
 1 file changed, 32 insertions(+)

diff --git 
a/common/utils/src/test/scala/org/apache/spark/util/LogKeySuite.scala 
b/common/utils/src/test/scala/org/apache/spark/util/LogKeySuite.scala
new file mode 100644
index ..39229f4b910b
--- /dev/null
+++ b/common/utils/src/test/scala/org/apache/spark/util/LogKeySuite.scala
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util
+
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, LogKey}
+
+class LogKeySuite
+extends AnyFunSuite // scalastyle:ignore funsuite
+with Logging {
+
+  test("LogKey enumeration fields must be sorted alphabetically") {
+val keys = LogKey.values.toSeq
+assert(keys === keys.sorted, "LogKey enumeration fields must be sorted 
alphabetically")
+  }
+}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (8d4e9647c971 -> db0975cb2a1c)

2024-04-02 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 8d4e9647c971 [SPARK-47684][SQL] Postgres: Map length unspecified 
bpchar to StringType
 add db0975cb2a1c [SPARK-47602][CORE] Resource managers: Migrate logError 
with variables to structured logging framework

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/internal/LogKey.scala   | 14 ++-
 .../scala/org/apache/spark/internal/Logging.scala  |  7 ++--
 .../scala/org/apache/spark/util/MDCSuite.scala | 49 ++
 .../apache/spark/util/PatternLoggingSuite.scala|  6 ++-
 .../apache/spark/util/StructuredLoggingSuite.scala | 40 --
 .../cluster/k8s/ExecutorPodsAllocator.scala|  8 ++--
 .../k8s/integrationtest/DepsTestsSuite.scala   |  3 +-
 .../spark/deploy/yarn/ApplicationMaster.scala  | 11 +++--
 .../org/apache/spark/deploy/yarn/Client.scala  |  5 ++-
 .../apache/spark/deploy/yarn/YarnAllocator.scala   |  6 ++-
 .../cluster/YarnClientSchedulerBackend.scala   |  7 ++--
 11 files changed, 133 insertions(+), 23 deletions(-)
 create mode 100644 
common/utils/src/test/scala/org/apache/spark/util/MDCSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47659][CORE][TESTS] Improve `*LoggingSuite*`

2024-04-01 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 14811974338e [SPARK-47659][CORE][TESTS] Improve `*LoggingSuite*`
14811974338e is described below

commit 14811974338ed30d990039c84a563f5e6cd0b26e
Author: panbingkun 
AuthorDate: Mon Apr 1 10:16:49 2024 -0700

[SPARK-47659][CORE][TESTS] Improve `*LoggingSuite*`

### What changes were proposed in this pull request?
The pr aims to improve `UT` related to `structured logs`, including: 
`LoggingSuiteBase`, `StructuredLoggingSuite` and `PatternLoggingSuite`.

### Why are the changes needed?
Enhance readability and make it more elegant.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45784 from panbingkun/SPARK-47659.

Authored-by: panbingkun 
Signed-off-by: Gengliang Wang 
---
 .../apache/spark/util/PatternLoggingSuite.scala|  21 +--
 .../apache/spark/util/StructuredLoggingSuite.scala | 148 ++---
 2 files changed, 115 insertions(+), 54 deletions(-)

diff --git 
a/common/utils/src/test/scala/org/apache/spark/util/PatternLoggingSuite.scala 
b/common/utils/src/test/scala/org/apache/spark/util/PatternLoggingSuite.scala
index 7e4318306c82..02895f708ff0 100644
--- 
a/common/utils/src/test/scala/org/apache/spark/util/PatternLoggingSuite.scala
+++ 
b/common/utils/src/test/scala/org/apache/spark/util/PatternLoggingSuite.scala
@@ -16,30 +16,33 @@
  */
 package org.apache.spark.util
 
+import org.apache.logging.log4j.Level
 import org.scalatest.BeforeAndAfterAll
 
 import org.apache.spark.internal.Logging
 
 class PatternLoggingSuite extends LoggingSuiteBase with BeforeAndAfterAll {
 
-  override protected def logFilePath: String = "target/pattern.log"
+  override def className: String = classOf[PatternLoggingSuite].getSimpleName
+  override def logFilePath: String = "target/pattern.log"
 
   override def beforeAll(): Unit = Logging.disableStructuredLogging()
 
   override def afterAll(): Unit = Logging.enableStructuredLogging()
 
-  override def expectedPatternForBasicMsg(level: String): String =
-s""".*$level PatternLoggingSuite: This is a log message\n"""
+  override def expectedPatternForBasicMsg(level: Level): String = {
+s""".*$level $className: This is a log message\n"""
+  }
 
-  override def expectedPatternForMsgWithMDC(level: String): String =
-s""".*$level PatternLoggingSuite: Lost executor 1.\n"""
+  override def expectedPatternForMsgWithMDC(level: Level): String =
+s""".*$level $className: Lost executor 1.\n"""
 
-  override def expectedPatternForMsgWithMDCAndException(level: String): String 
=
-s""".*$level PatternLoggingSuite: Error in executor 
1.\njava.lang.RuntimeException: OOM\n.*"""
+  override def expectedPatternForMsgWithMDCAndException(level: Level): String =
+s""".*$level $className: Error in executor 1.\njava.lang.RuntimeException: 
OOM\n.*"""
 
-  override def verifyMsgWithConcat(level: String, logOutput: String): Unit = {
+  override def verifyMsgWithConcat(level: Level, logOutput: String): Unit = {
 val pattern =
-  s""".*$level PatternLoggingSuite: Min Size: 2, Max Size: 4. Please 
double check.\n"""
+  s""".*$level $className: Min Size: 2, Max Size: 4. Please double 
check.\n"""
 assert(pattern.r.matches(logOutput))
   }
 }
diff --git 
a/common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala
 
b/common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala
index 8165c5f5b751..fe42c7fec990 100644
--- 
a/common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala
+++ 
b/common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala
@@ -19,23 +19,28 @@ package org.apache.spark.util
 import java.io.File
 import java.nio.file.Files
 
+import com.fasterxml.jackson.databind.ObjectMapper
+import com.fasterxml.jackson.module.scala.DefaultScalaModule
+import org.apache.logging.log4j.Level
 import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
 
 import org.apache.spark.internal.{LogEntry, Logging, MDC}
 import org.apache.spark.internal.LogKey.{EXECUTOR_ID, MAX_SIZE, MIN_SIZE}
 
-abstract class LoggingSuiteBase extends AnyFunSuite // scalastyle:ignore 
funsuite
-  with Logging {
+trait LoggingSuiteBase
+extends AnyFunSuite // scalastyle:ignore funsuite
+with Logging {
 
-  protected def logFilePath:

(spark) branch master updated: [SPARK-47654][INFRA] Structured logging framework: support log concatenation

2024-04-01 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a5ca5867298f [SPARK-47654][INFRA] Structured logging framework: 
support log concatenation
a5ca5867298f is described below

commit a5ca5867298f8ad6d40f3132ad74cbf078cc62b3
Author: Gengliang Wang 
AuthorDate: Mon Apr 1 00:15:24 2024 -0700

[SPARK-47654][INFRA] Structured logging framework: support log concatenation

### What changes were proposed in this pull request?

Support the log concatenation in the structured logging framework. For 
example
```
log"${MDC(CONFIG, SHUFFLE_MAPOUTPUT_MIN_SIZE_FOR_BROADCAST.key)} " +
  log"(${MDC(MIN_SIZE, minSizeForBroadcast.toString)} bytes) " +
  log"must be <= spark.rpc.message.maxSize (${MDC(MAX_SIZE, 
maxRpcMessageSize.toString)} " +
  log"bytes) to prevent sending an rpc message that is too large."
```

### Why are the changes needed?

Although most of the Spark logs are short, we need this convenient syntax 
when handling long logs with multiple variables.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New UT

### Was this patch authored or co-authored using generative AI tooling?

Yes, GitHub copilot

Closes #45779 from gengliangwang/logConcat.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/LogKey.scala   |  2 +-
 .../scala/org/apache/spark/internal/Logging.scala  | 62 +-
 .../apache/spark/util/PatternLoggingSuite.scala|  6 +++
 .../apache/spark/util/StructuredLoggingSuite.scala | 33 +++-
 4 files changed, 77 insertions(+), 26 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index 6ab6ac0eb58a..760077af6d3e 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -21,5 +21,5 @@ package org.apache.spark.internal
  * All structured logging keys should be defined here for standardization.
  */
 object LogKey extends Enumeration {
-  val EXECUTOR_ID = Value
+  val EXECUTOR_ID, MIN_SIZE, MAX_SIZE = Value
 }
diff --git 
a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
index 0aa93d6289d1..5765a6eed542 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
@@ -22,7 +22,6 @@ import java.util.Locale
 import scala.jdk.CollectionConverters._
 
 import org.apache.logging.log4j.{CloseableThreadContext, Level, LogManager}
-import org.apache.logging.log4j.CloseableThreadContext.Instance
 import org.apache.logging.log4j.core.{Filter, LifeCycle, LogEvent, Logger => 
Log4jLogger, LoggerContext}
 import org.apache.logging.log4j.core.appender.ConsoleAppender
 import org.apache.logging.log4j.core.config.DefaultConfiguration
@@ -43,7 +42,13 @@ case class MDC(key: LogKey.Value, value: String)
  * Wrapper class for log messages that include a logging context.
  * This is used as the return type of the string interpolator 
`LogStringContext`.
  */
-case class MessageWithContext(message: String, context: Option[Instance])
+case class MessageWithContext(message: String, context: 
java.util.HashMap[String, String]) {
+  def +(mdc: MessageWithContext): MessageWithContext = {
+val resultMap = new java.util.HashMap(context)
+resultMap.putAll(mdc.context)
+MessageWithContext(message + mdc.message, resultMap)
+  }
+}
 
 /**
  * Companion class for lazy evaluation of the MessageWithContext instance.
@@ -51,7 +56,7 @@ case class MessageWithContext(message: String, context: 
Option[Instance])
 class LogEntry(messageWithContext: => MessageWithContext) {
   def message: String = messageWithContext.message
 
-  def context: Option[Instance] = messageWithContext.context
+  def context: java.util.HashMap[String, String] = messageWithContext.context
 }
 
 /**
@@ -94,12 +99,12 @@ trait Logging {
 def log(args: MDC*): MessageWithContext = {
   val processedParts = sc.parts.iterator
   val sb = new StringBuilder(processedParts.next())
-  lazy val map = new java.util.HashMap[String, String]()
+  val context = new java.util.HashMap[String, String]()
 
   args.foreach { mdc =>
 sb.append(mdc.value)
 if (Logging.isStructuredLoggingEnabled) {
-  map.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
+  context.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
 }
 
 

(spark) branch master updated: [SPARK-47576][INFRA] Implement logInfo API in structured logging framework

2024-03-29 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 11d76c96554c [SPARK-47576][INFRA] Implement logInfo API in structured 
logging framework
11d76c96554c is described below

commit 11d76c96554cc71c6a941c99222c08c76bd04bf2
Author: Gengliang Wang 
AuthorDate: Fri Mar 29 13:10:40 2024 -0700

[SPARK-47576][INFRA] Implement logInfo API in structured logging framework

### What changes were proposed in this pull request?

Implement logWarning API in structured logging framework. Also, revise the 
test case names to make it more reasonable for the `PatternLoggingSuite`

### Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

New UT

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45777 from gengliangwang/logInfo.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/Logging.scala   | 14 ++
 .../apache/spark/util/StructuredLoggingSuite.scala  | 21 +++--
 2 files changed, 25 insertions(+), 10 deletions(-)

diff --git 
a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
index 2fed115f3dbb..0aa93d6289d1 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
@@ -122,6 +122,20 @@ trait Logging {
 if (log.isInfoEnabled) log.info(msg)
   }
 
+  protected def logInfo(entry: LogEntry): Unit = {
+if (log.isInfoEnabled) {
+  log.info(entry.message)
+  entry.context.map(_.close())
+}
+  }
+
+  protected def logInfo(entry: LogEntry, throwable: Throwable): Unit = {
+if (log.isInfoEnabled) {
+  log.info(entry.message, throwable)
+  entry.context.map(_.close())
+}
+  }
+
   protected def logDebug(msg: => String): Unit = {
 if (log.isDebugEnabled) log.debug(msg)
   }
diff --git 
a/common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala
 
b/common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala
index b032649170bc..5dfd3bb46021 100644
--- 
a/common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala
+++ 
b/common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala
@@ -58,33 +58,34 @@ abstract class LoggingSuiteBase extends AnyFunSuite // 
scalastyle:ignore funsuit
 
   def expectedPatternForMsgWithMDCAndException(level: String): String
 
-  test("Structured logging") {
+  test("Basic logging") {
 val msg = "This is a log message"
 Seq(
   ("ERROR", () => logError(msg)),
-  ("WARN", () => logWarning(msg))).foreach { case (level, logFunc) =>
+  ("WARN", () => logWarning(msg)),
+  ("INFO", () => logInfo(msg))).foreach { case (level, logFunc) =>
   val logOutput = captureLogOutput(logFunc)
   assert(expectedPatternForBasicMsg(level).r.matches(logOutput))
 }
   }
 
-  test("Structured logging with MDC") {
+  test("Logging with MDC") {
 Seq(
-  ("ERROR", () => logError(log"Lost executor ${MDC(EXECUTOR_ID, "1")}.")),
-  ("WARN", () => logWarning(log"Lost executor ${MDC(EXECUTOR_ID, "1")}.")))
-  .foreach {
+  ("ERROR", () => logError(msgWithMDC)),
+  ("WARN", () => logWarning(msgWithMDC)),
+  ("INFO", () => logInfo(msgWithMDC))).foreach {
 case (level, logFunc) =>
   val logOutput = captureLogOutput(logFunc)
   assert(expectedPatternForMsgWithMDC(level).r.matches(logOutput))
   }
   }
 
-  test("Structured exception logging with MDC") {
+  test("Logging with MDC and Exception") {
 val exception = new RuntimeException("OOM")
 Seq(
-  ("ERROR", () => logError(log"Error in executor ${MDC(EXECUTOR_ID, 
"1")}.", exception)),
-  ("WARN", () => logWarning(log"Error in executor ${MDC(EXECUTOR_ID, 
"1")}.", exception)))
-  .foreach {
+  ("ERROR", () => logError(msgWithMDCAndException, exception)),
+  ("WARN", () => logWarning(msgWithMDCAndException, exception)),
+  ("INFO", () => logInfo(msgWithMDCAndException, exception))).foreach {
 case (level, logFunc) =>
   val logOutput = captureLogOutput(logFunc)
   
assert(expectedPatternForMsgWithMDCAndException(level).r.findFirstIn(logOutput).isDefined)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (d182810abcd8 -> db14be8ab5f7)

2024-03-29 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d182810abcd8 [SPARK-47575][INFRA] Implement logWarning API in 
structured logging framework
 add db14be8ab5f7 [SPARK-47637][SQL] Use errorCapturingIdentifier in more 
places

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/parser/SqlBaseParser.g4  | 18 +-
 .../sql/catalyst/parser/DataTypeAstBuilder.scala|  2 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala  | 14 +++---
 .../sql/catalyst/parser/ErrorParserSuite.scala  | 21 +
 .../apache/spark/sql/execution/SparkSqlParser.scala |  4 ++--
 5 files changed, 40 insertions(+), 19 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47575][INFRA] Implement logWarning API in structured logging framework

2024-03-29 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d182810abcd8 [SPARK-47575][INFRA] Implement logWarning API in 
structured logging framework
d182810abcd8 is described below

commit d182810abcd8ff6a86211b90f0b4217100546688
Author: Gengliang Wang 
AuthorDate: Fri Mar 29 11:13:21 2024 -0700

[SPARK-47575][INFRA] Implement logWarning API in structured logging 
framework

### What changes were proposed in this pull request?

 Implement logWarning API in structured logging framework. Also, refactor 
the logging test suites to reduce duplicated code.

### Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

New unit tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45770 from gengliangwang/logWarning.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/internal/Logging.scala  | 14 
 .../apache/spark/util/PatternLoggingSuite.scala| 33 ++
 .../apache/spark/util/StructuredLoggingSuite.scala | 74 +++---
 3 files changed, 72 insertions(+), 49 deletions(-)

diff --git 
a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
index 7f380a9c7887..2fed115f3dbb 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
@@ -134,6 +134,20 @@ trait Logging {
 if (log.isWarnEnabled) log.warn(msg)
   }
 
+  protected def logWarning(entry: LogEntry): Unit = {
+if (log.isWarnEnabled) {
+  log.warn(entry.message)
+  entry.context.map(_.close())
+}
+  }
+
+  protected def logWarning(entry: LogEntry, throwable: Throwable): Unit = {
+if (log.isWarnEnabled) {
+  log.warn(entry.message, throwable)
+  entry.context.map(_.close())
+}
+  }
+
   protected def logError(msg: => String): Unit = {
 if (log.isErrorEnabled) log.error(msg)
   }
diff --git 
a/common/utils/src/test/scala/org/apache/spark/util/PatternLoggingSuite.scala 
b/common/utils/src/test/scala/org/apache/spark/util/PatternLoggingSuite.scala
index 0c6ed89172e0..ef0aa7050b07 100644
--- 
a/common/utils/src/test/scala/org/apache/spark/util/PatternLoggingSuite.scala
+++ 
b/common/utils/src/test/scala/org/apache/spark/util/PatternLoggingSuite.scala
@@ -18,8 +18,7 @@ package org.apache.spark.util
 
 import org.scalatest.BeforeAndAfterAll
 
-import org.apache.spark.internal.{Logging, MDC}
-import org.apache.spark.internal.LogKey.EXECUTOR_ID
+import org.apache.spark.internal.Logging
 
 class PatternLoggingSuite extends LoggingSuiteBase with BeforeAndAfterAll {
 
@@ -29,30 +28,12 @@ class PatternLoggingSuite extends LoggingSuiteBase with 
BeforeAndAfterAll {
 
   override def afterAll(): Unit = Logging.enableStructuredLogging()
 
-  test("Pattern layout logging") {
-val msg = "This is a log message"
+  override def expectedPatternForBasicMsg(level: String): String =
+s""".*$level PatternLoggingSuite: This is a log message\n"""
 
-val logOutput = captureLogOutput(() => logError(msg))
-// scalastyle:off line.size.limit
-val pattern = """.*ERROR PatternLoggingSuite: This is a log message\n""".r
-// scalastyle:on
-assert(pattern.matches(logOutput))
-  }
+  override def expectedPatternForMsgWithMDC(level: String): String =
+s""".*$level PatternLoggingSuite: Lost executor 1.\n"""
 
-  test("Pattern layout logging with MDC") {
-logError(log"Lost executor ${MDC(EXECUTOR_ID, "1")}.")
-
-val logOutput = captureLogOutput(() => logError(log"Lost executor 
${MDC(EXECUTOR_ID, "1")}."))
-val pattern = """.*ERROR PatternLoggingSuite: Lost executor 1.\n""".r
-assert(pattern.matches(logOutput))
-  }
-
-  test("Pattern layout exception logging") {
-val exception = new RuntimeException("OOM")
-
-val logOutput = captureLogOutput(() =>
-  logError(log"Error in executor ${MDC(EXECUTOR_ID, "1")}.", exception))
-assert(logOutput.contains("ERROR PatternLoggingSuite: Error in executor 
1."))
-assert(logOutput.contains("java.lang.RuntimeException: OOM"))
-  }
+  override def expectedPatternForMsgWithMDCAndException(level: String): String 
=
+s""".*$level PatternLoggingSuite: Error in executor 
1.\n

(spark) branch master updated: [SPARK-47574][INFRA] Introduce Structured Logging Framework

2024-03-28 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 874d033fc61b [SPARK-47574][INFRA] Introduce Structured Logging 
Framework
874d033fc61b is described below

commit 874d033fc61becb5679db70c804592a0f9cc37ed
Author: Gengliang Wang 
AuthorDate: Thu Mar 28 22:58:51 2024 -0700

[SPARK-47574][INFRA] Introduce Structured Logging Framework

### What changes were proposed in this pull request?

Introduce Structured Logging Framework as per [SPIP: Structured Logging 
Framework for Apache 
Spark](https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing)
 .
* The default logging output format will be json lines. For example
```
{
   "ts":"2023-03-12T12:02:46.661-0700",
   "level":"ERROR",
   "msg":"Cannot determine whether executor 289 is alive or not",
   "context":{
   "executor_id":"289"
   },
   "exception":{
  "class":"org.apache.spark.SparkException",
  "msg":"Exception thrown in awaitResult",
  "stackTrace":"..."
   },
   "source":"BlockManagerMasterEndpoint"
}
```
* Introduce a new configuration `spark.log.structuredLogging.enabled` to 
set the default log4j configuration. It is true by default. Users can disable 
it to get plain text log outputs.
* The change will start with the `logError` method. Example changes on the 
API:
from
```
logError(s"Cannot determine whether executor $executorId is alive or not.", 
e)
```
to
```
logError(log"Cannot determine whether executor ${MDC(EXECUTOR_ID, 
executorId)} is alive or not.", e)
```

### Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured 
logging. This transition will change the format of the default log output from 
plain text to JSON lines, making it more analyzable.

### Does this PR introduce _any_ user-facing change?

Yes, the default log output format will be json lines instead of plain 
text. User can restore the default plain text output when disabling 
configuration `spark.log.structuredLogging.enabled`.
If a user is a customized log4j configuration, there is no changes in the 
log output.

### How was this patch tested?

New Unit tests

### Was this patch authored or co-authored using generative AI tooling?

Yes, some of the code comments are from github copilot

Closes #45729 from gengliangwang/LogInterpolator.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 common/utils/pom.xml   |   4 +
 .../resources/org/apache/spark/SparkLayout.json|  38 
 .../org/apache/spark/log4j2-defaults.properties|   4 +-
 ...s => log4j2-pattern-layout-defaults.properties} |   0
 .../scala/org/apache/spark/internal/LogKey.scala   |  25 +
 .../scala/org/apache/spark/internal/Logging.scala  | 105 -
 common/utils/src/test/resources/log4j2.properties  |  50 ++
 .../apache/spark/util/PatternLoggingSuite.scala|  58 
 .../apache/spark/util/StructuredLoggingSuite.scala |  83 
 .../org/apache/spark/deploy/SparkSubmit.scala  |   5 +
 .../org/apache/spark/internal/config/package.scala |  10 ++
 dev/deps/spark-deps-hadoop-3-hive-2.3  |   1 +
 docs/core-migration-guide.md   |   2 +
 pom.xml|   5 +
 14 files changed, 386 insertions(+), 4 deletions(-)

diff --git a/common/utils/pom.xml b/common/utils/pom.xml
index d360e041dd64..1dbf2a769fff 100644
--- a/common/utils/pom.xml
+++ b/common/utils/pom.xml
@@ -98,6 +98,10 @@
   org.apache.logging.log4j
   log4j-1.2-api
 
+
+  org.apache.logging.log4j
+  log4j-layout-template-json
+
   
   
 
target/scala-${scala.binary.version}/classes
diff --git a/common/utils/src/main/resources/org/apache/spark/SparkLayout.json 
b/common/utils/src/main/resources/org/apache/spark/SparkLayout.json
new file mode 100644
index ..b0d8ea27ffbc
--- /dev/null
+++ b/common/utils/src/main/resources/org/apache/spark/SparkLayout.json
@@ -0,0 +1,38 @@
+{
+  "ts": {
+"$resolver": "timestamp"
+  },
+  "level": {
+"$resolver": "level",
+"field": "name"
+  },
+  "msg": {
+"$resolver": "message",
+"stringified": true
+  },
+  "context": {
+"$resolver": "mdc

(spark) branch master updated: [SPARK-47492][SQL] Widen whitespace rules in lexer

2024-03-28 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c832e2ac1d04 [SPARK-47492][SQL] Widen whitespace rules in lexer
c832e2ac1d04 is described below

commit c832e2ac1d04668c77493577662c639785808657
Author: Serge Rielau 
AuthorDate: Thu Mar 28 15:51:32 2024 -0700

[SPARK-47492][SQL] Widen whitespace rules in lexer

### What changes were proposed in this pull request?

In this pull PR we extend the Lexer's understanding of WhiteSpace (what 
separates tokens) from the ASCII: ,  to the various Unicode 
flavors of "space" such as "narrow" and "wide".

### Why are the changes needed?

SQL statements are frequently copy pasted from various sources. Many of 
these sources are "rich text" and based on Unicode.
When doing do it is inevitable that non ASCII whitespace characters are 
copied.
This results today in often incomprehensible syntax errors.
Incomprehensible because the error message prints the "bad" whitespace just 
like an ASCII whitespace.
So the user stands little chance to find root cause unless they use 
possible editor options to to highlight non ASCII space or they, by sheer luck, 
happen to remove the whitespace.

So in this PR we acknowledge the reality and stop "discriminating" against 
non-ASCII whitespace.

### Does this PR introduce _any_ user-facing change?

Queries that used to fail before with a Syntax error, now succeed.

### How was this patch tested?

Added a new set of unit tests in SparkSQLParserSuite

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45620 from srielau/SPARK-47492-Widen-whitespace-rules-in-lexer.

Lead-authored-by: Serge Rielau 
    Co-authored-by: Serge Rielau 
Signed-off-by: Gengliang Wang 
---
 .../spark/sql/catalyst/parser/SqlBaseLexer.g4  |  2 +-
 .../spark/sql/execution/SparkSqlParserSuite.scala  | 80 ++
 2 files changed, 81 insertions(+), 1 deletion(-)

diff --git 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
index 7c376e226850..f5565f0a63fb 100644
--- 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
+++ 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
@@ -554,7 +554,7 @@ BRACKETED_COMMENT
 ;
 
 WS
-: [ \r\n\t]+ -> channel(HIDDEN)
+: [ 
\t\n\f\r\u000B\u00A0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u2028\u202F\u205F\u3000]+
 -> channel(HIDDEN)
 ;
 
 // Catch-all for anything we can't recognize.
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
index c3768afa90f1..f60df77b7e9b 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
@@ -800,4 +800,84 @@ class SparkSqlParserSuite extends AnalysisTest with 
SharedSparkSession {
 start = 0,
 stop = 63))
   }
+
+  test("verify whitespace handling - standard whitespace") {
+parser.parsePlan("SELECT 1") // ASCII space
+parser.parsePlan("SELECT\r1") // ASCII carriage return
+parser.parsePlan("SELECT\n1") // ASCII line feed
+parser.parsePlan("SELECT\t1") // ASCII tab
+parser.parsePlan("SELECT\u000B1") // ASCII vertical tab
+parser.parsePlan("SELECT\f1") // ASCII form feed
+  }
+
+  // Need to switch off scala style for Unicode characters
+  // scalastyle:off
+  test("verify whitespace handling - Unicode no-break space") {
+parser.parsePlan("SELECT\u00A01") // Unicode no-break space
+  }
+
+  test("verify whitespace handling - Unicode ogham space mark") {
+parser.parsePlan("SELECT\u16801") // Unicode ogham space mark
+  }
+
+  test("verify whitespace handling - Unicode en quad") {
+parser.parsePlan("SELECT\u20001") // Unicode en quad
+  }
+
+  test("verify whitespace handling - Unicode em quad") {
+parser.parsePlan("SELECT\u20011") // Unicode em quad
+  }
+
+  test("verify whitespace handling - Unicode en space") {
+parser.parsePlan("SELECT\u20021") // Unicode en space
+  }
+
+  test("verify whitespace handling - Unicode em space") {
+parser.parsePlan("SELECT\u20031") // Unicode em space
+  }
+
+  test("verify whitespace handling - Unicode three-p

(spark) branch master updated: [SPARK-47447][SQL] Allow reading Parquet TimestampLTZ as TimestampNTZ

2024-03-20 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c3a04fa59ce1 [SPARK-47447][SQL] Allow reading Parquet TimestampLTZ as 
TimestampNTZ
c3a04fa59ce1 is described below

commit c3a04fa59ce1aabe4818430ae294fb8d210c0e4b
Author: Gengliang Wang 
AuthorDate: Tue Mar 19 23:04:59 2024 -0700

[SPARK-47447][SQL] Allow reading Parquet TimestampLTZ as TimestampNTZ

### What changes were proposed in this pull request?

Currently, Parquet TimestampNTZ type columns can be read as TimestampLTZ, 
while reading TimestampLTZ as TimestampNTZ will cause errors. This makes it 
impossible to read parquet files containing both TimestampLTZ and TimestampNTZ 
as TimestampNTZ.

To make the data type system on Parquet simpler, this PR allows reading 
TimestampLTZ as TimestampNTZ in the Parquet data source.

### Why are the changes needed?

* Make it possible  to read parquet files containing both TimestampLTZ and 
TimestampNTZ as TimestampNTZ
* Make the data type system on Parquet simpler

### Does this PR introduce _any_ user-facing change?

Yes, Parquet TimestampLTZ type column are now allowed to be read as 
TimestampNTZ

### How was this patch tested?

UT
### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45571 from gengliangwang/allowReadLTZAsNTZ.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../parquet/ParquetVectorUpdaterFactory.java   | 19 ++-
 .../datasources/parquet/ParquetRowConverter.scala  | 16 
 .../datasources/parquet/ParquetQuerySuite.scala| 22 +++---
 3 files changed, 21 insertions(+), 36 deletions(-)

diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetVectorUpdaterFactory.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetVectorUpdaterFactory.java
index abb44915cbcd..b6065c24f2ec 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetVectorUpdaterFactory.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetVectorUpdaterFactory.java
@@ -148,12 +148,10 @@ public class ParquetVectorUpdaterFactory {
   }
 } else if (sparkType == DataTypes.TimestampNTZType &&
   isTimestampTypeMatched(LogicalTypeAnnotation.TimeUnit.MICROS)) {
-  validateTimestampNTZType();
   // TIMESTAMP_NTZ is a new data type and has no legacy files that 
need to do rebase.
   return new LongUpdater();
 } else if (sparkType == DataTypes.TimestampNTZType &&
   isTimestampTypeMatched(LogicalTypeAnnotation.TimeUnit.MILLIS)) {
-  validateTimestampNTZType();
   // TIMESTAMP_NTZ is a new data type and has no legacy files that 
need to do rebase.
   return new LongAsMicrosUpdater();
 } else if (sparkType instanceof DayTimeIntervalType) {
@@ -176,7 +174,8 @@ public class ParquetVectorUpdaterFactory {
   }
   case INT96 -> {
 if (sparkType == DataTypes.TimestampNTZType) {
-  convertErrorForTimestampNTZ(typeName.name());
+  // TimestampNTZ type does not require rebasing due to its lack of 
time zone context.
+  return new BinaryToSQLTimestampUpdater();
 } else if (sparkType == DataTypes.TimestampType) {
   final boolean failIfRebase = "EXCEPTION".equals(int96RebaseMode);
   if (!shouldConvertTimestamps()) {
@@ -232,20 +231,6 @@ public class ParquetVectorUpdaterFactory {
   annotation.getUnit() == unit;
   }
 
-  private void validateTimestampNTZType() {
-assert(logicalTypeAnnotation instanceof TimestampLogicalTypeAnnotation);
-// Throw an exception if the Parquet type is TimestampLTZ as the Catalyst 
type is TimestampNTZ.
-// This is to avoid mistakes in reading the timestamp values.
-if (((TimestampLogicalTypeAnnotation) 
logicalTypeAnnotation).isAdjustedToUTC()) {
-  convertErrorForTimestampNTZ("int64 time(" + logicalTypeAnnotation + ")");
-}
-  }
-
-  void convertErrorForTimestampNTZ(String parquetType) {
-throw new RuntimeException("Unable to create Parquet converter for data 
type " +
-  DataTypes.TimestampNTZType.json() + " whose Parquet type is " + 
parquetType);
-  }
-
   boolean isUnsignedIntTypeMatched(int bitWidth) {
 return logicalTypeAnnotation instanceof IntLogicalTypeAnnotation 
annotation &&
   !annotation.isSigned() && annotation.getBitWidth() == bitWidth;
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter

(spark) branch branch-3.4 updated: [SPARK-47375][DOC][FOLLOWUP] Correct the preferTimestampNTZ option description in JDBC doc

2024-03-13 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 922f5f686dc7 [SPARK-47375][DOC][FOLLOWUP] Correct the 
preferTimestampNTZ option description in JDBC doc
922f5f686dc7 is described below

commit 922f5f686dc72433a9028bbe471a35a5b84f2855
Author: Gengliang Wang 
AuthorDate: Wed Mar 13 21:00:35 2024 -0700

[SPARK-47375][DOC][FOLLOWUP] Correct the preferTimestampNTZ option 
description in JDBC doc

### What changes were proposed in this pull request?

Correct the preferTimestampNTZ option description in JDBC doc as per 
https://github.com/apache/spark/pull/45496

### Why are the changes needed?

The current doc is wrong about the jdbc option preferTimestampNTZ

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Just doc change
### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45502 from gengliangwang/ntzJdbc.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
(cherry picked from commit abfbd2718159d62e3322cca8c2d4ef1c29781b21)
Signed-off-by: Gengliang Wang 
---
 docs/sql-data-sources-jdbc.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/sql-data-sources-jdbc.md b/docs/sql-data-sources-jdbc.md
index ef11a3a77dd8..004244b7328c 100644
--- a/docs/sql-data-sources-jdbc.md
+++ b/docs/sql-data-sources-jdbc.md
@@ -368,8 +368,9 @@ logging into the data sources.
 preferTimestampNTZ
 false
 
-  When the option is set to true, all timestamps are inferred 
as TIMESTAMP WITHOUT TIME ZONE.
-  Otherwise, timestamps are read as TIMESTAMP with local time zone.
+  When the option is set to true, TIMESTAMP WITHOUT TIME ZONE 
type are inferred as Spark's TimestampNTZ type.
+  Otherwise, it is interpreted as Spark's Timestamp type(equivalent to 
TIMESTAMP WITHOUT LOCAL TIME ZONE).
+  This setting specifically affects only the inference of TIMESTAMP 
WITHOUT TIME ZONE data type. Both TIMESTAMP WITHOUT LOCAL TIME ZONE and 
TIMESTAMP WITH TIME ZONE data types are consistently interpreted as Spark's 
Timestamp type regardless of this setting.
 
 read
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-47375][DOC][FOLLOWUP] Correct the preferTimestampNTZ option description in JDBC doc

2024-03-13 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 706f54f69fe7 [SPARK-47375][DOC][FOLLOWUP] Correct the 
preferTimestampNTZ option description in JDBC doc
706f54f69fe7 is described below

commit 706f54f69fe797027b5fcf1cfb4867811fb41c3d
Author: Gengliang Wang 
AuthorDate: Wed Mar 13 21:00:35 2024 -0700

[SPARK-47375][DOC][FOLLOWUP] Correct the preferTimestampNTZ option 
description in JDBC doc

### What changes were proposed in this pull request?

Correct the preferTimestampNTZ option description in JDBC doc as per 
https://github.com/apache/spark/pull/45496

### Why are the changes needed?

The current doc is wrong about the jdbc option preferTimestampNTZ

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Just doc change
### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45502 from gengliangwang/ntzJdbc.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
(cherry picked from commit abfbd2718159d62e3322cca8c2d4ef1c29781b21)
Signed-off-by: Gengliang Wang 
---
 docs/sql-data-sources-jdbc.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/sql-data-sources-jdbc.md b/docs/sql-data-sources-jdbc.md
index edcdef4bf008..d794116091fe 100644
--- a/docs/sql-data-sources-jdbc.md
+++ b/docs/sql-data-sources-jdbc.md
@@ -368,8 +368,9 @@ logging into the data sources.
 preferTimestampNTZ
 false
 
-  When the option is set to true, all timestamps are inferred 
as TIMESTAMP WITHOUT TIME ZONE.
-  Otherwise, timestamps are read as TIMESTAMP with local time zone.
+  When the option is set to true, TIMESTAMP WITHOUT TIME ZONE 
type are inferred as Spark's TimestampNTZ type.
+  Otherwise, it is interpreted as Spark's Timestamp type(equivalent to 
TIMESTAMP WITHOUT LOCAL TIME ZONE).
+  This setting specifically affects only the inference of TIMESTAMP 
WITHOUT TIME ZONE data type. Both TIMESTAMP WITHOUT LOCAL TIME ZONE and 
TIMESTAMP WITH TIME ZONE data types are consistently interpreted as Spark's 
Timestamp type regardless of this setting.
 
 read
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (bc0ba7dccdd5 -> abfbd2718159)

2024-03-13 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from bc0ba7dccdd5 [SPARK-41762][PYTHON][CONNECT][TESTS] Enable column name 
comparsion in `test_column_arithmetic_ops`
 add abfbd2718159 [SPARK-47375][DOC][FOLLOWUP] Correct the 
preferTimestampNTZ option description in JDBC doc

No new revisions were added by this update.

Summary of changes:
 docs/sql-data-sources-jdbc.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47344] Extend INVALID_IDENTIFIER error beyond catching '-' in an unquoted identifier and fix "IS ! NULL" et al.

2024-03-13 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ebe9f669e3fd [SPARK-47344] Extend INVALID_IDENTIFIER error beyond 
catching '-' in an unquoted identifier and fix "IS ! NULL" et al.
ebe9f669e3fd is described below

commit ebe9f669e3fd4f391336c12c2e15df048eaa11bc
Author: Serge Rielau 
AuthorDate: Wed Mar 13 13:19:29 2024 -0700

[SPARK-47344] Extend INVALID_IDENTIFIER error beyond catching '-' in an 
unquoted identifier and fix "IS ! NULL" et al.

### What changes were proposed in this pull request?

In this PR we propose to extend the lexing of IDENTIFIER beyond what is 
legitimate for unquoted identifiers to include
"plausible" identifiers. We then use the "exit" hook in the parser raise 
INVALID_IDENTIFIER error which is more meaningful than a syntax error.

Specifically we allow:
* general letters beyond the ASCII a-z. This will catch locale specific 
names
* URIs which are used for table's represented by a path.

As part of this PR we also found that rolling `NOT` and `!` into one token 
is a "bad idea".
We allow:

CREATE TABLE t(c1 INT ! NULL); etc.
This is clearly not intended.

! is now ONLY allowed as a boolean prefix operator.

### Why are the changes needed?

This change  improves the user experience in case of an error by returning 
a more meaningful error.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing test suite + new unit tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45470 from srielau/SPARK-47344.

Lead-authored-by: Serge Rielau 
    Co-authored-by: Wenchen Fan 
Signed-off-by: Gengliang Wang 
---
 .../utils/src/main/resources/error/error-classes.json   |  5 -
 docs/sql-error-conditions.md|  5 -
 .../apache/spark/sql/catalyst/parser/SqlBaseLexer.g4| 14 --
 .../apache/spark/sql/catalyst/parser/SqlBaseParser.g4   |  3 ++-
 .../org/apache/spark/sql/catalyst/parser/parsers.scala  | 12 
 .../apache/spark/sql/errors/QueryParsingErrors.scala|  2 +-
 .../spark/sql/catalyst/parser/ErrorParserSuite.scala| 17 +
 .../resources/sql-tests/results/ansi/keywords.sql.out   |  2 ++
 .../test/resources/sql-tests/results/keywords.sql.out   |  1 +
 .../ThriftServerWithSparkContextSuite.scala |  2 +-
 10 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 92c72e03e483..8272c442ddfa 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -2098,7 +2098,10 @@
   },
   "INVALID_IDENTIFIER" : {
 "message" : [
-  "The identifier  is invalid. Please, consider quoting it with 
back-quotes as ``."
+  "The unquoted identifier  is invalid and must be back quoted as: 
``.",
+  "Unquoted identifiers can only contain ASCII letters ('a' - 'z', 'A' - 
'Z'), digits ('0' - '9'), and underbar ('_').",
+  "Unquoted identifiers must also not start with a digit.",
+  "Different data sources and meta stores may impose additional 
restrictions on valid identifiers."
 ],
 "sqlState" : "42602"
   },
diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md
index efead13251c1..dba87bf0136e 100644
--- a/docs/sql-error-conditions.md
+++ b/docs/sql-error-conditions.md
@@ -1240,7 +1240,10 @@ For more details see 
[INVALID_HANDLE](sql-error-conditions-invalid-handle-error-
 
 [SQLSTATE: 
42602](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)
 
-The identifier `` is invalid. Please, consider quoting it with 
back-quotes as .
+The unquoted identifier `` is invalid and must be back quoted as: 
.
+Unquoted identifiers can only contain ASCII letters ('a' - 'z', 'A' - 'Z'), 
digits ('0' - '9'), and underbar ('_').
+Unquoted identifiers must also not start with a digit.
+Different data sources and meta stores may impose additional restrictions on 
valid identifiers.
 
 ### INVALID_INDEX_OF_ZERO
 
diff --git 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
index 174887def66d..7c376e226850 100644
--- 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
+++ 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4

(spark) branch branch-3.4 updated: [SPARK-47368][SQL]][3.5] Remove inferTimestampNTZ config check in ParquetRo…

2024-03-12 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 60b4c0b690ad [SPARK-47368][SQL]][3.5] Remove inferTimestampNTZ config 
check in ParquetRo…
60b4c0b690ad is described below

commit 60b4c0b690ad980ff4eef93180c70d6e64e5e347
Author: Gengliang Wang 
AuthorDate: Tue Mar 12 22:42:45 2024 -0700

[SPARK-47368][SQL]][3.5] Remove inferTimestampNTZ config check in ParquetRo…

### What changes were proposed in this pull request?

The configuration `spark.sql.parquet.inferTimestampNTZ.enabled` is not 
related the parquet row converter.  This PR is the remove the config check 
`spark.sql.parquet.inferTimestampNTZ.enabled` in the ParquetRowConverter

### Why are the changes needed?

Bug fix.  Otherwise reading TimestampNTZ columns may fail when 
`spark.sql.parquet.inferTimestampNTZ.enabled` is disabled.
### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

New UT

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45492 from gengliangwang/PR_TOOL_PICK_PR_45480_BRANCH-3.5.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
(cherry picked from commit 3018a5d8cd96a569b3bfe7e11b4b26fb4fb54f32)
Signed-off-by: Gengliang Wang 
---
 .../datasources/parquet/ParquetRowConverter.scala  |  9 +++---
 .../parquet/ParquetSchemaConverter.scala   |  7 -
 .../datasources/parquet/ParquetQuerySuite.scala| 36 +-
 3 files changed, 25 insertions(+), 27 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
index 9101e7d0ac52..1e07c6db2a06 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
@@ -505,11 +505,10 @@ private[parquet] class ParquetRowConverter(
   // can be read as Spark's TimestampNTZ type. This is to avoid mistakes in 
reading the timestamp
   // values.
   private def canReadAsTimestampNTZ(parquetType: Type): Boolean =
-schemaConverter.isTimestampNTZEnabled() &&
-  parquetType.asPrimitiveType().getPrimitiveTypeName == INT64 &&
-  
parquetType.getLogicalTypeAnnotation.isInstanceOf[TimestampLogicalTypeAnnotation]
 &&
-  !parquetType.getLogicalTypeAnnotation
-.asInstanceOf[TimestampLogicalTypeAnnotation].isAdjustedToUTC
+parquetType.asPrimitiveType().getPrimitiveTypeName == INT64 &&
+
parquetType.getLogicalTypeAnnotation.isInstanceOf[TimestampLogicalTypeAnnotation]
 &&
+!parquetType.getLogicalTypeAnnotation
+  .asInstanceOf[TimestampLogicalTypeAnnotation].isAdjustedToUTC
 
   /**
* Parquet converter for strings. A dictionary is used to minimize string 
decoding cost.
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
index 9c9e7ce729c1..a78b96ae6fcc 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
@@ -72,13 +72,6 @@ class ParquetToSparkSchemaConverter(
 inferTimestampNTZ = 
conf.get(SQLConf.PARQUET_INFER_TIMESTAMP_NTZ_ENABLED.key).toBoolean,
 nanosAsLong = conf.get(SQLConf.LEGACY_PARQUET_NANOS_AS_LONG.key).toBoolean)
 
-  /**
-   * Returns true if TIMESTAMP_NTZ type is enabled in this 
ParquetToSparkSchemaConverter.
-   */
-  def isTimestampNTZEnabled(): Boolean = {
-inferTimestampNTZ
-  }
-
   /**
* Converts Parquet [[MessageType]] `parquetSchema` to a Spark SQL 
[[StructType]].
*/
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
index 828ec39c7d72..29cb224c8787 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
@@ -160,21 +160,27 @@ abstract class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedS
 }
   }
 
-  test("SPARK-36182: writing and reading TimestampNTZType column") {
-withTable("ts") {
-  sql("create table ts (c1

(spark) branch branch-3.5 updated (6629b9a6a5ae -> 3018a5d8cd96)

2024-03-12 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


from 6629b9a6a5ae [SPARK-47370][DOC] Add migration doc: TimestampNTZ type 
inference on Parquet files
 add 3018a5d8cd96 [SPARK-47368][SQL]][3.5] Remove inferTimestampNTZ config 
check in ParquetRo…

No new revisions were added by this update.

Summary of changes:
 .../datasources/parquet/ParquetRowConverter.scala  |  9 +++---
 .../parquet/ParquetSchemaConverter.scala   |  7 -
 .../datasources/parquet/ParquetQuerySuite.scala| 36 +-
 3 files changed, 25 insertions(+), 27 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (5d32e62436dc -> 625589f736fe)

2024-03-12 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 5d32e62436dc [MINOR][TESTS] Enable nullability check in 
`test_create_dataframe_from_arrays`
 add 625589f736fe [SPARK-47368][SQL] Remove inferTimestampNTZ config check 
in ParquetRowConverter

No new revisions were added by this update.

Summary of changes:
 .../datasources/parquet/ParquetRowConverter.scala  | 14 -
 .../parquet/ParquetSchemaConverter.scala   |  7 -
 .../datasources/parquet/ParquetQuerySuite.scala| 36 +-
 3 files changed, 27 insertions(+), 30 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-47370][DOC] Add migration doc: TimestampNTZ type inference on Parquet files

2024-03-12 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 982fbc5b63e6 [SPARK-47370][DOC] Add migration doc: TimestampNTZ type 
inference on Parquet files
982fbc5b63e6 is described below

commit 982fbc5b63e61cbc280f8049caf60fbb6e178423
Author: Gengliang Wang 
AuthorDate: Tue Mar 12 15:11:34 2024 -0700

[SPARK-47370][DOC] Add migration doc: TimestampNTZ type inference on 
Parquet files

### What changes were proposed in this pull request?

Add migration doc: TimestampNTZ type inference on Parquet files

### Why are the changes needed?

Update docs. The behavior change was not mentioned in the SQL migration 
guide

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

It's just doc change
### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45482 from gengliangwang/ntzMigrationDoc.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
(cherry picked from commit 621f2c88f3e56257ee517d65e093d32fb44b783e)
Signed-off-by: Gengliang Wang 
---
 docs/sql-migration-guide.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 1ad6c8faa3db..b83745e75c79 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -43,6 +43,7 @@ license: |
   - Since Spark 3.4, vectorized readers are enabled by default for the nested 
data types (array, map and struct). To restore the legacy behavior, set 
`spark.sql.orc.enableNestedColumnVectorizedReader` and 
`spark.sql.parquet.enableNestedColumnVectorizedReader` to `false`.
   - Since Spark 3.4, `BinaryType` is not supported in CSV datasource. In Spark 
3.3 or earlier, users can write binary columns in CSV datasource, but the 
output content in CSV files is `Object.toString()` which is meaningless; 
meanwhile, if users read CSV tables with binary columns, Spark will throw an 
`Unsupported type: binary` exception.
   - Since Spark 3.4, bloom filter joins are enabled by default. To restore the 
legacy behavior, set `spark.sql.optimizer.runtime.bloomFilter.enabled` to 
`false`.
+  - Since Spark 3.4, when schema inference on external Parquet files, INT64 
timestamps with annotation `isAdjustedToUTC=false` will be inferred as 
TimestampNTZ type instead of Timestamp type. To restore the legacy behavior, 
set `spark.sql.parquet.inferTimestampNTZ.enabled` to `false`.
 
 ## Upgrading from Spark SQL 3.2 to 3.3
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-47370][DOC] Add migration doc: TimestampNTZ type inference on Parquet files

2024-03-12 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 6629b9a6a5ae [SPARK-47370][DOC] Add migration doc: TimestampNTZ type 
inference on Parquet files
6629b9a6a5ae is described below

commit 6629b9a6a5ae35db486dc69ce1ce5a86246daf1d
Author: Gengliang Wang 
AuthorDate: Tue Mar 12 15:11:34 2024 -0700

[SPARK-47370][DOC] Add migration doc: TimestampNTZ type inference on 
Parquet files

### What changes were proposed in this pull request?

Add migration doc: TimestampNTZ type inference on Parquet files

### Why are the changes needed?

Update docs. The behavior change was not mentioned in the SQL migration 
guide

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

It's just doc change
### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45482 from gengliangwang/ntzMigrationDoc.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
(cherry picked from commit 621f2c88f3e56257ee517d65e093d32fb44b783e)
Signed-off-by: Gengliang Wang 
---
 docs/sql-migration-guide.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 2eba9500e907..0e54c33c6d12 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -49,6 +49,7 @@ license: |
   - Since Spark 3.4, vectorized readers are enabled by default for the nested 
data types (array, map and struct). To restore the legacy behavior, set 
`spark.sql.orc.enableNestedColumnVectorizedReader` and 
`spark.sql.parquet.enableNestedColumnVectorizedReader` to `false`.
   - Since Spark 3.4, `BinaryType` is not supported in CSV datasource. In Spark 
3.3 or earlier, users can write binary columns in CSV datasource, but the 
output content in CSV files is `Object.toString()` which is meaningless; 
meanwhile, if users read CSV tables with binary columns, Spark will throw an 
`Unsupported type: binary` exception.
   - Since Spark 3.4, bloom filter joins are enabled by default. To restore the 
legacy behavior, set `spark.sql.optimizer.runtime.bloomFilter.enabled` to 
`false`.
+  - Since Spark 3.4, when schema inference on external Parquet files, INT64 
timestamps with annotation `isAdjustedToUTC=false` will be inferred as 
TimestampNTZ type instead of Timestamp type. To restore the legacy behavior, 
set `spark.sql.parquet.inferTimestampNTZ.enabled` to `false`.
 
 ## Upgrading from Spark SQL 3.2 to 3.3
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47370][DOC] Add migration doc: TimestampNTZ type inference on Parquet files

2024-03-12 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 621f2c88f3e5 [SPARK-47370][DOC] Add migration doc: TimestampNTZ type 
inference on Parquet files
621f2c88f3e5 is described below

commit 621f2c88f3e56257ee517d65e093d32fb44b783e
Author: Gengliang Wang 
AuthorDate: Tue Mar 12 15:11:34 2024 -0700

[SPARK-47370][DOC] Add migration doc: TimestampNTZ type inference on 
Parquet files

### What changes were proposed in this pull request?

Add migration doc: TimestampNTZ type inference on Parquet files

### Why are the changes needed?

Update docs. The behavior change was not mentioned in the SQL migration 
guide

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

It's just doc change
### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45482 from gengliangwang/ntzMigrationDoc.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 docs/sql-migration-guide.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 3d0c7280496a..9f92d6fc8347 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -67,6 +67,7 @@ license: |
   - Since Spark 3.4, vectorized readers are enabled by default for the nested 
data types (array, map and struct). To restore the legacy behavior, set 
`spark.sql.orc.enableNestedColumnVectorizedReader` and 
`spark.sql.parquet.enableNestedColumnVectorizedReader` to `false`.
   - Since Spark 3.4, `BinaryType` is not supported in CSV datasource. In Spark 
3.3 or earlier, users can write binary columns in CSV datasource, but the 
output content in CSV files is `Object.toString()` which is meaningless; 
meanwhile, if users read CSV tables with binary columns, Spark will throw an 
`Unsupported type: binary` exception.
   - Since Spark 3.4, bloom filter joins are enabled by default. To restore the 
legacy behavior, set `spark.sql.optimizer.runtime.bloomFilter.enabled` to 
`false`.
+  - Since Spark 3.4, when schema inference on external Parquet files, INT64 
timestamps with annotation `isAdjustedToUTC=false` will be inferred as 
TimestampNTZ type instead of Timestamp type. To restore the legacy behavior, 
set `spark.sql.parquet.inferTimestampNTZ.enabled` to `false`.
 
 ## Upgrading from Spark SQL 3.2 to 3.3
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-42285][DOC] Update Parquet data source doc on the timestamp_ntz inference option

2024-02-16 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 5067447bf9a4 [SPARK-42285][DOC] Update Parquet data source doc on the 
timestamp_ntz inference option
5067447bf9a4 is described below

commit 5067447bf9a420b2f972a03351058ebfa61e0e41
Author: Gengliang Wang 
AuthorDate: Fri Feb 16 18:21:19 2024 -0800

[SPARK-42285][DOC] Update Parquet data source doc on the timestamp_ntz 
inference option

### What changes were proposed in this pull request?

This is a follow-up of https://github.com/apache/spark/pull/39856. The 
configuration changes should be reflected in the Parquet data source doc

### Why are the changes needed?

To fix doc
### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Preview:
https://github.com/apache/spark/assets/1097932/618df731-49ad-49e7-afa2-22381cb3bbef;>

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45145 from gengliangwang/changeConfigName.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
(cherry picked from commit dc2f2673a73ccde44b59cada00e95e869ad64c01)
Signed-off-by: Gengliang Wang 
---
 docs/sql-data-sources-parquet.md | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/docs/sql-data-sources-parquet.md b/docs/sql-data-sources-parquet.md
index f49bbd7a9d04..707871e79802 100644
--- a/docs/sql-data-sources-parquet.md
+++ b/docs/sql-data-sources-parquet.md
@@ -616,14 +616,15 @@ Configuration of Parquet can be done using the `setConf` 
method on `SparkSession
   3.3.0
 
 
-  spark.sql.parquet.timestampNTZ.enabled
+  spark.sql.parquet.inferTimestampNTZ.enabled
   true
   
-Enables TIMESTAMP_NTZ support for Parquet reads and writes.
-When enabled, TIMESTAMP_NTZ values are written as Parquet 
timestamp
-columns with annotation isAdjustedToUTC = false and are inferred in a 
similar way.
-When disabled, such values are read as TIMESTAMP_LTZ and have 
to be
-converted to TIMESTAMP_LTZ for writes.
+When enabled, Parquet timestamp columns with annotation 
isAdjustedToUTC = false
+are inferred as TIMESTAMP_NTZ type during schema inference. Otherwise, all 
the Parquet
+timestamp columns are inferred as TIMESTAMP_LTZ types. Note that Spark 
writes the
+output schema into Parquet's footer metadata on file writing and leverages 
it on file
+reading. Thus this configuration only affects the schema inference on 
Parquet files
+which are not written by Spark.
   
   3.4.0
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-42285][DOC] Update Parquet data source doc on the timestamp_ntz inference option

2024-02-16 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dc2f2673a73c [SPARK-42285][DOC] Update Parquet data source doc on the 
timestamp_ntz inference option
dc2f2673a73c is described below

commit dc2f2673a73ccde44b59cada00e95e869ad64c01
Author: Gengliang Wang 
AuthorDate: Fri Feb 16 18:21:19 2024 -0800

[SPARK-42285][DOC] Update Parquet data source doc on the timestamp_ntz 
inference option

### What changes were proposed in this pull request?

This is a follow-up of https://github.com/apache/spark/pull/39856. The 
configuration changes should be reflected in the Parquet data source doc

### Why are the changes needed?

To fix doc
### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Preview:
https://github.com/apache/spark/assets/1097932/618df731-49ad-49e7-afa2-22381cb3bbef;>

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45145 from gengliangwang/changeConfigName.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 docs/sql-data-sources-parquet.md | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/docs/sql-data-sources-parquet.md b/docs/sql-data-sources-parquet.md
index e944db24d76b..f5c5ccd3b89a 100644
--- a/docs/sql-data-sources-parquet.md
+++ b/docs/sql-data-sources-parquet.md
@@ -616,14 +616,15 @@ Configuration of Parquet can be done via `spark.conf.set` 
or by running
   3.3.0
 
 
-  spark.sql.parquet.timestampNTZ.enabled
+  spark.sql.parquet.inferTimestampNTZ.enabled
   true
   
-Enables TIMESTAMP_NTZ support for Parquet reads and writes.
-When enabled, TIMESTAMP_NTZ values are written as Parquet 
timestamp
-columns with annotation isAdjustedToUTC = false and are inferred in a 
similar way.
-When disabled, such values are read as TIMESTAMP_LTZ and have 
to be
-converted to TIMESTAMP_LTZ for writes.
+When enabled, Parquet timestamp columns with annotation 
isAdjustedToUTC = false
+are inferred as TIMESTAMP_NTZ type during schema inference. Otherwise, all 
the Parquet
+timestamp columns are inferred as TIMESTAMP_LTZ types. Note that Spark 
writes the
+output schema into Parquet's footer metadata on file writing and leverages 
it on file
+reading. Thus this configuration only affects the schema inference on 
Parquet files
+which are not written by Spark.
   
   3.4.0
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46849][SQL][FOLLOWUP] Column default value cannot reference session variables

2024-02-05 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1e13243ca394 [SPARK-46849][SQL][FOLLOWUP] Column default value cannot 
reference session variables
1e13243ca394 is described below

commit 1e13243ca394b04e0b1d2972d7c8eab2c63414e5
Author: Wenchen Fan 
AuthorDate: Mon Feb 5 13:31:58 2024 -0800

[SPARK-46849][SQL][FOLLOWUP] Column default value cannot reference session 
variables

### What changes were proposed in this pull request?

One more followup of https://github.com/apache/spark/pull/44876 .

Previously, by using a fake analyzer, session variables can't be resolved 
and thus can't be in the default value expression. Now we use the actual 
analyzer and optimizer, session variables can be properly resolved and replaced 
with literals at the end. This is not expected as default value shouldn't 
references temporary things.

This PR fixes this by explicitly failing the check if default value 
references session variables.

### Why are the changes needed?

fix behavior changes.

### Does this PR introduce _any_ user-facing change?

no, the behavior change is not released.

### How was this patch tested?

new test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #45032 from cloud-fan/default-value.

Authored-by: Wenchen Fan 
Signed-off-by: Gengliang Wang 
---
 .../catalyst/util/ResolveDefaultColumnsUtil.scala  |  5 ++-
 .../org/apache/spark/sql/sources/InsertSuite.scala | 36 +-
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
index da03de73557f..a2bfc6e08da8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
@@ -468,10 +468,13 @@ object ResolveDefaultColumns extends QueryErrorsBase
   }
   // Our analysis check passes here. We do not further inspect whether the
   // expression is `foldable` here, as the plan is not optimized yet.
-} else if (default.references.nonEmpty) {
+}
+
+if (default.references.nonEmpty || 
default.exists(_.isInstanceOf[VariableReference])) {
   // Ideally we should let the rest of `CheckAnalysis` report errors about 
why the default
   // expression is unresolved. But we should report a better error here if 
the default
   // expression references columns, which means it's not a constant for 
sure.
+  // Note that, session variable should be considered as non-constant as 
well.
   throw QueryCompilationErrors.defaultValueNotConstantError(
 statement, colName, default.originalSQL)
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala
index 704df9d78ffa..2cc434318aa2 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala
@@ -28,6 +28,7 @@ import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
CatalogTable, CatalogTableType}
 import org.apache.spark.sql.catalyst.parser.ParseException
+import org.apache.spark.sql.connector.FakeV2Provider
 import org.apache.spark.sql.execution.datasources.DataSourceUtils
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.internal.SQLConf.PartitionOverwriteMode
@@ -1150,7 +1151,7 @@ class InsertSuite extends DataSourceTest with 
SharedSparkSession {
   }
 
   test("SPARK-38336 INSERT INTO statements with tables with default columns: 
negative tests") {
-// The default value fails to analyze.
+// The default value references columns.
 withTable("t") {
   checkError(
 exception = intercept[AnalysisException] {
@@ -1162,6 +1163,39 @@ class InsertSuite extends DataSourceTest with 
SharedSparkSession {
   "colName" -> "`s`",
   "defaultValue" -> "badvalue"))
 }
+try {
+  // The default value references session variables.
+  sql("DECLARE test_var INT")
+  withTable("t") {
+checkError(
+  exception = intercept[AnalysisException] {
+sql("create table t(i boolean, s int default test_var) using 
parquet")
+  },
+  // V1 command still u

(spark) branch master updated: [SPARK-46964][SQL] Change the signature of the hllInvalidLgK query execution error to take an integer as 4th argument

2024-02-02 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0965412d5174 [SPARK-46964][SQL] Change the signature of the 
hllInvalidLgK query execution error to take an integer as 4th argument
0965412d5174 is described below

commit 0965412d517441a15d4da0b5fc8fe34a9b5ec40f
Author: Menelaos Karavelas 
AuthorDate: Fri Feb 2 11:55:21 2024 -0800

[SPARK-46964][SQL] Change the signature of the hllInvalidLgK query 
execution error to take an integer as 4th argument

### What changes were proposed in this pull request?

The current signature of the `hllInvalidLgK` query execution error takes 
four arguments:
1. The SQL function (a string).
2. The minimum possible `lgk` value (an integer).
3. The maximum possible `lgk` value (an integer).
4. The actual invalid `lgk` value (a string).

There is no meaningful reason for the 4th argument to be a string. In this 
PR we change it to be an integer, just like the minimum and maximum valid 
values.

### Why are the changes needed?

Seeking to make the signature of the `hllInvalidLgK` error more meaningful 
and self-consistent.

### Does this PR introduce _any_ user-facing change?

No, there is no user-facing changes because of this PR. This is just an 
internal change.

### How was this patch tested?

Existing tests suffice.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44995 from mkaravel/hll-invalid-lgk-error-arg.

Authored-by: Menelaos Karavelas 
Signed-off-by: Gengliang Wang 
---
 .../sql/catalyst/expressions/aggregate/datasketchesAggregates.scala   | 2 +-
 .../main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala
index 595ae32d77b9..02925f3625d2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala
@@ -196,7 +196,7 @@ object HllSketchAgg {
   def checkLgK(lgConfigK: Int): Unit = {
 if (lgConfigK < minLgConfigK || lgConfigK > maxLgConfigK) {
   throw QueryExecutionErrors.hllInvalidLgK(function = "hll_sketch_agg",
-min = minLgConfigK, max = maxLgConfigK, value = lgConfigK.toString)
+min = minLgConfigK, max = maxLgConfigK, value = lgConfigK)
 }
   }
 }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index 9ff076c5fd50..af5cafdc8a3a 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -2601,14 +2601,14 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase with ExecutionE
   cause = e)
   }
 
-  def hllInvalidLgK(function: String, min: Int, max: Int, value: String): 
Throwable = {
+  def hllInvalidLgK(function: String, min: Int, max: Int, value: Int): 
Throwable = {
 new SparkRuntimeException(
   errorClass = "HLL_INVALID_LG_K",
   messageParameters = Map(
 "function" -> toSQLId(function),
 "min" -> toSQLValue(min, IntegerType),
 "max" -> toSQLValue(max, IntegerType),
-"value" -> value))
+"value" -> toSQLValue(value, IntegerType)))
   }
 
   def hllInvalidInputSketchBuffer(function: String): Throwable = {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-46637][DOCS] Enhancing the Visual Appeal of Spark doc website

2024-01-09 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new d3e30848084 [SPARK-46637][DOCS] Enhancing the Visual Appeal of Spark 
doc website
d3e30848084 is described below

commit d3e3084808453769ba0cd4278ee8650e40c185ea
Author: Gengliang Wang 
AuthorDate: Wed Jan 10 09:32:30 2024 +0900

[SPARK-46637][DOCS] Enhancing the Visual Appeal of Spark doc website

### What changes were proposed in this pull request?

Enhance the Visual Appeal of Spark doc website after 
https://github.com/apache/spark/pull/40269:
 1. There is a weird indent on the top right side of the first 
paragraph of the Spark 3.5.0 doc overview page
Before this PR
https://github.com/apache/spark/assets/1097932/84d21ca1-a4d0-4bd4-8f20-a34fa5db4000;>

After this PR:
https://github.com/apache/spark/assets/1097932/4ffc0d5a-ed75-44c5-b20a-475ff401afa8;>

 2. All the titles are too big and therefore less readable. In the 
website https://spark.apache.org/downloads.html, titles are h2 while in doc 
site https://spark.apache.org/docs/latest/ titles are h1. So we should make the 
font size of titles smaller.
Before this PR:
https://github.com/apache/spark/assets/1097932/5bbbd9eb-432a-42c0-98be-ff00a9099cd6;>
After this PR:
https://github.com/apache/spark/assets/1097932/dc94c1fb-6ac1-41a8-b4a4-19b3034125d7;>

 3. The banner image can't be displayed correct. Even when it shows up, 
it will be hover by the text. To make it simple, let's not show the banner 
image as we did in https://spark.apache.org/docs/3.4.2/
https://github.com/apache/spark/assets/1097932/f6d34261-a352-44e2-9633-6e96b311a0b3;>
https://github.com/apache/spark/assets/1097932/c49ce6b6-13d9-4d8f-97a9-7ed8b037be57;>

### Why are the changes needed?

Improve the Visual Appeal of Spark doc website

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manually build doc and verify on local setup.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #44642 from gengliangwang/enhance_doc.

Authored-by: Gengliang Wang 
Signed-off-by: Hyukjin Kwon 
---
 docs/_layouts/global.html  |  26 +++---
 docs/css/custom.css|  35 ++-
 docs/img/spark-hero-thin-light.jpg | Bin 278664 -> 0 bytes
 3 files changed, 25 insertions(+), 36 deletions(-)

diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
index 8c4435fdf31..5116472eaa7 100755
--- a/docs/_layouts/global.html
+++ b/docs/_layouts/global.html
@@ -138,25 +138,21 @@
 
 {% if page.url == "/" %}
 
-
-
 
 
   Apache Spark - A Unified 
engine for large-scale data analytics
 
-
-  
-Apache Spark is a unified analytics engine for large-scale 
data processing.
-It provides high-level APIs in Java, Scala, Python and R,
-and an optimized engine that supports general execution 
graphs.
-It also supports a rich set of higher-level tools including
-Spark SQL for SQL 
and structured data processing,
-pandas API on Spark 
for pandas workloads,
-MLlib for machine learning,
-GraphX for 
graph processing,
- and Structured Streaming
- for incremental computation and stream processing.
-  
+
+  Apache Spark is a unified analytics engine for large-scale 
data processing.
+  It provides high-level APIs in Java, Scala, Python and R,
+  and an optimized engine that supports general execution 
graphs.
+  It also supports a rich set of higher-level tools including
+  Spark SQL for SQL 
and structured data processing,
+  pandas API on Spark 
for pandas workloads,
+  MLlib for machine learning,
+  GraphX for graph 
processing,
+   and Structured Streaming
+   for incremental computation and stream processing.
 
 
   
diff --git a/docs/css/custom.css b/docs/css/custom.css
index 1239c0ed440..8158938866c 100644
--- a/docs/css/custom.css
+++ b/docs/css/custom.css
@@ -95,18 +95,7 @@ section {
   border-color: transparent;
 }
 
-.hero-banner .bg {
-  background: url(/img/spark-hero-thin-light.jpg) no-repeat;
-  transform: translate(36%, 0%);
-  height: 475px;
-  top: 0;

(spark) branch branch-3.5 updated: [SPARK-46396][SQL] Timestamp inference should not throw exception

2023-12-14 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 908c472728f2 [SPARK-46396][SQL] Timestamp inference should not throw 
exception
908c472728f2 is described below

commit 908c472728f24034baf0b59f03b04ca148eabeca
Author: Gengliang Wang 
AuthorDate: Thu Dec 14 00:06:22 2023 -0800

[SPARK-46396][SQL] Timestamp inference should not throw exception

### What changes were proposed in this pull request?

When setting `spark.sql.legacy.timeParserPolicy=LEGACY`, Spark will use the 
LegacyFastTimestampFormatter to infer potential timestamp columns. The 
inference shouldn't throw exception.

However, when the input is 23012150952, there is exception:

```

For input string: "23012150952"

java.lang.NumberFormatException: For input string: "23012150952"

at 
java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)

at java.base/java.lang.Integer.parseInt(Integer.java:668)

at java.base/java.lang.Integer.parseInt(Integer.java:786)

at 
org.apache.commons.lang3.time.FastDateParser$NumberStrategy.parse(FastDateParser.java:304)

at 
org.apache.commons.lang3.time.FastDateParser.parse(FastDateParser.java:1045)

at 
org.apache.commons.lang3.time.FastDateFormat.parse(FastDateFormat.java:651)

at 
org.apache.spark.sql.catalyst.util.LegacyFastTimestampFormatter.parseOptional(TimestampFormatter.scala:418)

```

This PR is to fix the issue.

### Why are the changes needed?

Bug fix, Timestamp inference should not throw exception
### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

New test case + existing tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #44338 from gengliangwang/fixParseOptional.
    
Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
(cherry picked from commit 4a79ae9d821e9b04fbe949251050c3e4819dff92)
Signed-off-by: Gengliang Wang 
---
 .../apache/spark/sql/catalyst/util/TimestampFormatter.scala  | 12 
 .../spark/sql/catalyst/util/TimestampFormatterSuite.scala|  3 ++-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
index 55eee41c14ca..0866cee9334c 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
@@ -414,10 +414,14 @@ class LegacyFastTimestampFormatter(
 
   override def parseOptional(s: String): Option[Long] = {
 cal.clear() // Clear the calendar because it can be re-used many times
-if (fastDateFormat.parse(s, new ParsePosition(0), cal)) {
-  Some(extractMicros(cal))
-} else {
-  None
+try {
+  if (fastDateFormat.parse(s, new ParsePosition(0), cal)) {
+Some(extractMicros(cal))
+  } else {
+None
+  }
+} catch {
+  case NonFatal(_) => None
 }
   }
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
index 2134a0d6ecd3..27d60815766d 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
@@ -502,10 +502,11 @@ class TimestampFormatterSuite extends 
DatetimeFormatterSuite {
 
 assert(fastFormatter.parseOptional("2023-12-31 
23:59:59.9990").contains(170406719000L))
 assert(fastFormatter.parseOptional("abc").isEmpty)
+assert(fastFormatter.parseOptional("23012150952").isEmpty)
 
 assert(simpleFormatter.parseOptional("2023-12-31 
23:59:59.9990").contains(170406720899L))
 assert(simpleFormatter.parseOptional("abc").isEmpty)
-
+assert(simpleFormatter.parseOptional("23012150952").isEmpty)
   }
 
   test("SPARK-45424: do not return optional parse results when only prefix 
match") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (f1e5a136fa79 -> 4a79ae9d821e)

2023-12-14 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f1e5a136fa79 [SPARK-46393][SQL] Classify exceptions in the JDBC table 
catalog
 add 4a79ae9d821e [SPARK-46396][SQL] Timestamp inference should not throw 
exception

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/util/TimestampFormatter.scala  | 12 
 .../spark/sql/catalyst/util/TimestampFormatterSuite.scala|  3 ++-
 2 files changed, 10 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark-website) branch asf-site updated: Fix the CSS of Spark 3.5.0 doc's generated tables (#492)

2023-11-30 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 0ceaaaf528 Fix the CSS of Spark 3.5.0 doc's generated tables (#492)
0ceaaaf528 is described below

commit 0ceaaaf528ec1d0201e1eab1288f37cce607268b
Author: Gengliang Wang 
AuthorDate: Thu Nov 30 15:06:18 2023 -0800

Fix the CSS of Spark 3.5.0 doc's generated tables (#492)

After https://github.com/apache/spark/pull/40269, there is no border in the 
generated tables of Spark doc(for example,  
[sql-ref-ansi-compliance.html](https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html))
 . Currently only the doc of Spark 3.5.0 is affected.
This PR is to apply the changes in 
https://github.com/apache/spark/pull/44096 on the current Spark 3.5.0 doc by
1. change the `site/docs/3.5.0/css/custom.css`
2. Execute `sed  -i ''  's/table class="table table-striped"/table/' 
*.html` under `site/docs/3.5.0/` directory.

This should be a safe change. I have verified it on my local env.
---
 site/docs/3.5.0/building-spark.html|  2 +-
 site/docs/3.5.0/cluster-overview.html  |  2 +-
 site/docs/3.5.0/configuration.html | 40 +++---
 site/docs/3.5.0/css/custom.css | 13 +++
 site/docs/3.5.0/ml-classification-regression.html  | 14 
 site/docs/3.5.0/ml-clustering.html |  8 ++---
 .../3.5.0/mllib-classification-regression.html |  2 +-
 site/docs/3.5.0/mllib-decision-tree.html   |  2 +-
 site/docs/3.5.0/mllib-ensembles.html   |  2 +-
 site/docs/3.5.0/mllib-evaluation-metrics.html  | 10 +++---
 site/docs/3.5.0/mllib-linear-methods.html  |  4 +--
 site/docs/3.5.0/mllib-pmml-model-export.html   |  2 +-
 site/docs/3.5.0/monitoring.html| 10 +++---
 site/docs/3.5.0/rdd-programming-guide.html |  8 ++---
 site/docs/3.5.0/running-on-kubernetes.html |  8 ++---
 site/docs/3.5.0/running-on-mesos.html  |  2 +-
 site/docs/3.5.0/running-on-yarn.html   |  8 ++---
 site/docs/3.5.0/security.html  | 26 +++---
 site/docs/3.5.0/spark-standalone.html  | 12 +++
 site/docs/3.5.0/sparkr.html|  6 ++--
 site/docs/3.5.0/sql-data-sources-avro.html | 12 +++
 site/docs/3.5.0/sql-data-sources-csv.html  |  2 +-
 site/docs/3.5.0/sql-data-sources-hive-tables.html  |  4 +--
 site/docs/3.5.0/sql-data-sources-jdbc.html |  2 +-
 site/docs/3.5.0/sql-data-sources-json.html |  2 +-
 .../sql-data-sources-load-save-functions.html  |  2 +-
 site/docs/3.5.0/sql-data-sources-orc.html  |  4 +--
 site/docs/3.5.0/sql-data-sources-parquet.html  |  4 +--
 site/docs/3.5.0/sql-data-sources-text.html |  2 +-
 .../sql-distributed-sql-engine-spark-sql-cli.html  |  4 +--
 .../docs/3.5.0/sql-error-conditions-sqlstates.html | 26 +++---
 site/docs/3.5.0/sql-migration-guide.html   |  4 +--
 site/docs/3.5.0/sql-performance-tuning.html| 16 -
 site/docs/3.5.0/storage-openstack-swift.html   |  2 +-
 site/docs/3.5.0/streaming-custom-receivers.html|  2 +-
 site/docs/3.5.0/streaming-programming-guide.html   | 10 +++---
 .../structured-streaming-kafka-integration.html| 20 +--
 .../structured-streaming-programming-guide.html| 12 +++
 site/docs/3.5.0/submitting-applications.html   |  2 +-
 site/docs/3.5.0/web-ui.html|  2 +-
 40 files changed, 164 insertions(+), 151 deletions(-)

diff --git a/site/docs/3.5.0/building-spark.html 
b/site/docs/3.5.0/building-spark.html
index 0af9dd6517..672d686bc3 100644
--- a/site/docs/3.5.0/building-spark.html
+++ b/site/docs/3.5.0/building-spark.html
@@ -481,7 +481,7 @@ Change the major Scala version using (e.g. 2.13):
 
 Related environment variables
 
-
+
 Variable NameDefaultMeaning
 
   SPARK_PROJECT_URL
diff --git a/site/docs/3.5.0/cluster-overview.html 
b/site/docs/3.5.0/cluster-overview.html
index d6015a8686..552b24b729 100644
--- a/site/docs/3.5.0/cluster-overview.html
+++ b/site/docs/3.5.0/cluster-overview.html
@@ -216,7 +216,7 @@ The job scheduling 
overview describes this in
 
 The following table summarizes terms youll see used to refer to 
cluster concepts:
 
-
+
   
 TermMeaning
   
diff --git a/site/docs/3.5.0/configuration.html 
b/site/docs/3.5.0/configuration.html
index d6c9255302..3ca1684ffd 100644
--- a/site/docs/3.5.0/configuration.html
+++ b/site/docs/3.5.0/configuration.html
@@ -309,7 +309,7 @@ of the most common options to set are:
 
 Application Properties
 
-
+
 Property NameDefaultMeaningSince 
Version
 
   spark.app.name
@@ -694,7 +694,7 @@ of the most common options to set are:
 
 Runtime Environment
 
-
+

(spark) branch branch-3.5 updated: [SPARK-46188][DOC][3.5] Fix the CSS of Spark doc's generated tables

2023-11-30 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 00bb4ad46e37 [SPARK-46188][DOC][3.5] Fix the CSS of Spark doc's 
generated tables
00bb4ad46e37 is described below

commit 00bb4ad46e373311a6303952f3944680b08e03d7
Author: Gengliang Wang 
AuthorDate: Thu Nov 30 14:56:48 2023 -0800

[SPARK-46188][DOC][3.5] Fix the CSS of Spark doc's generated tables

### What changes were proposed in this pull request?

After https://github.com/apache/spark/pull/40269, there is no border in the 
generated tables of Spark doc(for example,  
https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html) .  This PR 
is to fix it by restoring part of the table style in 
https://github.com/apache/spark/pull/40269/files#diff-309b964023ca899c9505205f36d3f4d5b36a6487e5c9b2e242204ee06bbc9ce9L26

This PR also unifies all the styles of tables by removing the `class="table 
table-striped"` in HTML style tables in markdown docs.

### Why are the changes needed?

Fix a regression in the table CSS of Spark docs

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manually build docs and verify.
Before changes:
https://github.com/apache/spark/assets/1097932/1eb7abff-65af-4c4c-bbd5-9077f38c1b43;>

After changes:
https://github.com/apache/spark/assets/1097932/be77d4c6-1279-43ec-a234-b69ee02e3dc6;>

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: ChatGPT 4

Closes #44097 from gengliangwang/fixTable3.5.

    Authored-by: Gengliang Wang 
    Signed-off-by: Gengliang Wang 
---
 docs/building-spark.md   |  2 +-
 docs/cluster-overview.md |  2 +-
 docs/configuration.md| 40 
 docs/css/custom.css  | 13 
 docs/ml-classification-regression.md | 14 -
 docs/ml-clustering.md|  8 ++---
 docs/mllib-classification-regression.md  |  2 +-
 docs/mllib-decision-tree.md  |  2 +-
 docs/mllib-ensembles.md  |  2 +-
 docs/mllib-evaluation-metrics.md | 10 +++---
 docs/mllib-linear-methods.md |  4 +--
 docs/mllib-pmml-model-export.md  |  2 +-
 docs/monitoring.md   | 10 +++---
 docs/rdd-programming-guide.md|  8 ++---
 docs/running-on-kubernetes.md|  8 ++---
 docs/running-on-mesos.md |  2 +-
 docs/running-on-yarn.md  |  8 ++---
 docs/security.md | 26 +++
 docs/spark-standalone.md | 12 +++
 docs/sparkr.md   |  6 ++--
 docs/sql-data-sources-avro.md| 12 +++
 docs/sql-data-sources-csv.md |  2 +-
 docs/sql-data-sources-hive-tables.md |  4 +--
 docs/sql-data-sources-jdbc.md|  2 +-
 docs/sql-data-sources-json.md|  2 +-
 docs/sql-data-sources-load-save-functions.md |  2 +-
 docs/sql-data-sources-orc.md |  4 +--
 docs/sql-data-sources-parquet.md |  4 +--
 docs/sql-data-sources-text.md|  2 +-
 docs/sql-distributed-sql-engine-spark-sql-cli.md |  4 +--
 docs/sql-error-conditions-sqlstates.md   | 26 +++
 docs/sql-migration-guide.md  |  4 +--
 docs/sql-performance-tuning.md   | 16 +-
 docs/storage-openstack-swift.md  |  2 +-
 docs/streaming-custom-receivers.md   |  2 +-
 docs/streaming-programming-guide.md  | 10 +++---
 docs/structured-streaming-kafka-integration.md   | 20 ++--
 docs/structured-streaming-programming-guide.md   | 12 +++
 docs/submitting-applications.md  |  2 +-
 docs/web-ui.md   |  2 +-
 40 files changed, 164 insertions(+), 151 deletions(-)

diff --git a/docs/building-spark.md b/docs/building-spark.md
index 4b8e70655d59..33d253a49dbf 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -286,7 +286,7 @@ If use an individual repository or a repository on GitHub 
Enterprise, export bel
 
 ### Related environment variables
 
-
+
 Variable NameDefaultMeaning
 
   SPARK_PROJECT_URL
diff --git a/docs/cluster-overview.md b/docs/cluster-overview.md
index 7da06a852089..34913bd97a41 100644
--- a/docs/cluster-overview.md
+++ b/docs/cluster-overview.md
@@ -91,7 +91,7 @@ The [job scheduling overview](job-scheduling.html) describ

(spark) branch master updated (9bb358b51e30 -> 99b80a7f17e2)

2023-11-30 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 9bb358b51e30 [SPARK-46170][SQL] Support inject adaptive query post 
planner strategy rules in SparkSessionExtensions
 add 99b80a7f17e2 [SPARK-46188][DOC] Fix the CSS of Spark doc's generated 
tables

No new revisions were added by this update.

Summary of changes:
 docs/building-spark.md   |  2 +-
 docs/cluster-overview.md |  2 +-
 docs/configuration.md| 38 
 docs/css/custom.css  | 13 
 docs/ml-classification-regression.md | 14 -
 docs/ml-clustering.md|  8 ++---
 docs/mllib-classification-regression.md  |  2 +-
 docs/mllib-decision-tree.md  |  2 +-
 docs/mllib-ensembles.md  |  2 +-
 docs/mllib-evaluation-metrics.md | 10 +++
 docs/mllib-linear-methods.md |  4 +--
 docs/mllib-pmml-model-export.md  |  2 +-
 docs/monitoring.md   | 10 +++
 docs/rdd-programming-guide.md|  8 ++---
 docs/running-on-kubernetes.md|  8 ++---
 docs/running-on-yarn.md  |  8 ++---
 docs/security.md | 26 
 docs/spark-standalone.md | 14 -
 docs/sparkr.md   |  6 ++--
 docs/sql-data-sources-avro.md| 12 
 docs/sql-data-sources-csv.md |  2 +-
 docs/sql-data-sources-hive-tables.md |  4 +--
 docs/sql-data-sources-jdbc.md|  2 +-
 docs/sql-data-sources-json.md|  2 +-
 docs/sql-data-sources-load-save-functions.md |  2 +-
 docs/sql-data-sources-orc.md |  4 +--
 docs/sql-data-sources-parquet.md |  4 +--
 docs/sql-data-sources-protobuf.md|  6 ++--
 docs/sql-data-sources-text.md|  2 +-
 docs/sql-data-sources-xml.md |  2 +-
 docs/sql-distributed-sql-engine-spark-sql-cli.md |  4 +--
 docs/sql-error-conditions-sqlstates.md   | 26 
 docs/sql-migration-guide.md  |  4 +--
 docs/sql-performance-tuning.md   | 16 +-
 docs/storage-openstack-swift.md  |  2 +-
 docs/streaming-custom-receivers.md   |  2 +-
 docs/streaming-programming-guide.md  | 10 +++
 docs/structured-streaming-kafka-integration.md   | 20 ++---
 docs/structured-streaming-programming-guide.md   | 12 
 docs/structured-streaming-state-data-source.md   |  8 ++---
 docs/submitting-applications.md  |  2 +-
 docs/web-ui.md   |  2 +-
 42 files changed, 171 insertions(+), 158 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46144][SQL] Fail INSERT INTO ... REPLACE statement if the condition contains subquery

2023-11-29 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c162f6df1b3d [SPARK-46144][SQL] Fail INSERT INTO ... REPLACE statement 
if the condition contains subquery
c162f6df1b3d is described below

commit c162f6df1b3d6ccc2944b6fb6db033482c9f01ee
Author: Gengliang Wang 
AuthorDate: Wed Nov 29 21:19:58 2023 -0800

[SPARK-46144][SQL] Fail INSERT INTO ... REPLACE statement if the condition 
contains subquery

### What changes were proposed in this pull request?

For the following query:
```
INSERT INTO tbl REPLACE WHERE id = (select c2 from values(1) as t(c2)) 
SELECT * FROM source
```
There will be an analysis error:
```
[UNRESOLVED_COLUMN.WITHOUT_SUGGESTION] A column, variable, or function 
parameter with name `c2` cannot be resolved.  SQLSTATE: 42703; line 1 pos 51;
'OverwriteByExpression RelationV2[id#27L, data#28] testcat.tbl testcat.tbl, 
(id#27L = scalar-subquery#26 []), false
```
The error message is confusing. The actual reason is the 
OverwriteByExpression plan doesn't support subqueries. While supporting the 
feature is non-trivial, this PR is to improve the error message as
```
[UNSUPPORTED_FEATURE.OVERWRITE_BY_SUBQUERY] The feature is not supported: 
INSERT OVERWRITE with a subquery condition. SQLSTATE: 0A000; line 1 pos 43;
```

### Why are the changes needed?

Error message improvement
### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

New UT

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #44060 from gengliangwang/replace.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../src/main/resources/error/error-classes.json|  5 +
 ...r-conditions-unsupported-feature-error-class.md |  4 
 .../sql/catalyst/analysis/CheckAnalysis.scala  |  5 +
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 26 ++
 4 files changed, 40 insertions(+)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 5b70edf249d1..9e0019b34728 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -3529,6 +3529,11 @@
   "Unable to convert  of Orc to data type ."
 ]
   },
+  "OVERWRITE_BY_SUBQUERY" : {
+"message" : [
+  "INSERT OVERWRITE with a subquery condition."
+]
+  },
   "PANDAS_UDAF_IN_PIVOT" : {
 "message" : [
   "Pandas user defined aggregate function in the PIVOT clause."
diff --git a/docs/sql-error-conditions-unsupported-feature-error-class.md 
b/docs/sql-error-conditions-unsupported-feature-error-class.md
index 0541b9d0589e..1143aff634c2 100644
--- a/docs/sql-error-conditions-unsupported-feature-error-class.md
+++ b/docs/sql-error-conditions-unsupported-feature-error-class.md
@@ -121,6 +121,10 @@ The target JDBC server hosting table `` does 
not support ALTER TABLE
 
 Unable to convert `` of Orc to data type ``.
 
+## OVERWRITE_BY_SUBQUERY
+
+INSERT OVERWRITE with a subquery condition.
+
 ## PANDAS_UDAF_IN_PIVOT
 
 Pandas user defined aggregate function in the PIVOT clause.
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index 3843901a2e01..ea1af1d3c8cd 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -271,6 +271,11 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
   case _ =>
 }
 
+  case o: OverwriteByExpression if 
o.deleteExpr.exists(_.isInstanceOf[SubqueryExpression]) =>
+o.deleteExpr.failAnalysis (
+  errorClass = "UNSUPPORTED_FEATURE.OVERWRITE_BY_SUBQUERY",
+  messageParameters = Map.empty)
+
   case operator: LogicalPlan =>
 operator transformExpressionsDown {
   // Check argument data types of higher-order functions downwards 
first.
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
index c2e759efe402..b92b512aa1d3 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
@@ -3226,

(spark) branch branch-3.5 updated: [SPARK-43380][SQL][FOLLOW-UP] Fix slowdown in Avro read

2023-10-31 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 64242bf6a64 [SPARK-43380][SQL][FOLLOW-UP] Fix slowdown in Avro read
64242bf6a64 is described below

commit 64242bf6a6425274b83bc1191230437c2d3fbc71
Author: zeruibao 
AuthorDate: Tue Oct 31 16:46:40 2023 -0700

[SPARK-43380][SQL][FOLLOW-UP] Fix slowdown in Avro read

### What changes were proposed in this pull request?
Fix slowdown in Avro read. There is a 
https://github.com/apache/spark/pull/42503 that causes the performance 
regression. It seems that `SQLConf.get.getConf(confKey)` is very costly. Move 
it out of `newWriter` function.

### Why are the changes needed?
Need to fix the performance regression of Avro read.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing UT test

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43606 from zeruibao/SPARK-43380-FIX-SLOWDOWN.

Authored-by: zeruibao 
Signed-off-by: Gengliang Wang 
(cherry picked from commit 45f73bc69655a236323be1bcb2988341d2aa5203)
Signed-off-by: Gengliang Wang 
---
 .../src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala  | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git 
a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
 
b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
index fe0bd7392b6..ec34d10a5ff 100644
--- 
a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
+++ 
b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
@@ -105,6 +105,9 @@ private[sql] class AvroDeserializer(
   s"Cannot convert Avro type $rootAvroType to SQL type 
${rootCatalystType.sql}.", ise)
   }
 
+  private lazy val preventReadingIncorrectType = !SQLConf.get
+.getConf(SQLConf.LEGACY_AVRO_ALLOW_INCOMPATIBLE_SCHEMA)
+
   def deserialize(data: Any): Option[Any] = converter(data)
 
   /**
@@ -122,8 +125,6 @@ private[sql] class AvroDeserializer(
 s"schema is incompatible (avroType = $avroType, sqlType = 
${catalystType.sql})"
 
 val realDataType = SchemaConverters.toSqlType(avroType, 
useStableIdForUnionType).dataType
-val confKey = SQLConf.LEGACY_AVRO_ALLOW_INCOMPATIBLE_SCHEMA
-val preventReadingIncorrectType = !SQLConf.get.getConf(confKey)
 
 (avroType.getType, catalystType) match {
   case (NULL, NullType) => (updater, ordinal, _) =>


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-43380][SQL][FOLLOW-UP] Fix slowdown in Avro read

2023-10-31 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 45f73bc6965 [SPARK-43380][SQL][FOLLOW-UP] Fix slowdown in Avro read
45f73bc6965 is described below

commit 45f73bc69655a236323be1bcb2988341d2aa5203
Author: zeruibao 
AuthorDate: Tue Oct 31 16:46:40 2023 -0700

[SPARK-43380][SQL][FOLLOW-UP] Fix slowdown in Avro read

### What changes were proposed in this pull request?
Fix slowdown in Avro read. There is a 
https://github.com/apache/spark/pull/42503 that causes the performance 
regression. It seems that `SQLConf.get.getConf(confKey)` is very costly. Move 
it out of `newWriter` function.

### Why are the changes needed?
Need to fix the performance regression of Avro read.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing UT test

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43606 from zeruibao/SPARK-43380-FIX-SLOWDOWN.

Authored-by: zeruibao 
Signed-off-by: Gengliang Wang 
---
 .../src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala  | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git 
a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
 
b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
index c04fe820f0b..29b9fdf9dfb 100644
--- 
a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
+++ 
b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
@@ -105,6 +105,9 @@ private[sql] class AvroDeserializer(
   s"Cannot convert Avro type $rootAvroType to SQL type 
${rootCatalystType.sql}.", ise)
   }
 
+  private lazy val preventReadingIncorrectType = !SQLConf.get
+.getConf(SQLConf.LEGACY_AVRO_ALLOW_INCOMPATIBLE_SCHEMA)
+
   def deserialize(data: Any): Option[Any] = converter(data)
 
   /**
@@ -122,8 +125,6 @@ private[sql] class AvroDeserializer(
 s"schema is incompatible (avroType = $avroType, sqlType = 
${catalystType.sql})"
 
 val realDataType = SchemaConverters.toSqlType(avroType, 
useStableIdForUnionType).dataType
-val confKey = SQLConf.LEGACY_AVRO_ALLOW_INCOMPATIBLE_SCHEMA
-val preventReadingIncorrectType = !SQLConf.get.getConf(confKey)
 
 (avroType.getType, catalystType) match {
   case (NULL, NullType) => (updater, ordinal, _) =>


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (49f9e74973f -> af8907a0873)

2023-10-31 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 49f9e74973f [SPARK-45481][SPARK-45664][SPARK-45711][SQL][FOLLOWUP] 
Avoid magic strings copy from parquet|orc|avro compression codes
 add af8907a0873 [SPARK-45242][SQL][FOLLOWUP] Canonicalize DataFrame ID in 
CollectMetrics

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/plans/logical/basicLogicalOperators.scala | 4 
 .../scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala | 5 +
 2 files changed, 9 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-45581] Make SQLSTATE mandatory

2023-10-18 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7e82e1bc43e [SPARK-45581] Make SQLSTATE mandatory
7e82e1bc43e is described below

commit 7e82e1bc43e0297c3036d802b3a151d2b93db2f6
Author: srielau 
AuthorDate: Wed Oct 18 11:04:44 2023 -0700

[SPARK-45581] Make SQLSTATE mandatory

### What changes were proposed in this pull request?

We propose to make SQLSTATEs mandatory field when using error classes in 
the new error framework.

### Why are the changes needed?

Being able to rely on the existence of SQLSTATEs allows easier 
classification of errors as well as usage of tooling to intercept SQLSTATEs.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

A new test was added to SparkThrowableSuite to enforce SQLSTATE existence

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43412 from srielau/SPARK-45581-make-SQLSTATEs-mandatory.

Authored-by: srielau 
Signed-off-by: Gengliang Wang 
---
 common/utils/src/main/resources/error/README.md| 26 ++
 .../src/main/resources/error/error-classes.json|  9 +---
 .../org/apache/spark/ErrorClassesJSONReader.scala  |  8 +++
 .../org/apache/spark/SparkThrowableSuite.scala | 16 -
 docs/sql-error-conditions.md   |  6 ++---
 5 files changed, 41 insertions(+), 24 deletions(-)

diff --git a/common/utils/src/main/resources/error/README.md 
b/common/utils/src/main/resources/error/README.md
index ac388c29250..8d8529bea56 100644
--- a/common/utils/src/main/resources/error/README.md
+++ b/common/utils/src/main/resources/error/README.md
@@ -1,6 +1,6 @@
 # Guidelines
 
-To throw a standardized user-facing error or exception, developers should 
specify the error class
+To throw a standardized user-facing error or exception, developers should 
specify the error class, a SQLSTATE,
 and message parameters rather than an arbitrary error message.
 
 ## Usage
@@ -10,7 +10,7 @@ and message parameters rather than an arbitrary error message.
If true, use the error class `INTERNAL_ERROR` and skip to step 4.
 2. Check if an appropriate error class already exists in `error-classes.json`.
If true, use the error class and skip to step 4.
-3. Add a new class to `error-classes.json`; keep in mind the invariants below.
+3. Add a new class with a new or existing SQLSTATE to `error-classes.json`; 
keep in mind the invariants below.
 4. Check if the exception type already extends `SparkThrowable`.
If true, skip to step 6.
 5. Mix `SparkThrowable` into the exception.
@@ -26,9 +26,9 @@ Throw with arbitrary error message:
 
 `error-classes.json`
 
-"PROBLEM_BECAUSE": {
-  "message": ["Problem  because "],
-  "sqlState": "X"
+"PROBLEM_BECAUSE" : {
+  "message" : ["Problem  because "],
+  "sqlState" : "X"
 }
 
 `SparkException.scala`
@@ -70,6 +70,8 @@ Error classes are a succinct, human-readable representation 
of the error categor
 
 An uncategorized errors can be assigned to a legacy error class with the 
prefix `_LEGACY_ERROR_TEMP_` and an unused sequential number, for instance 
`_LEGACY_ERROR_TEMP_0053`.
 
+You should not introduce new uncategorized errors. Instead, convert them to 
proper errors whenever encountering them in new code.
+
  Invariants
 
 - Unique
@@ -79,7 +81,10 @@ An uncategorized errors can be assigned to a legacy error 
class with the prefix
 ### Message
 
 Error messages provide a descriptive, human-readable representation of the 
error.
-The message format accepts string parameters via the C-style printf syntax.
+The message format accepts string parameters via the HTML tag syntax: e.g. 
.
+
+The values passed to the message shoudl not themselves be messages.
+They should be: runtime-values, keywords, identifiers, or other values that 
are not translated.
 
 The quality of the error message should match the
 [guidelines](https://spark.apache.org/error-message-guidelines.html).
@@ -90,21 +95,24 @@ The quality of the error message should match the
 
 ### SQLSTATE
 
-SQLSTATE is an optional portable error identifier across SQL engines.
+SQLSTATE is an mandatory portable error identifier across SQL engines.
 SQLSTATE comprises a 2-character class value followed by a 3-character 
subclass value.
 Spark prefers to re-use existing SQLSTATEs, preferably used by multiple 
vendors.
 For extension Spark claims the 'K**' subclass range.
 If a new class is needed it will also claim the 'K0' class.
 
+Internal errors should use the 'XX' class. You can subdivide internal errors 
by component. 
+For exam

[spark] branch master updated (e3b1bb117fe9 -> 3593f8a8919d)

2023-10-06 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from e3b1bb117fe9 [SPARK-45262][SQL][TESTS][DOCS] Improve examples for 
regexp parameters
 add 3593f8a8919d [SPARK-45425][SQL] Mapped TINYINT to ShortType for 
MsSqlServerDialect

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/jdbc/MsSqlServerIntegrationSuite.scala| 12 ++--
 .../org/apache/spark/sql/jdbc/MsSqlServerDialect.scala  |  4 +++-
 .../scala/org/apache/spark/sql/jdbc/JDBCSuite.scala | 17 +
 3 files changed, 30 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44838][SQL] raise_error improvement

2023-09-27 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9109d7037f4 [SPARK-44838][SQL] raise_error improvement
9109d7037f4 is described below

commit 9109d7037f44158e72d14019eb33f9c7b8838868
Author: srielau 
AuthorDate: Wed Sep 27 10:02:44 2023 -0700

[SPARK-44838][SQL] raise_error improvement

### What changes were proposed in this pull request?

Extend the raise_error() function to a two-argument version:
raise_error(errorClassStr, errorParamMap)
This new form will accept any error class defined in error-classes.json and 
require Map to provide values for the parameters in the error 
classes template.
Externally an error raised via raise_error() is indistinguishable from an 
error raised from within the Spark engine.

The single-parameter raise_error(str) will raise USER_RAISED_EXCEPTION 
(SQLSTATE P0001 - borrowed from PostgreSQL).
USER_RAISED_EXCEPTION text is: "" which will be filled in 
with the str - value.

We will also provide `spark.sql.legacy.raiseErrorWithoutErrorClass` 
(default: false) to revert to the old behavior for the single-parameter version.

Naturally assert_true() will also return `USER_RAISED_EXCEPTION`.

 Examples
```
SELECT raise_error('VIEW_NOT_FOUND', map('relationName', '`v1`');
  [VIEW_NOT_FOUND] The view `v1` cannot be found. Verify the spelling ...

SELECT raise_error('Error!');
  [USER_RAISED_EXCEPTION] Error!

SELECT assert_true(1 < 0);
 [USER_RAISED_EXCEPTION] '(1 < 0)' is not true!

SELECT assert_true(1 < 0, 'bad!')
  [USER_RAISED_EXCEPTION] bad!
```

### Why are the changes needed?

This change moves raise_error() and assert_true() to the new error frame 
work.
It greatly expands the ability of users to raise error messages which can 
be intercepted via SQLSTATE and/or error class.

### Does this PR introduce _any_ user-facing change?

Yes, the result of assert_true() changes and raise_error() gains a new 
signature.

### How was this patch tested?

Run existing QA and add new tests for assert_true and raise_error

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #42985 from srielau/SPARK-44838-raise_error.

Lead-authored-by: srielau 
Co-authored-by: Serge Rielau 
Co-authored-by: Wenchen Fan 
Signed-off-by: Gengliang Wang 
---
 .../src/main/resources/error/error-classes.json|  26 +++
 .../org/apache/spark/ErrorClassesJSONReader.scala  |  18 ++
 .../org/apache/spark/SparkThrowableHelper.scala|  10 +-
 .../scala/org/apache/spark/sql/functions.scala |   8 +
 .../function_assert_true_with_message.explain  |   2 +-
 .../explain-results/function_raise_error.explain   |   2 +-
 .../org/apache/spark/SparkThrowableSuite.scala |   4 +-
 docs/sql-error-conditions.md   |  20 ++
 python/pyspark/sql/tests/test_functions.py |   4 +-
 .../spark/sql/catalyst/expressions/misc.scala  |  71 ---
 .../spark/sql/errors/QueryExecutionErrors.scala|  49 -
 .../org/apache/spark/sql/internal/SQLConf.scala|  14 ++
 .../expressions/ExpressionEvalHelper.scala |   4 +
 .../expressions/MiscExpressionsSuite.scala |  10 +-
 .../catalyst/optimizer/ConstantFoldingSuite.scala  |   2 +-
 .../scala/org/apache/spark/sql/functions.scala |   8 +
 .../sql-functions/sql-expression-schema.md |   2 +-
 .../analyzer-results/misc-functions.sql.out|  86 +++-
 .../resources/sql-tests/inputs/misc-functions.sql  |  22 +++
 .../sql-tests/results/misc-functions.sql.out   | 220 +++--
 .../apache/spark/sql/ColumnExpressionSuite.scala   |  26 ++-
 .../spark/sql/execution/ui/UISeleniumSuite.scala   |   9 +-
 .../sql/expressions/ExpressionInfoSuite.scala  |   1 +
 23 files changed, 551 insertions(+), 67 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index dd0190c3462..0882e387176 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -3502,6 +3502,26 @@
   "3. set \"spark.sql.legacy.allowUntypedScalaUDF\" to \"true\" and use 
this API with caution."
 ]
   },
+  "USER_RAISED_EXCEPTION" : {
+"message" : [
+  ""
+],
+"sqlState" : "P0001"
+  },
+  "USER_RAISED_EXCEPTION_PARAMETER_MISMATCH" : {
+"message" : [
+  "The `raise_error()` function was used to raise error class: 
 which expects parameters: .",
+   

  1   2   3   4   5   6   7   8   9   10   >