(spark) branch master updated: [SPARK-46587][SQL] XML: Fix XSD big integer conversion
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c63e0641f2f3 [SPARK-46587][SQL] XML: Fix XSD big integer conversion c63e0641f2f3 is described below commit c63e0641f2f39c9812b58165d1f78daa120a990b Author: Sandip Agarwala <131817656+sandip...@users.noreply.github.com> AuthorDate: Thu Jan 4 16:42:36 2024 +0900 [SPARK-46587][SQL] XML: Fix XSD big integer conversion ### What changes were proposed in this pull request? Fix XSD type conversion for some big integer types in XSDToSchema helper utility. NOTE: This is a deviation from spark-xml. ### Why are the changes needed? To correctly map XSD data types to spark data types ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? Unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #44587 from sandip-db/xml-xsd-datatype. Authored-by: Sandip Agarwala <131817656+sandip...@users.noreply.github.com> Signed-off-by: Hyukjin Kwon --- .../execution/datasources/xml/XSDToSchema.scala| 13 +-- .../datasources/xml/util/XSDToSchemaSuite.scala| 113 - 2 files changed, 119 insertions(+), 7 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XSDToSchema.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XSDToSchema.scala index 356ffd57698c..87082299615c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XSDToSchema.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XSDToSchema.scala @@ -96,17 +96,18 @@ object XSDToSchema extends Logging{ case facet: XmlSchemaTotalDigitsFacet => facet.getValue.toString.toInt }.getOrElse(38) DecimalType(totalDigits, math.min(totalDigits, fracDigits)) - case Constants.XSD_UNSIGNEDLONG => DecimalType(38, 0) + case Constants.XSD_UNSIGNEDLONG | + Constants.XSD_INTEGER | + Constants.XSD_NEGATIVEINTEGER | + Constants.XSD_NONNEGATIVEINTEGER | + Constants.XSD_NONPOSITIVEINTEGER | + Constants.XSD_POSITIVEINTEGER => DecimalType(38, 0) case Constants.XSD_DOUBLE => DoubleType case Constants.XSD_FLOAT => FloatType case Constants.XSD_BYTE => ByteType case Constants.XSD_SHORT | Constants.XSD_UNSIGNEDBYTE => ShortType - case Constants.XSD_INTEGER | - Constants.XSD_NEGATIVEINTEGER | - Constants.XSD_NONNEGATIVEINTEGER | - Constants.XSD_NONPOSITIVEINTEGER | - Constants.XSD_POSITIVEINTEGER | + case Constants.XSD_INT | Constants.XSD_UNSIGNEDSHORT => IntegerType case Constants.XSD_LONG | Constants.XSD_UNSIGNEDINT => LongType diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/xml/util/XSDToSchemaSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/xml/util/XSDToSchemaSuite.scala index 434b4655d408..1b8059340067 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/xml/util/XSDToSchemaSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/xml/util/XSDToSchemaSuite.scala @@ -23,7 +23,7 @@ import org.apache.hadoop.fs.Path import org.apache.spark.sql.execution.datasources.xml.TestUtils._ import org.apache.spark.sql.execution.datasources.xml.XSDToSchema import org.apache.spark.sql.test.SharedSparkSession -import org.apache.spark.sql.types.{ArrayType, DecimalType, FloatType, LongType, StringType} +import org.apache.spark.sql.types._ class XSDToSchemaSuite extends SharedSparkSession { @@ -183,4 +183,115 @@ class XSDToSchemaSuite extends SharedSparkSession { XSDToSchema.read(new Path("/path/not/found")) } } + + test("Basic DataTypes parsing") { +val xsdString = + """ +|http://www.w3.org/2001/XMLSchema";> +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +| +
(spark) branch master updated: [SPARK-46530][PYTHON][SQL][FOLLOW-UP] Uses path separator instead of file separator to correctly check PySpark library existence
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b303eced7f86 [SPARK-46530][PYTHON][SQL][FOLLOW-UP] Uses path separator instead of file separator to correctly check PySpark library existence b303eced7f86 is described below commit b303eced7f8639887278db34e0080ffa0c19bd0c Author: Hyukjin Kwon AuthorDate: Thu Jan 4 15:49:45 2024 +0900 [SPARK-46530][PYTHON][SQL][FOLLOW-UP] Uses path separator instead of file separator to correctly check PySpark library existence ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/44519 that fixes a mistake of separating the paths. It should use `Files.pathSeparator`. ### Why are the changes needed? It works with testing mode, but it doesn't work with production mode otherwise. ### Does this PR introduce _any_ user-facing change? No, because the main change has not been released. ### How was this patch tested? Manually as described in "How was this patch tested?" at https://github.com/apache/spark/pull/44504. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44590 from HyukjinKwon/SPARK-46530-followup. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala | 6 -- .../apache/spark/sql/execution/datasources/DataSourceManager.scala | 4 +--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala index 26c790a12447..929058fb7185 100644 --- a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala +++ b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala @@ -36,7 +36,7 @@ private[spark] object PythonUtils extends Logging { val PY4J_ZIP_NAME = "py4j-0.10.9.7-src.zip" /** Get the PYTHONPATH for PySpark, either from SPARK_HOME, if it is set, or from our JAR */ - def sparkPythonPath: String = { + def sparkPythonPaths: Seq[String] = { val pythonPath = new ArrayBuffer[String] for (sparkHome <- sys.env.get("SPARK_HOME")) { pythonPath += Seq(sparkHome, "python", "lib", "pyspark.zip").mkString(File.separator) @@ -44,9 +44,11 @@ private[spark] object PythonUtils extends Logging { Seq(sparkHome, "python", "lib", PY4J_ZIP_NAME).mkString(File.separator) } pythonPath ++= SparkContext.jarOfObject(this) -pythonPath.mkString(File.pathSeparator) +pythonPath.toSeq } + def sparkPythonPath: String = sparkPythonPaths.mkString(File.pathSeparator) + /** Merge PYTHONPATHS with the appropriate separator. Ignores blank strings. */ def mergePythonPaths(paths: String*): String = { paths.filter(_ != "").mkString(File.pathSeparator) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala index 4fc636a59e5a..236ab98969e5 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala @@ -20,7 +20,6 @@ package org.apache.spark.sql.execution.datasources import java.io.File import java.util.Locale import java.util.concurrent.ConcurrentHashMap -import java.util.regex.Pattern import scala.jdk.CollectionConverters._ @@ -91,8 +90,7 @@ object DataSourceManager extends Logging { private lazy val shouldLoadPythonDataSources: Boolean = { Utils.checkCommandAvailable(PythonUtils.defaultPythonExec) && // Make sure PySpark zipped files also exist. - PythonUtils.sparkPythonPath -.split(Pattern.quote(File.separator)).forall(new File(_).exists()) + PythonUtils.sparkPythonPaths.forall(new File(_).exists()) } private def initialDataSourceBuilders: Map[String, UserDefinedPythonDataSource] = { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46576][SQL] Improve error messages for unsupported data source save mode
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 69c46876b5a7 [SPARK-46576][SQL] Improve error messages for unsupported data source save mode 69c46876b5a7 is described below commit 69c46876b5a76c2de6a149ea7663fad18027e387 Author: allisonwang-db AuthorDate: Thu Jan 4 09:40:40 2024 +0300 [SPARK-46576][SQL] Improve error messages for unsupported data source save mode ### What changes were proposed in this pull request? This PR renames the error class `_LEGACY_ERROR_TEMP_1308` to `UNSUPPORTED_DATA_SOURCE_SAVE_MODE` and improves its error messages. ### Why are the changes needed? To make the error more user-friendly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #44576 from allisonwang-db/spark-46576-unsupported-save-mode. Authored-by: allisonwang-db Signed-off-by: Max Gekk --- .../src/main/resources/error/error-classes.json | 11 ++- .../apache/spark/sql/kafka010/KafkaSinkSuite.scala | 2 +- docs/sql-error-conditions.md| 6 ++ .../spark/sql/errors/QueryCompilationErrors.scala | 4 ++-- .../spark/sql/connector/DataSourceV2Suite.scala | 8 .../execution/python/PythonDataSourceSuite.scala| 21 +++-- 6 files changed, 34 insertions(+), 18 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index bcaf8a74c08d..9cade1197dca 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -3588,6 +3588,12 @@ ], "sqlState" : "0A000" }, + "UNSUPPORTED_DATA_SOURCE_SAVE_MODE" : { +"message" : [ + "The data source '' cannot be written in the mode. Please use either the \"Append\" or \"Overwrite\" mode instead." +], +"sqlState" : "0A000" + }, "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE" : { "message" : [ "The datasource doesn't support the column of the type ." @@ -5403,11 +5409,6 @@ "There is a 'path' option set and save() is called with a path parameter. Either remove the path option, or call save() without the parameter. To ignore this check, set '' to 'true'." ] }, - "_LEGACY_ERROR_TEMP_1308" : { -"message" : [ - "TableProvider implementation cannot be written with mode, please use Append or Overwrite modes instead." -] - }, "_LEGACY_ERROR_TEMP_1309" : { "message" : [ "insertInto() can't be used together with partitionBy(). Partition columns have already been defined for the table. It is not necessary to use partitionBy()." diff --git a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala index 6753f8be54bf..5566785c4d56 100644 --- a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala +++ b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala @@ -557,7 +557,7 @@ class KafkaSinkBatchSuiteV2 extends KafkaSinkBatchSuiteBase { test("batch - unsupported save modes") { testUnsupportedSaveModes((mode) => - Seq(s"cannot be written with ${mode.name} mode", "does not support truncate")) + Seq(s"cannot be written in the \"${mode.name}\" mode", "does not support truncate")) } test("generic - write big data with small producer buffer") { diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md index c6108e97b4c5..89de607b0f22 100644 --- a/docs/sql-error-conditions.md +++ b/docs/sql-error-conditions.md @@ -2332,6 +2332,12 @@ Unsupported data source type for direct query on files: `` Unsupported data type ``. +### UNSUPPORTED_DATA_SOURCE_SAVE_MODE + +[SQLSTATE: 0A000](sql-error-conditions-sqlstates.html#class-0A-feature-not-supported) + +The data source '``' cannot be written in the `` mode. Please use either the "Append" or "Overwrite" mode instead. + ### UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE [SQLSTATE: 0A000](sql-error-conditions-sqlstates.html#class-0A-feature-not-supported) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala index b844ee2bdc45..90e7ab610f7a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.s
(spark) branch master updated: [SPARK-46504][PS][TESTS][FOLLOWUP] Break the remaining part of `IndexesTests` into small test files
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 59d147a4f48f [SPARK-46504][PS][TESTS][FOLLOWUP] Break the remaining part of `IndexesTests` into small test files 59d147a4f48f is described below commit 59d147a4f48ff6112c682e9797dbd982022bfc10 Author: Ruifeng Zheng AuthorDate: Thu Jan 4 14:33:42 2024 +0800 [SPARK-46504][PS][TESTS][FOLLOWUP] Break the remaining part of `IndexesTests` into small test files ### What changes were proposed in this pull request? Break the remaining part of `IndexesTests` into small test files ### Why are the changes needed? testing parallelism ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #44588 from zhengruifeng/ps_test_idx_base_lastlast. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng --- dev/sparktestsupport/modules.py| 8 +- .../{test_parity_base.py => test_parity_basic.py} | 17 +- ...{test_parity_base.py => test_parity_getattr.py} | 17 +- .../{test_parity_base.py => test_parity_name.py} | 17 +- .../tests/indexes/{test_base.py => test_basic.py} | 155 + .../pyspark/pandas/tests/indexes/test_getattr.py | 79 + python/pyspark/pandas/tests/indexes/test_name.py | 183 + 7 files changed, 296 insertions(+), 180 deletions(-) diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py index a97e6afdc356..699a9d07452d 100644 --- a/dev/sparktestsupport/modules.py +++ b/dev/sparktestsupport/modules.py @@ -795,7 +795,9 @@ pyspark_pandas_slow = Module( "pyspark.pandas.generic", "pyspark.pandas.series", # unittests -"pyspark.pandas.tests.indexes.test_base", +"pyspark.pandas.tests.indexes.test_basic", +"pyspark.pandas.tests.indexes.test_getattr", +"pyspark.pandas.tests.indexes.test_name", "pyspark.pandas.tests.indexes.test_conversion", "pyspark.pandas.tests.indexes.test_drop", "pyspark.pandas.tests.indexes.test_level", @@ -1095,7 +1097,9 @@ pyspark_pandas_connect_part0 = Module( "pyspark.pandas.tests.connect.test_parity_sql", "pyspark.pandas.tests.connect.test_parity_typedef", "pyspark.pandas.tests.connect.test_parity_utils", -"pyspark.pandas.tests.connect.indexes.test_parity_base", +"pyspark.pandas.tests.connect.indexes.test_parity_basic", +"pyspark.pandas.tests.connect.indexes.test_parity_getattr", +"pyspark.pandas.tests.connect.indexes.test_parity_name", "pyspark.pandas.tests.connect.indexes.test_parity_conversion", "pyspark.pandas.tests.connect.indexes.test_parity_drop", "pyspark.pandas.tests.connect.indexes.test_parity_level", diff --git a/python/pyspark/pandas/tests/connect/indexes/test_parity_base.py b/python/pyspark/pandas/tests/connect/indexes/test_parity_basic.py similarity index 72% copy from python/pyspark/pandas/tests/connect/indexes/test_parity_base.py copy to python/pyspark/pandas/tests/connect/indexes/test_parity_basic.py index 83ce92eb34b2..94651552ea8d 100644 --- a/python/pyspark/pandas/tests/connect/indexes/test_parity_base.py +++ b/python/pyspark/pandas/tests/connect/indexes/test_parity_basic.py @@ -16,22 +16,21 @@ # import unittest -from pyspark import pandas as ps -from pyspark.pandas.tests.indexes.test_base import IndexesTestsMixin +from pyspark.pandas.tests.indexes.test_basic import IndexBasicMixin from pyspark.testing.connectutils import ReusedConnectTestCase -from pyspark.testing.pandasutils import PandasOnSparkTestUtils, TestUtils +from pyspark.testing.pandasutils import PandasOnSparkTestUtils -class IndexesParityTests( -IndexesTestsMixin, PandasOnSparkTestUtils, TestUtils, ReusedConnectTestCase +class IndexBasicParityTests( +IndexBasicMixin, +PandasOnSparkTestUtils, +ReusedConnectTestCase, ): -@property -def psdf(self): -return ps.from_pandas(self.pdf) +pass if __name__ == "__main__": -from pyspark.pandas.tests.connect.indexes.test_parity_base import * # noqa: F401 +from pyspark.pandas.tests.connect.indexes.test_parity_basic import * # noqa: F401 try: import xmlrunner # type: ignore[import] diff --git a/python/pyspark/pandas/tests/connect/indexes/test_parity_base.py b/python/pyspark/pandas/tests/connect/indexes/test_parity_getattr.py similarity index 72% copy from python/pyspark/pandas/tests/connect/indexes/test_parity_base.py copy to python/pyspark/pandas/tests/connect/indexes/test_parity_getattr.py index 83ce92eb34b2..47d893bda3be 100644 --- a/python/pyspark/pandas/tests/connect/indexes
(spark) branch master updated (56023635ab8 -> 1cd3a1b0e1c)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 56023635ab8 [SPARK-46412][K8S][DOCS] Update Java and JDK info in K8S testing add 1cd3a1b0e1c Revert "[SPARK-46582][R][INFRA] Upgrade R Tools version from 4.0.2 to 4.3.2 in AppVeyor" No new revisions were added by this update. Summary of changes: dev/appveyor-install-dependencies.ps1 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (f3e454a8323 -> 56023635ab8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f3e454a8323 [SPARK-45292][SQL][HIVE] Remove Guava from shared classes from IsolatedClientLoader add 56023635ab8 [SPARK-46412][K8S][DOCS] Update Java and JDK info in K8S testing No new revisions were added by this update. Summary of changes: resource-managers/kubernetes/integration-tests/README.md | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-45292][SQL][HIVE] Remove Guava from shared classes from IsolatedClientLoader
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f3e454a8323a [SPARK-45292][SQL][HIVE] Remove Guava from shared classes from IsolatedClientLoader f3e454a8323a is described below commit f3e454a8323aa1f1948b0fe7981ac43aa674a32a Author: Cheng Pan AuthorDate: Wed Jan 3 21:28:24 2024 -0800 [SPARK-45292][SQL][HIVE] Remove Guava from shared classes from IsolatedClientLoader ### What changes were proposed in this pull request? Try removing Guava from `sharedClasses` as suggested by JoshRosen in https://github.com/apache/spark/pull/33989#issuecomment-928616327 and https://github.com/apache/spark/pull/42493#issuecomment-1687092403 ### Why are the changes needed? Unblock Guava upgrading. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed (embedded HMS) and verified in the internal YARN cluster (remote HMS with kerberos-enabled). ``` # already setup hive-site.xml stuff properly to make sure to use remote HMS bin/spark-shell --conf spark.sql.hive.metastore.jars=maven ... scala> spark.sql("show databases").show warning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation` https://maven-central.storage-download.googleapis.com/maven2/ added as a remote repository with the name: repo-1 Ivy Default Cache set to: /home/hadoop/.ivy2/cache The jars for the packages stored in: /home/hadoop/.ivy2/jars org.apache.hive#hive-metastore added as a dependency org.apache.hive#hive-exec added as a dependency org.apache.hive#hive-common added as a dependency org.apache.hive#hive-serde added as a dependency org.apache.hadoop#hadoop-client-api added as a dependency org.apache.hadoop#hadoop-client-runtime added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-d0d2962d-ae27-4526-a0c7-040a542e1e54;1.0 confs: [default] found org.apache.hive#hive-metastore;2.3.9 in central found org.apache.hive#hive-serde;2.3.9 in central found org.apache.hive#hive-common;2.3.9 in central found org.apache.hive#hive-shims;2.3.9 in central found org.apache.hive.shims#hive-shims-common;2.3.9 in central found org.apache.logging.log4j#log4j-slf4j-impl;2.6.2 in central found org.slf4j#slf4j-api;1.7.10 in central found com.google.guava#guava;14.0.1 in central found commons-lang#commons-lang;2.6 in central found org.apache.thrift#libthrift;0.9.3 in central found org.apache.httpcomponents#httpclient;4.4 in central found org.apache.httpcomponents#httpcore;4.4 in central found commons-logging#commons-logging;1.2 in central found commons-codec#commons-codec;1.4 in central found org.apache.zookeeper#zookeeper;3.4.6 in central found org.slf4j#slf4j-log4j12;1.6.1 in central found log4j#log4j;1.2.16 in central found jline#jline;2.12 in central found io.netty#netty;3.7.0.Final in central found org.apache.hive.shims#hive-shims-0.23;2.3.9 in central found org.apache.hadoop#hadoop-yarn-server-resourcemanager;2.7.2 in central found org.apache.hadoop#hadoop-annotations;2.7.2 in central found com.google.inject.extensions#guice-servlet;3.0 in central found com.google.inject#guice;3.0 in central found javax.inject#javax.inject;1 in central found aopalliance#aopalliance;1.0 in central found org.sonatype.sisu.inject#cglib;2.2.1-v20090111 in central found asm#asm;3.2 in central found com.google.protobuf#protobuf-java;2.5.0 in central found commons-io#commons-io;2.4 in central found com.sun.jersey#jersey-json;1.14 in central found org.codehaus.jettison#jettison;1.1 in central found com.sun.xml.bind#jaxb-impl;2.2.3-1 in central found javax.xml.bind#jaxb-api;2.2.2 in central found javax.xml.stream#stax-api;1.0-2 in central found javax.activation#activation;1.1 in central found org.codehaus.jackson#jackson-core-asl;1.9.13 in central found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in central found org.codehaus.jackson#jackson-jaxrs;1.9.13 in central found org.codehaus.jackson#jackson-xc;1.9.13 in central found com.sun.jersey#jersey-core;1.14 in central found com.sun.jersey.contribs#jersey-guice;1.9 in central found com.sun.jersey#jersey-server;1.14 in central found org.ap
(spark) branch master updated (7b6077a02fc3 -> 733be49a8078)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 7b6077a02fc3 [SPARK-46584][SQL][TESTS] Remove invalid attachCleanupResourceChecker in JoinSuite add 733be49a8078 [SPARK-46539][SQL][FOLLOWUP] fix golden files No new revisions were added by this update. Summary of changes: .../src/test/resources/sql-tests/analyzer-results/selectExcept.sql.out | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46584][SQL][TESTS] Remove invalid attachCleanupResourceChecker in JoinSuite
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7b6077a02fc3 [SPARK-46584][SQL][TESTS] Remove invalid attachCleanupResourceChecker in JoinSuite 7b6077a02fc3 is described below commit 7b6077a02fc3e619465fb21511ea16e71e6d4c7e Author: zml1206 AuthorDate: Thu Jan 4 10:32:46 2024 +0800 [SPARK-46584][SQL][TESTS] Remove invalid attachCleanupResourceChecker in JoinSuite ### What changes were proposed in this pull request? Remove `attachCleanupResourceChecker` in `JoinSuite`. ### Why are the changes needed? `attachCleanupResourceChecker` is invalid: 1. The matching of `SortExec` needs to be in `QueryExecution.executePlan` not `QueryExecution.sparkPlan`, The correct way is `foreachUp(df.queryExecution.executedPlan){f()}`. 2. `Mockito` counts the number of function calls, only for objects after `spy`. Calls to the original object are not counted. eg ``` test() { val data = new java.util.ArrayList[String]() val _data = spy(data) data.add("a"); data.add("b"); data.add("b"); _data.add("b") verify(_data, times(0)).add("a") verify(_data, times(1)).add("b") } ``` Therefore, when using `df.queryExecution.executedPlan` correctly to match, count is always 0. 3. Not all `SortMergeJoin` joinTypes will trigger `cleanupResources()`, such as 'full outer join'. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Local test, update `attachCleanupResourceChecker` atLeastOnce to nerver, ut is still successful. ``` verify(sortExec, atLeastOnce).cleanupResources() verify(sortExec.rowSorter, atLeastOnce).cleanupResources() ``` to ``` verify(sortExec, never).cleanupResources() verify(sortExec.rowSorter, never).cleanupResources() ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44573 from zml1206/SPARK-21492. Authored-by: zml1206 Signed-off-by: Kent Yao --- .../test/scala/org/apache/spark/sql/JoinSuite.scala | 19 --- 1 file changed, 19 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala index 909a05ce26f7..f31f60e8df56 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala @@ -22,8 +22,6 @@ import java.util.Locale import scala.collection.mutable.ListBuffer import scala.jdk.CollectionConverters._ -import org.mockito.Mockito._ - import org.apache.spark.TestUtils.{assertNotSpilled, assertSpilled} import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation @@ -44,23 +42,6 @@ import org.apache.spark.tags.SlowSQLTest class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlanHelper { import testImplicits._ - private def attachCleanupResourceChecker(plan: SparkPlan): Unit = { -// SPARK-21492: Check cleanupResources are finally triggered in SortExec node for every -// test case -plan.foreachUp { - case s: SortExec => -val sortExec = spy[SortExec](s) -verify(sortExec, atLeastOnce).cleanupResources() -verify(sortExec.rowSorter, atLeastOnce).cleanupResources() - case _ => -} - } - - override protected def checkAnswer(df: => DataFrame, rows: Seq[Row]): Unit = { -attachCleanupResourceChecker(df.queryExecution.sparkPlan) -super.checkAnswer(df, rows) - } - setupTestData() def statisticSizeInByte(df: DataFrame): BigInt = { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46582][R][INFRA] Upgrade R Tools version from 4.0.2 to 4.3.2 in AppVeyor
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dbdeecbc3ffc [SPARK-46582][R][INFRA] Upgrade R Tools version from 4.0.2 to 4.3.2 in AppVeyor dbdeecbc3ffc is described below commit dbdeecbc3ffc2a048ba720a688e1e6bfff4e8b4b Author: Hyukjin Kwon AuthorDate: Thu Jan 4 11:09:33 2024 +0900 [SPARK-46582][R][INFRA] Upgrade R Tools version from 4.0.2 to 4.3.2 in AppVeyor ### What changes were proposed in this pull request? This PR proposes to upgrade R Tools version from 4.0.2 to 4.3.2 in AppVeyor ### Why are the changes needed? R Tools 4.3.X is for R 4.3.X. We did not upgrade because of the test failure previously. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Checking the CI in this PR. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44584 from HyukjinKwon/r-tools-ver. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- dev/appveyor-install-dependencies.ps1 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dev/appveyor-install-dependencies.ps1 b/dev/appveyor-install-dependencies.ps1 index b37f1ee45f30..a3a440ef83f2 100644 --- a/dev/appveyor-install-dependencies.ps1 +++ b/dev/appveyor-install-dependencies.ps1 @@ -141,7 +141,7 @@ Pop-Location # == R $rVer = "4.3.2" -$rToolsVer = "4.0.2" +$rToolsVer = "4.3.2" InstallR InstallRtools - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (5c10fb3e509a -> 893e69172560)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 5c10fb3e509a [SPARK-44556][SQL] Reuse `OrcTail` when enable vectorizedReader add 893e69172560 [SPARK-46580][TESTS] Regenerate benchmark results No new revisions were added by this update. Summary of changes: .../benchmarks/AvroReadBenchmark-jdk21-results.txt | 110 +-- .../avro/benchmarks/AvroReadBenchmark-results.txt | 110 +-- .../AvroWriteBenchmark-jdk21-results.txt | 20 +- .../avro/benchmarks/AvroWriteBenchmark-results.txt | 20 +- .../CoalescedRDDBenchmark-jdk21-results.txt| 64 +- core/benchmarks/CoalescedRDDBenchmark-results.txt | 64 +- core/benchmarks/KryoBenchmark-jdk21-results.txt| 40 +- core/benchmarks/KryoBenchmark-results.txt | 40 +- .../KryoIteratorBenchmark-jdk21-results.txt| 40 +- core/benchmarks/KryoIteratorBenchmark-results.txt | 40 +- .../KryoSerializerBenchmark-jdk21-results.txt | 8 +- .../benchmarks/KryoSerializerBenchmark-results.txt | 8 +- .../MapStatusesConvertBenchmark-jdk21-results.txt | 6 +- .../MapStatusesConvertBenchmark-results.txt| 6 +- .../MapStatusesSerDeserBenchmark-jdk21-results.txt | 52 +- .../MapStatusesSerDeserBenchmark-results.txt | 50 +- .../PersistenceEngineBenchmark-jdk21-results.txt | 32 +- .../PersistenceEngineBenchmark-results.txt | 32 +- .../PropertiesCloneBenchmark-jdk21-results.txt | 40 +- .../PropertiesCloneBenchmark-results.txt | 40 +- .../XORShiftRandomBenchmark-jdk21-results.txt | 38 +- .../benchmarks/XORShiftRandomBenchmark-results.txt | 38 +- .../ZStandardBenchmark-jdk21-results.txt | 48 +- core/benchmarks/ZStandardBenchmark-results.txt | 48 +- .../benchmarks/BLASBenchmark-jdk21-results.txt | 208 ++--- mllib-local/benchmarks/BLASBenchmark-results.txt | 208 ++--- .../UDTSerializationBenchmark-jdk21-results.txt| 8 +- .../UDTSerializationBenchmark-results.txt | 8 +- .../CalendarIntervalBenchmark-jdk21-results.txt| 6 +- .../CalendarIntervalBenchmark-results.txt | 6 +- .../EnumTypeSetBenchmark-jdk21-results.txt | 120 +-- .../benchmarks/EnumTypeSetBenchmark-results.txt| 120 +-- .../GenericArrayDataBenchmark-jdk21-results.txt| 14 +- .../GenericArrayDataBenchmark-results.txt | 14 +- .../benchmarks/HashBenchmark-jdk21-results.txt | 60 +- sql/catalyst/benchmarks/HashBenchmark-results.txt | 60 +- .../HashByteArrayBenchmark-jdk21-results.txt | 90 +- .../benchmarks/HashByteArrayBenchmark-results.txt | 90 +- .../UnsafeProjectionBenchmark-jdk21-results.txt| 12 +- .../UnsafeProjectionBenchmark-results.txt | 12 +- .../AggregateBenchmark-jdk21-results.txt | 130 +-- sql/core/benchmarks/AggregateBenchmark-results.txt | 130 +-- .../AnsiIntervalSortBenchmark-jdk21-results.txt| 32 +- .../AnsiIntervalSortBenchmark-results.txt | 32 +- .../benchmarks/Base64Benchmark-jdk21-results.txt | 64 +- sql/core/benchmarks/Base64Benchmark-results.txt| 64 +- .../BloomFilterBenchmark-jdk21-results.txt | 128 +-- .../benchmarks/BloomFilterBenchmark-results.txt| 128 +-- ...iltInDataSourceWriteBenchmark-jdk21-results.txt | 70 +- .../BuiltInDataSourceWriteBenchmark-results.txt| 70 +- .../ByteArrayBenchmark-jdk21-results.txt | 22 +- sql/core/benchmarks/ByteArrayBenchmark-results.txt | 22 +- sql/core/benchmarks/CSVBenchmark-jdk21-results.txt | 94 +-- sql/core/benchmarks/CSVBenchmark-results.txt | 94 +-- .../CharVarcharBenchmark-jdk21-results.txt | 140 ++-- .../benchmarks/CharVarcharBenchmark-results.txt| 140 ++-- .../ColumnarBatchBenchmark-jdk21-results.txt | 54 +- .../benchmarks/ColumnarBatchBenchmark-results.txt | 54 +- .../CompressionSchemeBenchmark-jdk21-results.txt | 168 ++-- .../CompressionSchemeBenchmark-results.txt | 168 ++-- ...ConstantColumnVectorBenchmark-jdk21-results.txt | 350 .../ConstantColumnVectorBenchmark-results.txt | 350 .../DataSourceReadBenchmark-jdk21-results.txt | 634 +++--- .../benchmarks/DataSourceReadBenchmark-results.txt | 634 +++--- .../benchmarks/DatasetBenchmark-jdk21-results.txt | 52 +- sql/core/benchmarks/DatasetBenchmark-results.txt | 52 +- .../benchmarks/DateTimeBenchmark-jdk21-results.txt | 482 +-- sql/core/benchmarks/DateTimeBenchmark-results.txt | 482 +-- .../DateTimeRebaseBenchmark-jdk21-results.txt | 230 +++--- .../benchmarks/DateTimeRebaseBenchmark-results.txt | 230 +++--- ...ndOnlyUnsafeRowArrayBenchmark-jdk21-results.txt | 40 +- ...alAppendOnlyUnsafeRowArrayBenchmark-results.txt | 40 +- .../benchmarks/ExtractBenchmark-jdk21-results.txt | 172 ++-- sql/core/benchmarks/ExtractBenchmark-results.txt | 172 ++
(spark) branch master updated (85b44ccef4c4 -> 5c10fb3e509a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 85b44ccef4c4 [SPARK-46546][DOCS] Fix the formatting of tables in `running-on-yarn` pages add 5c10fb3e509a [SPARK-44556][SQL] Reuse `OrcTail` when enable vectorizedReader No new revisions were added by this update. Summary of changes: .../datasources/orc/OrcColumnarBatchReader.java| 11 ++- .../sql/execution/datasources/orc/OrcFileFormat.scala | 2 +- .../datasources/v2/orc/OrcPartitionReaderFactory.scala | 18 ++ 3 files changed, 21 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-46546][DOCS] Fix the formatting of tables in `running-on-yarn` pages
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new fb90ade2c739 [SPARK-46546][DOCS] Fix the formatting of tables in `running-on-yarn` pages fb90ade2c739 is described below commit fb90ade2c7390077d2755fc43b73e63f5cf44f21 Author: panbingkun AuthorDate: Wed Jan 3 12:07:15 2024 -0800 [SPARK-46546][DOCS] Fix the formatting of tables in `running-on-yarn` pages ### What changes were proposed in this pull request? The pr aims to fix the formatting of tables in `running-on-yarn` pages. ### Why are the changes needed? Make the tables on the page display normally. Before: https://github.com/apache/spark/assets/15246973/26facec4-d805-4549-a640-120c499bd7fd";> After: https://github.com/apache/spark/assets/15246973/cf6c20ef-a4ce-4532-9acd-ab9cec41881a";> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually check. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44540 from panbingkun/SPARK-46546. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun (cherry picked from commit 85b44ccef4c4aeec302c12e03833590c7d8d6b9e) Signed-off-by: Dongjoon Hyun --- docs/running-on-yarn.md | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index 9b4e59a119ee..ce7121b806cb 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -866,7 +866,7 @@ to avoid garbage collection issues during shuffle. The following extra configuration options are available when the shuffle service is running on YARN: -Property NameDefaultMeaning +Property NameDefaultMeaningSince Version spark.yarn.shuffle.stopOnFailure false @@ -875,6 +875,7 @@ The following extra configuration options are available when the shuffle service initialization. This prevents application failures caused by running containers on NodeManagers where the Spark Shuffle Service is not running. + 2.1.0 spark.yarn.shuffle.service.metrics.namespace @@ -883,6 +884,7 @@ The following extra configuration options are available when the shuffle service The namespace to use when emitting shuffle service metrics into Hadoop metrics2 system of the NodeManager. + 3.2.0 spark.yarn.shuffle.service.logs.namespace @@ -894,6 +896,7 @@ The following extra configuration options are available when the shuffle service may expect the logger name to look like a class name, it's generally recommended to provide a value which would be a valid Java package or class name and not include spaces. + 3.3.0 spark.shuffle.service.db.backend - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46546][DOCS] Fix the formatting of tables in `running-on-yarn` pages
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 85b44ccef4c4 [SPARK-46546][DOCS] Fix the formatting of tables in `running-on-yarn` pages 85b44ccef4c4 is described below commit 85b44ccef4c4aeec302c12e03833590c7d8d6b9e Author: panbingkun AuthorDate: Wed Jan 3 12:07:15 2024 -0800 [SPARK-46546][DOCS] Fix the formatting of tables in `running-on-yarn` pages ### What changes were proposed in this pull request? The pr aims to fix the formatting of tables in `running-on-yarn` pages. ### Why are the changes needed? Make the tables on the page display normally. Before: https://github.com/apache/spark/assets/15246973/26facec4-d805-4549-a640-120c499bd7fd";> After: https://github.com/apache/spark/assets/15246973/cf6c20ef-a4ce-4532-9acd-ab9cec41881a";> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually check. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44540 from panbingkun/SPARK-46546. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- docs/running-on-yarn.md | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index 3dfa63e1cb2e..02547b30d2e5 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -866,7 +866,7 @@ to avoid garbage collection issues during shuffle. The following extra configuration options are available when the shuffle service is running on YARN: -Property NameDefaultMeaning +Property NameDefaultMeaningSince Version spark.yarn.shuffle.stopOnFailure false @@ -875,6 +875,7 @@ The following extra configuration options are available when the shuffle service initialization. This prevents application failures caused by running containers on NodeManagers where the Spark Shuffle Service is not running. + 2.1.0 spark.yarn.shuffle.service.metrics.namespace @@ -883,6 +884,7 @@ The following extra configuration options are available when the shuffle service The namespace to use when emitting shuffle service metrics into Hadoop metrics2 system of the NodeManager. + 3.2.0 spark.yarn.shuffle.service.logs.namespace @@ -894,6 +896,7 @@ The following extra configuration options are available when the shuffle service may expect the logger name to look like a class name, it's generally recommended to provide a value which would be a valid Java package or class name and not include spaces. + 3.3.0 spark.shuffle.service.db.backend - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46579][SQL] Redact JDBC url in errors and logs
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 49f94eb6e88a [SPARK-46579][SQL] Redact JDBC url in errors and logs 49f94eb6e88a is described below commit 49f94eb6e88a9e5aaff675fb53125ce6091529fa Author: Max Gekk AuthorDate: Wed Jan 3 12:02:02 2024 -0800 [SPARK-46579][SQL] Redact JDBC url in errors and logs ### What changes were proposed in this pull request? In the PR, I propose to redact the JDBC url in error message parameters and logs. ### Why are the changes needed? To avoid leaking of user's secrets. ### Does this PR introduce _any_ user-facing change? Yes, it can. ### How was this patch tested? By running the modified test suites: ``` $ build/sbt "test:testOnly *JDBCTableCatalogSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44574 from MaxGekk/redact-jdbc-url. Authored-by: Max Gekk Signed-off-by: Dongjoon Hyun --- .../execution/datasources/jdbc/JDBCOptions.scala | 3 +++ .../jdbc/connection/BasicConnectionProvider.scala | 3 ++- .../execution/datasources/v2/jdbc/JDBCTable.scala | 4 ++-- .../datasources/v2/jdbc/JDBCTableCatalog.scala | 22 +++--- .../org/apache/spark/sql/jdbc/JdbcDialects.scala | 2 +- .../v2/jdbc/JDBCTableCatalogSuite.scala| 15 +++ 6 files changed, 30 insertions(+), 19 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala index 28fa7b8bf561..43db0c6eef11 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala @@ -28,6 +28,7 @@ import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap import org.apache.spark.sql.errors.QueryExecutionErrors import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types.TimestampNTZType +import org.apache.spark.util.Utils /** * Options for the JDBC data source. @@ -248,6 +249,8 @@ class JDBCOptions( otherOption.parameters.equals(this.parameters) case _ => false } + + def getRedactUrl(): String = Utils.redact(SQLConf.get.stringRedactionPattern, url) } class JdbcOptionsInWrite( diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/BasicConnectionProvider.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/BasicConnectionProvider.scala index 369cf59e0599..57902336ebf2 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/BasicConnectionProvider.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/BasicConnectionProvider.scala @@ -45,7 +45,8 @@ private[jdbc] class BasicConnectionProvider extends JdbcConnectionProvider with jdbcOptions.asConnectionProperties.asScala.foreach { case(k, v) => properties.put(k, v) } -logDebug(s"JDBC connection initiated with URL: ${jdbcOptions.url} and properties: $properties") +logDebug(s"JDBC connection initiated with URL: ${jdbcOptions.getRedactUrl()} " + + s"and properties: $properties") driver.connect(jdbcOptions.url, properties) } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTable.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTable.scala index c251010881f3..120a68075a8f 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTable.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTable.scala @@ -66,7 +66,7 @@ case class JDBCTable(ident: Identifier, schema: StructType, jdbcOptions: JDBCOpt JdbcUtils.classifyException( errorClass = "FAILED_JDBC.CREATE_INDEX", messageParameters = Map( - "url" -> jdbcOptions.url, + "url" -> jdbcOptions.getRedactUrl(), "indexName" -> toSQLId(indexName), "tableName" -> toSQLId(name)), dialect = JdbcDialects.get(jdbcOptions.url)) { @@ -87,7 +87,7 @@ case class JDBCTable(ident: Identifier, schema: StructType, jdbcOptions: JDBCOpt JdbcUtils.classifyException( errorClass = "FAILED_JDBC.DROP_INDEX", messageParameters = Map( - "url" -> jdbcOptions.url, + "url" -> jdbcOptions.getRedactUrl(), "indexName" -> toSQLId(indexName), "tableName" -> toSQLId(name)), dialect = JdbcDialects.get(jdbcOptions.url)) { diff --git a/sql/core/s
(spark) branch master updated: [SPARK-46539][SQL] SELECT * EXCEPT(all fields from a struct) results in an assertion failure
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9c46d9dcd195 [SPARK-46539][SQL] SELECT * EXCEPT(all fields from a struct) results in an assertion failure 9c46d9dcd195 is described below commit 9c46d9dcd19551dbdef546adec73d5799364ab0b Author: Stefan Kandic AuthorDate: Wed Jan 3 21:52:37 2024 +0300 [SPARK-46539][SQL] SELECT * EXCEPT(all fields from a struct) results in an assertion failure ### What changes were proposed in this pull request? Fixing the assertion error which occurs when we do SELECT .. EXCEPT(every field from a struct) by adding a check for an empty struct ### Why are the changes needed? Because this is a valid query that should just return an empty struct rather than fail during serialization. ### Does this PR introduce _any_ user-facing change? Yes, users should no longer see this error and instead get an empty struct '{ }' ### How was this patch tested? By adding new UT to existing selectExcept tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #44527 from stefankandic/select-except-err. Authored-by: Stefan Kandic Signed-off-by: Max Gekk --- .../spark/sql/catalyst/encoders/ExpressionEncoder.scala| 12 ++-- .../sql-tests/analyzer-results/selectExcept.sql.out| 12 .../src/test/resources/sql-tests/inputs/selectExcept.sql | 1 + .../test/resources/sql-tests/results/selectExcept.sql.out | 14 ++ 4 files changed, 37 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala index 74d7a5e7a675..654f39393636 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala @@ -325,11 +325,19 @@ case class ExpressionEncoder[T]( assert(serializer.forall(_.references.isEmpty), "serializer cannot reference any attributes.") assert(serializer.flatMap { ser => val boundRefs = ser.collect { case b: BoundReference => b } -assert(boundRefs.nonEmpty, - "each serializer expression should contain at least one `BoundReference`") +assert(boundRefs.nonEmpty || isEmptyStruct(ser), + "each serializer expression should contain at least one `BoundReference` or it " + + "should be an empty struct. This is required to ensure that there is a reference point " + + "for the serialized object or that the serialized object is intentionally left empty." +) boundRefs }.distinct.length <= 1, "all serializer expressions must use the same BoundReference.") + private def isEmptyStruct(expr: NamedExpression): Boolean = expr.dataType match { +case struct: StructType => struct.isEmpty +case _ => false + } + /** * Returns a new copy of this encoder, where the `deserializer` is resolved and bound to the * given schema. diff --git a/sql/core/src/test/resources/sql-tests/analyzer-results/selectExcept.sql.out b/sql/core/src/test/resources/sql-tests/analyzer-results/selectExcept.sql.out index 3b8594d832c6..49ea7ed4edcf 100644 --- a/sql/core/src/test/resources/sql-tests/analyzer-results/selectExcept.sql.out +++ b/sql/core/src/test/resources/sql-tests/analyzer-results/selectExcept.sql.out @@ -121,6 +121,18 @@ Project [id#x, name#x, named_struct(f1, data#x.f1, s2, named_struct(f3, data#x.s +- LocalRelation [id#x, name#x, data#x] +-- !query +SELECT * EXCEPT (data.f1, data.s2) FROM tbl_view +-- !query analysis +Project [id#x, name#x, named_struct() AS data#x] ++- SubqueryAlias tbl_view + +- View (`tbl_view`, [id#x,name#x,data#x]) + +- Project [cast(id#x as int) AS id#x, cast(name#x as string) AS name#x, cast(data#x as struct>) AS data#x] + +- Project [id#x, name#x, data#x] ++- SubqueryAlias tbl_view + +- LocalRelation [id#x, name#x, data#x] + + -- !query SELECT * EXCEPT (id, name, data) FROM tbl_view -- !query analysis diff --git a/sql/core/src/test/resources/sql-tests/inputs/selectExcept.sql b/sql/core/src/test/resources/sql-tests/inputs/selectExcept.sql index e07e4f1117c2..08d56aeda0a8 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/selectExcept.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/selectExcept.sql @@ -20,6 +20,7 @@ SELECT * EXCEPT (data) FROM tbl_view; SELECT * EXCEPT (data.f1) FROM tbl_view; SELECT * EXCEPT (data.s2) FROM tbl_view; SELECT * EXCEPT (data.s2.f2) FROM tbl_view; +SELECT * EXCEPT (data.f1, data.s2) FROM tbl_view; -- EXCEPT all columns
(spark) branch master updated (605fecd22cc1 -> 06f9e7419966)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 605fecd22cc1 [SPARK-46577][SQL] HiveMetastoreLazyInitializationSuite leaks hive's SessionState add 06f9e7419966 [SPARK-46550][BUILD][SQL] Upgrade `datasketches-java` to 5.0.1 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++-- pom.xml| 2 +- .../catalyst/expressions/aggregate/datasketchesAggregates.scala| 3 ++- .../spark/sql/catalyst/expressions/datasketchesExpressions.scala | 7 --- 4 files changed, 9 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-46577][SQL] HiveMetastoreLazyInitializationSuite leaks hive's SessionState
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 2eb603c09fb5 [SPARK-46577][SQL] HiveMetastoreLazyInitializationSuite leaks hive's SessionState 2eb603c09fb5 is described below commit 2eb603c09fb5e81ae24f4e43a17fa45fb071c358 Author: Kent Yao AuthorDate: Wed Jan 3 05:54:57 2024 -0800 [SPARK-46577][SQL] HiveMetastoreLazyInitializationSuite leaks hive's SessionState ### What changes were proposed in this pull request? The upcoming tests with the new hive configurations will have no effect due to the leaked SessionState. ``` 06:21:12.848 pool-1-thread-1 INFO ThriftServerWithSparkContextInHttpSuite: Trying to start HiveThriftServer2: mode=http, attempt=0 06:21:12.851 pool-1-thread-1 INFO AbstractService: Service:OperationManager is inited. 06:21:12.851 pool-1-thread-1 INFO AbstractService: Service:SessionManager is inited. 06:21:12.851 pool-1-thread-1 INFO AbstractService: Service: CLIService is inited. 06:21:12.851 pool-1-thread-1 INFO AbstractService: Service:ThriftBinaryCLIService is inited. 06:21:12.851 pool-1-thread-1 INFO AbstractService: Service: HiveServer2 is inited. 06:21:12.851 pool-1-thread-1 INFO AbstractService: Service:OperationManager is started. 06:21:12.851 pool-1-thread-1 INFO AbstractService: Service:SessionManager is started. 06:21:12.851 pool-1-thread-1 INFO AbstractService: Service: CLIService is started. 06:21:12.852 pool-1-thread-1 INFO AbstractService: Service:ThriftBinaryCLIService is started. 06:21:12.852 pool-1-thread-1 INFO ThriftCLIService: Starting ThriftBinaryCLIService on port 1 with 5...500 worker threads 06:21:12.852 pool-1-thread-1 INFO AbstractService: Service:HiveServer2 is started. ``` As the logs above revealed, ThriftServerWithSparkContextInHttpSuite started the ThriftBinaryCLIService instead of the ThriftHttpCLIService. This is because in HiveClientImpl, the new configurations are only applied to hive conf during initializing but not for existing ones. This cause ThriftServerWithSparkContextInHttpSuite retrying or even aborting. ### Why are the changes needed? Fix flakiness in tests ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ran tests locally with the hive-thriftserver module locally, ### Was this patch authored or co-authored using generative AI tooling? no Closes #44578 from yaooqinn/SPARK-46577. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun (cherry picked from commit 605fecd22cc18fc9b93fb26d4aa6088f5a314f92) Signed-off-by: Dongjoon Hyun --- .../spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala | 6 ++ 1 file changed, 6 insertions(+) diff --git a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala index b8739ce56e41..cb85993e5e09 100644 --- a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala +++ b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala @@ -17,6 +17,8 @@ package org.apache.spark.sql.hive +import org.apache.hadoop.hive.ql.metadata.Hive +import org.apache.hadoop.hive.ql.session.SessionState import org.apache.logging.log4j.LogManager import org.apache.logging.log4j.core.Logger @@ -69,6 +71,10 @@ class HiveMetastoreLazyInitializationSuite extends SparkFunSuite { } finally { Thread.currentThread().setContextClassLoader(originalClassLoader) spark.sparkContext.setLogLevel(originalLevel.toString) + SparkSession.clearActiveSession() + SparkSession.clearDefaultSession() + SessionState.detachSession() + Hive.closeCurrent() spark.stop() } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-46577][SQL] HiveMetastoreLazyInitializationSuite leaks hive's SessionState
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 2891d92e9d8a [SPARK-46577][SQL] HiveMetastoreLazyInitializationSuite leaks hive's SessionState 2891d92e9d8a is described below commit 2891d92e9d8a5050f457bb116530d46de3babf97 Author: Kent Yao AuthorDate: Wed Jan 3 05:54:57 2024 -0800 [SPARK-46577][SQL] HiveMetastoreLazyInitializationSuite leaks hive's SessionState ### What changes were proposed in this pull request? The upcoming tests with the new hive configurations will have no effect due to the leaked SessionState. ``` 06:21:12.848 pool-1-thread-1 INFO ThriftServerWithSparkContextInHttpSuite: Trying to start HiveThriftServer2: mode=http, attempt=0 06:21:12.851 pool-1-thread-1 INFO AbstractService: Service:OperationManager is inited. 06:21:12.851 pool-1-thread-1 INFO AbstractService: Service:SessionManager is inited. 06:21:12.851 pool-1-thread-1 INFO AbstractService: Service: CLIService is inited. 06:21:12.851 pool-1-thread-1 INFO AbstractService: Service:ThriftBinaryCLIService is inited. 06:21:12.851 pool-1-thread-1 INFO AbstractService: Service: HiveServer2 is inited. 06:21:12.851 pool-1-thread-1 INFO AbstractService: Service:OperationManager is started. 06:21:12.851 pool-1-thread-1 INFO AbstractService: Service:SessionManager is started. 06:21:12.851 pool-1-thread-1 INFO AbstractService: Service: CLIService is started. 06:21:12.852 pool-1-thread-1 INFO AbstractService: Service:ThriftBinaryCLIService is started. 06:21:12.852 pool-1-thread-1 INFO ThriftCLIService: Starting ThriftBinaryCLIService on port 1 with 5...500 worker threads 06:21:12.852 pool-1-thread-1 INFO AbstractService: Service:HiveServer2 is started. ``` As the logs above revealed, ThriftServerWithSparkContextInHttpSuite started the ThriftBinaryCLIService instead of the ThriftHttpCLIService. This is because in HiveClientImpl, the new configurations are only applied to hive conf during initializing but not for existing ones. This cause ThriftServerWithSparkContextInHttpSuite retrying or even aborting. ### Why are the changes needed? Fix flakiness in tests ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ran tests locally with the hive-thriftserver module locally, ### Was this patch authored or co-authored using generative AI tooling? no Closes #44578 from yaooqinn/SPARK-46577. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun (cherry picked from commit 605fecd22cc18fc9b93fb26d4aa6088f5a314f92) Signed-off-by: Dongjoon Hyun --- .../spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala | 6 ++ 1 file changed, 6 insertions(+) diff --git a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala index b8739ce56e41..cb85993e5e09 100644 --- a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala +++ b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala @@ -17,6 +17,8 @@ package org.apache.spark.sql.hive +import org.apache.hadoop.hive.ql.metadata.Hive +import org.apache.hadoop.hive.ql.session.SessionState import org.apache.logging.log4j.LogManager import org.apache.logging.log4j.core.Logger @@ -69,6 +71,10 @@ class HiveMetastoreLazyInitializationSuite extends SparkFunSuite { } finally { Thread.currentThread().setContextClassLoader(originalClassLoader) spark.sparkContext.setLogLevel(originalLevel.toString) + SparkSession.clearActiveSession() + SparkSession.clearDefaultSession() + SessionState.detachSession() + Hive.closeCurrent() spark.stop() } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (3b1d843da2de -> 605fecd22cc1)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 3b1d843da2de [SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream add 605fecd22cc1 [SPARK-46577][SQL] HiveMetastoreLazyInitializationSuite leaks hive's SessionState No new revisions were added by this update. Summary of changes: .../spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala | 6 ++ 1 file changed, 6 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3b1d843da2de [SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream 3b1d843da2de is described below commit 3b1d843da2de524c781757b4823cc8b8e7d2f5f7 Author: beliefer AuthorDate: Wed Jan 3 05:50:44 2024 -0800 [SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream ### What changes were proposed in this pull request? This PR propose to remove `ThreadLocal` for `ReadAheadInputStream`. ### Why are the changes needed? `ReadAheadInputStream` has a field `oneByte` declared as `TheadLocal`. In fact, `oneByte` only used in read. We can remove it by the way that the closure of local variables in instance methods can provide thread safety guarantees. On the other hand, the `TheadLocal` occupies a certain amount of space in the heap and there are allocation and GC costs. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Exists test cases. ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #44563 from beliefer/SPARK-46567. Authored-by: beliefer Signed-off-by: Dongjoon Hyun --- core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java b/core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java index 1b76aae8dd22..33dfa4422906 100644 --- a/core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java +++ b/core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java @@ -89,8 +89,6 @@ public class ReadAheadInputStream extends InputStream { private final Condition asyncReadComplete = stateChangeLock.newCondition(); - private static final ThreadLocal oneByte = ThreadLocal.withInitial(() -> new byte[1]); - /** * Creates a ReadAheadInputStream with the specified buffer size and read-ahead * threshold @@ -247,7 +245,7 @@ public class ReadAheadInputStream extends InputStream { // short path - just get one byte. return activeBuffer.get() & 0xFF; } else { - byte[] oneByteArray = oneByte.get(); + byte[] oneByteArray = new byte[1]; return read(oneByteArray, 0, 1) == -1 ? -1 : oneByteArray[0] & 0xFF; } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46524][SQL] Improve error messages for invalid save mode
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a3d999292f8e [SPARK-46524][SQL] Improve error messages for invalid save mode a3d999292f8e is described below commit a3d999292f8e99269dfd0289e2f5aca7e5ea4fae Author: allisonwang-db AuthorDate: Wed Jan 3 15:43:53 2024 +0300 [SPARK-46524][SQL] Improve error messages for invalid save mode ### What changes were proposed in this pull request? This PR improves the error messages when writing a data frame with an invalid save mode. ### Why are the changes needed? To improve the error messages. Before this PR, Spark throws an java.lang.IllegalArgumentException: `java.lang.IllegalArgumentException: Unknown save mode: foo. Accepted save modes are 'overwrite', 'append', 'ignore', 'error', 'errorifexists', 'default'.` After this PR, the error will have a proper error class: `[INVALID_SAVE_MODE] The specified save mode "foo" is invalid. Valid save modes include "append", "overwrite", "ignore", "error", "errorifexists", and "default"." ` ### Does this PR introduce _any_ user-facing change? Yes. The error messages will be changed. ### How was this patch tested? New unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #44508 from allisonwang-db/spark-46524-invalid-save-mode. Authored-by: allisonwang-db Signed-off-by: Max Gekk --- R/pkg/tests/fulltests/test_sparkSQL.R| 2 +- common/utils/src/main/resources/error/error-classes.json | 6 ++ docs/sql-error-conditions.md | 6 ++ .../org/apache/spark/sql/errors/QueryCompilationErrors.scala | 7 +++ .../src/main/scala/org/apache/spark/sql/DataFrameWriter.scala| 3 +-- .../spark/sql/execution/python/PythonDataSourceSuite.scala | 9 + 6 files changed, 30 insertions(+), 3 deletions(-) diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R b/R/pkg/tests/fulltests/test_sparkSQL.R index 0d96f708a544..c1a5292195af 100644 --- a/R/pkg/tests/fulltests/test_sparkSQL.R +++ b/R/pkg/tests/fulltests/test_sparkSQL.R @@ -1414,7 +1414,7 @@ test_that("test HiveContext", { # Invalid mode expect_error(saveAsTable(df, "parquetest", "parquet", mode = "abc", path = parquetDataPath), - "illegal argument - Unknown save mode: abc") + "Error in mode : analysis error - \\[INVALID_SAVE_MODE\\].*") unsetHiveContext() } }) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 87e43fe0e38c..bcaf8a74c08d 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -2239,6 +2239,12 @@ ], "sqlState" : "42613" }, + "INVALID_SAVE_MODE" : { +"message" : [ + "The specified save mode is invalid. Valid save modes include \"append\", \"overwrite\", \"ignore\", \"error\", \"errorifexists\", and \"default\"." +], +"sqlState" : "42000" + }, "INVALID_SCHEMA" : { "message" : [ "The input schema is not a valid schema string." diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md index 3f4074af9b78..c6108e97b4c5 100644 --- a/docs/sql-error-conditions.md +++ b/docs/sql-error-conditions.md @@ -1271,6 +1271,12 @@ For more details see [INVALID_PARTITION_OPERATION](sql-error-conditions-invalid- Parameterized query must either use positional, or named parameters, but not both. +### INVALID_SAVE_MODE + +[SQLSTATE: 42000](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation) + +The specified save mode `` is invalid. Valid save modes include "append", "overwrite", "ignore", "error", "errorifexists", and "default". + ### [INVALID_SCHEMA](sql-error-conditions-invalid-schema-error-class.html) [SQLSTATE: 42K07](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala index bc847d1c0069..b844ee2bdc45 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala @@ -3184,6 +3184,13 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase with Compilat "config" -> SQLConf.LEGACY_PATH_OPTION_BEHAVIOR.key)) } + def invalidSaveModeError(saveMode: String): Throwable = { +new AnalysisException( + errorClass = "INVALID_SA
(spark-website) branch asf-site updated: docs: update examples page (#494)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 67c90a03a7 docs: update examples page (#494) 67c90a03a7 is described below commit 67c90a03a706fec13d6356d009ea19270391c4b1 Author: Matthew Powers AuthorDate: Wed Jan 3 04:49:55 2024 -0500 docs: update examples page (#494) * docs: update examples page * add examples html --- examples.md| 745 + site/examples.html | 576 +++-- 2 files changed, 617 insertions(+), 704 deletions(-) diff --git a/examples.md b/examples.md index d9362784d9..d29cd40bba 100644 --- a/examples.md +++ b/examples.md @@ -6,397 +6,364 @@ navigation: weight: 4 show: true --- -Apache Spark™ examples - -These examples give a quick overview of the Spark API. -Spark is built on the concept of distributed datasets, which contain arbitrary Java or -Python objects. You create a dataset from external data, then apply parallel operations -to it. The building block of the Spark API is its [RDD API](https://spark.apache.org/docs/latest/rdd-programming-guide.html#resilient-distributed-datasets-rdds). -In the RDD API, -there are two types of operations: transformations, which define a new dataset based on previous ones, -and actions, which kick off a job to execute on a cluster. -On top of Spark’s RDD API, high level APIs are provided, e.g. -[DataFrame API](https://spark.apache.org/docs/latest/sql-programming-guide.html#datasets-and-dataframes) and -[Machine Learning API](https://spark.apache.org/docs/latest/mllib-guide.html). -These high level APIs provide a concise way to conduct certain data operations. -In this page, we will show examples using RDD API as well as examples using high level APIs. - -RDD API examples - -Word count -In this example, we use a few transformations to build a dataset of (String, Int) pairs called counts and then save it to a file. - - - Python - Scala - Java - - - - - -{% highlight python %} -text_file = sc.textFile("hdfs://...") -counts = text_file.flatMap(lambda line: line.split(" ")) \ - .map(lambda word: (word, 1)) \ - .reduceByKey(lambda a, b: a + b) -counts.saveAsTextFile("hdfs://...") -{% endhighlight %} - - - - - -{% highlight scala %} -val textFile = sc.textFile("hdfs://...") -val counts = textFile.flatMap(line => line.split(" ")) - .map(word => (word, 1)) - .reduceByKey(_ + _) -counts.saveAsTextFile("hdfs://...") -{% endhighlight %} - - - - - -{% highlight java %} -JavaRDD textFile = sc.textFile("hdfs://..."); -JavaPairRDD counts = textFile -.flatMap(s -> Arrays.asList(s.split(" ")).iterator()) -.mapToPair(word -> new Tuple2<>(word, 1)) -.reduceByKey((a, b) -> a + b); -counts.saveAsTextFile("hdfs://..."); -{% endhighlight %} - - - - -Pi estimation -Spark can also be used for compute-intensive tasks. This code estimates π by "throwing darts" at a circle. We pick random points in the unit square ((0, 0) to (1,1)) and see how many fall in the unit circle. The fraction should be π / 4, so we use this to get our estimate. - - - Python - Scala - Java - - - - - -{% highlight python %} -def inside(p): -x, y = random.random(), random.random() -return x*x + y*y < 1 - -count = sc.parallelize(range(0, NUM_SAMPLES)) \ - .filter(inside).count() -print("Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)) -{% endhighlight %} - - - - - -{% highlight scala %} -val count = sc.parallelize(1 to NUM_SAMPLES).filter { _ => - val x = math.random - val y = math.random - x*x + y*y < 1 -}.count() -println(s"Pi is roughly ${4.0 * count / NUM_SAMPLES}") -{% endhighlight %} - - - - - -{% highlight java %} -List l = new ArrayList<>(NUM_SAMPLES); -for (int i = 0; i < NUM_SAMPLES; i++) { - l.add(i); -} - -long count = sc.parallelize(l).filter(i -> { - double x = Math.random(); - double y = Math.random(); - return x*x + y*y < 1; -}).count(); -System.out.println("Pi is roughly " + 4.0 * count / NUM_SAMPLES); -{% endhighlight %} - - - - -DataFrame API examples - -In Spark, a https://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes";>DataFrame -is a distributed collection of data organized into named columns. -Users can use DataFrame API to perform various relational operations on both external -data sources and Spark’s built-in distributed collections without providing specific procedures for processing data. -Also, programs based on DataFrame API will be automatically optimized by Spark’s built-in optimizer, Catalyst. - - -Text search -In this example, we search through the error messages in a log file. - - - Python - Scala - Java - - - - - -{% highlight python %} -textFile = sc.textFile("hdfs://...") - -# Creates a DataFrame hav
Re: [PR] docs: update examples page [spark-website]
zhengruifeng merged PR #494: URL: https://github.com/apache/spark-website/pull/494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46573][K8S] Use `appId` instead of `conf.appId` in `LoggingPodStatusWatcherImpl`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 347c955fe723 [SPARK-46573][K8S] Use `appId` instead of `conf.appId` in `LoggingPodStatusWatcherImpl` 347c955fe723 is described below commit 347c955fe7231eb2912c6678ea7769024f6dc5df Author: yangjie01 AuthorDate: Wed Jan 3 01:00:22 2024 -0800 [SPARK-46573][K8S] Use `appId` instead of `conf.appId` in `LoggingPodStatusWatcherImpl` ### What changes were proposed in this pull request? This PR replaces the call to `conf.appId` with direct use of `appId` in `LoggingPodStatusWatcherImpl`, as it is already defined in `LoggingPodStatusWatcherImpl`: https://github.com/apache/spark/blob/b74b1592c9ec07b3d29b6d4d900b1d3ba1417cd1/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala#L42 ### Why are the changes needed? Should use the already defined `val appId` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #44569 from LuciferYang/SPARK-46573. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala index bc8b023b5ecd..3227a72a8371 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala @@ -96,7 +96,7 @@ private[k8s] class LoggingPodStatusWatcherImpl(conf: KubernetesDriverConf) } override def watchOrStop(sId: String): Boolean = { -logInfo(s"Waiting for application ${conf.appName} with application ID ${conf.appId} " + +logInfo(s"Waiting for application ${conf.appName} with application ID $appId " + s"and submission ID $sId to finish...") val interval = conf.get(REPORT_INTERVAL) synchronized { @@ -110,7 +110,7 @@ private[k8s] class LoggingPodStatusWatcherImpl(conf: KubernetesDriverConf) logInfo( pod.map { p => s"Container final statuses:\n\n${containersDescription(p)}" } .getOrElse("No containers were found in the driver pod.")) - logInfo(s"Application ${conf.appName} with application ID ${conf.appId} " + + logInfo(s"Application ${conf.appName} with application ID $appId " + s"and submission ID $sId finished") } podCompleted - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46525][DOCKER][TESTS] Fix docker-integration-tests on Apple Sillicon
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9208c42b3a11 [SPARK-46525][DOCKER][TESTS] Fix docker-integration-tests on Apple Sillicon 9208c42b3a11 is described below commit 9208c42b3a110099d1cc0249b6be364aacff0f2a Author: Kent Yao AuthorDate: Wed Jan 3 00:58:11 2024 -0800 [SPARK-46525][DOCKER][TESTS] Fix docker-integration-tests on Apple Sillicon ### What changes were proposed in this pull request? `com.spotify.docker.client` is not going to support Apple Silicons as it has already been archived and the [jnr-unixsocket](https://mvnrepository.com/artifact/com.github.jnr/jnr-unixsocket) 0.18 it uses is not compatible with Apple Silicons. If we run our docker IT tests on Apple Silicons, it will fail like ```java [info] org.apache.spark.sql.jdbc.MariaDBKrbIntegrationSuite *** ABORTED *** (2 seconds, 264 milliseconds) [info] com.spotify.docker.client.exceptions.DockerException: java.util.concurrent.ExecutionException: com.spotify.docker.client.shaded.javax.ws.rs.ProcessingException: java.lang.UnsatisfiedLinkError: could not load FFI provider jnr.ffi.provider.jffi.Provider [info] at com.spotify.docker.client.DefaultDockerClient.propagate(DefaultDockerClient.java:2828) [info] at com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:2692) [info] at com.spotify.docker.client.DefaultDockerClient.ping(DefaultDockerClient.java:574) [info] at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.$anonfun$beforeAll$1(DockerJDBCIntegrationSuite.scala:124) [info] at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled(DockerIntegrationFunSuite.scala:49) [info] at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled$(DockerIntegrationFunSuite.scala:47) [info] at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.runIfTestsEnabled(DockerJDBCIntegrationSuite.scala:95) [info] at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.beforeAll(DockerJDBCIntegrationSuite.scala:118) [info] at org.apache.spark.sql.jdbc.DockerKrbJDBCIntegrationSuite.super$beforeAll(DockerKrbJDBCIntegrationSuite.scala:65) [info] at org.apache.spark.sql.jdbc.DockerKrbJDBCIntegrationSuite.$anonfun$beforeAll$1(DockerKrbJDBCIntegrationSuite.scala:65) [info] at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled(DockerIntegrationFunSuite.scala:49) [info] at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled$(DockerIntegrationFunSuite.scala:47) [info] at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.runIfTestsEnabled(DockerJDBCIntegrationSuite.scala:95) [info] at org.apache.spark.sql.jdbc.DockerKrbJDBCIntegrationSuite.beforeAll(DockerKrbJDBCIntegrationSuite.scala:44) [info] at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) [info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) [info] at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:69) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517) [info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414) [info] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [info] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [info] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [info] at java.base/java.lang.Thread.run(Thread.java:840) [info] Cause: java.util.concurrent.ExecutionException: com.spotify.docker.client.shaded.javax.ws.rs.ProcessingException: java.lang.UnsatisfiedLinkError: could not load FFI provider jnr.ffi.provider.jffi.Provider [info] at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) [info] at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) [info] at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) [info] at com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:2690) [info] at com.spotify.docker.client.DefaultDockerClient.ping(DefaultDockerClient.java:574) [info] at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.$anonfun$beforeAll$1(DockerJDBCIntegrationSuite.scala:124) [info] at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled(DockerIntegrat