[spark] branch master updated: [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4dbe4ffebfc [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core) 4dbe4ffebfc is described below commit 4dbe4ffebfc8cc3a894c9e798c5a7b364cf7a399 Author: Hyukjin Kwon AuthorDate: Tue Oct 10 14:26:45 2023 +0900 [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core) ### What changes were proposed in this pull request? This PR proposes to fix imports according to PEP8 in `pyspark.pandas` and `pyspark.*` (core), see https://peps.python.org/pep-0008/#imports. ### Why are the changes needed? I have not been fixing them as they are too minor. However, this practice is being propagated across the whole PySpark packages, and I think we should fix them all so other users do not follow the non-standard practice. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing linters and tests should cover. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43257 from HyukjinKwon/SPARK-45450. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- python/pyspark/conf.py| 1 + python/pyspark/errors_doc_gen.py | 1 + python/pyspark/java_gateway.py| 1 + python/pyspark/join.py| 3 ++- python/pyspark/pandas/accessors.py| 1 - python/pyspark/pandas/base.py | 2 +- python/pyspark/pandas/config.py | 1 - python/pyspark/pandas/correlation.py | 1 - python/pyspark/pandas/data_type_ops/date_ops.py | 1 - python/pyspark/pandas/data_type_ops/datetime_ops.py | 1 - python/pyspark/pandas/data_type_ops/string_ops.py | 1 - python/pyspark/pandas/frame.py| 7 ++- python/pyspark/pandas/generic.py | 1 - python/pyspark/pandas/groupby.py | 2 -- python/pyspark/pandas/indexes/base.py | 1 - python/pyspark/pandas/indexes/multi.py| 2 -- python/pyspark/pandas/indexing.py | 6 ++ python/pyspark/pandas/internal.py | 11 +++ python/pyspark/pandas/mlflow.py | 4 ++-- python/pyspark/pandas/namespace.py| 2 +- python/pyspark/pandas/numpy_compat.py | 2 +- python/pyspark/pandas/plot/core.py| 6 +++--- python/pyspark/pandas/plot/matplotlib.py | 1 - python/pyspark/pandas/resample.py | 2 -- python/pyspark/pandas/series.py | 2 +- python/pyspark/pandas/spark/accessors.py | 3 --- python/pyspark/pandas/spark/functions.py | 2 -- python/pyspark/pandas/sql_processor.py| 2 +- python/pyspark/pandas/strings.py | 3 +-- python/pyspark/pandas/supported_api_gen.py| 6 +++--- python/pyspark/pandas/tests/computation/test_corrwith.py | 1 - python/pyspark/pandas/tests/computation/test_cov.py | 1 - .../pandas/tests/connect/data_type_ops/testing_utils.py | 1 - python/pyspark/pandas/tests/connect/test_parity_extension.py | 1 + python/pyspark/pandas/tests/connect/test_parity_indexing.py | 1 + .../pyspark/pandas/tests/connect/test_parity_numpy_compat.py | 1 + python/pyspark/pandas/tests/data_type_ops/testing_utils.py| 2 -- python/pyspark/pandas/tests/frame/test_reshaping.py | 1 - python/pyspark/pandas/tests/frame/test_spark.py | 2 +- python/pyspark/pandas/tests/series/test_series.py | 1 - python/pyspark/pandas/tests/series/test_stat.py | 2 +- python/pyspark/pandas/tests/test_indexops_spark.py| 2 +- python/pyspark/pandas/tests/test_stats.py | 1 + python/pyspark/pandas/utils.py| 7 +++ python/pyspark/pandas/window.py | 2 -- python/pyspark/profiler.py| 1 - python/pyspark/rdd.py | 6 +++--- python/pyspark/shuffle.py | 2 +- python/pyspark/tests/test_statcounter.py | 3 ++- python/pyspark/util.py
Re: [PR] Add canonical links to the PySpark docs page for published docs [spark-website]
panbingkun commented on PR #482: URL: https://github.com/apache/spark-website/pull/482#issuecomment-1754307942 (base) panbingkun:~/Developer/spark/spark-website-community$git show | grep "diff --git a/site/docs" | wc -l 23474 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Re: [PR] Add canonical links to the PySpark docs page for published docs [spark-website]
panbingkun commented on PR #482: URL: https://github.com/apache/spark-website/pull/482#issuecomment-1754305344 This PR has completed the `canonical links` addition of all HTML documents in all version (3.1.1, 3.1.2, 3.1.3, 3.2.0, 3.2.1, 3.2.2, 3.2.3, 3.2.4, 3.3.0, 3.3.1, 3.3.2, 3.3.3, 3.4.0, 3.4.1), but excluding the following files: - V3.1.1 site/docs/3.1.1/api/python/_static/webpack-macros.html site/docs/3.1.1/api/python/search.html site/docs/3.1.1/api/python/_modules/abc.html site/docs/3.1.1/api/python/migration_guide/pyspark_1.0_1.2_to_1.3.html site/docs/3.1.1/api/python/migration_guide/pyspark_1.4_to_1.5.html site/docs/3.1.1/api/python/migration_guide/pyspark_2.2_to_2.3.html site/docs/3.1.1/api/python/migration_guide/pyspark_2.3.0_to_2.3.1_above.html site/docs/3.1.1/api/python/migration_guide/pyspark_2.3_to_2.4.html site/docs/3.1.1/api/python/migration_guide/pyspark_2.4_to_3.0.html - V3.1.2 site/docs/3.1.2/api/python/_static/webpack-macros.html site/docs/3.1.2/api/python/search.html site/docs/3.1.2/api/python/_modules/abc.html site/docs/3.1.2/api/python/migration_guide/pyspark_1.0_1.2_to_1.3.html site/docs/3.1.2/api/python/migration_guide/pyspark_1.4_to_1.5.html site/docs/3.1.2/api/python/migration_guide/pyspark_2.2_to_2.3.html site/docs/3.1.2/api/python/migration_guide/pyspark_2.3.0_to_2.3.1_above.html site/docs/3.1.2/api/python/migration_guide/pyspark_2.3_to_2.4.html site/docs/3.1.2/api/python/migration_guide/pyspark_2.4_to_3.0.html - V3.1.3 site/docs/3.1.3/api/python/_static/webpack-macros.html site/docs/3.1.3/api/python/search.html site/docs/3.1.3/api/python/_modules/abc.html site/docs/3.1.3/api/python/migration_guide/pyspark_1.0_1.2_to_1.3.html site/docs/3.1.3/api/python/migration_guide/pyspark_1.4_to_1.5.html site/docs/3.1.3/api/python/migration_guide/pyspark_2.2_to_2.3.html site/docs/3.1.3/api/python/migration_guide/pyspark_2.3.0_to_2.3.1_above.html site/docs/3.1.3/api/python/migration_guide/pyspark_2.3_to_2.4.html site/docs/3.1.3/api/python/migration_guide/pyspark_2.4_to_3.0.html - V3.2.0 site/docs/3.2.0/api/python/_static/webpack-macros.html site/docs/3.2.0/api/python/search.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-bar-2.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-bar-3.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-pie-1.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-density-3.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-hist-2.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-line-1.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-kde-1.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-barh-2.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-hist-1.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-density-4.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-kde-1.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-barh-5.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-area-2.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-barh-4.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-box-1.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-hist-2.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-density-5.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-barh-3.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-density-2.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-pie-1.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-bar-3.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-bar-2.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-density-2.html site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-scatter-2.html
[spark] branch master updated: [SPARK-45472][SS] RocksDB State Store Doesn't Need to Recheck checkpoint path existence
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 08640961e3b [SPARK-45472][SS] RocksDB State Store Doesn't Need to Recheck checkpoint path existence 08640961e3b is described below commit 08640961e3bad7de38ed3358df8706bad028c27a Author: Siying Dong AuthorDate: Tue Oct 10 11:24:31 2023 +0900 [SPARK-45472][SS] RocksDB State Store Doesn't Need to Recheck checkpoint path existence ### What changes were proposed in this pull request? In RocksDBFileManager, we add a variable to indicate that root path is already checked and created if not existing, so that we don't need to recheck the second time. ### Why are the changes needed? Right now, every time RocksDB.load() is called, we check checkpoint directory existence and create it if not. This is relatively expensive and show up in performance profiling. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing CI tests to cover it. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43299 from siying/rootPath. Authored-by: Siying Dong Signed-off-by: Jungtaek Lim --- .../execution/streaming/state/RocksDBFileManager.scala | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala index eae9aac3c0a..3d0745c2fb3 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala @@ -146,10 +146,15 @@ class RocksDBFileManager( private def codec = CompressionCodec.createCodec(sparkConf, codecName) + @volatile private var rootDirChecked: Boolean = false + def getChangeLogWriter(version: Long): StateStoreChangelogWriter = { -val rootDir = new Path(dfsRootDir) val changelogFile = dfsChangelogFile(version) -if (!fm.exists(rootDir)) fm.mkdirs(rootDir) +if (!rootDirChecked) { + val rootDir = new Path(dfsRootDir) + if (!fm.exists(rootDir)) fm.mkdirs(rootDir) + rootDirChecked = true +} val changelogWriter = new StateStoreChangelogWriter(fm, changelogFile, codec) changelogWriter } @@ -193,8 +198,11 @@ class RocksDBFileManager( // CheckpointFileManager.createAtomic API which doesn't auto-initialize parent directories. // Moreover, once we disable to track the number of keys, in which the numKeys is -1, we // still need to create the initial dfs root directory anyway. - val path = new Path(dfsRootDir) - if (!fm.exists(path)) fm.mkdirs(path) + if (!rootDirChecked) { +val path = new Path(dfsRootDir) +if (!fm.exists(path)) fm.mkdirs(path) +rootDirChecked = true + } } zipToDfsFile(localOtherFiles :+ metadataFile, dfsBatchZipFile(version)) logInfo(s"Saved checkpoint file for version $version") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new ac4b9154b58 [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance ac4b9154b58 is described below commit ac4b9154b5822779023e66f2efb24d05e20b1cca Author: Chaoqin Li AuthorDate: Tue Oct 10 11:03:19 2023 +0900 [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance ### What changes were proposed in this pull request? When loading a rocksdb instance, remove file version map entry of larger versions to avoid rocksdb sst file unique id mismatch exception. The SST files in larger versions can't be reused even if they have the same size and name because they belong to another rocksdb instance. ### Why are the changes needed? Avoid rocksdb file mismatch exception that may occur in runtime. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add rocksdb unit test. Closes #43174 from chaoqin-li1123/rocksdb_mismatch. Authored-by: Chaoqin Li Signed-off-by: Jungtaek Lim --- .../streaming/state/RocksDBFileManager.scala | 4 +++ .../execution/streaming/state/RocksDBSuite.scala | 29 ++ 2 files changed, 33 insertions(+) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala index 0891d773713..faf9cd701ae 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala @@ -207,6 +207,10 @@ class RocksDBFileManager( */ def loadCheckpointFromDfs(version: Long, localDir: File): RocksDBCheckpointMetadata = { logInfo(s"Loading checkpoint files for version $version") +// The unique ids of SST files are checked when opening a rocksdb instance. The SST files +// in larger versions can't be reused even if they have the same size and name because +// they belong to another rocksdb instance. +versionToRocksDBFiles.keySet().removeIf(_ >= version) val metadata = if (version == 0) { if (localDir.exists) Utils.deleteRecursively(localDir) localDir.mkdirs() diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala index e31b05c362f..91dd8582207 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala @@ -214,6 +214,35 @@ class RocksDBSuite extends AlsoTestWithChangelogCheckpointingEnabled with Shared } } + testWithChangelogCheckpointingEnabled("SPARK-45419: Do not reuse SST files" + +" in different RocksDB instances") { +val remoteDir = Utils.createTempDir().toString +val conf = dbConf.copy(minDeltasForSnapshot = 0, compactOnCommit = false) +new File(remoteDir).delete() // to make sure that the directory gets created +withDB(remoteDir, conf = conf) { db => + for (version <- 0 to 2) { +db.load(version) +db.put(version.toString, version.toString) +db.commit() + } + // upload snapshot 3.zip + db.doMaintenance() + // Roll back to version 1 and start to process data. + for (version <- 1 to 3) { +db.load(version) +db.put(version.toString, version.toString) +db.commit() + } + // Upload snapshot 4.zip, should not reuse the SST files in 3.zip + db.doMaintenance() +} + +withDB(remoteDir, conf = conf) { db => + // Open the db to verify that the state in 4.zip is no corrupted. + db.load(4) +} + } + // A rocksdb instance with changelog checkpointing enabled should be able to load // an existing checkpoint without changelog. testWithChangelogCheckpointingEnabled( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ff87c0958f7 [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance ff87c0958f7 is described below commit ff87c0958f79b16c2f276e0dc53a855fca558347 Author: Chaoqin Li AuthorDate: Tue Oct 10 11:03:19 2023 +0900 [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance ### What changes were proposed in this pull request? When loading a rocksdb instance, remove file version map entry of larger versions to avoid rocksdb sst file unique id mismatch exception. The SST files in larger versions can't be reused even if they have the same size and name because they belong to another rocksdb instance. ### Why are the changes needed? Avoid rocksdb file mismatch exception that may occur in runtime. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add rocksdb unit test. Closes #43174 from chaoqin-li1123/rocksdb_mismatch. Authored-by: Chaoqin Li Signed-off-by: Jungtaek Lim --- .../streaming/state/RocksDBFileManager.scala | 4 +++ .../execution/streaming/state/RocksDBSuite.scala | 29 ++ 2 files changed, 33 insertions(+) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala index 05fd845accb..eae9aac3c0a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala @@ -208,6 +208,10 @@ class RocksDBFileManager( */ def loadCheckpointFromDfs(version: Long, localDir: File): RocksDBCheckpointMetadata = { logInfo(s"Loading checkpoint files for version $version") +// The unique ids of SST files are checked when opening a rocksdb instance. The SST files +// in larger versions can't be reused even if they have the same size and name because +// they belong to another rocksdb instance. +versionToRocksDBFiles.keySet().removeIf(_ >= version) val metadata = if (version == 0) { if (localDir.exists) Utils.deleteRecursively(localDir) localDir.mkdirs() diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala index b4b67f381d2..764358dc1f0 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala @@ -240,6 +240,35 @@ class RocksDBSuite extends AlsoTestWithChangelogCheckpointingEnabled with Shared } } + testWithChangelogCheckpointingEnabled("SPARK-45419: Do not reuse SST files" + +" in different RocksDB instances") { +val remoteDir = Utils.createTempDir().toString +val conf = dbConf.copy(minDeltasForSnapshot = 0, compactOnCommit = false) +new File(remoteDir).delete() // to make sure that the directory gets created +withDB(remoteDir, conf = conf) { db => + for (version <- 0 to 2) { +db.load(version) +db.put(version.toString, version.toString) +db.commit() + } + // upload snapshot 3.zip + db.doMaintenance() + // Roll back to version 1 and start to process data. + for (version <- 1 to 3) { +db.load(version) +db.put(version.toString, version.toString) +db.commit() + } + // Upload snapshot 4.zip, should not reuse the SST files in 3.zip + db.doMaintenance() +} + +withDB(remoteDir, conf = conf) { db => + // Open the db to verify that the state in 4.zip is no corrupted. + db.load(4) +} + } + // A rocksdb instance with changelog checkpointing enabled should be able to load // an existing checkpoint without changelog. testWithChangelogCheckpointingEnabled( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Re: [PR] Add canonical links to the PySpark docs page for published docs [spark-website]
panbingkun commented on PR #482: URL: https://github.com/apache/spark-website/pull/482#issuecomment-1754195368 I will perform similar operations on other versions of HTML documents. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43299][SS][CONNECT] Convert StreamingQueryException in Scala Client
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 26527c43e65 [SPARK-43299][SS][CONNECT] Convert StreamingQueryException in Scala Client 26527c43e65 is described below commit 26527c43e652718c6d6be8f2ae2f92e835e1b328 Author: Yihong He AuthorDate: Tue Oct 10 08:41:07 2023 +0900 [SPARK-43299][SS][CONNECT] Convert StreamingQueryException in Scala Client ### What changes were proposed in this pull request? - Convert StreamingQueryException in Scala Client - Move StreamingQueryException to common/utils module - Implement (message, cause) constructor and getMessage() for StreamingQueryException - Get StreamingQueryException in StreamingQuery.exception by throw instead of GRPC response ### Why are the changes needed? - Compatibility with the existing behavior ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Existing tests ### Was this patch authored or co-authored using generative AI tooling? Closes #42859 from heyihong/SPARK-43299. Authored-by: Yihong He Signed-off-by: Hyukjin Kwon --- .../sql/streaming/StreamingQueryException.scala| 10 + .../spark/sql/streaming/StreamingQuery.scala | 18 - .../sql/streaming/StreamingQueryException.scala| 47 -- .../CheckConnectJvmClientCompatibility.scala | 12 -- .../sql/streaming/ClientStreamingQuerySuite.scala | 4 +- .../connect/client/GrpcExceptionConverter.scala| 7 .../sql/connect/planner/SparkConnectPlanner.scala | 6 +++ project/MimaExcludes.scala | 3 ++ 8 files changed, 37 insertions(+), 70 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala b/common/utils/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala similarity index 89% rename from sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala rename to common/utils/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala index 738c79769bb..77415fb4759 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala +++ b/common/utils/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala @@ -42,6 +42,14 @@ class StreamingQueryException private[sql]( messageParameters: Map[String, String]) extends Exception(message, cause) with SparkThrowable { + private[spark] def this( + message: String, + cause: Throwable, + errorClass: String, + messageParameters: Map[String, String]) = { +this("", message, cause, null, null, errorClass, messageParameters) + } + def this( queryDebugString: String, cause: Throwable, @@ -62,6 +70,8 @@ class StreamingQueryException private[sql]( /** Time when the exception occurred */ val time: Long = System.currentTimeMillis + override def getMessage: String = s"${message}\n${queryDebugString}" + override def toString(): String = s"""${classOf[StreamingQueryException].getName}: ${cause.getMessage} |$queryDebugString""".stripMargin diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala index a48367b468d..48ef0a907b5 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala @@ -242,17 +242,15 @@ class RemoteStreamingQuery( } override def exception: Option[StreamingQueryException] = { -val exception = executeQueryCmd(_.setException(true)).getException -if (exception.hasExceptionMessage) { - Some( -new StreamingQueryException( - // message maps to the return value of original StreamingQueryException's toString method - message = exception.getExceptionMessage, - errorClass = exception.getErrorClass, - stackTrace = exception.getStackTrace)) -} else { - None +try { + // When exception field is set to false, the server throws a StreamingQueryException + // to the client. + executeQueryCmd(_.setException(false)) +} catch { + case e: StreamingQueryException => return Some(e) } + +None } private def executeQueryCmd( diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala deleted file mode
[spark] branch master updated: [SPARK-45456][BUILD] Upgrade maven to 3.9.5
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7e8021b6f52 [SPARK-45456][BUILD] Upgrade maven to 3.9.5 7e8021b6f52 is described below commit 7e8021b6f52cbc50cd3e072972e613159177253f Author: YangJie AuthorDate: Mon Oct 9 13:54:48 2023 -0700 [SPARK-45456][BUILD] Upgrade maven to 3.9.5 ### What changes were proposed in this pull request? This PR aims to upgrade Maven to 3.9.4 from 3.9.5. ### Why are the changes needed? The full release notes as follows: - https://maven.apache.org/docs/3.9.5/release-notes.html | https://github.com/apache/maven/releases/tag/maven-3.9.5 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Manual test : run `build/mvn -version` wll trigger download `apache-maven-3.9.5-bin.tar.gz` ``` exec: curl --silent --show-error -L https://www.apache.org/dyn/closer.lua/maven/maven-3/3.9.5/binaries/apache-maven-3.9.5-bin.tar.gz?action=download ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #43267 from LuciferYang/SPARK-45456. Lead-authored-by: YangJie Co-authored-by: yangjie01 Signed-off-by: Dongjoon Hyun --- dev/appveyor-install-dependencies.ps1 | 2 +- docs/building-spark.md| 2 +- pom.xml | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/dev/appveyor-install-dependencies.ps1 b/dev/appveyor-install-dependencies.ps1 index c07405e01e9..f410e8774d9 100644 --- a/dev/appveyor-install-dependencies.ps1 +++ b/dev/appveyor-install-dependencies.ps1 @@ -81,7 +81,7 @@ if (!(Test-Path $tools)) { # == Maven # Push-Location $tools # -# $mavenVer = "3.9.4" +# $mavenVer = "3.9.5" # Start-FileDownload "https://archive.apache.org/dist/maven/maven-3/$mavenVer/binaries/apache-maven-$mavenVer-bin.zip; "maven.zip" # # # extract diff --git a/docs/building-spark.md b/docs/building-spark.md index 95416f1e011..90a520a62a9 100644 --- a/docs/building-spark.md +++ b/docs/building-spark.md @@ -27,7 +27,7 @@ license: | ## Apache Maven The Maven-based build is the build of reference for Apache Spark. -Building Spark using Maven requires Maven 3.9.4 and Java 17/21. +Building Spark using Maven requires Maven 3.9.5 and Java 17/21. Spark requires Scala 2.13; support for Scala 2.12 was removed in Spark 4.0.0. ### Setting up Maven's Memory Usage diff --git a/pom.xml b/pom.xml index a98e1c6fd99..3e43bc04707 100644 --- a/pom.xml +++ b/pom.xml @@ -115,7 +115,7 @@ 17 ${java.version} ${java.version} -3.9.4 +3.9.5 3.1.0 spark 9.5 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-44729][PYTHON][DOCS][3.3] Add canonical links to the PySpark docs page
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 454abbfd66b [SPARK-44729][PYTHON][DOCS][3.3] Add canonical links to the PySpark docs page 454abbfd66b is described below commit 454abbfd66be20ab7e8b8342b4e9c25c1332be14 Author: panbingkun AuthorDate: Mon Oct 9 13:53:34 2023 -0700 [SPARK-44729][PYTHON][DOCS][3.3] Add canonical links to the PySpark docs page ### What changes were proposed in this pull request? The pr aims to add canonical links to the PySpark docs page, backport this to branch 3.3. Master branch pr: https://github.com/apache/spark/pull/42425. ### Why are the changes needed? Backport this to branch 3.3. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GA. - Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43286 from panbingkun/branch-3.3_SPARK-44729. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- python/docs/source/conf.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/python/docs/source/conf.py b/python/docs/source/conf.py index e1bc4006393..2e2b5201262 100644 --- a/python/docs/source/conf.py +++ b/python/docs/source/conf.py @@ -248,6 +248,8 @@ html_use_index = False # Output file base name for HTML help builder. htmlhelp_basename = 'pysparkdoc' +# The base URL which points to the root of the HTML documentation. +html_baseurl = 'https://spark.apache.org/docs/latest/api/python' # -- Options for LaTeX output - - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.4 updated: [SPARK-44729][PYTHON][DOCS][3.4] Add canonical links to the PySpark docs page
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new b7850204170 [SPARK-44729][PYTHON][DOCS][3.4] Add canonical links to the PySpark docs page b7850204170 is described below commit b7850204170f0759f7fc60ddb597644370b5ecce Author: panbingkun AuthorDate: Mon Oct 9 13:48:37 2023 -0700 [SPARK-44729][PYTHON][DOCS][3.4] Add canonical links to the PySpark docs page ### What changes were proposed in this pull request? The pr aims to add canonical links to the PySpark docs page, backport this to branch 3.4. Master branch pr: https://github.com/apache/spark/pull/42425. ### Why are the changes needed? Backport this to branch 3.4. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GA. - Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43285 from panbingkun/branch-3.4_SPARK-44729. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- python/docs/source/conf.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/python/docs/source/conf.py b/python/docs/source/conf.py index 8203a802053..38c331048e7 100644 --- a/python/docs/source/conf.py +++ b/python/docs/source/conf.py @@ -259,6 +259,8 @@ html_use_index = False # Output file base name for HTML help builder. htmlhelp_basename = 'pysparkdoc' +# The base URL which points to the root of the HTML documentation. +html_baseurl = 'https://spark.apache.org/docs/latest/api/python' # -- Options for LaTeX output - - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ced321c8b5a -> 1e4797e88aa)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from ced321c8b5a [SPARK-45383][SQL] Fix error message for time travel with non-existing table add 1e4797e88aa [SPARK-45452][SQL][FOLLOWUP] Simplify path check logic No new revisions were added by this update. Summary of changes: .../org/apache/spark/util/HadoopFSUtils.scala | 27 +- .../org/apache/spark/util/HadoopFSUtilsSuite.scala | 32 ++ 2 files changed, 52 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-45383][SQL] Fix error message for time travel with non-existing table
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 8bf5a5bca3f [SPARK-45383][SQL] Fix error message for time travel with non-existing table 8bf5a5bca3f is described below commit 8bf5a5bca3f9f7db78182d14e56476d384f442fa Author: Wenchen Fan AuthorDate: Mon Oct 9 22:15:45 2023 +0300 [SPARK-45383][SQL] Fix error message for time travel with non-existing table ### What changes were proposed in this pull request? Fixes a small bug to report `TABLE_OR_VIEW_NOT_FOUND` error correctly for time travel. It was missed before because `RelationTimeTravel` is a leaf node but it may contain `UnresolvedRelation`. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? Yes, the error message becomes reasonable ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #43298 from cloud-fan/time-travel. Authored-by: Wenchen Fan Signed-off-by: Max Gekk (cherry picked from commit ced321c8b5a32c69dfb2841d4bec8a03f21b8038) Signed-off-by: Max Gekk --- .../apache/spark/sql/catalyst/analysis/CheckAnalysis.scala| 4 .../org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala | 11 +++ 2 files changed, 15 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index 511f3622e7e..533ea8a2b79 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -365,6 +365,9 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB }) operator match { + case RelationTimeTravel(u: UnresolvedRelation, _, _) => +u.tableNotFound(u.multipartIdentifier) + case etw: EventTimeWatermark => etw.eventTime.dataType match { case s: StructType @@ -377,6 +380,7 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB "eventName" -> toSQLId(etw.eventTime.name), "eventType" -> toSQLType(etw.eventTime.dataType))) } + case f: Filter if f.condition.dataType != BooleanType => f.failAnalysis( errorClass = "DATATYPE_MISMATCH.FILTER_NOT_BOOLEAN", diff --git a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala index 06f5600e0d1..7745e9c0a4e 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala @@ -3014,6 +3014,17 @@ class DataSourceV2SQLSuiteV1Filter sqlState = None, parameters = Map("relationId" -> "`x`")) + checkError( +exception = intercept[AnalysisException] { + sql("SELECT * FROM non_exist VERSION AS OF 1") +}, +errorClass = "TABLE_OR_VIEW_NOT_FOUND", +parameters = Map("relationName" -> "`non_exist`"), +context = ExpectedContext( + fragment = "non_exist", + start = 14, + stop = 22)) + val subquery1 = "SELECT 1 FROM non_exist" checkError( exception = intercept[AnalysisException] { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45383][SQL] Fix error message for time travel with non-existing table
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ced321c8b5a [SPARK-45383][SQL] Fix error message for time travel with non-existing table ced321c8b5a is described below commit ced321c8b5a32c69dfb2841d4bec8a03f21b8038 Author: Wenchen Fan AuthorDate: Mon Oct 9 22:15:45 2023 +0300 [SPARK-45383][SQL] Fix error message for time travel with non-existing table ### What changes were proposed in this pull request? Fixes a small bug to report `TABLE_OR_VIEW_NOT_FOUND` error correctly for time travel. It was missed before because `RelationTimeTravel` is a leaf node but it may contain `UnresolvedRelation`. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? Yes, the error message becomes reasonable ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #43298 from cloud-fan/time-travel. Authored-by: Wenchen Fan Signed-off-by: Max Gekk --- .../apache/spark/sql/catalyst/analysis/CheckAnalysis.scala| 4 .../org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala | 11 +++ 2 files changed, 15 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index e140625f47a..611dd7b3009 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -384,6 +384,9 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB }) operator match { + case RelationTimeTravel(u: UnresolvedRelation, _, _) => +u.tableNotFound(u.multipartIdentifier) + case etw: EventTimeWatermark => etw.eventTime.dataType match { case s: StructType @@ -396,6 +399,7 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB "eventName" -> toSQLId(etw.eventTime.name), "eventType" -> toSQLType(etw.eventTime.dataType))) } + case f: Filter if f.condition.dataType != BooleanType => f.failAnalysis( errorClass = "DATATYPE_MISMATCH.FILTER_NOT_BOOLEAN", diff --git a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala index ae639b272a2..047bc8de739 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala @@ -3014,6 +3014,17 @@ class DataSourceV2SQLSuiteV1Filter sqlState = None, parameters = Map("relationId" -> "`x`")) + checkError( +exception = intercept[AnalysisException] { + sql("SELECT * FROM non_exist VERSION AS OF 1") +}, +errorClass = "TABLE_OR_VIEW_NOT_FOUND", +parameters = Map("relationName" -> "`non_exist`"), +context = ExpectedContext( + fragment = "non_exist", + start = 14, + stop = 22)) + val subquery1 = "SELECT 1 FROM non_exist" checkError( exception = intercept[AnalysisException] { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45412][PYTHON][CONNECT][FOLLOW-UP] Remove unnecessary check
This is an automated email from the ASF dual-hosted git repository. beliefer pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f5cf3101991 [SPARK-45412][PYTHON][CONNECT][FOLLOW-UP] Remove unnecessary check f5cf3101991 is described below commit f5cf310199132f62b779e0244d15f7680e2ba856 Author: Ruifeng Zheng AuthorDate: Mon Oct 9 18:07:59 2023 +0800 [SPARK-45412][PYTHON][CONNECT][FOLLOW-UP] Remove unnecessary check ### What changes were proposed in this pull request? Remove unnecessary check ### Why are the changes needed? https://github.com/apache/spark/pull/43215 already validates the plan in `__init__` ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #43287 from zhengruifeng/SPARK-45412-followup. Authored-by: Ruifeng Zheng Signed-off-by: Jiaan Geng --- python/pyspark/sql/connect/dataframe.py | 16 +--- 1 file changed, 1 insertion(+), 15 deletions(-) diff --git a/python/pyspark/sql/connect/dataframe.py b/python/pyspark/sql/connect/dataframe.py index 2c0a75fad46..4044fab3bb3 100644 --- a/python/pyspark/sql/connect/dataframe.py +++ b/python/pyspark/sql/connect/dataframe.py @@ -169,7 +169,6 @@ class DataFrame: @property def write(self) -> "DataFrameWriter": -assert self._plan is not None return DataFrameWriter(self._plan, self._session) write.__doc__ = PySparkDataFrame.write.__doc__ @@ -1096,11 +1095,6 @@ class DataFrame: union.__doc__ = PySparkDataFrame.union.__doc__ def unionAll(self, other: "DataFrame") -> "DataFrame": -if other._plan is None: -raise PySparkValueError( -error_class="MISSING_VALID_PLAN", -message_parameters={"operator": "Union"}, -) self._check_same_session(other) return DataFrame.withPlan( plan.SetOperation(self._plan, other._plan, "union", is_all=True), session=self._session @@ -2030,8 +2024,6 @@ class DataFrame: mapInArrow.__doc__ = PySparkDataFrame.mapInArrow.__doc__ def foreach(self, f: Callable[[Row], None]) -> None: -assert self._plan is not None - def foreach_func(row: Any) -> None: f(row) @@ -2042,8 +2034,6 @@ class DataFrame: foreach.__doc__ = PySparkDataFrame.foreach.__doc__ def foreachPartition(self, f: Callable[[Iterator[Row]], None]) -> None: -assert self._plan is not None - schema = self.schema field_converters = [ ArrowTableToRowsConversion._create_converter(f.dataType) for f in schema.fields @@ -2069,14 +2059,12 @@ class DataFrame: @property def writeStream(self) -> DataStreamWriter: -assert self._plan is not None return DataStreamWriter(plan=self._plan, session=self._session) writeStream.__doc__ = PySparkDataFrame.writeStream.__doc__ def sameSemantics(self, other: "DataFrame") -> bool: -assert self._plan is not None -assert other._plan is not None +self._check_same_session(other) return self._session.client.same_semantics( plan=self._plan.to_proto(self._session.client), other=other._plan.to_proto(other._session.client), @@ -2085,7 +2073,6 @@ class DataFrame: sameSemantics.__doc__ = PySparkDataFrame.sameSemantics.__doc__ def semanticHash(self) -> int: -assert self._plan is not None return self._session.client.semantic_hash( plan=self._plan.to_proto(self._session.client), ) @@ -2093,7 +2080,6 @@ class DataFrame: semanticHash.__doc__ = PySparkDataFrame.semanticHash.__doc__ def writeTo(self, table: str) -> "DataFrameWriterV2": -assert self._plan is not None return DataFrameWriterV2(self._plan, self._session, table) writeTo.__doc__ = PySparkDataFrame.writeTo.__doc__ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-45459][SQL][TESTS][DOCS] Remove the last 2 extra spaces in the automatically generated `sql-error-conditions.md` file
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 4841a404be3 [SPARK-45459][SQL][TESTS][DOCS] Remove the last 2 extra spaces in the automatically generated `sql-error-conditions.md` file 4841a404be3 is described below commit 4841a404be3c37fc16031a0119b321eefcb2faab Author: panbingkun AuthorDate: Mon Oct 9 12:32:14 2023 +0300 [SPARK-45459][SQL][TESTS][DOCS] Remove the last 2 extra spaces in the automatically generated `sql-error-conditions.md` file ### What changes were proposed in this pull request? The pr aims to remove the last 2 extra spaces in the automatically generated `sql-error-conditions.md` file. ### Why are the changes needed? - When I am work on another PR, I use the following command: ``` SPARK_GENERATE_GOLDEN_FILES=1 build/sbt \ "core/testOnly *SparkThrowableSuite -- -t \"Error classes match with document\"" ``` I found that in the automatically generated `sql-error-conditions.md` file, there are 2 extra spaces added at the end, Obviously, this is not what we expected, otherwise we would need to manually remove it, which is not in line with automation. - The git tells us this difference, as follows: https://github.com/apache/spark/assets/15246973/a68b657f-3a00-4405-9623-1f7ab9d44d82;> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GA. - Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43274 from panbingkun/SPARK-45459. Authored-by: panbingkun Signed-off-by: Max Gekk (cherry picked from commit af800b505956ff26e03c5fc56b6cb4ac5c0efe2f) Signed-off-by: Max Gekk --- core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala index 0249cde5488..299bcea3f9e 100644 --- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala +++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala @@ -253,8 +253,7 @@ class SparkThrowableSuite extends SparkFunSuite { | |Also see [SQLSTATE Codes](sql-error-conditions-sqlstates.html). | - |$sqlErrorParentDocContent - |""".stripMargin + |$sqlErrorParentDocContent""".stripMargin errors.filter(_._2.subClass.isDefined).foreach(error => { val name = error._1 @@ -316,7 +315,7 @@ class SparkThrowableSuite extends SparkFunSuite { } FileUtils.writeStringToFile( parentDocPath.toFile, - sqlErrorParentDoc + lineSeparator, + sqlErrorParentDoc, StandardCharsets.UTF_8) } } else { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4493b431192 -> af800b50595)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 4493b431192 [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match add af800b50595 [SPARK-45459][SQL][TESTS][DOCS] Remove the last 2 extra spaces in the automatically generated `sql-error-conditions.md` file No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 5f8ae9a3dbd [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match 5f8ae9a3dbd is described below commit 5f8ae9a3dbd2c7624bffd588483c9916c302c081 Author: Jia Fan AuthorDate: Mon Oct 9 12:30:20 2023 +0300 [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match ### What changes were proposed in this pull request? When use custom pattern to parse timestamp, if there have matched prefix, not matched all. The `Iso8601TimestampFormatter::parseOptional` and `Iso8601TimestampFormatter::parseWithoutTimeZoneOptional` should not return not empty result. eg: pattern = `-MM-dd HH:mm:ss`, value = `-12-31 23:59:59.999`. If fact, `-MM-dd HH:mm:ss` can parse `-12-31 23:59:59` normally, but value have suffix `.999`. so we can't return not empty result. This bug will affect inference the schema in CSV/JSON. ### Why are the changes needed? Fix inference the schema bug. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? add new test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43245 from Hisoka-X/SPARK-45424-inference-schema-unresolved. Authored-by: Jia Fan Signed-off-by: Max Gekk (cherry picked from commit 4493b431192fcdbab1379b7ffb89eea0cdaa19f1) Signed-off-by: Max Gekk --- .../apache/spark/sql/catalyst/util/TimestampFormatter.scala| 10 ++ .../spark/sql/catalyst/util/TimestampFormatterSuite.scala | 10 ++ 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala index 8a288d0e9f3..55eee41c14c 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala +++ b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala @@ -167,8 +167,9 @@ class Iso8601TimestampFormatter( override def parseOptional(s: String): Option[Long] = { try { - val parsed = formatter.parseUnresolved(s, new ParsePosition(0)) - if (parsed != null) { + val parsePosition = new ParsePosition(0) + val parsed = formatter.parseUnresolved(s, parsePosition) + if (parsed != null && s.length == parsePosition.getIndex) { Some(extractMicros(parsed)) } else { None @@ -196,8 +197,9 @@ class Iso8601TimestampFormatter( override def parseWithoutTimeZoneOptional(s: String, allowTimeZone: Boolean): Option[Long] = { try { - val parsed = formatter.parseUnresolved(s, new ParsePosition(0)) - if (parsed != null) { + val parsePosition = new ParsePosition(0) + val parsed = formatter.parseUnresolved(s, parsePosition) + if (parsed != null && s.length == parsePosition.getIndex) { Some(extractMicrosNTZ(s, parsed, allowTimeZone)) } else { None diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala index eb173bc7f8c..2134a0d6ecd 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala @@ -507,4 +507,14 @@ class TimestampFormatterSuite extends DatetimeFormatterSuite { assert(simpleFormatter.parseOptional("abc").isEmpty) } + + test("SPARK-45424: do not return optional parse results when only prefix match") { +val formatter = new Iso8601TimestampFormatter( + "-MM-dd HH:mm:ss", + locale = DateFormatter.defaultLocale, + legacyFormat = LegacyDateFormats.SIMPLE_DATE_FORMAT, + isParsing = true, zoneId = DateTimeTestUtils.LA) +assert(formatter.parseOptional("-12-31 23:59:59.999").isEmpty) +assert(formatter.parseWithoutTimeZoneOptional("-12-31 23:59:59.999", true).isEmpty) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4493b431192 [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match 4493b431192 is described below commit 4493b431192fcdbab1379b7ffb89eea0cdaa19f1 Author: Jia Fan AuthorDate: Mon Oct 9 12:30:20 2023 +0300 [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match ### What changes were proposed in this pull request? When use custom pattern to parse timestamp, if there have matched prefix, not matched all. The `Iso8601TimestampFormatter::parseOptional` and `Iso8601TimestampFormatter::parseWithoutTimeZoneOptional` should not return not empty result. eg: pattern = `-MM-dd HH:mm:ss`, value = `-12-31 23:59:59.999`. If fact, `-MM-dd HH:mm:ss` can parse `-12-31 23:59:59` normally, but value have suffix `.999`. so we can't return not empty result. This bug will affect inference the schema in CSV/JSON. ### Why are the changes needed? Fix inference the schema bug. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? add new test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43245 from Hisoka-X/SPARK-45424-inference-schema-unresolved. Authored-by: Jia Fan Signed-off-by: Max Gekk --- .../apache/spark/sql/catalyst/util/TimestampFormatter.scala| 10 ++ .../spark/sql/catalyst/util/TimestampFormatterSuite.scala | 10 ++ 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala index 8a288d0e9f3..55eee41c14c 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala +++ b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala @@ -167,8 +167,9 @@ class Iso8601TimestampFormatter( override def parseOptional(s: String): Option[Long] = { try { - val parsed = formatter.parseUnresolved(s, new ParsePosition(0)) - if (parsed != null) { + val parsePosition = new ParsePosition(0) + val parsed = formatter.parseUnresolved(s, parsePosition) + if (parsed != null && s.length == parsePosition.getIndex) { Some(extractMicros(parsed)) } else { None @@ -196,8 +197,9 @@ class Iso8601TimestampFormatter( override def parseWithoutTimeZoneOptional(s: String, allowTimeZone: Boolean): Option[Long] = { try { - val parsed = formatter.parseUnresolved(s, new ParsePosition(0)) - if (parsed != null) { + val parsePosition = new ParsePosition(0) + val parsed = formatter.parseUnresolved(s, parsePosition) + if (parsed != null && s.length == parsePosition.getIndex) { Some(extractMicrosNTZ(s, parsed, allowTimeZone)) } else { None diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala index ecd849dd3af..d2fc89a034f 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala @@ -491,4 +491,14 @@ class TimestampFormatterSuite extends DatetimeFormatterSuite { assert(simpleFormatter.parseOptional("abc").isEmpty) } + + test("SPARK-45424: do not return optional parse results when only prefix match") { +val formatter = new Iso8601TimestampFormatter( + "-MM-dd HH:mm:ss", + locale = DateFormatter.defaultLocale, + legacyFormat = LegacyDateFormats.SIMPLE_DATE_FORMAT, + isParsing = true, zoneId = DateTimeTestUtils.LA) +assert(formatter.parseOptional("-12-31 23:59:59.999").isEmpty) +assert(formatter.parseWithoutTimeZoneOptional("-12-31 23:59:59.999", true).isEmpty) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Re: [PR] Add canonical links to the PySpark docs page for published docs [spark-website]
panbingkun commented on PR #482: URL: https://github.com/apache/spark-website/pull/482#issuecomment-1752580071 At present, this PR has only made changes to the historical document of version 3.1.1. If there are no issues with similar modifications, I will use tools to make similar modifications to other versions. @zhengruifeng @allisonport-db @MrPowers @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[PR] Add canonical links to the PySpark docs page for published docs [spark-website]
panbingkun opened a new pull request, #482: URL: https://github.com/apache/spark-website/pull/482 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org