[spark] branch master updated: [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core)

2023-10-09 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4dbe4ffebfc [SPARK-45450][PYTHON] Fix imports according to PEP8: 
pyspark.pandas and pyspark (core)
4dbe4ffebfc is described below

commit 4dbe4ffebfc8cc3a894c9e798c5a7b364cf7a399
Author: Hyukjin Kwon 
AuthorDate: Tue Oct 10 14:26:45 2023 +0900

[SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and 
pyspark (core)

### What changes were proposed in this pull request?

This PR proposes to fix imports according to PEP8 in `pyspark.pandas` and 
`pyspark.*` (core), see https://peps.python.org/pep-0008/#imports.

### Why are the changes needed?

I have not been fixing them as they are too minor. However, this practice 
is being propagated across the whole PySpark packages, and I think we should 
fix them all so other users do not follow the non-standard practice.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing linters and tests should cover.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43257 from HyukjinKwon/SPARK-45450.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/conf.py|  1 +
 python/pyspark/errors_doc_gen.py  |  1 +
 python/pyspark/java_gateway.py|  1 +
 python/pyspark/join.py|  3 ++-
 python/pyspark/pandas/accessors.py|  1 -
 python/pyspark/pandas/base.py |  2 +-
 python/pyspark/pandas/config.py   |  1 -
 python/pyspark/pandas/correlation.py  |  1 -
 python/pyspark/pandas/data_type_ops/date_ops.py   |  1 -
 python/pyspark/pandas/data_type_ops/datetime_ops.py   |  1 -
 python/pyspark/pandas/data_type_ops/string_ops.py |  1 -
 python/pyspark/pandas/frame.py|  7 ++-
 python/pyspark/pandas/generic.py  |  1 -
 python/pyspark/pandas/groupby.py  |  2 --
 python/pyspark/pandas/indexes/base.py |  1 -
 python/pyspark/pandas/indexes/multi.py|  2 --
 python/pyspark/pandas/indexing.py |  6 ++
 python/pyspark/pandas/internal.py | 11 +++
 python/pyspark/pandas/mlflow.py   |  4 ++--
 python/pyspark/pandas/namespace.py|  2 +-
 python/pyspark/pandas/numpy_compat.py |  2 +-
 python/pyspark/pandas/plot/core.py|  6 +++---
 python/pyspark/pandas/plot/matplotlib.py  |  1 -
 python/pyspark/pandas/resample.py |  2 --
 python/pyspark/pandas/series.py   |  2 +-
 python/pyspark/pandas/spark/accessors.py  |  3 ---
 python/pyspark/pandas/spark/functions.py  |  2 --
 python/pyspark/pandas/sql_processor.py|  2 +-
 python/pyspark/pandas/strings.py  |  3 +--
 python/pyspark/pandas/supported_api_gen.py|  6 +++---
 python/pyspark/pandas/tests/computation/test_corrwith.py  |  1 -
 python/pyspark/pandas/tests/computation/test_cov.py   |  1 -
 .../pandas/tests/connect/data_type_ops/testing_utils.py   |  1 -
 python/pyspark/pandas/tests/connect/test_parity_extension.py  |  1 +
 python/pyspark/pandas/tests/connect/test_parity_indexing.py   |  1 +
 .../pyspark/pandas/tests/connect/test_parity_numpy_compat.py  |  1 +
 python/pyspark/pandas/tests/data_type_ops/testing_utils.py|  2 --
 python/pyspark/pandas/tests/frame/test_reshaping.py   |  1 -
 python/pyspark/pandas/tests/frame/test_spark.py   |  2 +-
 python/pyspark/pandas/tests/series/test_series.py |  1 -
 python/pyspark/pandas/tests/series/test_stat.py   |  2 +-
 python/pyspark/pandas/tests/test_indexops_spark.py|  2 +-
 python/pyspark/pandas/tests/test_stats.py |  1 +
 python/pyspark/pandas/utils.py|  7 +++
 python/pyspark/pandas/window.py   |  2 --
 python/pyspark/profiler.py|  1 -
 python/pyspark/rdd.py |  6 +++---
 python/pyspark/shuffle.py |  2 +-
 python/pyspark/tests/test_statcounter.py  |  3 ++-
 python/pyspark/util.py 

Re: [PR] Add canonical links to the PySpark docs page for published docs [spark-website]

2023-10-09 Thread via GitHub


panbingkun commented on PR #482:
URL: https://github.com/apache/spark-website/pull/482#issuecomment-1754307942

   (base) panbingkun:~/Developer/spark/spark-website-community$git show | grep 
"diff --git a/site/docs" | wc -l
  23474


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



Re: [PR] Add canonical links to the PySpark docs page for published docs [spark-website]

2023-10-09 Thread via GitHub


panbingkun commented on PR #482:
URL: https://github.com/apache/spark-website/pull/482#issuecomment-1754305344

   This PR has completed the `canonical links` addition of all HTML documents 
in all version (3.1.1, 3.1.2, 3.1.3, 3.2.0, 3.2.1, 3.2.2, 3.2.3, 3.2.4, 3.3.0, 
3.3.1, 3.3.2, 3.3.3, 3.4.0, 3.4.1), but excluding the following files:
   
   - V3.1.1
site/docs/3.1.1/api/python/_static/webpack-macros.html
site/docs/3.1.1/api/python/search.html

site/docs/3.1.1/api/python/_modules/abc.html
site/docs/3.1.1/api/python/migration_guide/pyspark_1.0_1.2_to_1.3.html
site/docs/3.1.1/api/python/migration_guide/pyspark_1.4_to_1.5.html
site/docs/3.1.1/api/python/migration_guide/pyspark_2.2_to_2.3.html

site/docs/3.1.1/api/python/migration_guide/pyspark_2.3.0_to_2.3.1_above.html
site/docs/3.1.1/api/python/migration_guide/pyspark_2.3_to_2.4.html
site/docs/3.1.1/api/python/migration_guide/pyspark_2.4_to_3.0.html
   
   
   - V3.1.2
site/docs/3.1.2/api/python/_static/webpack-macros.html
site/docs/3.1.2/api/python/search.html

site/docs/3.1.2/api/python/_modules/abc.html
site/docs/3.1.2/api/python/migration_guide/pyspark_1.0_1.2_to_1.3.html
site/docs/3.1.2/api/python/migration_guide/pyspark_1.4_to_1.5.html
site/docs/3.1.2/api/python/migration_guide/pyspark_2.2_to_2.3.html

site/docs/3.1.2/api/python/migration_guide/pyspark_2.3.0_to_2.3.1_above.html
site/docs/3.1.2/api/python/migration_guide/pyspark_2.3_to_2.4.html
site/docs/3.1.2/api/python/migration_guide/pyspark_2.4_to_3.0.html
   
   
   - V3.1.3
site/docs/3.1.3/api/python/_static/webpack-macros.html
site/docs/3.1.3/api/python/search.html

site/docs/3.1.3/api/python/_modules/abc.html
site/docs/3.1.3/api/python/migration_guide/pyspark_1.0_1.2_to_1.3.html
site/docs/3.1.3/api/python/migration_guide/pyspark_1.4_to_1.5.html
site/docs/3.1.3/api/python/migration_guide/pyspark_2.2_to_2.3.html

site/docs/3.1.3/api/python/migration_guide/pyspark_2.3.0_to_2.3.1_above.html
site/docs/3.1.3/api/python/migration_guide/pyspark_2.3_to_2.4.html
site/docs/3.1.3/api/python/migration_guide/pyspark_2.4_to_3.0.html
   
   
   - V3.2.0
site/docs/3.2.0/api/python/_static/webpack-macros.html
site/docs/3.2.0/api/python/search.html


site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-bar-2.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-bar-3.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-pie-1.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-density-3.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-hist-2.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-line-1.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-kde-1.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-barh-2.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-hist-1.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-density-4.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-kde-1.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-barh-5.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-area-2.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-barh-4.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-box-1.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-hist-2.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-density-5.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-barh-3.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-density-2.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-pie-1.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-bar-3.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-bar-2.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-Series-plot-density-2.html

site/docs/3.2.0/api/python/reference/pyspark.pandas/api/pyspark-pandas-DataFrame-plot-scatter-2.html


[spark] branch master updated: [SPARK-45472][SS] RocksDB State Store Doesn't Need to Recheck checkpoint path existence

2023-10-09 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 08640961e3b [SPARK-45472][SS] RocksDB State Store Doesn't Need to 
Recheck checkpoint path existence
08640961e3b is described below

commit 08640961e3bad7de38ed3358df8706bad028c27a
Author: Siying Dong 
AuthorDate: Tue Oct 10 11:24:31 2023 +0900

[SPARK-45472][SS] RocksDB State Store Doesn't Need to Recheck checkpoint 
path existence

### What changes were proposed in this pull request?
In RocksDBFileManager, we add a variable to indicate that root path is 
already checked and created if not existing, so that we don't need to recheck 
the second time.

### Why are the changes needed?
Right now, every time RocksDB.load() is called, we check checkpoint 
directory existence and create it if not. This is relatively expensive and show 
up in performance profiling.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing CI tests to cover it.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43299 from siying/rootPath.

Authored-by: Siying Dong 
Signed-off-by: Jungtaek Lim 
---
 .../execution/streaming/state/RocksDBFileManager.scala   | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
index eae9aac3c0a..3d0745c2fb3 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
@@ -146,10 +146,15 @@ class RocksDBFileManager(
 
   private def codec = CompressionCodec.createCodec(sparkConf, codecName)
 
+  @volatile private var rootDirChecked: Boolean = false
+
   def getChangeLogWriter(version: Long): StateStoreChangelogWriter = {
-val rootDir = new Path(dfsRootDir)
 val changelogFile = dfsChangelogFile(version)
-if (!fm.exists(rootDir)) fm.mkdirs(rootDir)
+if (!rootDirChecked) {
+  val rootDir = new Path(dfsRootDir)
+  if (!fm.exists(rootDir)) fm.mkdirs(rootDir)
+  rootDirChecked = true
+}
 val changelogWriter = new StateStoreChangelogWriter(fm, changelogFile, 
codec)
 changelogWriter
   }
@@ -193,8 +198,11 @@ class RocksDBFileManager(
   // CheckpointFileManager.createAtomic API which doesn't auto-initialize 
parent directories.
   // Moreover, once we disable to track the number of keys, in which the 
numKeys is -1, we
   // still need to create the initial dfs root directory anyway.
-  val path = new Path(dfsRootDir)
-  if (!fm.exists(path)) fm.mkdirs(path)
+  if (!rootDirChecked) {
+val path = new Path(dfsRootDir)
+if (!fm.exists(path)) fm.mkdirs(path)
+rootDirChecked = true
+  }
 }
 zipToDfsFile(localOtherFiles :+ metadataFile, dfsBatchZipFile(version))
 logInfo(s"Saved checkpoint file for version $version")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance

2023-10-09 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new ac4b9154b58 [SPARK-45419][SS] Avoid reusing rocksdb sst files in a 
dfferent rocksdb instance
ac4b9154b58 is described below

commit ac4b9154b5822779023e66f2efb24d05e20b1cca
Author: Chaoqin Li 
AuthorDate: Tue Oct 10 11:03:19 2023 +0900

[SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb 
instance

### What changes were proposed in this pull request?
When loading a rocksdb instance, remove file version map entry of larger 
versions to avoid rocksdb sst file unique id mismatch exception. The SST files 
in larger versions can't be reused even if they have the same size and name 
because they belong to another rocksdb instance.

### Why are the changes needed?
Avoid rocksdb file mismatch exception that may occur in runtime.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add rocksdb unit test.

Closes #43174 from chaoqin-li1123/rocksdb_mismatch.

Authored-by: Chaoqin Li 
Signed-off-by: Jungtaek Lim 
---
 .../streaming/state/RocksDBFileManager.scala   |  4 +++
 .../execution/streaming/state/RocksDBSuite.scala   | 29 ++
 2 files changed, 33 insertions(+)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
index 0891d773713..faf9cd701ae 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
@@ -207,6 +207,10 @@ class RocksDBFileManager(
*/
   def loadCheckpointFromDfs(version: Long, localDir: File): 
RocksDBCheckpointMetadata = {
 logInfo(s"Loading checkpoint files for version $version")
+// The unique ids of SST files are checked when opening a rocksdb 
instance. The SST files
+// in larger versions can't be reused even if they have the same size and 
name because
+// they belong to another rocksdb instance.
+versionToRocksDBFiles.keySet().removeIf(_ >= version)
 val metadata = if (version == 0) {
   if (localDir.exists) Utils.deleteRecursively(localDir)
   localDir.mkdirs()
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala
index e31b05c362f..91dd8582207 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala
@@ -214,6 +214,35 @@ class RocksDBSuite extends 
AlsoTestWithChangelogCheckpointingEnabled with Shared
 }
   }
 
+  testWithChangelogCheckpointingEnabled("SPARK-45419: Do not reuse SST files" +
+" in different RocksDB instances") {
+val remoteDir = Utils.createTempDir().toString
+val conf = dbConf.copy(minDeltasForSnapshot = 0, compactOnCommit = false)
+new File(remoteDir).delete()  // to make sure that the directory gets 
created
+withDB(remoteDir, conf = conf) { db =>
+  for (version <- 0 to 2) {
+db.load(version)
+db.put(version.toString, version.toString)
+db.commit()
+  }
+  // upload snapshot 3.zip
+  db.doMaintenance()
+  // Roll back to version 1 and start to process data.
+  for (version <- 1 to 3) {
+db.load(version)
+db.put(version.toString, version.toString)
+db.commit()
+  }
+  // Upload snapshot 4.zip, should not reuse the SST files in 3.zip
+  db.doMaintenance()
+}
+
+withDB(remoteDir, conf = conf) { db =>
+  // Open the db to verify that the state in 4.zip is no corrupted.
+  db.load(4)
+}
+  }
+
   // A rocksdb instance with changelog checkpointing enabled should be able to 
load
   // an existing checkpoint without changelog.
   testWithChangelogCheckpointingEnabled(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance

2023-10-09 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ff87c0958f7 [SPARK-45419][SS] Avoid reusing rocksdb sst files in a 
dfferent rocksdb instance
ff87c0958f7 is described below

commit ff87c0958f79b16c2f276e0dc53a855fca558347
Author: Chaoqin Li 
AuthorDate: Tue Oct 10 11:03:19 2023 +0900

[SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb 
instance

### What changes were proposed in this pull request?
When loading a rocksdb instance, remove file version map entry of larger 
versions to avoid rocksdb sst file unique id mismatch exception. The SST files 
in larger versions can't be reused even if they have the same size and name 
because they belong to another rocksdb instance.

### Why are the changes needed?
Avoid rocksdb file mismatch exception that may occur in runtime.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add rocksdb unit test.

Closes #43174 from chaoqin-li1123/rocksdb_mismatch.

Authored-by: Chaoqin Li 
Signed-off-by: Jungtaek Lim 
---
 .../streaming/state/RocksDBFileManager.scala   |  4 +++
 .../execution/streaming/state/RocksDBSuite.scala   | 29 ++
 2 files changed, 33 insertions(+)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
index 05fd845accb..eae9aac3c0a 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
@@ -208,6 +208,10 @@ class RocksDBFileManager(
*/
   def loadCheckpointFromDfs(version: Long, localDir: File): 
RocksDBCheckpointMetadata = {
 logInfo(s"Loading checkpoint files for version $version")
+// The unique ids of SST files are checked when opening a rocksdb 
instance. The SST files
+// in larger versions can't be reused even if they have the same size and 
name because
+// they belong to another rocksdb instance.
+versionToRocksDBFiles.keySet().removeIf(_ >= version)
 val metadata = if (version == 0) {
   if (localDir.exists) Utils.deleteRecursively(localDir)
   localDir.mkdirs()
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala
index b4b67f381d2..764358dc1f0 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala
@@ -240,6 +240,35 @@ class RocksDBSuite extends 
AlsoTestWithChangelogCheckpointingEnabled with Shared
 }
   }
 
+  testWithChangelogCheckpointingEnabled("SPARK-45419: Do not reuse SST files" +
+" in different RocksDB instances") {
+val remoteDir = Utils.createTempDir().toString
+val conf = dbConf.copy(minDeltasForSnapshot = 0, compactOnCommit = false)
+new File(remoteDir).delete()  // to make sure that the directory gets 
created
+withDB(remoteDir, conf = conf) { db =>
+  for (version <- 0 to 2) {
+db.load(version)
+db.put(version.toString, version.toString)
+db.commit()
+  }
+  // upload snapshot 3.zip
+  db.doMaintenance()
+  // Roll back to version 1 and start to process data.
+  for (version <- 1 to 3) {
+db.load(version)
+db.put(version.toString, version.toString)
+db.commit()
+  }
+  // Upload snapshot 4.zip, should not reuse the SST files in 3.zip
+  db.doMaintenance()
+}
+
+withDB(remoteDir, conf = conf) { db =>
+  // Open the db to verify that the state in 4.zip is no corrupted.
+  db.load(4)
+}
+  }
+
   // A rocksdb instance with changelog checkpointing enabled should be able to 
load
   // an existing checkpoint without changelog.
   testWithChangelogCheckpointingEnabled(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



Re: [PR] Add canonical links to the PySpark docs page for published docs [spark-website]

2023-10-09 Thread via GitHub


panbingkun commented on PR #482:
URL: https://github.com/apache/spark-website/pull/482#issuecomment-1754195368

   I will perform similar operations on other versions of HTML documents.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-43299][SS][CONNECT] Convert StreamingQueryException in Scala Client

2023-10-09 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 26527c43e65 [SPARK-43299][SS][CONNECT] Convert StreamingQueryException 
in Scala Client
26527c43e65 is described below

commit 26527c43e652718c6d6be8f2ae2f92e835e1b328
Author: Yihong He 
AuthorDate: Tue Oct 10 08:41:07 2023 +0900

[SPARK-43299][SS][CONNECT] Convert StreamingQueryException in Scala Client

### What changes were proposed in this pull request?

- Convert StreamingQueryException in Scala Client
- Move StreamingQueryException to common/utils module
- Implement (message, cause) constructor and getMessage() for 
StreamingQueryException
- Get StreamingQueryException in StreamingQuery.exception by throw instead 
of GRPC response

### Why are the changes needed?

- Compatibility with the existing behavior

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

- Existing tests

### Was this patch authored or co-authored using generative AI tooling?

Closes #42859 from heyihong/SPARK-43299.

Authored-by: Yihong He 
Signed-off-by: Hyukjin Kwon 
---
 .../sql/streaming/StreamingQueryException.scala| 10 +
 .../spark/sql/streaming/StreamingQuery.scala   | 18 -
 .../sql/streaming/StreamingQueryException.scala| 47 --
 .../CheckConnectJvmClientCompatibility.scala   | 12 --
 .../sql/streaming/ClientStreamingQuerySuite.scala  |  4 +-
 .../connect/client/GrpcExceptionConverter.scala|  7 
 .../sql/connect/planner/SparkConnectPlanner.scala  |  6 +++
 project/MimaExcludes.scala |  3 ++
 8 files changed, 37 insertions(+), 70 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala
 
b/common/utils/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala
similarity index 89%
rename from 
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala
rename to 
common/utils/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala
index 738c79769bb..77415fb4759 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala
+++ 
b/common/utils/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala
@@ -42,6 +42,14 @@ class StreamingQueryException private[sql](
 messageParameters: Map[String, String])
   extends Exception(message, cause) with SparkThrowable {
 
+  private[spark] def this(
+  message: String,
+  cause: Throwable,
+  errorClass: String,
+  messageParameters: Map[String, String]) = {
+this("", message, cause, null, null, errorClass, messageParameters)
+  }
+
   def this(
   queryDebugString: String,
   cause: Throwable,
@@ -62,6 +70,8 @@ class StreamingQueryException private[sql](
   /** Time when the exception occurred */
   val time: Long = System.currentTimeMillis
 
+  override def getMessage: String = s"${message}\n${queryDebugString}"
+
   override def toString(): String =
 s"""${classOf[StreamingQueryException].getName}: ${cause.getMessage}
|$queryDebugString""".stripMargin
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala
index a48367b468d..48ef0a907b5 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala
@@ -242,17 +242,15 @@ class RemoteStreamingQuery(
   }
 
   override def exception: Option[StreamingQueryException] = {
-val exception = executeQueryCmd(_.setException(true)).getException
-if (exception.hasExceptionMessage) {
-  Some(
-new StreamingQueryException(
-  // message maps to the return value of original 
StreamingQueryException's toString method
-  message = exception.getExceptionMessage,
-  errorClass = exception.getErrorClass,
-  stackTrace = exception.getStackTrace))
-} else {
-  None
+try {
+  // When exception field is set to false, the server throws a 
StreamingQueryException
+  // to the client.
+  executeQueryCmd(_.setException(false))
+} catch {
+  case e: StreamingQueryException => return Some(e)
 }
+
+None
   }
 
   private def executeQueryCmd(
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala
deleted file mode 

[spark] branch master updated: [SPARK-45456][BUILD] Upgrade maven to 3.9.5

2023-10-09 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7e8021b6f52 [SPARK-45456][BUILD] Upgrade maven to 3.9.5
7e8021b6f52 is described below

commit 7e8021b6f52cbc50cd3e072972e613159177253f
Author: YangJie 
AuthorDate: Mon Oct 9 13:54:48 2023 -0700

[SPARK-45456][BUILD] Upgrade maven to 3.9.5

### What changes were proposed in this pull request?
This PR aims to upgrade Maven to 3.9.4 from 3.9.5.

### Why are the changes needed?
The full release notes as follows:

- https://maven.apache.org/docs/3.9.5/release-notes.html | 
https://github.com/apache/maven/releases/tag/maven-3.9.5

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass GitHub Actions
- Manual test :

run `build/mvn -version` wll trigger download 
`apache-maven-3.9.5-bin.tar.gz`

```
exec: curl --silent --show-error -L 
https://www.apache.org/dyn/closer.lua/maven/maven-3/3.9.5/binaries/apache-maven-3.9.5-bin.tar.gz?action=download
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43267 from LuciferYang/SPARK-45456.

Lead-authored-by: YangJie 
Co-authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 dev/appveyor-install-dependencies.ps1 | 2 +-
 docs/building-spark.md| 2 +-
 pom.xml   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/appveyor-install-dependencies.ps1 
b/dev/appveyor-install-dependencies.ps1
index c07405e01e9..f410e8774d9 100644
--- a/dev/appveyor-install-dependencies.ps1
+++ b/dev/appveyor-install-dependencies.ps1
@@ -81,7 +81,7 @@ if (!(Test-Path $tools)) {
 # == Maven
 # Push-Location $tools
 #
-# $mavenVer = "3.9.4"
+# $mavenVer = "3.9.5"
 # Start-FileDownload 
"https://archive.apache.org/dist/maven/maven-3/$mavenVer/binaries/apache-maven-$mavenVer-bin.zip;
 "maven.zip"
 #
 # # extract
diff --git a/docs/building-spark.md b/docs/building-spark.md
index 95416f1e011..90a520a62a9 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -27,7 +27,7 @@ license: |
 ## Apache Maven
 
 The Maven-based build is the build of reference for Apache Spark.
-Building Spark using Maven requires Maven 3.9.4 and Java 17/21.
+Building Spark using Maven requires Maven 3.9.5 and Java 17/21.
 Spark requires Scala 2.13; support for Scala 2.12 was removed in Spark 4.0.0.
 
 ### Setting up Maven's Memory Usage
diff --git a/pom.xml b/pom.xml
index a98e1c6fd99..3e43bc04707 100644
--- a/pom.xml
+++ b/pom.xml
@@ -115,7 +115,7 @@
 17
 ${java.version}
 ${java.version}
-3.9.4
+3.9.5
 3.1.0
 spark
 9.5


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.3 updated: [SPARK-44729][PYTHON][DOCS][3.3] Add canonical links to the PySpark docs page

2023-10-09 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 454abbfd66b [SPARK-44729][PYTHON][DOCS][3.3] Add canonical links to 
the PySpark docs page
454abbfd66b is described below

commit 454abbfd66be20ab7e8b8342b4e9c25c1332be14
Author: panbingkun 
AuthorDate: Mon Oct 9 13:53:34 2023 -0700

[SPARK-44729][PYTHON][DOCS][3.3] Add canonical links to the PySpark docs 
page

### What changes were proposed in this pull request?
The pr aims to add canonical links to the PySpark docs page, backport this 
to branch 3.3.
Master branch pr: https://github.com/apache/spark/pull/42425.

### Why are the changes needed?
Backport this to branch 3.3.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.
- Manually test.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43286 from panbingkun/branch-3.3_SPARK-44729.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 python/docs/source/conf.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/python/docs/source/conf.py b/python/docs/source/conf.py
index e1bc4006393..2e2b5201262 100644
--- a/python/docs/source/conf.py
+++ b/python/docs/source/conf.py
@@ -248,6 +248,8 @@ html_use_index = False
 # Output file base name for HTML help builder.
 htmlhelp_basename = 'pysparkdoc'
 
+# The base URL which points to the root of the HTML documentation.
+html_baseurl = 'https://spark.apache.org/docs/latest/api/python'
 
 # -- Options for LaTeX output -
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-44729][PYTHON][DOCS][3.4] Add canonical links to the PySpark docs page

2023-10-09 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new b7850204170 [SPARK-44729][PYTHON][DOCS][3.4] Add canonical links to 
the PySpark docs page
b7850204170 is described below

commit b7850204170f0759f7fc60ddb597644370b5ecce
Author: panbingkun 
AuthorDate: Mon Oct 9 13:48:37 2023 -0700

[SPARK-44729][PYTHON][DOCS][3.4] Add canonical links to the PySpark docs 
page

### What changes were proposed in this pull request?
The pr aims to add canonical links to the PySpark docs page, backport this 
to branch 3.4.
Master branch pr: https://github.com/apache/spark/pull/42425.

### Why are the changes needed?
Backport this to branch 3.4.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.
- Manually test.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43285 from panbingkun/branch-3.4_SPARK-44729.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 python/docs/source/conf.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/python/docs/source/conf.py b/python/docs/source/conf.py
index 8203a802053..38c331048e7 100644
--- a/python/docs/source/conf.py
+++ b/python/docs/source/conf.py
@@ -259,6 +259,8 @@ html_use_index = False
 # Output file base name for HTML help builder.
 htmlhelp_basename = 'pysparkdoc'
 
+# The base URL which points to the root of the HTML documentation.
+html_baseurl = 'https://spark.apache.org/docs/latest/api/python'
 
 # -- Options for LaTeX output -
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (ced321c8b5a -> 1e4797e88aa)

2023-10-09 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from ced321c8b5a [SPARK-45383][SQL] Fix error message for time travel with 
non-existing table
 add 1e4797e88aa [SPARK-45452][SQL][FOLLOWUP] Simplify path check logic

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/util/HadoopFSUtils.scala  | 27 +-
 .../org/apache/spark/util/HadoopFSUtilsSuite.scala | 32 ++
 2 files changed, 52 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-45383][SQL] Fix error message for time travel with non-existing table

2023-10-09 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 8bf5a5bca3f [SPARK-45383][SQL] Fix error message for time travel with 
non-existing table
8bf5a5bca3f is described below

commit 8bf5a5bca3f9f7db78182d14e56476d384f442fa
Author: Wenchen Fan 
AuthorDate: Mon Oct 9 22:15:45 2023 +0300

[SPARK-45383][SQL] Fix error message for time travel with non-existing table

### What changes were proposed in this pull request?

Fixes a small bug to report `TABLE_OR_VIEW_NOT_FOUND` error correctly for 
time travel. It was missed before because `RelationTimeTravel` is a leaf node 
but it may contain `UnresolvedRelation`.

### Why are the changes needed?

bug fix

### Does this PR introduce _any_ user-facing change?

Yes, the error message becomes reasonable

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43298 from cloud-fan/time-travel.

Authored-by: Wenchen Fan 
Signed-off-by: Max Gekk 
(cherry picked from commit ced321c8b5a32c69dfb2841d4bec8a03f21b8038)
Signed-off-by: Max Gekk 
---
 .../apache/spark/sql/catalyst/analysis/CheckAnalysis.scala|  4 
 .../org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala | 11 +++
 2 files changed, 15 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index 511f3622e7e..533ea8a2b79 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -365,6 +365,9 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
 })
 
 operator match {
+  case RelationTimeTravel(u: UnresolvedRelation, _, _) =>
+u.tableNotFound(u.multipartIdentifier)
+
   case etw: EventTimeWatermark =>
 etw.eventTime.dataType match {
   case s: StructType
@@ -377,6 +380,7 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
 "eventName" -> toSQLId(etw.eventTime.name),
 "eventType" -> toSQLType(etw.eventTime.dataType)))
 }
+
   case f: Filter if f.condition.dataType != BooleanType =>
 f.failAnalysis(
   errorClass = "DATATYPE_MISMATCH.FILTER_NOT_BOOLEAN",
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
index 06f5600e0d1..7745e9c0a4e 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
@@ -3014,6 +3014,17 @@ class DataSourceV2SQLSuiteV1Filter
 sqlState = None,
 parameters = Map("relationId" -> "`x`"))
 
+  checkError(
+exception = intercept[AnalysisException] {
+  sql("SELECT * FROM non_exist VERSION AS OF 1")
+},
+errorClass = "TABLE_OR_VIEW_NOT_FOUND",
+parameters = Map("relationName" -> "`non_exist`"),
+context = ExpectedContext(
+  fragment = "non_exist",
+  start = 14,
+  stop = 22))
+
   val subquery1 = "SELECT 1 FROM non_exist"
   checkError(
 exception = intercept[AnalysisException] {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-45383][SQL] Fix error message for time travel with non-existing table

2023-10-09 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ced321c8b5a [SPARK-45383][SQL] Fix error message for time travel with 
non-existing table
ced321c8b5a is described below

commit ced321c8b5a32c69dfb2841d4bec8a03f21b8038
Author: Wenchen Fan 
AuthorDate: Mon Oct 9 22:15:45 2023 +0300

[SPARK-45383][SQL] Fix error message for time travel with non-existing table

### What changes were proposed in this pull request?

Fixes a small bug to report `TABLE_OR_VIEW_NOT_FOUND` error correctly for 
time travel. It was missed before because `RelationTimeTravel` is a leaf node 
but it may contain `UnresolvedRelation`.

### Why are the changes needed?

bug fix

### Does this PR introduce _any_ user-facing change?

Yes, the error message becomes reasonable

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43298 from cloud-fan/time-travel.

Authored-by: Wenchen Fan 
Signed-off-by: Max Gekk 
---
 .../apache/spark/sql/catalyst/analysis/CheckAnalysis.scala|  4 
 .../org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala | 11 +++
 2 files changed, 15 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index e140625f47a..611dd7b3009 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -384,6 +384,9 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
 })
 
 operator match {
+  case RelationTimeTravel(u: UnresolvedRelation, _, _) =>
+u.tableNotFound(u.multipartIdentifier)
+
   case etw: EventTimeWatermark =>
 etw.eventTime.dataType match {
   case s: StructType
@@ -396,6 +399,7 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
 "eventName" -> toSQLId(etw.eventTime.name),
 "eventType" -> toSQLType(etw.eventTime.dataType)))
 }
+
   case f: Filter if f.condition.dataType != BooleanType =>
 f.failAnalysis(
   errorClass = "DATATYPE_MISMATCH.FILTER_NOT_BOOLEAN",
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
index ae639b272a2..047bc8de739 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
@@ -3014,6 +3014,17 @@ class DataSourceV2SQLSuiteV1Filter
 sqlState = None,
 parameters = Map("relationId" -> "`x`"))
 
+  checkError(
+exception = intercept[AnalysisException] {
+  sql("SELECT * FROM non_exist VERSION AS OF 1")
+},
+errorClass = "TABLE_OR_VIEW_NOT_FOUND",
+parameters = Map("relationName" -> "`non_exist`"),
+context = ExpectedContext(
+  fragment = "non_exist",
+  start = 14,
+  stop = 22))
+
   val subquery1 = "SELECT 1 FROM non_exist"
   checkError(
 exception = intercept[AnalysisException] {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-45412][PYTHON][CONNECT][FOLLOW-UP] Remove unnecessary check

2023-10-09 Thread beliefer
This is an automated email from the ASF dual-hosted git repository.

beliefer pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f5cf3101991 [SPARK-45412][PYTHON][CONNECT][FOLLOW-UP] Remove 
unnecessary check
f5cf3101991 is described below

commit f5cf310199132f62b779e0244d15f7680e2ba856
Author: Ruifeng Zheng 
AuthorDate: Mon Oct 9 18:07:59 2023 +0800

[SPARK-45412][PYTHON][CONNECT][FOLLOW-UP] Remove unnecessary check

### What changes were proposed in this pull request?
Remove unnecessary check

### Why are the changes needed?
https://github.com/apache/spark/pull/43215 already validates the plan in 
`__init__`

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #43287 from zhengruifeng/SPARK-45412-followup.

Authored-by: Ruifeng Zheng 
Signed-off-by: Jiaan Geng 
---
 python/pyspark/sql/connect/dataframe.py | 16 +---
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/python/pyspark/sql/connect/dataframe.py 
b/python/pyspark/sql/connect/dataframe.py
index 2c0a75fad46..4044fab3bb3 100644
--- a/python/pyspark/sql/connect/dataframe.py
+++ b/python/pyspark/sql/connect/dataframe.py
@@ -169,7 +169,6 @@ class DataFrame:
 
 @property
 def write(self) -> "DataFrameWriter":
-assert self._plan is not None
 return DataFrameWriter(self._plan, self._session)
 
 write.__doc__ = PySparkDataFrame.write.__doc__
@@ -1096,11 +1095,6 @@ class DataFrame:
 union.__doc__ = PySparkDataFrame.union.__doc__
 
 def unionAll(self, other: "DataFrame") -> "DataFrame":
-if other._plan is None:
-raise PySparkValueError(
-error_class="MISSING_VALID_PLAN",
-message_parameters={"operator": "Union"},
-)
 self._check_same_session(other)
 return DataFrame.withPlan(
 plan.SetOperation(self._plan, other._plan, "union", is_all=True), 
session=self._session
@@ -2030,8 +2024,6 @@ class DataFrame:
 mapInArrow.__doc__ = PySparkDataFrame.mapInArrow.__doc__
 
 def foreach(self, f: Callable[[Row], None]) -> None:
-assert self._plan is not None
-
 def foreach_func(row: Any) -> None:
 f(row)
 
@@ -2042,8 +2034,6 @@ class DataFrame:
 foreach.__doc__ = PySparkDataFrame.foreach.__doc__
 
 def foreachPartition(self, f: Callable[[Iterator[Row]], None]) -> None:
-assert self._plan is not None
-
 schema = self.schema
 field_converters = [
 ArrowTableToRowsConversion._create_converter(f.dataType) for f in 
schema.fields
@@ -2069,14 +2059,12 @@ class DataFrame:
 
 @property
 def writeStream(self) -> DataStreamWriter:
-assert self._plan is not None
 return DataStreamWriter(plan=self._plan, session=self._session)
 
 writeStream.__doc__ = PySparkDataFrame.writeStream.__doc__
 
 def sameSemantics(self, other: "DataFrame") -> bool:
-assert self._plan is not None
-assert other._plan is not None
+self._check_same_session(other)
 return self._session.client.same_semantics(
 plan=self._plan.to_proto(self._session.client),
 other=other._plan.to_proto(other._session.client),
@@ -2085,7 +2073,6 @@ class DataFrame:
 sameSemantics.__doc__ = PySparkDataFrame.sameSemantics.__doc__
 
 def semanticHash(self) -> int:
-assert self._plan is not None
 return self._session.client.semantic_hash(
 plan=self._plan.to_proto(self._session.client),
 )
@@ -2093,7 +2080,6 @@ class DataFrame:
 semanticHash.__doc__ = PySparkDataFrame.semanticHash.__doc__
 
 def writeTo(self, table: str) -> "DataFrameWriterV2":
-assert self._plan is not None
 return DataFrameWriterV2(self._plan, self._session, table)
 
 writeTo.__doc__ = PySparkDataFrame.writeTo.__doc__


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-45459][SQL][TESTS][DOCS] Remove the last 2 extra spaces in the automatically generated `sql-error-conditions.md` file

2023-10-09 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 4841a404be3 [SPARK-45459][SQL][TESTS][DOCS] Remove the last 2 extra 
spaces in the automatically generated `sql-error-conditions.md` file
4841a404be3 is described below

commit 4841a404be3c37fc16031a0119b321eefcb2faab
Author: panbingkun 
AuthorDate: Mon Oct 9 12:32:14 2023 +0300

[SPARK-45459][SQL][TESTS][DOCS] Remove the last 2 extra spaces in the 
automatically generated `sql-error-conditions.md` file

### What changes were proposed in this pull request?
The pr aims to remove the last 2 extra spaces in the automatically 
generated `sql-error-conditions.md` file.

### Why are the changes needed?
- When I am work on another PR, I use the following command:
```
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt \
"core/testOnly *SparkThrowableSuite -- -t \"Error classes match 
with document\""
```
  I found that in the automatically generated `sql-error-conditions.md` 
file, there are 2 extra spaces added at the end,
Obviously, this is not what we expected, otherwise we would need to 
manually remove it, which is not in line with automation.

- The git tells us this difference, as follows:
https://github.com/apache/spark/assets/15246973/a68b657f-3a00-4405-9623-1f7ab9d44d82;>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.
- Manually test.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43274 from panbingkun/SPARK-45459.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
(cherry picked from commit af800b505956ff26e03c5fc56b6cb4ac5c0efe2f)
Signed-off-by: Max Gekk 
---
 core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
index 0249cde5488..299bcea3f9e 100644
--- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
@@ -253,8 +253,7 @@ class SparkThrowableSuite extends SparkFunSuite {
  |
  |Also see [SQLSTATE Codes](sql-error-conditions-sqlstates.html).
  |
- |$sqlErrorParentDocContent
- |""".stripMargin
+ |$sqlErrorParentDocContent""".stripMargin
 
 errors.filter(_._2.subClass.isDefined).foreach(error => {
   val name = error._1
@@ -316,7 +315,7 @@ class SparkThrowableSuite extends SparkFunSuite {
 }
 FileUtils.writeStringToFile(
   parentDocPath.toFile,
-  sqlErrorParentDoc + lineSeparator,
+  sqlErrorParentDoc,
   StandardCharsets.UTF_8)
   }
 } else {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (4493b431192 -> af800b50595)

2023-10-09 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4493b431192 [SPARK-45424][SQL] Fix TimestampFormatter return optional 
parse results when only prefix match
 add af800b50595 [SPARK-45459][SQL][TESTS][DOCS] Remove the last 2 extra 
spaces in the automatically generated `sql-error-conditions.md` file

No new revisions were added by this update.

Summary of changes:
 core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match

2023-10-09 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 5f8ae9a3dbd [SPARK-45424][SQL] Fix TimestampFormatter return optional 
parse results when only prefix match
5f8ae9a3dbd is described below

commit 5f8ae9a3dbd2c7624bffd588483c9916c302c081
Author: Jia Fan 
AuthorDate: Mon Oct 9 12:30:20 2023 +0300

[SPARK-45424][SQL] Fix TimestampFormatter return optional parse results 
when only prefix match

### What changes were proposed in this pull request?
When use custom pattern to parse timestamp, if there have matched prefix, 
not matched all. The `Iso8601TimestampFormatter::parseOptional` and 
`Iso8601TimestampFormatter::parseWithoutTimeZoneOptional` should not return not 
empty result.
eg: pattern = `-MM-dd HH:mm:ss`, value = `-12-31 23:59:59.999`. If 
fact, `-MM-dd HH:mm:ss` can parse `-12-31 23:59:59`  normally, but 
value have suffix `.999`. so we can't return not empty result.
This bug will affect inference the schema in CSV/JSON.

### Why are the changes needed?
Fix inference the schema bug.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add new test.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43245 from Hisoka-X/SPARK-45424-inference-schema-unresolved.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
(cherry picked from commit 4493b431192fcdbab1379b7ffb89eea0cdaa19f1)
Signed-off-by: Max Gekk 
---
 .../apache/spark/sql/catalyst/util/TimestampFormatter.scala| 10 ++
 .../spark/sql/catalyst/util/TimestampFormatterSuite.scala  | 10 ++
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
index 8a288d0e9f3..55eee41c14c 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
@@ -167,8 +167,9 @@ class Iso8601TimestampFormatter(
 
   override def parseOptional(s: String): Option[Long] = {
 try {
-  val parsed = formatter.parseUnresolved(s, new ParsePosition(0))
-  if (parsed != null) {
+  val parsePosition = new ParsePosition(0)
+  val parsed = formatter.parseUnresolved(s, parsePosition)
+  if (parsed != null && s.length == parsePosition.getIndex) {
 Some(extractMicros(parsed))
   } else {
 None
@@ -196,8 +197,9 @@ class Iso8601TimestampFormatter(
 
   override def parseWithoutTimeZoneOptional(s: String, allowTimeZone: 
Boolean): Option[Long] = {
 try {
-  val parsed = formatter.parseUnresolved(s, new ParsePosition(0))
-  if (parsed != null) {
+  val parsePosition = new ParsePosition(0)
+  val parsed = formatter.parseUnresolved(s, parsePosition)
+  if (parsed != null && s.length == parsePosition.getIndex) {
 Some(extractMicrosNTZ(s, parsed, allowTimeZone))
   } else {
 None
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
index eb173bc7f8c..2134a0d6ecd 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
@@ -507,4 +507,14 @@ class TimestampFormatterSuite extends 
DatetimeFormatterSuite {
 assert(simpleFormatter.parseOptional("abc").isEmpty)
 
   }
+
+  test("SPARK-45424: do not return optional parse results when only prefix 
match") {
+val formatter = new Iso8601TimestampFormatter(
+  "-MM-dd HH:mm:ss",
+  locale = DateFormatter.defaultLocale,
+  legacyFormat = LegacyDateFormats.SIMPLE_DATE_FORMAT,
+  isParsing = true, zoneId = DateTimeTestUtils.LA)
+assert(formatter.parseOptional("-12-31 23:59:59.999").isEmpty)
+assert(formatter.parseWithoutTimeZoneOptional("-12-31 23:59:59.999", 
true).isEmpty)
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match

2023-10-09 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4493b431192 [SPARK-45424][SQL] Fix TimestampFormatter return optional 
parse results when only prefix match
4493b431192 is described below

commit 4493b431192fcdbab1379b7ffb89eea0cdaa19f1
Author: Jia Fan 
AuthorDate: Mon Oct 9 12:30:20 2023 +0300

[SPARK-45424][SQL] Fix TimestampFormatter return optional parse results 
when only prefix match

### What changes were proposed in this pull request?
When use custom pattern to parse timestamp, if there have matched prefix, 
not matched all. The `Iso8601TimestampFormatter::parseOptional` and 
`Iso8601TimestampFormatter::parseWithoutTimeZoneOptional` should not return not 
empty result.
eg: pattern = `-MM-dd HH:mm:ss`, value = `-12-31 23:59:59.999`. If 
fact, `-MM-dd HH:mm:ss` can parse `-12-31 23:59:59`  normally, but 
value have suffix `.999`. so we can't return not empty result.
This bug will affect inference the schema in CSV/JSON.

### Why are the changes needed?
Fix inference the schema bug.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add new test.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43245 from Hisoka-X/SPARK-45424-inference-schema-unresolved.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
---
 .../apache/spark/sql/catalyst/util/TimestampFormatter.scala| 10 ++
 .../spark/sql/catalyst/util/TimestampFormatterSuite.scala  | 10 ++
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
index 8a288d0e9f3..55eee41c14c 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
@@ -167,8 +167,9 @@ class Iso8601TimestampFormatter(
 
   override def parseOptional(s: String): Option[Long] = {
 try {
-  val parsed = formatter.parseUnresolved(s, new ParsePosition(0))
-  if (parsed != null) {
+  val parsePosition = new ParsePosition(0)
+  val parsed = formatter.parseUnresolved(s, parsePosition)
+  if (parsed != null && s.length == parsePosition.getIndex) {
 Some(extractMicros(parsed))
   } else {
 None
@@ -196,8 +197,9 @@ class Iso8601TimestampFormatter(
 
   override def parseWithoutTimeZoneOptional(s: String, allowTimeZone: 
Boolean): Option[Long] = {
 try {
-  val parsed = formatter.parseUnresolved(s, new ParsePosition(0))
-  if (parsed != null) {
+  val parsePosition = new ParsePosition(0)
+  val parsed = formatter.parseUnresolved(s, parsePosition)
+  if (parsed != null && s.length == parsePosition.getIndex) {
 Some(extractMicrosNTZ(s, parsed, allowTimeZone))
   } else {
 None
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
index ecd849dd3af..d2fc89a034f 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala
@@ -491,4 +491,14 @@ class TimestampFormatterSuite extends 
DatetimeFormatterSuite {
 assert(simpleFormatter.parseOptional("abc").isEmpty)
 
   }
+
+  test("SPARK-45424: do not return optional parse results when only prefix 
match") {
+val formatter = new Iso8601TimestampFormatter(
+  "-MM-dd HH:mm:ss",
+  locale = DateFormatter.defaultLocale,
+  legacyFormat = LegacyDateFormats.SIMPLE_DATE_FORMAT,
+  isParsing = true, zoneId = DateTimeTestUtils.LA)
+assert(formatter.parseOptional("-12-31 23:59:59.999").isEmpty)
+assert(formatter.parseWithoutTimeZoneOptional("-12-31 23:59:59.999", 
true).isEmpty)
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



Re: [PR] Add canonical links to the PySpark docs page for published docs [spark-website]

2023-10-09 Thread via GitHub


panbingkun commented on PR #482:
URL: https://github.com/apache/spark-website/pull/482#issuecomment-1752580071

   At present, this PR has only made changes to the historical document of 
version 3.1.1. If there are no issues with similar modifications, I will use 
tools to make similar modifications to other versions.
   
   @zhengruifeng @allisonport-db @MrPowers @HyukjinKwon


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[PR] Add canonical links to the PySpark docs page for published docs [spark-website]

2023-10-09 Thread via GitHub


panbingkun opened a new pull request, #482:
URL: https://github.com/apache/spark-website/pull/482

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org