date:20230529

[spark] branch master updated: [SPARK-43030][SQL][FOLLOWUP] CTE ref should keep the output attributes duplicated when renew

2023-05-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 84431a6 [SPARK-43030][SQL][FOLLOWUP] CTE ref should keep the 
output attributes duplicated when renew
84431a6 is described below

commit 84431a6e4631afca22b3a4763eabeeef1398
Author: Wenchen Fan 
AuthorDate: Tue May 30 13:51:38 2023 +0800

[SPARK-43030][SQL][FOLLOWUP] CTE ref should keep the output attributes 
duplicated when renew

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/40662 to fix a 
regression.

`CTERelationRef` inherits the output attributes from a query, which may 
contain duplicated attributes, for queries like SELECT a, a FROM t. It's 
important to keep the duplicated attributes to have the same id in the new 
instance, as column resolution allows more than one matching attribute if their 
ids are the same.

For example, `Project('a, CTERelationRef(a#1, a#1))` can be resolved 
properly as the matching attributes a have the same id, but `Project('a, 
CTERelationRef(a#2, a#3))` can't be resolved.

### Why are the changes needed?

bug fix

### Does this PR introduce _any_ user-facing change?

the bug is not released yet.

### How was this patch tested?

new test

Closes #41363 from cloud-fan/fix.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
---
 .../sql/catalyst/plans/logical/basicLogicalOperators.scala   | 12 +++-
 .../apache/spark/sql/catalyst/analysis/AnalysisSuite.scala   | 11 +++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
index ceed7b0cc54..4bde26a7d6e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
@@ -844,7 +844,17 @@ case class CTERelationRef(
 
   override lazy val resolved: Boolean = _resolved
 
-  override def newInstance(): LogicalPlan = copy(output = 
output.map(_.newInstance()))
+  override def newInstance(): LogicalPlan = {
+// CTERelationRef inherits the output attributes from a query, which may 
contain duplicated
+// attributes, for queries like `SELECT a, a FROM t`. It's important to 
keep the duplicated
+// attributes to have the same id in the new instance, as column 
resolution allows more than one
+// matching attributes if their ids are the same.
+// For example, `Project('a, CTERelationRef(a#1, a#1))` can be resolved 
properly as the matching
+// attributes `a` have the same id, but `Project('a, CTERelationRef(a#2, 
a#3))` can't be
+// resolved.
+val oldAttrToNewAttr = 
AttributeMap(output.zip(output.map(_.newInstance(
+copy(output = output.map(attr => oldAttrToNewAttr(attr)))
+  }
 
   def withNewStats(statsOpt: Option[Statistics]): CTERelationRef = 
copy(statsOpt = statsOpt)
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
index 1029f7f8fab..e1050e91e59 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
@@ -1500,6 +1500,17 @@ class AnalysisSuite extends AnalysisTest with Matchers {
   assert(refs.map(_.output).distinct.length == 2)
 }
 
+withClue("CTE relation has duplicated attributes") {
+  val cteDef = CTERelationDef(testRelation.select($"a", $"a"))
+  val cteRef = CTERelationRef(cteDef.id, false, Nil)
+  val plan = WithCTE(cteRef.join(cteRef.select($"a")), Seq(cteDef)).analyze
+  val refs = plan.collect {
+case r: CTERelationRef => r
+  }
+  assert(refs.length == 2)
+  assert(refs.map(_.output).distinct.length == 2)
+}
+
 withClue("references in both CTE relation definition and main query") {
   val cteDef2 = CTERelationDef(cteRef.where($"a" > 2))
   val cteRef2 = CTERelationRef(cteDef2.id, false, Nil)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43353][PYTHON] Migrate remaining session errors into error class

2023-05-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ff2b9c2ddc0 [SPARK-43353][PYTHON] Migrate remaining session errors 
into error class
ff2b9c2ddc0 is described below

commit ff2b9c2ddc07c02f6ba4c68a2fd66243919acfb6
Author: itholic 
AuthorDate: Tue May 30 13:49:45 2023 +0900

[SPARK-43353][PYTHON] Migrate remaining session errors into error class

### What changes were proposed in this pull request?

This PR proposes to migrate remaining Spark session errors into error class

### Why are the changes needed?

To leverage PySpark error framework.

### Does this PR introduce _any_ user-facing change?

No API changes.

### How was this patch tested?

The existing CI should pass.

Closes #41031 from itholic/error_session.

Authored-by: itholic 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/errors/error_classes.py | 20 ++
 python/pyspark/sql/session.py  | 48 +++---
 2 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/python/pyspark/errors/error_classes.py 
b/python/pyspark/errors/error_classes.py
index 817b8ce60db..2d82d03eb6d 100644
--- a/python/pyspark/errors/error_classes.py
+++ b/python/pyspark/errors/error_classes.py
@@ -89,6 +89,11 @@ ERROR_CLASSES_JSON = """
   "Cannot convert  into ."
 ]
   },
+  "CANNOT_DETERMINE_TYPE": {
+"message": [
+  "Some of types cannot be determined after inferring."
+]
+  },
   "CANNOT_GET_BATCH_ID": {
 "message": [
   "Could not get batch id from ."
@@ -470,6 +475,11 @@ ERROR_CLASSES_JSON = """
   "Argument `` should be a list[str], got ."
 ]
   },
+  "NOT_LIST_OR_NONE_OR_STRUCT" : {
+"message" : [
+  "Argument `` should be a list, None or StructType, got 
."
+]
+  },
   "NOT_LIST_OR_STR_OR_TUPLE" : {
 "message" : [
   "Argument `` should be a list, str or tuple, got ."
@@ -576,6 +586,11 @@ ERROR_CLASSES_JSON = """
   "Result vector from pandas_udf was not the required length: expected 
, got ."
 ]
   },
+  "SESSION_ALREADY_EXIST" : {
+"message" : [
+  "Cannot start a remote Spark session because there is a regular Spark 
session already running."
+]
+  },
   "SESSION_NOT_SAME" : {
 "message" : [
   "Both Datasets must belong to the same SparkSession."
@@ -586,6 +601,11 @@ ERROR_CLASSES_JSON = """
   "There should not be an existing Spark Session or Spark Context."
 ]
   },
+  "SHOULD_NOT_DATAFRAME": {
+"message": [
+  "Argument `` should not be a DataFrame."
+]
+  },
   "SLICE_WITH_STEP" : {
 "message" : [
   "Slice with step is not supported."
diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index df970a0bf37..e96dc9cee3f 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -64,6 +64,7 @@ from pyspark.sql.types import (
 )
 from pyspark.errors.exceptions.captured import install_exception_handler
 from pyspark.sql.utils import is_timestamp_ntz_preferred, to_str
+from pyspark.errors import PySparkValueError, PySparkTypeError
 
 if TYPE_CHECKING:
 from pyspark.sql._typing import AtomicValue, RowLike, OptionalPrimitiveType
@@ -873,7 +874,10 @@ class SparkSession(SparkConversionMixin):
 :class:`pyspark.sql.types.StructType`
 """
 if not data:
-raise ValueError("can not infer schema from empty dataset")
+raise PySparkValueError(
+error_class="CANNOT_INFER_EMPTY_SCHEMA",
+message_parameters={},
+)
 infer_dict_as_struct = self._jconf.inferDictAsStruct()
 infer_array_from_first_element = 
self._jconf.legacyInferArrayTypeFromFirstElement()
 prefer_timestamp_ntz = is_timestamp_ntz_preferred()
@@ -891,7 +895,10 @@ class SparkSession(SparkConversionMixin):
 ),
 )
 if _has_nulltype(schema):
-raise ValueError("Some of types cannot be determined after 
inferring")
+raise PySparkValueError(
+error_class="CANNOT_DETERMINE_TYPE",
+message_parameters={},
+)
 return schema
 
 def _inferSchema(
@@ -917,7 +924,10 @@ class SparkSession(SparkConversionMixin):
 """
 first = rdd.first()
 if isinstance(first, Sized) and len(first) == 0:
-raise ValueError("The first row in RDD is empty, can not infer 
schema")
+raise PySparkValueError(
+error_class="CANNOT_INFER_EMPTY_SCHEMA",
+message_parameters={},
+)
 
 infer_dict_as_struct = self._jconf.inferDictAsStruct()
 infer_array_from_first_element = 
self._jconf.legacyInferArrayTypeFromFirstElement()
@@ -944,9 +954,9

[spark] branch master updated: [SPARK-43603][PS][CONNECT][TESTS][FOLLOW-UP] Delete unused `test_parity_dataframe.py`

2023-05-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dada63c24ec [SPARK-43603][PS][CONNECT][TESTS][FOLLOW-UP] Delete unused 
`test_parity_dataframe.py`
dada63c24ec is described below

commit dada63c24ecbe1b65dc4dac6632b34db05fbef98
Author: Ruifeng Zheng 
AuthorDate: Tue May 30 12:05:16 2023 +0900

[SPARK-43603][PS][CONNECT][TESTS][FOLLOW-UP] Delete unused 
`test_parity_dataframe.py`

### What changes were proposed in this pull request?

`pyspark.pandas.tests.connect.test_parity_dataframe` had been split in 
https://github.com/apache/spark/commit/cb5bd57aac40c06921321929df00d2086e37ba34,
 but I forgot to remove the test file.
this PR removes unused `test_parity_dataframe.py`

### Why are the changes needed?
`test_parity_dataframe.py` is no longer needed

### Does this PR introduce _any_ user-facing change?
No, test-only

### How was this patch tested?
CI

Closes #41372 from zhengruifeng/reorg_ps_df_tests_followup.

Authored-by: Ruifeng Zheng 
Signed-off-by: Hyukjin Kwon 
---
 .../pandas/tests/connect/test_parity_dataframe.py  | 154 -
 1 file changed, 154 deletions(-)

diff --git a/python/pyspark/pandas/tests/connect/test_parity_dataframe.py 
b/python/pyspark/pandas/tests/connect/test_parity_dataframe.py
deleted file mode 100644
index c1b9ae2ee11..000
--- a/python/pyspark/pandas/tests/connect/test_parity_dataframe.py
+++ /dev/null
@@ -1,154 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-import unittest
-
-from pyspark import pandas as ps
-from pyspark.pandas.tests.test_dataframe import DataFrameTestsMixin
-from pyspark.testing.connectutils import ReusedConnectTestCase
-from pyspark.testing.pandasutils import PandasOnSparkTestUtils
-
-
-class DataFrameParityTests(DataFrameTestsMixin, PandasOnSparkTestUtils, 
ReusedConnectTestCase):
-@property
-def psdf(self):
-return ps.from_pandas(self.pdf)
-
-@unittest.skip(
-"TODO(SPARK-43610): Enable `InternalFrame.attach_distributed_column` 
in Spark Connect."
-)
-def test_aggregate(self):
-super().test_aggregate()
-
-@unittest.skip("TODO(SPARK-41876): Implement DataFrame `toLocalIterator`")
-def test_iterrows(self):
-super().test_iterrows()
-
-@unittest.skip("TODO(SPARK-41876): Implement DataFrame `toLocalIterator`")
-def test_itertuples(self):
-super().test_itertuples()
-
-@unittest.skip(
-"TODO(SPARK-43611): Fix unexpected `AnalysisException` from Spark 
Connect client."
-)
-def test_cummax(self):
-super().test_cummax()
-
-@unittest.skip(
-"TODO(SPARK-43611): Fix unexpected `AnalysisException` from Spark 
Connect client."
-)
-def test_cummax_multiindex_columns(self):
-super().test_cummax_multiindex_columns()
-
-@unittest.skip(
-"TODO(SPARK-43611): Fix unexpected `AnalysisException` from Spark 
Connect client."
-)
-def test_cummin(self):
-super().test_cummin()
-
-@unittest.skip(
-"TODO(SPARK-43611): Fix unexpected `AnalysisException` from Spark 
Connect client."
-)
-def test_cummin_multiindex_columns(self):
-super().test_cummin_multiindex_columns()
-
-@unittest.skip(
-"TODO(SPARK-43611): Fix unexpected `AnalysisException` from Spark 
Connect client."
-)
-def test_cumprod(self):
-super().test_cumprod()
-
-@unittest.skip(
-"TODO(SPARK-43611): Fix unexpected `AnalysisException` from Spark 
Connect client."
-)
-def test_cumprod_multiindex_columns(self):
-super().test_cumprod_multiindex_columns()
-
-@unittest.skip(
-"TODO(SPARK-43611): Fix unexpected `AnalysisException` from Spark 
Connect client."
-)
-def test_cumsum(self):
-super().test_cumsum()
-
-@unittest.skip(
-"TODO(SPARK-43611): Fix unexpected `AnalysisException` from Spark 
Connect client."
-)
-def test_cumsum_multiindex_columns(self):
-

[spark] branch master updated (1da7b7f9c21 -> 014dd357656)

2023-05-29 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1da7b7f9c21 [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0
 add 014dd357656 [SPARK-43863][CONNECT] Remove redundant `toSeq` from 
`SparkConnectPlanner` for Scala 2.13

No new revisions were added by this update.

Summary of changes:
 .../sql/connect/planner/SparkConnectPlanner.scala  | 33 +++---
 1 file changed, 17 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

2023-05-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1da7b7f9c21 [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0
1da7b7f9c21 is described below

commit 1da7b7f9c21f4b1981e9c52ed88d71a6b317f104
Author: itholic 
AuthorDate: Tue May 30 09:02:54 2023 +0900

[SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

### What changes were proposed in this pull request?

This PR proposes to upgrade pandas to 2.0.0.

### Why are the changes needed?

To support latest pandas.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Addressed the existing UTs.

Closes #41211 from itholic/pandas_2.

Authored-by: itholic 
Signed-off-by: Hyukjin Kwon 
---
 dev/infra/Dockerfile   |   4 +-
 python/pyspark/mlv2/tests/test_feature.py  |  10 +
 python/pyspark/mlv2/tests/test_summarizer.py   |   6 +
 python/pyspark/pandas/base.py  |  14 +-
 python/pyspark/pandas/frame.py |  11 +-
 python/pyspark/pandas/generic.py   |   6 +-
 python/pyspark/pandas/groupby.py   |   8 +-
 python/pyspark/pandas/indexes/base.py  |  63 +++---
 python/pyspark/pandas/indexes/category.py  |   2 +-
 python/pyspark/pandas/indexes/datetimes.py |  63 +++---
 python/pyspark/pandas/indexes/numeric.py   |  12 +-
 python/pyspark/pandas/namespace.py |  22 +-
 python/pyspark/pandas/series.py|  10 +-
 python/pyspark/pandas/spark/accessors.py   |   9 +-
 python/pyspark/pandas/strings.py   |  30 +--
 python/pyspark/pandas/supported_api_gen.py |   2 +-
 .../pandas/tests/computation/test_any_all.py   |   4 +
 .../pandas/tests/computation/test_combine.py   |   4 +
 .../pandas/tests/computation/test_compute.py   |  13 ++
 .../pyspark/pandas/tests/computation/test_cov.py   |   4 +
 .../pandas/tests/computation/test_describe.py  |   8 +
 .../pandas/tests/data_type_ops/test_date_ops.py|  10 +
 .../pyspark/pandas/tests/frame/test_reindexing.py  |   4 +
 python/pyspark/pandas/tests/indexes/test_base.py   | 237 +
 .../pyspark/pandas/tests/indexes/test_category.py  |  13 ++
 .../pyspark/pandas/tests/indexes/test_datetime.py  |  10 +
 .../pyspark/pandas/tests/indexes/test_indexing.py  |   5 +
 .../pyspark/pandas/tests/indexes/test_reindex.py   |   5 +
 .../pyspark/pandas/tests/indexes/test_timedelta.py |   6 +
 .../tests/plot/test_frame_plot_matplotlib.py   |  56 +
 python/pyspark/pandas/tests/test_categorical.py|  22 ++
 python/pyspark/pandas/tests/test_csv.py|   6 +
 .../pandas/tests/test_dataframe_conversion.py  |   5 +
 python/pyspark/pandas/tests/test_groupby.py|  38 
 python/pyspark/pandas/tests/test_groupby_slow.py   |   9 +
 python/pyspark/pandas/tests/test_namespace.py  |   5 +
 .../pandas/tests/test_ops_on_diff_frames.py|   5 +
 .../tests/test_ops_on_diff_frames_groupby.py   |  11 +
 .../test_ops_on_diff_frames_groupby_rolling.py |   5 +
 python/pyspark/pandas/tests/test_rolling.py|   9 +
 python/pyspark/pandas/tests/test_series.py |  44 
 .../pyspark/pandas/tests/test_series_conversion.py |   5 +
 .../pyspark/pandas/tests/test_series_datetime.py   |  65 ++
 python/pyspark/pandas/tests/test_series_string.py  |  14 ++
 python/pyspark/pandas/tests/test_stats.py  |  15 ++
 .../pyspark/sql/tests/connect/test_parity_arrow.py |   6 +
 python/pyspark/sql/tests/test_arrow.py |   4 +
 47 files changed, 746 insertions(+), 173 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 189bd606499..888b4e00b39 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -64,8 +64,8 @@ RUN Rscript -e "devtools::install_version('roxygen2', 
version='7.2.0', repos='ht
 # See more in SPARK-39735
 ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
-RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage matplotlib
-RUN python3.9 -m pip install numpy pyarrow 'pandas<=1.5.3' scipy 
unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 
'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
+RUN pypy3 -m pip install numpy 'pandas<=2.0.0' scipy coverage matplotlib
+RUN python3.9 -m pip install numpy pyarrow 'pandas<=2.0.0' scipy 
unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 
'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
 
 # Add Python deps for Spark Connect.
 RUN python3.9 -m pip install grpcio protobuf googleapis-common-protos 
grpcio-status
diff --git a/python/pyspark/mlv2/tests/test_feature.py

[spark] branch master updated: [SPARK-43799][PYTHON] Add descriptor binary option to Pyspark Protobuf API

2023-05-29 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ff9d41abaff [SPARK-43799][PYTHON] Add descriptor binary option to 
Pyspark Protobuf API
ff9d41abaff is described below

commit ff9d41abaffcbd6f0c26ce5be9d2324fe9f01d5c
Author: Raghu Angadi 
AuthorDate: Tue May 30 09:01:32 2023 +0900

[SPARK-43799][PYTHON] Add descriptor binary option to Pyspark Protobuf API

### What changes were proposed in this pull request?
This updated Protobuf Pyspark API to allow passing binary FileDescriptorSet 
rather than a file name. This is a Python follow up to feature implemented in 
Scala in #41192.

### Why are the changes needed?
  - This allows flexibility for Pyspark users to provide binary descriptor 
set directly.
  - Even if users are using file path, Pyspark avoids passing file name to 
Scala and reads the descriptor file in Python. This avoids having to read the 
file in Scala.

### Does this PR introduce _any_ user-facing change?

  - This adds extra arg to `from_protobuf()` and `to_protobuf()` API.

### How was this patch tested?
  - Doc tests
  - Manual tests

Closes #41343 from rangadi/py-proto-file-buffer.

Authored-by: Raghu Angadi 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/protobuf/functions.py | 74 +++-
 1 file changed, 64 insertions(+), 10 deletions(-)

diff --git a/python/pyspark/sql/protobuf/functions.py 
b/python/pyspark/sql/protobuf/functions.py
index a303cf91493..42165938eb7 100644
--- a/python/pyspark/sql/protobuf/functions.py
+++ b/python/pyspark/sql/protobuf/functions.py
@@ -37,13 +37,17 @@ def from_protobuf(
 messageName: str,
 descFilePath: Optional[str] = None,
 options: Optional[Dict[str, str]] = None,
+binaryDescriptorSet: Optional[bytes] = None,
 ) -> Column:
 """
 Converts a binary column of Protobuf format into its corresponding 
catalyst value.
-The Protobuf definition is provided in one of these two ways:
+The Protobuf definition is provided in one of these ways:
 
- Protobuf descriptor file: E.g. a descriptor file created with
   `protoc --include_imports --descriptor_set_out=abc.desc abc.proto`
+   - Protobuf descriptor as binary: Rather than file path as in previous 
option,
+ we can provide the binary content of the file. This allows 
flexibility in how the
+ descriptor set is created and fetched.
- Jar containing Protobuf Java class: The jar containing Java class 
should be shaded.
  Specifically, `com.google.protobuf.*` should be shaded to
  `org.sparkproject.spark_protobuf.protobuf.*`.
@@ -52,6 +56,9 @@ def from_protobuf(
 
 .. versionadded:: 3.4.0
 
+.. versionchanged:: 3.5.0
+Supports `binaryDescriptorSet` arg to pass binary descriptor directly.
+
 Parameters
 --
 data : :class:`~pyspark.sql.Column` or str
@@ -61,9 +68,11 @@ def from_protobuf(
 The Protobuf class name when descFilePath parameter is not set.
 E.g. `com.example.protos.ExampleEvent`.
 descFilePath : str, optional
-The protobuf descriptor file.
+The Protobuf descriptor file.
 options : dict, optional
 options to control how the protobuf record is parsed.
+binaryDescriptorSet: bytes, optional
+The Protobuf `FileDescriptorSet` serialized as binary.
 
 Notes
 -
@@ -92,9 +101,14 @@ def from_protobuf(
 ... proto_df = df.select(
 ... to_protobuf(df.value, message_name, 
desc_file_path).alias("value"))
 ... proto_df.show(truncate=False)
-... proto_df = proto_df.select(
+... proto_df_1 = proto_df.select( # With file name for descriptor
 ... from_protobuf(proto_df.value, message_name, 
desc_file_path).alias("value"))
-... proto_df.show(truncate=False)
+... proto_df_1.show(truncate=False)
+... proto_df_2 = proto_df.select( # With binary for descriptor
+... from_protobuf(proto_df.value, message_name,
+...   binaryDescriptorSet = 
bytearray.fromhex(desc_hex))
+... .alias("value"))
+... proto_df_2.show(truncate=False)
 ++
 |value   |
 ++
@@ -105,6 +119,11 @@ def from_protobuf(
 +--+
 |{2, Alice, 109200}|
 +--+
++--+
+|value |
++--+
+|{2, Alice, 109200}|
++--+
 >>> data = [([(1668035962, 2020)])]
 >>> ddl_schema = "value struct"
 >>> df = spark.createDataFrame(data, ddl_schema)

[spark] branch master updated: [SPARK-43858][INFRA][FOLLOWUP] Restore the Scala version to 2.13 after run benchmark

2023-05-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ce568b7b56b [SPARK-43858][INFRA][FOLLOWUP] Restore the Scala version 
to 2.13 after run benchmark
ce568b7b56b is described below

commit ce568b7b56b0f8ee89045f2de7a1eb6816d8156b
Author: yangjie01 
AuthorDate: Mon May 29 10:15:53 2023 -0700

[SPARK-43858][INFRA][FOLLOWUP] Restore the Scala version to 2.13 after run 
benchmark

### What changes were proposed in this pull request?
This is a followup of https://github.com/apache/spark/pull/41358, this pr 
change to restore Scala version to 2.13 after running benchmark to avoid 
containing changed `pom.xml` in the results tarball.

### Why are the changes needed?
After running benchmark, should restore Scala to default version to results 
tarball includes changed `pom.xml`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

Closes #41371 from LuciferYang/SPARK-43858-FOLLOWUP.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/benchmark.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
index 055f238bc61..f9699ac80f2 100644
--- a/.github/workflows/benchmark.yml
+++ b/.github/workflows/benchmark.yml
@@ -182,7 +182,7 @@ jobs:
   "`find . -name 'spark-core*-SNAPSHOT-tests.jar'`" \
   "${{ github.event.inputs.class }}"
 # Revert to default Scala version to clean up unnecessary git diff
-dev/change-scala-version.sh 2.12
+dev/change-scala-version.sh 2.13
 # To keep the directory structure and file permissions, tar them
 # See also 
https://github.com/actions/upload-artifact#maintaining-file-permissions-and-case-sensitive-files
 echo "Preparing the benchmark results:"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8e18d687bd0 -> 15ffe91f447)

2023-05-29 Thread yangjie01

This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 8e18d687bd0 [SPARK-43858][INFRA] Make benchmark Github Action task use 
Scala 2.13 as default
 add 15ffe91f447 [SPARK-43860][SQL] Enable tail-recursion wherever possible

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/connect/common/LiteralValueProtoConverter.scala | 1 +
 .../scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala | 1 +
 .../spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala | 1 +
 3 files changed, 3 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a9f13e1e54d -> 8e18d687bd0)

2023-05-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from a9f13e1e54d [SPARK-43849][BUILD] Enable unused imports check for Scala 
2.13
 add 8e18d687bd0 [SPARK-43858][INFRA] Make benchmark Github Action task use 
Scala 2.13 as default

No new revisions were added by this update.

Summary of changes:
 .github/workflows/benchmark.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (383d9fb4df8 -> a9f13e1e54d)

2023-05-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 383d9fb4df8 [SPARK-43680][SPARK-43681][SPARK-43682][SPARK-43683][PS] 
Fix `NullOps` for Spark Connect
 add a9f13e1e54d [SPARK-43849][BUILD] Enable unused imports check for Scala 
2.13

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/connect/planner/SparkConnectPlanner.scala   | 4 
 pom.xml  | 8 +++-
 project/SparkBuild.scala | 9 ++---
 .../apache/spark/sql/catalyst/expressions/ExpressionSet.scala| 2 +-
 4 files changed, 14 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43680][SPARK-43681][SPARK-43682][SPARK-43683][PS] Fix `NullOps` for Spark Connect

2023-05-29 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 383d9fb4df8 [SPARK-43680][SPARK-43681][SPARK-43682][SPARK-43683][PS] 
Fix `NullOps` for Spark Connect
383d9fb4df8 is described below

commit 383d9fb4df81429d2d7d31a35f200ff152bf77f6
Author: itholic 
AuthorDate: Mon May 29 19:38:27 2023 +0800

[SPARK-43680][SPARK-43681][SPARK-43682][SPARK-43683][PS] Fix `NullOps` for 
Spark Connect

### What changes were proposed in this pull request?

This PR proposes to fix `NullOps` test for pandas API on Spark with Spark 
Connect.

This includes SPARK-43680, SPARK-43681, SPARK-43682, SPARK-43683 at once, 
because they are all related similar modifications in single file.

### Why are the changes needed?

To support all features for pandas API on Spark with Spark Connect.

### Does this PR introduce _any_ user-facing change?

Yes, `NullOps.lt`,  `NullOps.le`, `NullOps.ge`, `NullOps.gt` are now 
working as expected on Spark Connect.

### How was this patch tested?

Uncomment the UTs, and tested manually.

Closes #41361 from itholic/SPARK-43680-3.

Authored-by: itholic 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/pandas/data_type_ops/null_ops.py| 34 +-
 .../connect/data_type_ops/test_parity_null_ops.py  | 16 --
 2 files changed, 21 insertions(+), 29 deletions(-)

diff --git a/python/pyspark/pandas/data_type_ops/null_ops.py 
b/python/pyspark/pandas/data_type_ops/null_ops.py
index 9205d5e2407..ddd7bddcfbd 100644
--- a/python/pyspark/pandas/data_type_ops/null_ops.py
+++ b/python/pyspark/pandas/data_type_ops/null_ops.py
@@ -30,8 +30,8 @@ from pyspark.pandas.data_type_ops.base import (
 )
 from pyspark.pandas._typing import SeriesOrIndex
 from pyspark.pandas.typedef import pandas_on_spark_type
-from pyspark.sql import Column
 from pyspark.sql.types import BooleanType, StringType
+from pyspark.sql.utils import pyspark_column_op, is_remote
 
 
 class NullOps(DataTypeOps):
@@ -44,28 +44,36 @@ class NullOps(DataTypeOps):
 return "nulls"
 
 def lt(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
-from pyspark.pandas.base import column_op
-
 _sanitize_list_like(right)
-return column_op(Column.__lt__)(left, right)
+result = pyspark_column_op("__lt__")(left, right)
+if is_remote:
+# In Spark Connect, it returns None instead of False, so we 
manually cast it.
+result = result.fillna(False)
+return result
 
 def le(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
-from pyspark.pandas.base import column_op
-
 _sanitize_list_like(right)
-return column_op(Column.__le__)(left, right)
+result = pyspark_column_op("__le__")(left, right)
+if is_remote:
+# In Spark Connect, it returns None instead of False, so we 
manually cast it.
+result = result.fillna(False)
+return result
 
 def ge(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
-from pyspark.pandas.base import column_op
-
 _sanitize_list_like(right)
-return column_op(Column.__ge__)(left, right)
+result = pyspark_column_op("__ge__")(left, right)
+if is_remote:
+# In Spark Connect, it returns None instead of False, so we 
manually cast it.
+result = result.fillna(False)
+return result
 
 def gt(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
-from pyspark.pandas.base import column_op
-
 _sanitize_list_like(right)
-return column_op(Column.__gt__)(left, right)
+result = pyspark_column_op("__gt__")(left, right)
+if is_remote:
+# In Spark Connect, it returns None instead of False, so we 
manually cast it.
+result = result.fillna(False)
+return result
 
 def astype(self, index_ops: IndexOpsLike, dtype: Union[str, type, Dtype]) 
-> IndexOpsLike:
 dtype, spark_type = pandas_on_spark_type(dtype)
diff --git 
a/python/pyspark/pandas/tests/connect/data_type_ops/test_parity_null_ops.py 
b/python/pyspark/pandas/tests/connect/data_type_ops/test_parity_null_ops.py
index 00bfb75087a..1b53a064971 100644
--- a/python/pyspark/pandas/tests/connect/data_type_ops/test_parity_null_ops.py
+++ b/python/pyspark/pandas/tests/connect/data_type_ops/test_parity_null_ops.py
@@ -33,22 +33,6 @@ class NullOpsParityTests(
 def test_eq(self):
 super().test_eq()
 
-@unittest.skip("TODO(SPARK-43680): Fix NullOps.ge to work with Spark 
Connect Column.")
-def test_ge(self):
-super().test_ge()
-
-@unittest.skip("TODO(SPARK-43681): Fix NullOps.gt to work with Spark 
Connect Column.")
-def test_gt(self):
-

[spark] branch master updated: [SPARK-43676][SPARK-43677][SPARK-43678][SPARK-43679][PS] Fix `DatetimeOps` for Spark Connect

2023-05-29 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 373975b8ef3 [SPARK-43676][SPARK-43677][SPARK-43678][SPARK-43679][PS] 
Fix `DatetimeOps` for Spark Connect
373975b8ef3 is described below

commit 373975b8ef35afc80d396e69081173875022a970
Author: itholic 
AuthorDate: Mon May 29 19:36:38 2023 +0800

[SPARK-43676][SPARK-43677][SPARK-43678][SPARK-43679][PS] Fix `DatetimeOps` 
for Spark Connect

### What changes were proposed in this pull request?

This PR proposes to fix `DatetimeOps` test for pandas API on Spark with 
Spark Connect.

This includes SPARK-43676, SPARK-43677, SPARK-43678, SPARK-43679 at once, 
because they are all related similar modifications in single file.

### Why are the changes needed?

To support all features for pandas API on Spark with Spark Connect.

### Does this PR introduce _any_ user-facing change?

Yes, `DatetimeOps.lt`,  `DatetimeOps.le`, `DatetimeOps.ge`, 
`DatetimeOps.gt` are now working as expected on Spark Connect.

### How was this patch tested?

Uncomment the UTs, and tested manually.

Closes #41306 from itholic/SPARK-43676-9.

Authored-by: itholic 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/pandas/data_type_ops/datetime_ops.py | 17 +
 .../connect/data_type_ops/test_parity_datetime_ops.py   | 16 
 2 files changed, 5 insertions(+), 28 deletions(-)

diff --git a/python/pyspark/pandas/data_type_ops/datetime_ops.py 
b/python/pyspark/pandas/data_type_ops/datetime_ops.py
index 5069598..c5f4df96bde 100644
--- a/python/pyspark/pandas/data_type_ops/datetime_ops.py
+++ b/python/pyspark/pandas/data_type_ops/datetime_ops.py
@@ -33,6 +33,7 @@ from pyspark.sql.types import (
 TimestampNTZType,
 NumericType,
 )
+from pyspark.sql.utils import pyspark_column_op
 
 from pyspark.pandas._typing import Dtype, IndexOpsLike, SeriesOrIndex
 from pyspark.pandas.base import IndexOpsMixin
@@ -109,28 +110,20 @@ class DatetimeOps(DataTypeOps):
 raise TypeError("Datetime subtraction can only be applied to 
datetime series.")
 
 def lt(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
-from pyspark.pandas.base import column_op
-
 _sanitize_list_like(right)
-return column_op(Column.__lt__)(left, right)
+return pyspark_column_op("__lt__")(left, right)
 
 def le(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
-from pyspark.pandas.base import column_op
-
 _sanitize_list_like(right)
-return column_op(Column.__le__)(left, right)
+return pyspark_column_op("__le__")(left, right)
 
 def ge(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
-from pyspark.pandas.base import column_op
-
 _sanitize_list_like(right)
-return column_op(Column.__ge__)(left, right)
+return pyspark_column_op("__ge__")(left, right)
 
 def gt(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
-from pyspark.pandas.base import column_op
-
 _sanitize_list_like(right)
-return column_op(Column.__gt__)(left, right)
+return pyspark_column_op("__gt__")(left, right)
 
 def prepare(self, col: pd.Series) -> pd.Series:
 """Prepare column when from_pandas."""
diff --git 
a/python/pyspark/pandas/tests/connect/data_type_ops/test_parity_datetime_ops.py 
b/python/pyspark/pandas/tests/connect/data_type_ops/test_parity_datetime_ops.py
index 697c191b743..6d081b10aba 100644
--- 
a/python/pyspark/pandas/tests/connect/data_type_ops/test_parity_datetime_ops.py
+++ 
b/python/pyspark/pandas/tests/connect/data_type_ops/test_parity_datetime_ops.py
@@ -34,22 +34,6 @@ class DatetimeOpsParityTests(
 def test_astype(self):
 super().test_astype()
 
-@unittest.skip("TODO(SPARK-43676): Fix DatetimeOps.ge to work with Spark 
Connect Column.")
-def test_ge(self):
-super().test_ge()
-
-@unittest.skip("TODO(SPARK-43677): Fix DatetimeOps.gt to work with Spark 
Connect Column.")
-def test_gt(self):
-super().test_gt()
-
-@unittest.skip("TODO(SPARK-43678): Fix DatetimeOps.le to work with Spark 
Connect Column.")
-def test_le(self):
-super().test_le()
-
-@unittest.skip("TODO(SPARK-43679): Fix DatetimeOps.lt to work with Spark 
Connect Column.")
-def test_lt(self):
-super().test_lt()
-
 
 if __name__ == "__main__":
 from pyspark.pandas.tests.connect.data_type_ops.test_parity_datetime_ops 
import *  # noqa: F401


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43779][SQL] ParseToDate should load the EvalMode in main thread

2023-05-29 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 88f69d6f928 [SPARK-43779][SQL] ParseToDate should load the EvalMode in 
main thread
88f69d6f928 is described below

commit 88f69d6f92860823b1a90bc162ebca2b7c8132fc
Author: Rui Wang 
AuthorDate: Mon May 29 16:57:16 2023 +0800

[SPARK-43779][SQL] ParseToDate should load the EvalMode in main thread

### What changes were proposed in this pull request?

ParseToDate should load the EvalMode in main thread instead of loading it 
in a lazy val.

### Why are the changes needed?

This is because it is sometimes hard to estimate when the lazy val is 
executed while the SQLConf where we load the EvalMode is thread local.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UT

Closes #41298 from amaliujia/SPARK-43779.

Authored-by: Rui Wang 
Signed-off-by: Wenchen Fan 
---
 .../catalyst/expressions/datetimeExpressions.scala |  9 +---
 .../sql-tests/analyzer-results/ansi/date.sql.out   |  8 +++
 .../ansi/datetime-parsing-invalid.sql.out  |  4 ++--
 .../sql-tests/analyzer-results/date.sql.out|  8 +++
 .../analyzer-results/datetime-legacy.sql.out   |  8 +++
 .../datetime-parsing-invalid.sql.out   |  4 ++--
 .../analyzer-results/group-by-filter.sql.out   |  8 +++
 .../analyzer-results/postgreSQL/text.sql.out   |  4 ++--
 .../analyzer-results/predicate-functions.sql.out   | 26 +++---
 .../analyzer-results/timestamp-ltz.sql.out |  2 +-
 .../analyzer-results/timestamp-ntz.sql.out |  2 +-
 11 files changed, 43 insertions(+), 40 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
index 57c70f4d9bd..51ddf2b85f8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
@@ -2044,12 +2044,15 @@ case class MonthsBetween(
 case class ParseToDate(
 left: Expression,
 format: Option[Expression],
-timeZoneId: Option[String] = None)
+timeZoneId: Option[String] = None,
+ansiEnabled: Boolean = SQLConf.get.ansiEnabled)
   extends RuntimeReplaceable with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
 
   override lazy val replacement: Expression = format.map { f =>
-Cast(GetTimestamp(left, f, TimestampType, timeZoneId), DateType, 
timeZoneId)
-  }.getOrElse(Cast(left, DateType, timeZoneId)) // backwards compatibility
+Cast(GetTimestamp(left, f, TimestampType, timeZoneId, ansiEnabled), 
DateType, timeZoneId,
+  EvalMode.fromBoolean(ansiEnabled))
+  }.getOrElse(Cast(left, DateType, timeZoneId,
+EvalMode.fromBoolean(ansiEnabled))) // backwards compatibility
 
   def this(left: Expression, format: Expression) = {
 this(left, Option(format))
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/date.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/date.sql.out
index 28fe86d930f..3765d65ec3b 100644
--- a/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/date.sql.out
+++ b/sql/core/src/test/resources/sql-tests/analyzer-results/ansi/date.sql.out
@@ -148,21 +148,21 @@ select UNIX_DATE(DATE('1970-01-01')), 
UNIX_DATE(DATE('2020-12-04')), UNIX_DATE(n
 -- !query
 select to_date(null), to_date('2016-12-31'), to_date('2016-12-31', 
'-MM-dd')
 -- !query analysis
-Project [to_date(cast(null as string), None, Some(America/Los_Angeles)) AS 
to_date(NULL)#x, to_date(2016-12-31, None, Some(America/Los_Angeles)) AS 
to_date(2016-12-31)#x, to_date(2016-12-31, Some(-MM-dd), 
Some(America/Los_Angeles)) AS to_date(2016-12-31, -MM-dd)#x]
+Project [to_date(cast(null as string), None, Some(America/Los_Angeles), true) 
AS to_date(NULL)#x, to_date(2016-12-31, None, Some(America/Los_Angeles), true) 
AS to_date(2016-12-31)#x, to_date(2016-12-31, Some(-MM-dd), 
Some(America/Los_Angeles), true) AS to_date(2016-12-31, -MM-dd)#x]
 +- OneRowRelation
 
 
 -- !query
 select to_date("16", "dd")
 -- !query analysis
-Project [to_date(16, Some(dd), Some(America/Los_Angeles)) AS to_date(16, dd)#x]
+Project [to_date(16, Some(dd), Some(America/Los_Angeles), true) AS to_date(16, 
dd)#x]
 +- OneRowRelation
 
 
 -- !query
 select to_date("02-29", "MM-dd")
 -- !query analysis
-Project [to_date(02-29, Some(MM-dd), Some(America/Los_Angeles)) AS 
to_date(02-29, MM-dd)#x]
+Project [to_date(02-29, Some(MM-dd), Some(America/Los_Angeles), true) AS 
to_date(02-29, MM-dd)#x]
 +-

[spark] branch master updated (27bb384947e -> 8b464df9fcf)

2023-05-29 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 27bb384947e [SPARK-43841][SQL] Handle candidate attributes with no 
prefix in `StringUtils#orderSuggestedIdentifiersBySimilarity`
 add 8b464df9fcf [SPARK-43846][SQL][TESTS] Use checkError() to check 
Exception in SessionCatalogSuite

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/catalog/SessionCatalogSuite.scala | 462 -
 1 file changed, 277 insertions(+), 185 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (31a8ef803a8 -> 27bb384947e)

2023-05-29 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 31a8ef803a8 [SPARK-43821][CONNECT][TESTS] Make the prompt for 
`findJar` method in IntegrationTestUtils clearer
 add 27bb384947e [SPARK-43841][SQL] Handle candidate attributes with no 
prefix in `StringUtils#orderSuggestedIdentifiersBySimilarity`

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/util/StringUtils.scala  |  2 +-
 .../spark/sql/catalyst/util/StringUtilsSuite.scala |  7 ++
 .../sql/errors/QueryCompilationErrorsSuite.scala   | 27 ++
 3 files changed, 35 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43030][SQL][FOLLOWUP] CTE ref should keep the output attributes duplicated when renew

[spark] branch master updated: [SPARK-43353][PYTHON] Migrate remaining session errors into error class

[spark] branch master updated: [SPARK-43603][PS][CONNECT][TESTS][FOLLOW-UP] Delete unused `test_parity_dataframe.py`

[spark] branch master updated (1da7b7f9c21 -> 014dd357656)

[spark] branch master updated: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

[spark] branch master updated: [SPARK-43799][PYTHON] Add descriptor binary option to Pyspark Protobuf API

[spark] branch master updated: [SPARK-43858][INFRA][FOLLOWUP] Restore the Scala version to 2.13 after run benchmark

[spark] branch master updated (8e18d687bd0 -> 15ffe91f447)

[spark] branch master updated (a9f13e1e54d -> 8e18d687bd0)

[spark] branch master updated (383d9fb4df8 -> a9f13e1e54d)

[spark] branch master updated: [SPARK-43680][SPARK-43681][SPARK-43682][SPARK-43683][PS] Fix `NullOps` for Spark Connect

[spark] branch master updated: [SPARK-43676][SPARK-43677][SPARK-43678][SPARK-43679][PS] Fix `DatetimeOps` for Spark Connect

[spark] branch master updated: [SPARK-43779][SQL] ParseToDate should load the EvalMode in main thread

[spark] branch master updated (27bb384947e -> 8b464df9fcf)

[spark] branch master updated (31a8ef803a8 -> 27bb384947e)

15 matches

Site Navigation

Mail list logo

Footer information