(spark) branch master updated: [SPARK-46323][PYTHON] Fix the output name of pyspark.sql.functions.now

2023-12-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e76eb9fa4c42 [SPARK-46323][PYTHON] Fix the output name of 
pyspark.sql.functions.now
e76eb9fa4c42 is described below

commit e76eb9fa4c42da4237e282ce4da8e3e8d1be38ba
Author: Hyukjin Kwon 
AuthorDate: Fri Dec 8 16:58:52 2023 +0900

[SPARK-46323][PYTHON] Fix the output name of pyspark.sql.functions.now

### What changes were proposed in this pull request?

This PR proposes to fix `current_timestamp()` to `now()` in its output name 
when you invoke `pyspark.sql.functions.now`.

### Why are the changes needed?

To show the correct name of the functions being used.

### Does this PR introduce _any_ user-facing change?

Yes.

```python
from pyspark.sql import functions as sf
df.select(sf.now()).show(truncate=False)
```

Before:

```
+--+
|current_timestamp()   |
+--+
|2023-12-08 15:15:58.767781|
+--+
```

After:

```
+--+
|now() |
+--+
|2023-12-08 15:18:18.482269|
+--+
```

### How was this patch tested?

Manually tested, and unittests were added.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44252 from HyukjinKwon/now-name.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/connect/functions/builtin.py |  2 +-
 python/pyspark/sql/functions/builtin.py | 15 ---
 python/pyspark/sql/tests/test_functions.py  |  8 
 3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/python/pyspark/sql/connect/functions/builtin.py 
b/python/pyspark/sql/connect/functions/builtin.py
index 48a7a223e6e4..cf2e2e0c7344 100644
--- a/python/pyspark/sql/connect/functions/builtin.py
+++ b/python/pyspark/sql/connect/functions/builtin.py
@@ -2864,7 +2864,7 @@ current_timestamp.__doc__ = 
pysparkfuncs.current_timestamp.__doc__
 
 
 def now() -> Column:
-return _invoke_function("current_timestamp")
+return _invoke_function("now")
 
 
 now.__doc__ = pysparkfuncs.now.__doc__
diff --git a/python/pyspark/sql/functions/builtin.py 
b/python/pyspark/sql/functions/builtin.py
index 87ae84c4e2d4..4f8e6a8e1d14 100644
--- a/python/pyspark/sql/functions/builtin.py
+++ b/python/pyspark/sql/functions/builtin.py
@@ -6838,15 +6838,16 @@ def now() -> Column:
 
 Examples
 
+>>> from pyspark.sql import functions as sf
 >>> df = spark.range(1)
->>> df.select(now()).show(truncate=False) # doctest: +SKIP
-+---+
-|now()|
-+---+
-|2022-08-26 21:23:22.716|
-+---+
+>>> df.select(sf.now()).show(truncate=False) # doctest: +SKIP
++--+
+|now() |
++--+
+|2023-12-08 15:18:18.482269|
++--+
 """
-return _invoke_function("current_timestamp")
+return _invoke_function("now")
 
 
 @_try_remote_functions
diff --git a/python/pyspark/sql/tests/test_functions.py 
b/python/pyspark/sql/tests/test_functions.py
index 2ac7ddbcba59..8586fac4e86d 100644
--- a/python/pyspark/sql/tests/test_functions.py
+++ b/python/pyspark/sql/tests/test_functions.py
@@ -1376,6 +1376,14 @@ class FunctionsTestsMixin:
 for i in range(3):
 self.assertEqual(res[0][i * 2], res[0][i * 2 + 1])
 
+def test_current_timestamp(self):
+df = self.spark.range(1).select(F.current_timestamp())
+self.assertIsInstance(df.first()[0], datetime.datetime)
+self.assertEqual(df.schema.names[0], "current_timestamp()")
+df = self.spark.range(1).select(F.now())
+self.assertIsInstance(df.first()[0], datetime.datetime)
+self.assertEqual(df.schema.names[0], "now()")
+
 
 class FunctionsTests(ReusedSQLTestCase, FunctionsTestsMixin):
 pass


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-46275] Protobuf: Return null in permissive mode when deserialization fails

2023-12-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new ab1443052347 [SPARK-46275] Protobuf: Return null in permissive mode 
when deserialization fails
ab1443052347 is described below

commit ab14430523473528bafa41d8f10bc33efbb74493
Author: Raghu Angadi 
AuthorDate: Fri Dec 8 16:40:27 2023 +0900

[SPARK-46275] Protobuf: Return null in permissive mode when deserialization 
fails

### What changes were proposed in this pull request?
This updates the the behavior of `from_protobuf()` built function when 
underlying record fails to deserialize.

  * **Current behvior**:
* By default, this would throw an error and the query fails. [This part 
is not changed in the PR]
* When `mode` is set to 'PERMISSIVE' it returns a non-null struct with 
each of the inner fields set to null e.g. `{ "field_a": null, "field_b": null 
}`  etc.
   * This is not very convenient to the users. They don't know if this 
was due to malformed record or if the input itself has null. It is very hard to 
check for each field for null in SQL query (imagine a sql query with a struct 
that has 10 fields).

  * **New behavior**
* When `mode` is set to 'PERMISSIVE' it simply returns `null`.

### Why are the changes needed?
This makes it easier for users to detect and handle malformed records.

### Does this PR introduce _any_ user-facing change?
Yes, but this does not change the contract. In fact, it clarifies it.

### How was this patch tested?
 - Unit tests are updated.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44214 from rangadi/protobuf-null.

Authored-by: Raghu Angadi 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 309c796876f310f8604292d84acc12e711ba7031)
Signed-off-by: Hyukjin Kwon 
---
 .../sql/protobuf/ProtobufDataToCatalyst.scala  | 31 --
 .../ProtobufCatalystDataConversionSuite.scala  | 13 +
 2 files changed, 6 insertions(+), 38 deletions(-)

diff --git 
a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala
 
b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala
index 5c4a5ff06896..d2417674837b 100644
--- 
a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala
+++ 
b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala
@@ -22,12 +22,12 @@ import scala.util.control.NonFatal
 import com.google.protobuf.DynamicMessage
 import com.google.protobuf.TypeRegistry
 
-import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, 
Expression, SpecificInternalRow, UnaryExpression}
+import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, 
Expression, UnaryExpression}
 import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
 import org.apache.spark.sql.catalyst.util.{FailFastMode, ParseMode, 
PermissiveMode}
 import org.apache.spark.sql.errors.{QueryCompilationErrors, 
QueryExecutionErrors}
 import org.apache.spark.sql.protobuf.utils.{ProtobufOptions, ProtobufUtils, 
SchemaConverters}
-import org.apache.spark.sql.types.{AbstractDataType, BinaryType, DataType, 
StructType}
+import org.apache.spark.sql.types.{AbstractDataType, BinaryType, DataType}
 
 private[sql] case class ProtobufDataToCatalyst(
 child: Expression,
@@ -39,16 +39,8 @@ private[sql] case class ProtobufDataToCatalyst(
 
   override def inputTypes: Seq[AbstractDataType] = Seq(BinaryType)
 
-  override lazy val dataType: DataType = {
-val dt = SchemaConverters.toSqlType(messageDescriptor, 
protobufOptions).dataType
-parseMode match {
-  // With PermissiveMode, the output Catalyst row might contain columns of 
null values for
-  // corrupt records, even if some of the columns are not nullable in the 
user-provided schema.
-  // Therefore we force the schema to be all nullable here.
-  case PermissiveMode => dt.asNullable
-  case _ => dt
-}
-  }
+  override lazy val dataType: DataType =
+SchemaConverters.toSqlType(messageDescriptor, protobufOptions).dataType
 
   override def nullable: Boolean = true
 
@@ -87,22 +79,9 @@ private[sql] case class ProtobufDataToCatalyst(
 mode
   }
 
-  @transient private lazy val nullResultRow: Any = dataType match {
-case st: StructType =>
-  val resultRow = new SpecificInternalRow(st.map(_.dataType))
-  for (i <- 0 until st.length) {
-resultRow.setNullAt(i)
-  }
-  resultRow
-
-case _ =>
-  null
-  }
-
   private def handleException(e: Throwable): Any = {
 parseMode match {
-  case PermissiveMode =>
-nullResultRow

(spark) branch master updated: [SPARK-46275] Protobuf: Return null in permissive mode when deserialization fails

2023-12-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 309c796876f3 [SPARK-46275] Protobuf: Return null in permissive mode 
when deserialization fails
309c796876f3 is described below

commit 309c796876f310f8604292d84acc12e711ba7031
Author: Raghu Angadi 
AuthorDate: Fri Dec 8 16:40:27 2023 +0900

[SPARK-46275] Protobuf: Return null in permissive mode when deserialization 
fails

### What changes were proposed in this pull request?
This updates the the behavior of `from_protobuf()` built function when 
underlying record fails to deserialize.

  * **Current behvior**:
* By default, this would throw an error and the query fails. [This part 
is not changed in the PR]
* When `mode` is set to 'PERMISSIVE' it returns a non-null struct with 
each of the inner fields set to null e.g. `{ "field_a": null, "field_b": null 
}`  etc.
   * This is not very convenient to the users. They don't know if this 
was due to malformed record or if the input itself has null. It is very hard to 
check for each field for null in SQL query (imagine a sql query with a struct 
that has 10 fields).

  * **New behavior**
* When `mode` is set to 'PERMISSIVE' it simply returns `null`.

### Why are the changes needed?
This makes it easier for users to detect and handle malformed records.

### Does this PR introduce _any_ user-facing change?
Yes, but this does not change the contract. In fact, it clarifies it.

### How was this patch tested?
 - Unit tests are updated.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44214 from rangadi/protobuf-null.

Authored-by: Raghu Angadi 
Signed-off-by: Hyukjin Kwon 
---
 .../sql/protobuf/ProtobufDataToCatalyst.scala  | 31 --
 .../ProtobufCatalystDataConversionSuite.scala  | 13 +
 2 files changed, 6 insertions(+), 38 deletions(-)

diff --git 
a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala
 
b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala
index a239a627125d..a182ac854b28 100644
--- 
a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala
+++ 
b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala
@@ -22,12 +22,12 @@ import scala.util.control.NonFatal
 import com.google.protobuf.DynamicMessage
 import com.google.protobuf.TypeRegistry
 
-import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, 
Expression, SpecificInternalRow, UnaryExpression}
+import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, 
Expression, UnaryExpression}
 import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
 import org.apache.spark.sql.catalyst.util.{FailFastMode, ParseMode, 
PermissiveMode}
 import org.apache.spark.sql.errors.{QueryCompilationErrors, 
QueryExecutionErrors}
 import org.apache.spark.sql.protobuf.utils.{ProtobufOptions, ProtobufUtils, 
SchemaConverters}
-import org.apache.spark.sql.types.{AbstractDataType, BinaryType, DataType, 
StructType}
+import org.apache.spark.sql.types.{AbstractDataType, BinaryType, DataType}
 
 private[sql] case class ProtobufDataToCatalyst(
 child: Expression,
@@ -39,16 +39,8 @@ private[sql] case class ProtobufDataToCatalyst(
 
   override def inputTypes: Seq[AbstractDataType] = Seq(BinaryType)
 
-  override lazy val dataType: DataType = {
-val dt = SchemaConverters.toSqlType(messageDescriptor, 
protobufOptions).dataType
-parseMode match {
-  // With PermissiveMode, the output Catalyst row might contain columns of 
null values for
-  // corrupt records, even if some of the columns are not nullable in the 
user-provided schema.
-  // Therefore we force the schema to be all nullable here.
-  case PermissiveMode => dt.asNullable
-  case _ => dt
-}
-  }
+  override lazy val dataType: DataType =
+SchemaConverters.toSqlType(messageDescriptor, protobufOptions).dataType
 
   override def nullable: Boolean = true
 
@@ -87,22 +79,9 @@ private[sql] case class ProtobufDataToCatalyst(
 mode
   }
 
-  @transient private lazy val nullResultRow: Any = dataType match {
-case st: StructType =>
-  val resultRow = new SpecificInternalRow(st.map(_.dataType))
-  for (i <- 0 until st.length) {
-resultRow.setNullAt(i)
-  }
-  resultRow
-
-case _ =>
-  null
-  }
-
   private def handleException(e: Throwable): Any = {
 parseMode match {
-  case PermissiveMode =>
-nullResultRow
+  case PermissiveMode => null
   case FailFastMode =>
 throw 
QueryExecutionErrors.malformedProt

(spark) branch master updated (c06d41859f08 -> 9ffdcc398ed5)

2023-12-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from c06d41859f08 [SPARK-46320][CORE] Support `spark.master.rest.host`
 add 9ffdcc398ed5 [SPARK-46321][PS][TESTS] Re-ennable 
`IndexesTests.test_asof` that was skipped due to Pandas bug

No new revisions were added by this update.

Summary of changes:
 python/pyspark/pandas/tests/indexes/test_base.py | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (b6b450927ec8 -> c06d41859f08)

2023-12-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b6b450927ec8 [SPARK-46317][PYTHON][CONNECT] Match minor behaviour 
matching in SparkSession with full test coverage
 add c06d41859f08 [SPARK-46320][CORE] Support `spark.master.rest.host`

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/deploy/master/Master.scala   | 3 ++-
 .../src/main/scala/org/apache/spark/internal/config/package.scala | 6 ++
 docs/spark-standalone.md  | 8 
 3 files changed, 16 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46317][PYTHON][CONNECT] Match minor behaviour matching in SparkSession with full test coverage

2023-12-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b6b450927ec8 [SPARK-46317][PYTHON][CONNECT] Match minor behaviour 
matching in SparkSession with full test coverage
b6b450927ec8 is described below

commit b6b450927ec8139ab9b19442023178f308ada9cb
Author: Hyukjin Kwon 
AuthorDate: Fri Dec 8 15:10:52 2023 +0900

[SPARK-46317][PYTHON][CONNECT] Match minor behaviour matching in 
SparkSession with full test coverage

### What changes were proposed in this pull request?

This PR matches the corner case behaviours in `SparkSession` between Spark 
Connect and non-Spark Connect with adding unittests with the full test coverage 
within `pyspark.sql.session`.

### Why are the changes needed?

- For feature parity.
- To improve the test coverage.
See 
https://app.codecov.io/gh/apache/spark/blob/master/python%2Fpyspark%2Fsql%2Fsession.py
 - this is not being tested.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manually ran the new unittest.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44247 from HyukjinKwon/SPARK-46317.

Lead-authored-by: Hyukjin Kwon 
Co-authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/errors/error_classes.py |  5 --
 python/pyspark/sql/connect/session.py  |  9 
 python/pyspark/sql/session.py  |  4 +-
 python/pyspark/sql/tests/test_dataframe.py | 29 +++
 python/pyspark/sql/tests/test_session.py   | 84 +-
 python/pyspark/sql/tests/test_types.py | 10 
 6 files changed, 133 insertions(+), 8 deletions(-)

diff --git a/python/pyspark/errors/error_classes.py 
b/python/pyspark/errors/error_classes.py
index cc8400270967..d2d7f3148f4c 100644
--- a/python/pyspark/errors/error_classes.py
+++ b/python/pyspark/errors/error_classes.py
@@ -848,11 +848,6 @@ ERROR_CLASSES_JSON = """
   "SparkContext or SparkSession should be created first.."
 ]
   },
-  "SHOULD_NOT_DATAFRAME": {
-"message": [
-  "Argument `` should not be a DataFrame."
-]
-  },
   "SLICE_WITH_STEP" : {
 "message" : [
   "Slice with step is not supported."
diff --git a/python/pyspark/sql/connect/session.py 
b/python/pyspark/sql/connect/session.py
index 0fcd85c033cf..a27e6fa4b729 100644
--- a/python/pyspark/sql/connect/session.py
+++ b/python/pyspark/sql/connect/session.py
@@ -370,6 +370,15 @@ class SparkSession:
 _cols = [x.encode("utf-8") if not isinstance(x, str) else x for x 
in schema]
 _num_cols = len(_cols)
 
+elif schema is not None:
+raise PySparkTypeError(
+error_class="NOT_LIST_OR_NONE_OR_STRUCT",
+message_parameters={
+"arg_name": "schema",
+"arg_type": type(schema).__name__,
+},
+)
+
 if isinstance(data, np.ndarray) and data.ndim not in [1, 2]:
 raise PySparkValueError(
 error_class="INVALID_NDARRAY_DIMENSION",
diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index 86aacfa54c6e..7615491a1778 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -1417,8 +1417,8 @@ class SparkSession(SparkConversionMixin):
 self._jvm.SparkSession.setActiveSession(self._jsparkSession)
 if isinstance(data, DataFrame):
 raise PySparkTypeError(
-error_class="SHOULD_NOT_DATAFRAME",
-message_parameters={"arg_name": "data"},
+error_class="INVALID_TYPE",
+message_parameters={"arg_name": "data", "data_type": 
"DataFrame"},
 )
 
 if isinstance(schema, str):
diff --git a/python/pyspark/sql/tests/test_dataframe.py 
b/python/pyspark/sql/tests/test_dataframe.py
index c25fe60ad174..e1df01116e18 100644
--- a/python/pyspark/sql/tests/test_dataframe.py
+++ b/python/pyspark/sql/tests/test_dataframe.py
@@ -1913,6 +1913,35 @@ class DataFrameTestsMixin:
 self.assertEqual(df.schema, schema)
 self.assertEqual(df.collect(), data)
 
+def test_partial_inference_failure(self):
+with self.assertRaises(PySparkValueError) as pe:
+self.spark.createDataFrame([(None, 1)])
+
+self.check_error(
+exception=pe.exception,
+error_class="CANNOT_DETERMINE_TYPE",
+message_parameters={},
+)
+
+def test_invalid_argument_create_dataframe(self):
+with self.assertRaises(PySparkTypeError) as pe:
+self.spark.createDataFrame([(1, 2)], schema=123)
+
+self.check_error(
+exception=pe.exception,
+error_cl

(spark) branch master updated: [SPARK-46315][PYTHON][TESTS] Test invalid key for spark.conf.get (pyspark.sql.conf)

2023-12-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 19193854c759 [SPARK-46315][PYTHON][TESTS] Test invalid key for 
spark.conf.get (pyspark.sql.conf)
19193854c759 is described below

commit 19193854c759c4f7c90aad191906dc799c7a7341
Author: Hyukjin Kwon 
AuthorDate: Fri Dec 8 15:10:17 2023 +0900

[SPARK-46315][PYTHON][TESTS] Test invalid key for spark.conf.get 
(pyspark.sql.conf)

### What changes were proposed in this pull request?

This PR adds tests for negative cases for `spark.conf.get` 
(`pyspark.sql.conf`)

### Why are the changes needed?

To improve the test coverage.


https://app.codecov.io/gh/apache/spark/blob/master/python%2Fpyspark%2Fsql%2Fconf.py

### Does this PR introduce _any_ user-facing change?

No, test-only

### How was this patch tested?

Manually ran the new unittest.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44245 from HyukjinKwon/SPARK-46315.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/tests/test_conf.py | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/tests/test_conf.py 
b/python/pyspark/sql/tests/test_conf.py
index 15722c2c57a4..9b939205b1d1 100644
--- a/python/pyspark/sql/tests/test_conf.py
+++ b/python/pyspark/sql/tests/test_conf.py
@@ -16,7 +16,7 @@
 #
 from decimal import Decimal
 
-from pyspark.errors import IllegalArgumentException
+from pyspark.errors import IllegalArgumentException, PySparkTypeError
 from pyspark.testing.sqlutils import ReusedSQLTestCase
 
 
@@ -63,6 +63,18 @@ class ConfTestsMixin:
 with self.assertRaises(Exception):
 spark.conf.set("foo", Decimal(1))
 
+with self.assertRaises(PySparkTypeError) as pe:
+spark.conf.get(123)
+
+self.check_error(
+exception=pe.exception,
+error_class="NOT_STR",
+message_parameters={
+"arg_name": "key",
+"arg_type": "int",
+},
+)
+
 spark.conf.unset("foo")
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46318][PYTHON][INFRA] Exclude ported pyspark.loose_version from the code coverage report

2023-12-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cc254e6e9c2e [SPARK-46318][PYTHON][INFRA] Exclude ported 
pyspark.loose_version from the code coverage report
cc254e6e9c2e is described below

commit cc254e6e9c2e690c98ece8b399aa3763d01893a6
Author: Hyukjin Kwon 
AuthorDate: Fri Dec 8 15:09:38 2023 +0900

[SPARK-46318][PYTHON][INFRA] Exclude ported pyspark.loose_version from the 
code coverage report

### What changes were proposed in this pull request?

This PR proposes to exclude ported (from Python) `pyspark.loose_version` 
from the code coverage report.

### Why are the changes needed?

For correct test coverage report, and make it easier to read.


https://app.codecov.io/gh/apache/spark/blob/master/python%2Fpyspark%2Floose_version.py

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually tested.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44248 from HyukjinKwon/SPARK-46318.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/run-tests-with-coverage | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/python/run-tests-with-coverage b/python/run-tests-with-coverage
index 339964096188..f6b6d965254d 100755
--- a/python/run-tests-with-coverage
+++ b/python/run-tests-with-coverage
@@ -60,10 +60,10 @@ find $COVERAGE_DIR/coverage_data -size 0 -print0 | xargs -0 
rm -fr
 echo "Combining collected coverage data under $COVERAGE_DIR/coverage_data"
 $COV_EXEC combine
 echo "Creating XML report file at python/coverage.xml"
-$COV_EXEC xml --ignore-errors --include "pyspark/*" --omit 
"pyspark/cloudpickle/*" --omit "pyspark/sql/connect/proto/*"
+$COV_EXEC xml --ignore-errors --include "pyspark/*" --omit 
"pyspark/cloudpickle/*" --omit "pyspark/sql/connect/proto/*" --omit 
"python/pyspark/loose_version.py"
 echo "Reporting the coverage data at $COVERAGE_DIR/coverage_data/coverage"
-$COV_EXEC report --include "pyspark/*" --omit "pyspark/cloudpickle/*" --omit 
"pyspark/sql/connect/proto/*"
+$COV_EXEC report --include "pyspark/*" --omit "pyspark/cloudpickle/*" --omit 
"pyspark/sql/connect/proto/*" --omit "python/pyspark/loose_version.py"
 echo "Generating HTML files for PySpark coverage under $COVERAGE_DIR/htmlcov"
-$COV_EXEC html --ignore-errors --include "pyspark/*" --directory 
"$COVERAGE_DIR/htmlcov" --omit "pyspark/cloudpickle/*" --omit 
"pyspark/sql/connect/proto/*"
+$COV_EXEC html --ignore-errors --include "pyspark/*" --directory 
"$COVERAGE_DIR/htmlcov" --omit "pyspark/cloudpickle/*" --omit 
"pyspark/sql/connect/proto/*" --omit "python/pyspark/loose_version.py"
 
 popd


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46316][CORE] Enable `buf-lint-action` on `core` module

2023-12-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7e4a63a0588f [SPARK-46316][CORE] Enable `buf-lint-action` on `core` 
module
7e4a63a0588f is described below

commit 7e4a63a0588f1b4b16e76d4d7d1add19cb2f0a82
Author: Dongjoon Hyun 
AuthorDate: Thu Dec 7 19:57:08 2023 -0800

[SPARK-46316][CORE] Enable `buf-lint-action` on `core` module

### What changes were proposed in this pull request?

This PR aims to enable `buf-lint-action` on `core` module.

### Why are the changes needed?

To enforce the community guideline.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

![Screenshot 2023-12-07 at 7 40 54 
PM](https://github.com/apache/spark/assets/9700541/b23f-d8be-410a-bc61-88f8b477a3b0)

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44246 from dongjoon-hyun/SPARK-46316.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml |  4 
 core/src/main/protobuf/buf.yaml  | 23 +++
 2 files changed, 27 insertions(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 0e1a4a810f8a..e54883552920 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -583,6 +583,10 @@ jobs:
   uses: bufbuild/buf-setup-action@v1
   with:
 github_token: ${{ secrets.GITHUB_TOKEN }}
+- name: Protocol Buffers Linter
+  uses: bufbuild/buf-lint-action@v1
+  with:
+input: core/src/main/protobuf
 # Change 'branch-3.5' to 'branch-4.0' in master branch after cutting 
branch-4.0 branch.
 - name: Breaking change detection against branch-3.5
   uses: bufbuild/buf-breaking-action@v1
diff --git a/core/src/main/protobuf/buf.yaml b/core/src/main/protobuf/buf.yaml
new file mode 100644
index ..47f69191a5c7
--- /dev/null
+++ b/core/src/main/protobuf/buf.yaml
@@ -0,0 +1,23 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+version: v1
+breaking:
+  use:
+- FILE
+lint:
+  use:
+- BASIC


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: Revert "[SPARK-46316][CORE] Enable `buf-lint-action` on `core` module"

2023-12-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 80dc64a573e1 Revert "[SPARK-46316][CORE] Enable `buf-lint-action` on 
`core` module"
80dc64a573e1 is described below

commit 80dc64a573e1c7678f92f8690f09a52329f7d30b
Author: Dongjoon Hyun 
AuthorDate: Thu Dec 7 20:03:01 2023 -0800

Revert "[SPARK-46316][CORE] Enable `buf-lint-action` on `core` module"

This reverts commit dcbae0643ce145df6cd0a7a68af3fdd1a062587b.
---
 .github/workflows/build_and_test.yml |  4 
 core/src/main/protobuf/buf.yaml  | 23 ---
 2 files changed, 27 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index e54883552920..0e1a4a810f8a 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -583,10 +583,6 @@ jobs:
   uses: bufbuild/buf-setup-action@v1
   with:
 github_token: ${{ secrets.GITHUB_TOKEN }}
-- name: Protocol Buffers Linter
-  uses: bufbuild/buf-lint-action@v1
-  with:
-input: core/src/main/protobuf
 # Change 'branch-3.5' to 'branch-4.0' in master branch after cutting 
branch-4.0 branch.
 - name: Breaking change detection against branch-3.5
   uses: bufbuild/buf-breaking-action@v1
diff --git a/core/src/main/protobuf/buf.yaml b/core/src/main/protobuf/buf.yaml
deleted file mode 100644
index 47f69191a5c7..
--- a/core/src/main/protobuf/buf.yaml
+++ /dev/null
@@ -1,23 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-version: v1
-breaking:
-  use:
-- FILE
-lint:
-  use:
-- BASIC


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (105eee73cfa0 -> dcbae0643ce1)

2023-12-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 105eee73cfa0 [SPARK-46313][CORE] Log `Spark HA` recovery duration
 add dcbae0643ce1 [SPARK-46316][CORE] Enable `buf-lint-action` on `core` 
module

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml  | 4 
 {connector/connect/common => core}/src/main/protobuf/buf.yaml | 4 +---
 2 files changed, 5 insertions(+), 3 deletions(-)
 copy {connector/connect/common => core}/src/main/protobuf/buf.yaml (94%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (61a3e0587df6 -> 105eee73cfa0)

2023-12-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 61a3e0587df6 [SPARK-46312][CORE] Use `lower_camel_case` in 
`store_types.proto`
 add 105eee73cfa0 [SPARK-46313][CORE] Log `Spark HA` recovery duration

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/deploy/master/Master.scala | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46312][CORE] Use `lower_camel_case` in `store_types.proto`

2023-12-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 61a3e0587df6 [SPARK-46312][CORE] Use `lower_camel_case` in 
`store_types.proto`
61a3e0587df6 is described below

commit 61a3e0587df6be881cdc115fefb77482fa446b5c
Author: Dongjoon Hyun 
AuthorDate: Thu Dec 7 18:16:21 2023 -0800

[SPARK-46312][CORE] Use `lower_camel_case` in `store_types.proto`

### What changes were proposed in this pull request?

This PR aims to use 'lower_camel_case` in `store_types.proto`.

### Why are the changes needed?

According to our guideline, we had better follow 
[FIELD_LOWER_SNAKE_CASE](https://buf.build/docs/lint/rules#field_lower_snake_case)


https://github.com/apache/spark/blob/9585cf6d56e3af37142609668dda1eeda3ec876f/core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto#L23

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44242 from dongjoon-hyun/SPARK-46312.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../main/protobuf/org/apache/spark/status/protobuf/store_types.proto  | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto 
b/core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto
index 93365add3a64..386c660b16de 100644
--- a/core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto
+++ b/core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto
@@ -164,7 +164,7 @@ message ExecutorStageSummaryWrapper {
 message ExecutorResourceRequest {
   optional string resource_name = 1;
   int64 amount = 2;
-  optional string discoveryScript = 3;
+  optional string discovery_script = 3;
   optional string vendor = 4;
 }
 
@@ -277,7 +277,7 @@ message RDDStorageInfoWrapper {
 }
 
 message ResourceProfileWrapper {
-  ResourceProfileInfo rpInfo = 1;
+  ResourceProfileInfo rp_info = 1;
 }
 
 message CachedQuantile {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46309][PS][TESTS] Remove unused code in `pyspark.pandas.tests.indexes.* `

2023-12-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9585cf6d56e3 [SPARK-46309][PS][TESTS] Remove unused code in 
`pyspark.pandas.tests.indexes.* `
9585cf6d56e3 is described below

commit 9585cf6d56e3af37142609668dda1eeda3ec876f
Author: Ruifeng Zheng 
AuthorDate: Thu Dec 7 15:04:07 2023 -0800

[SPARK-46309][PS][TESTS] Remove unused code in 
`pyspark.pandas.tests.indexes.* `

### What changes were proposed in this pull request?
Remove unused code in `pyspark.pandas.tests.indexes.* `

### Why are the changes needed?
clean up the code

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #44239 from zhengruifeng/ps_index_cleanup.

Authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
---
 .../pyspark/pandas/tests/data_type_ops/test_string_ops.py  |  4 
 python/pyspark/pandas/tests/indexes/test_align.py  | 14 --
 python/pyspark/pandas/tests/indexes/test_base_slow.py  |  7 ---
 python/pyspark/pandas/tests/indexes/test_reindex.py| 14 --
 python/pyspark/pandas/tests/indexes/test_rename.py | 14 --
 5 files changed, 53 deletions(-)

diff --git a/python/pyspark/pandas/tests/data_type_ops/test_string_ops.py 
b/python/pyspark/pandas/tests/data_type_ops/test_string_ops.py
index 2870aed8e75e..340153b06335 100644
--- a/python/pyspark/pandas/tests/data_type_ops/test_string_ops.py
+++ b/python/pyspark/pandas/tests/data_type_ops/test_string_ops.py
@@ -35,10 +35,6 @@ class StringOpsTestsMixin:
 def bool_pdf(self):
 return pd.DataFrame({"this": ["x", "y", "z"], "that": ["z", "y", "x"]})
 
-@property
-def bool_psdf(self):
-return ps.from_pandas(self.bool_pdf)
-
 @property
 def bool_non_numeric_pdf(self):
 return pd.concat([self.bool_pdf, self.non_numeric_pdf], axis=1)
diff --git a/python/pyspark/pandas/tests/indexes/test_align.py 
b/python/pyspark/pandas/tests/indexes/test_align.py
index 56fde9b4f28b..73e10d441078 100644
--- a/python/pyspark/pandas/tests/indexes/test_align.py
+++ b/python/pyspark/pandas/tests/indexes/test_align.py
@@ -16,7 +16,6 @@
 #
 import unittest
 
-import numpy as np
 import pandas as pd
 
 from pyspark import pandas as ps
@@ -25,19 +24,6 @@ from pyspark.testing.sqlutils import SQLTestUtils
 
 
 class FrameAlignMixin:
-@property
-def pdf(self):
-return pd.DataFrame(
-{"a": [1, 2, 3, 4, 5, 6, 7, 8, 9], "b": [4, 5, 6, 3, 2, 1, 0, 0, 
0]},
-index=np.random.rand(9),
-)
-
-@property
-def df_pair(self):
-pdf = self.pdf
-psdf = ps.from_pandas(pdf)
-return pdf, psdf
-
 def test_align(self):
 pdf1 = pd.DataFrame({"a": [1, 2, 3], "b": ["a", "b", "c"]}, index=[10, 
20, 30])
 psdf1 = ps.from_pandas(pdf1)
diff --git a/python/pyspark/pandas/tests/indexes/test_base_slow.py 
b/python/pyspark/pandas/tests/indexes/test_base_slow.py
index c890f5004b43..eb417fe47ef8 100644
--- a/python/pyspark/pandas/tests/indexes/test_base_slow.py
+++ b/python/pyspark/pandas/tests/indexes/test_base_slow.py
@@ -24,13 +24,6 @@ from pyspark.testing.pandasutils import ComparisonTestBase, 
TestUtils
 
 
 class IndexesSlowTestsMixin:
-@property
-def pdf(self):
-return pd.DataFrame(
-{"a": [1, 2, 3, 4, 5, 6, 7, 8, 9], "b": [4, 5, 6, 3, 2, 1, 0, 0, 
0]},
-index=[0, 1, 3, 5, 6, 8, 9, 9, 9],
-)
-
 def test_append(self):
 # Index
 pidx = pd.Index(range(1))
diff --git a/python/pyspark/pandas/tests/indexes/test_reindex.py 
b/python/pyspark/pandas/tests/indexes/test_reindex.py
index 1d544ea221bf..1229a613846b 100644
--- a/python/pyspark/pandas/tests/indexes/test_reindex.py
+++ b/python/pyspark/pandas/tests/indexes/test_reindex.py
@@ -16,7 +16,6 @@
 #
 import unittest
 
-import numpy as np
 import pandas as pd
 
 from pyspark import pandas as ps
@@ -25,19 +24,6 @@ from pyspark.testing.sqlutils import SQLTestUtils
 
 
 class FrameReindexMixin:
-@property
-def pdf(self):
-return pd.DataFrame(
-{"a": [1, 2, 3, 4, 5, 6, 7, 8, 9], "b": [4, 5, 6, 3, 2, 1, 0, 0, 
0]},
-index=np.random.rand(9),
-)
-
-@property
-def df_pair(self):
-pdf = self.pdf
-psdf = ps.from_pandas(pdf)
-return pdf, psdf
-
 def test_reindex(self):
 index = pd.Index(["A", "B", "C", "D", "E"])
 columns = pd.Index(["numbers"])
diff --git a/python/pyspark/pandas/tests/indexes/test_rename.py 
b/python/pyspark/pandas/tests/indexes/test_rename.py
index b59408952143..662071f420e9 100644
--- a/python/pyspark/pandas/test

(spark) branch master updated (027aeb1764a8 -> 82e67461511e)

2023-12-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 027aeb1764a8 [SPARK-46277][PYTHON] Validate startup urls with the 
config being set
 add 82e67461511e [SPARK-46311][CORE] Log the final state of drivers during 
`Master.removeDriver`

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/deploy/master/Master.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46277][PYTHON] Validate startup urls with the config being set

2023-12-07 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 027aeb1764a8 [SPARK-46277][PYTHON] Validate startup urls with the 
config being set
027aeb1764a8 is described below

commit 027aeb1764a816858b7ea071cd2b620f02a6a525
Author: Xinrong Meng 
AuthorDate: Thu Dec 7 13:45:31 2023 -0800

[SPARK-46277][PYTHON] Validate startup urls with the config being set

### What changes were proposed in this pull request?
Validate startup urls with the config being set, see example in the "Does 
this PR introduce _any_ user-facing change".

### Why are the changes needed?
Clear and user-friendly error messages.

### Does this PR introduce _any_ user-facing change?
Yes.

FROM
```py
>>> SparkSession.builder.config(map={"spark.master": "x", "spark.remote": 
"y"})
>> SparkSession.builder.config(map={"spark.master": "x", "spark.remote": 
"y"}).config("x", "z")  # Only raises the error when adding new configs
Traceback (most recent call last):
...
RuntimeError: Spark master cannot be configured with Spark Connect server; 
however, found URL for Spark Connect [y]
```

TO
```py
>>> SparkSession.builder.config(map={"spark.master": "x", "spark.remote": 
"y"})
Traceback (most recent call last):
...
RuntimeError: Spark master cannot be configured with Spark Connect server; 
however, found URL for Spark Connect [y]
```

### How was this patch tested?
Unit tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44194 from xinrong-meng/fix_session.

Authored-by: Xinrong Meng 
Signed-off-by: Xinrong Meng 
---
 python/pyspark/errors/error_classes.py   |  6 +++---
 python/pyspark/sql/session.py| 28 +++-
 python/pyspark/sql/tests/test_session.py | 30 --
 3 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/python/pyspark/errors/error_classes.py 
b/python/pyspark/errors/error_classes.py
index 965fd04a9135..cc8400270967 100644
--- a/python/pyspark/errors/error_classes.py
+++ b/python/pyspark/errors/error_classes.py
@@ -86,12 +86,12 @@ ERROR_CLASSES_JSON = """
   },
   "CANNOT_CONFIGURE_SPARK_CONNECT": {
 "message": [
-  "Spark Connect server cannot be configured with Spark master; however, 
found URL for Spark master []."
+  "Spark Connect server cannot be configured: Existing [], 
New []."
 ]
   },
-  "CANNOT_CONFIGURE_SPARK_MASTER": {
+  "CANNOT_CONFIGURE_SPARK_CONNECT_MASTER": {
 "message": [
-  "Spark master cannot be configured with Spark Connect server; however, 
found URL for Spark Connect []."
+  "Spark Connect server and Spark master cannot be configured together: 
Spark master [], Spark Connect []."
 ]
   },
   "CANNOT_CONVERT_COLUMN_INTO_BOOL": {
diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index 7f4589557cd2..86aacfa54c6e 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -286,17 +286,17 @@ class SparkSession(SparkConversionMixin):
 with self._lock:
 if conf is not None:
 for k, v in conf.getAll():
-self._validate_startup_urls()
 self._options[k] = v
+self._validate_startup_urls()
 elif map is not None:
 for k, v in map.items():  # type: ignore[assignment]
 v = to_str(v)  # type: ignore[assignment]
-self._validate_startup_urls()
 self._options[k] = v
+self._validate_startup_urls()
 else:
 value = to_str(value)
-self._validate_startup_urls()
 self._options[cast(str, key)] = value
+self._validate_startup_urls()
 return self
 
 def _validate_startup_urls(
@@ -306,22 +306,16 @@ class SparkSession(SparkConversionMixin):
 Helper function that validates the combination of startup URLs and 
raises an exception
 if incompatible options are selected.
 """
-if "spark.master" in self._options and (
+if ("spark.master" in self._options or "MASTER" in os.environ) and 
(
 "spark.remote" in self._options or "SPARK_REMOTE" in os.environ
 ):
 raise PySparkRuntimeError(
-error_class="CANNOT_CONFIGURE_SPARK_MASTER",
+error_class="CANNOT_CONFIGURE_SPARK_CONNECT_MASTER",
 message_parameters={
-"url": self._options.get("spark.remote", 
os.environ.get("SPARK_REMO

(spark) branch master updated: [SPARK-46293][CONNECT][PYTHON] Use `protobuf` transitive dependency

2023-12-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e359318c4493 [SPARK-46293][CONNECT][PYTHON] Use `protobuf` transitive 
dependency
e359318c4493 is described below

commit e359318c4493e16a7546d70c9340ffc5015aacff
Author: Haejoon Lee 
AuthorDate: Thu Dec 7 10:28:27 2023 -0800

[SPARK-46293][CONNECT][PYTHON] Use `protobuf` transitive dependency

### What changes were proposed in this pull request?

This PR proposes to remove `protobuf` from required package.

### Why are the changes needed?

`protobuf` is automatically installed when installing `grpcio` and 
`grpcio-status`, so we don't need to specify the specific version explicitly.

### Does this PR introduce _any_ user-facing change?

No API changes.

### How was this patch tested?

The existing CI should pass

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44221 from itholic/protobuf_docs.

Authored-by: Haejoon Lee 
Signed-off-by: Dongjoon Hyun 
---
 dev/requirements.txt | 1 -
 1 file changed, 1 deletion(-)

diff --git a/dev/requirements.txt b/dev/requirements.txt
index 0f1f1aee5b63..51facfeb5088 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -53,7 +53,6 @@ py
 # Spark Connect (required)
 grpcio>=1.59.3
 grpcio-status>=1.59.3
-protobuf==4.25.1
 googleapis-common-protos>=1.56.4
 
 # Spark Connect python proto generation plugin (optional)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (0692856bb124 -> 8132e1700c81)

2023-12-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 0692856bb124 [SPARK-46307][PS][TESTS] Enable `fill_value` tests for 
`GroupByTests.test_shift`
 add 8132e1700c81 [SPARK-46261][CONNECT] `DataFrame.withColumnsRenamed` 
should keep the dict/map ordering

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/Dataset.scala  |   8 ++-
 .../main/protobuf/spark/connect/relations.proto|  14 +++-
 .../queries/withColumnRenamed_java_map.json|  11 +--
 .../queries/withColumnRenamed_java_map.proto.bin   | Bin 72 -> 72 bytes
 .../queries/withColumnRenamed_scala_map.json   |  11 +--
 .../queries/withColumnRenamed_scala_map.proto.bin  | Bin 72 -> 72 bytes
 .../queries/withColumnRenamed_single.json  |   7 +-
 .../queries/withColumnRenamed_single.proto.bin | Bin 60 -> 60 bytes
 .../sql/connect/planner/SparkConnectPlanner.scala  |  19 +++--
 python/pyspark/sql/connect/plan.py |   8 ++-
 python/pyspark/sql/connect/proto/relations_pb2.py  |  80 +++--
 python/pyspark/sql/connect/proto/relations_pb2.pyi |  34 -
 .../sql/tests/connect/test_parity_dataframe.py |   5 --
 .../main/scala/org/apache/spark/sql/Dataset.scala  |   2 +-
 14 files changed, 133 insertions(+), 66 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46304][INFRA] Upgrade setup-java action to v4 and setup-python to v5

2023-12-07 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b4d90dd2a622 [SPARK-46304][INFRA] Upgrade setup-java action to v4 and 
setup-python to v5
b4d90dd2a622 is described below

commit b4d90dd2a622beda542a7ce1ee15af9f312f9724
Author: panbingkun 
AuthorDate: Thu Dec 7 20:48:34 2023 +0800

[SPARK-46304][INFRA] Upgrade setup-java action to v4 and setup-python to v5

### What changes were proposed in this pull request?
The pr aims to upgrade `setup-java` action from `v3` to `v4` and 
`set-python` action from `v4` to `v5`.

### Why are the changes needed?
1.`setup-java` action
v4.0.0 release notes: 
https://github.com/actions/setup-java/releases/tag/v4.0.0
the version of the Node.js runtime was updated to 20. From now on, the code 
for the setup-java will run on Node.js 20 instead of Node.js 16.

2.`set-python` action
v5.0.0 release notes: 
https://github.com/actions/setup-python/releases/tag/v5.0.0
update node version runtime from node16 to node20.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44235 from panbingkun/SPARK-46304.

Authored-by: panbingkun 
Signed-off-by: Ruifeng Zheng 
---
 .github/workflows/benchmark.yml|  4 ++--
 .github/workflows/build_and_test.yml   | 20 ++--
 .github/workflows/maven_test.yml   |  4 ++--
 .github/workflows/publish_snapshot.yml |  4 ++--
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
index 3cb63404bcac..20ea952ff0a2 100644
--- a/.github/workflows/benchmark.yml
+++ b/.github/workflows/benchmark.yml
@@ -105,7 +105,7 @@ jobs:
 run: cd tpcds-kit/tools && make OS=LINUX
   - name: Install Java ${{ github.event.inputs.jdk }}
 if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true'
-uses: actions/setup-java@v3
+uses: actions/setup-java@v4
 with:
   distribution: zulu
   java-version: ${{ github.event.inputs.jdk }}
@@ -157,7 +157,7 @@ jobs:
 restore-keys: |
   benchmark-coursier-${{ github.event.inputs.jdk }}
 - name: Install Java ${{ github.event.inputs.jdk }}
-  uses: actions/setup-java@v3
+  uses: actions/setup-java@v4
   with:
 distribution: zulu
 java-version: ${{ github.event.inputs.jdk }}
diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 920ac6a38c87..0e1a4a810f8a 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -241,12 +241,12 @@ jobs:
   ./dev/free_disk_space
 fi
 - name: Install Java ${{ matrix.java }}
-  uses: actions/setup-java@v3
+  uses: actions/setup-java@v4
   with:
 distribution: zulu
 java-version: ${{ matrix.java }}
 - name: Install Python 3.9
-  uses: actions/setup-python@v4
+  uses: actions/setup-python@v5
   # We should install one Python that is higher than 3+ for SQL and Yarn 
because:
   # - SQL component also has Python related tests, for example, 
IntegratedUDFTestUtils.
   # - Yarn has a Python specific test too, for example, YarnClusterSuite.
@@ -432,7 +432,7 @@ jobs:
   ./dev/free_disk_space_container
 fi
 - name: Install Java ${{ matrix.java }}
-  uses: actions/setup-java@v3
+  uses: actions/setup-java@v4
   with:
 distribution: zulu
 java-version: ${{ matrix.java }}
@@ -542,7 +542,7 @@ jobs:
   ./dev/free_disk_space_container
 fi
 - name: Install Java ${{ inputs.java }}
-  uses: actions/setup-java@v3
+  uses: actions/setup-java@v4
   with:
 distribution: zulu
 java-version: ${{ inputs.java }}
@@ -590,7 +590,7 @@ jobs:
 input: connector/connect/common/src/main
 against: 
'https://github.com/apache/spark.git#branch=branch-3.5,subdir=connector/connect/common/src/main'
 - name: Install Python 3.9
-  uses: actions/setup-python@v4
+  uses: actions/setup-python@v5
   with:
 python-version: '3.9'
 - name: Install dependencies for Python CodeGen check
@@ -665,7 +665,7 @@ jobs:
   ./dev/free_disk_space_container
 fi
 - name: Install Java ${{ inputs.java }}
-  uses: actions/setup-java@v3
+  uses: actions/setup-java@v4
   with:
 distribution: zulu
 java-version: ${{ inputs.java }}
@@ -831,7 +831,7 @@ jobs:
 restore-keys: |
   java${{ matrix.java }}-maven-
 - name: Install Java ${{ matrix.java }}
-  uses: actions/setup-java@v3
+  uses: action

(spark) branch master updated: [SPARK-46301][CORE] Support `spark.worker.(initial|max)RegistrationRetries`

2023-12-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b92d64d6ef0c [SPARK-46301][CORE] Support 
`spark.worker.(initial|max)RegistrationRetries`
b92d64d6ef0c is described below

commit b92d64d6ef0c99b6b444f41ebdfe95f3260312aa
Author: Dongjoon Hyun 
AuthorDate: Thu Dec 7 01:01:56 2023 -0800

[SPARK-46301][CORE] Support `spark.worker.(initial|max)RegistrationRetries`

### What changes were proposed in this pull request?

This PR aims to support `spark.worker.(initial|max)RegistrationRetries` to 
parameterize the hard-coded magic numbers.
```scala
- private val INITIAL_REGISTRATION_RETRIES = 6
- private val TOTAL_REGISTRATION_RETRIES = INITIAL_REGISTRATION_RETRIES + 10
+ private val INITIAL_REGISTRATION_RETRIES = 
conf.get(WORKER_INITIAL_REGISTRATION_RETRIES)
+ private val TOTAL_REGISTRATION_RETRIES = 
conf.get(WORKER_MAX_REGISTRATION_RETRIES)
```

### Why are the changes needed?

To allow users to control these.

### Does this PR introduce _any_ user-facing change?

No. The default values are consistent with the existing behavior.

### How was this patch tested?

Pass the CIs.

![Screenshot 2023-12-06 at 8 58 05 
PM](https://github.com/apache/spark/assets/9700541/985ff3f7-d8c9-4803-a207-a6c16388e4d0)

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44229 from dongjoon-hyun/SPARK-46301.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/deploy/worker/Worker.scala  | 14 ++
 .../org/apache/spark/internal/config/Worker.scala  | 17 +
 docs/spark-standalone.md   | 18 ++
 3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
b/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
index eae12648b95a..1422a1484f8d 100755
--- a/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
@@ -96,12 +96,18 @@ private[deploy] class Worker(
   private val HEARTBEAT_MILLIS = conf.get(WORKER_TIMEOUT) * 1000 / 4
 
   // Model retries to connect to the master, after Hadoop's model.
-  // The first six attempts to reconnect are in shorter intervals (between 5 
and 15 seconds)
-  // Afterwards, the next 10 attempts are between 30 and 90 seconds.
+  // The total number of retries are less than or equal to 
WORKER_MAX_REGISTRATION_RETRIES.
+  // Within the upper limit, WORKER_MAX_REGISTRATION_RETRIES,
+  // the first WORKER_INITIAL_REGISTRATION_RETRIES attempts to reconnect are 
in shorter intervals
+  // (between 5 and 15 seconds). Afterwards, the next attempts are between 30 
and 90 seconds while
   // A bit of randomness is introduced so that not all of the workers attempt 
to reconnect at
   // the same time.
-  private val INITIAL_REGISTRATION_RETRIES = 6
-  private val TOTAL_REGISTRATION_RETRIES = INITIAL_REGISTRATION_RETRIES + 10
+  private val INITIAL_REGISTRATION_RETRIES = 
conf.get(WORKER_INITIAL_REGISTRATION_RETRIES)
+  private val TOTAL_REGISTRATION_RETRIES = 
conf.get(WORKER_MAX_REGISTRATION_RETRIES)
+  if (INITIAL_REGISTRATION_RETRIES > TOTAL_REGISTRATION_RETRIES) {
+logInfo(s"${WORKER_INITIAL_REGISTRATION_RETRIES.key} 
($INITIAL_REGISTRATION_RETRIES) is " +
+  s"capped by ${WORKER_MAX_REGISTRATION_RETRIES.key} 
($TOTAL_REGISTRATION_RETRIES)")
+  }
   private val FUZZ_MULTIPLIER_INTERVAL_LOWER_BOUND = 0.500
   private val REGISTRATION_RETRY_FUZZ_MULTIPLIER = {
 val randomNumberGenerator = new 
Random(UUID.randomUUID.getMostSignificantBits)
diff --git a/core/src/main/scala/org/apache/spark/internal/config/Worker.scala 
b/core/src/main/scala/org/apache/spark/internal/config/Worker.scala
index f160470edd8f..c53e181df002 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/Worker.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/Worker.scala
@@ -37,6 +37,23 @@ private[spark] object Worker {
 .longConf
 .createWithDefault(60)
 
+  val WORKER_INITIAL_REGISTRATION_RETRIES = 
ConfigBuilder("spark.worker.initialRegistrationRetries")
+.version("4.0.0")
+.internal()
+.doc("The number of retries to reconnect in short intervals (between 5 and 
15 seconds).")
+.intConf
+.checkValue(_ > 0, "The number of initial registration retries should be 
positive")
+.createWithDefault(6)
+
+  val WORKER_MAX_REGISTRATION_RETRIES = 
ConfigBuilder("spark.worker.maxRegistrationRetries")
+.version("4.0.0")
+.internal()
+.doc("The max number of retries to reconnect. After 
spark.worker.initialRegistrationRetries " +
+  "attempts

(spark) branch master updated: [SPARK-46303][PS][TESTS] Remove unused code in `pyspark.pandas.tests.series.* `

2023-12-07 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 12c075c9d6cb [SPARK-46303][PS][TESTS] Remove unused code in 
`pyspark.pandas.tests.series.* `
12c075c9d6cb is described below

commit 12c075c9d6cb254e31a003c554d8e2dbce0ef6a9
Author: Ruifeng Zheng 
AuthorDate: Thu Dec 7 16:56:54 2023 +0800

[SPARK-46303][PS][TESTS] Remove unused code in 
`pyspark.pandas.tests.series.* `

### What changes were proposed in this pull request?
 Remove unused code in `pyspark.pandas.tests.series.* `

### Why are the changes needed?
clean up the code

### Does this PR introduce _any_ user-facing change?
no, test-only

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #44232 from zhengruifeng/ps_series_cleanup.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/pandas/tests/series/test_all_any.py  | 8 
 python/pyspark/pandas/tests/series/test_arg_ops.py  | 8 
 python/pyspark/pandas/tests/series/test_as_of.py| 8 
 python/pyspark/pandas/tests/series/test_as_type.py  | 8 
 python/pyspark/pandas/tests/series/test_compute.py  | 8 
 python/pyspark/pandas/tests/series/test_cumulative.py   | 8 
 python/pyspark/pandas/tests/series/test_index.py| 8 
 python/pyspark/pandas/tests/series/test_missing_data.py | 8 
 python/pyspark/pandas/tests/series/test_sort.py | 8 
 python/pyspark/pandas/tests/series/test_stat.py | 8 
 10 files changed, 80 deletions(-)

diff --git a/python/pyspark/pandas/tests/series/test_all_any.py 
b/python/pyspark/pandas/tests/series/test_all_any.py
index 6663675c6b9b..52ef5a26f4df 100644
--- a/python/pyspark/pandas/tests/series/test_all_any.py
+++ b/python/pyspark/pandas/tests/series/test_all_any.py
@@ -25,14 +25,6 @@ from pyspark.testing.sqlutils import SQLTestUtils
 
 
 class SeriesAllAnyMixin:
-@property
-def pser(self):
-return pd.Series([1, 2, 3, 4, 5, 6, 7], name="x")
-
-@property
-def psser(self):
-return ps.from_pandas(self.pser)
-
 def test_all(self):
 for pser in [
 pd.Series([True, True], name="x"),
diff --git a/python/pyspark/pandas/tests/series/test_arg_ops.py 
b/python/pyspark/pandas/tests/series/test_arg_ops.py
index 5b1aa246246a..134296462c1b 100644
--- a/python/pyspark/pandas/tests/series/test_arg_ops.py
+++ b/python/pyspark/pandas/tests/series/test_arg_ops.py
@@ -25,14 +25,6 @@ from pyspark.testing.sqlutils import SQLTestUtils
 
 
 class SeriesArgOpsMixin:
-@property
-def pser(self):
-return pd.Series([1, 2, 3, 4, 5, 6, 7], name="x")
-
-@property
-def psser(self):
-return ps.from_pandas(self.pser)
-
 def test_argsort(self):
 # Without null values
 pser = pd.Series([0, -100, 50, 100, 20], index=["A", "B", "C", "D", 
"E"])
diff --git a/python/pyspark/pandas/tests/series/test_as_of.py 
b/python/pyspark/pandas/tests/series/test_as_of.py
index 552176ad656d..ad3e2522b652 100644
--- a/python/pyspark/pandas/tests/series/test_as_of.py
+++ b/python/pyspark/pandas/tests/series/test_as_of.py
@@ -25,14 +25,6 @@ from pyspark.testing.sqlutils import SQLTestUtils
 
 
 class SeriesAsOfMixin:
-@property
-def pser(self):
-return pd.Series([1, 2, 3, 4, 5, 6, 7], name="x")
-
-@property
-def psser(self):
-return ps.from_pandas(self.pser)
-
 def test_asof(self):
 pser = pd.Series([1, 2, np.nan, 4], index=[10, 20, 30, 40], 
name="Koalas")
 psser = ps.from_pandas(pser)
diff --git a/python/pyspark/pandas/tests/series/test_as_type.py 
b/python/pyspark/pandas/tests/series/test_as_type.py
index 2f66d19d63fc..70352c34879f 100644
--- a/python/pyspark/pandas/tests/series/test_as_type.py
+++ b/python/pyspark/pandas/tests/series/test_as_type.py
@@ -30,14 +30,6 @@ from pyspark.pandas.typedef.typehints import (
 
 
 class SeriesAsTypeMixin:
-@property
-def pser(self):
-return pd.Series([1, 2, 3, 4, 5, 6, 7], name="x")
-
-@property
-def psser(self):
-return ps.from_pandas(self.pser)
-
 def test_astype(self):
 psers = [pd.Series([10, 20, 15, 30, 45], name="x")]
 
diff --git a/python/pyspark/pandas/tests/series/test_compute.py 
b/python/pyspark/pandas/tests/series/test_compute.py
index 9e48de893a13..05cd42fe4ed1 100644
--- a/python/pyspark/pandas/tests/series/test_compute.py
+++ b/python/pyspark/pandas/tests/series/test_compute.py
@@ -27,14 +27,6 @@ from pyspark.testing.sqlutils import SQLTestUtils
 
 
 class SeriesComputeMixin:
-@property
-def pser(self):
-return pd.Series([1, 2, 3, 4, 5, 6, 7], name="x")
-
-@property
-