This is an automated email from the ASF dual-hosted git repository.
ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 2f2abd5584e0 [SPARK-53719][SQL][PYTHON][CONNECT] Enhance type checking
in `_to_col` function
2f2abd5584e0 is described below
commit 2f2abd5584e0c50eb32edcf3d8bd1c859088fe8c
Author: Yicong-Huang <[email protected]>
AuthorDate: Fri Sep 26 08:34:14 2025 +0800
[SPARK-53719][SQL][PYTHON][CONNECT] Enhance type checking in `_to_col`
function
### What changes were proposed in this pull request?
Change the `AssertionError` in `pyspark.sql.connect._to_col` to raise
`PySparkTypeError`.
### Why are the changes needed?
Currently, the
`pyspark.sql.connect._invoke_function_over_columns` raises Assertion error
when a None value is inputted. Meanwhile
`pyspark.sql.functions._invoke_function_over_columns` raises `PySparkTypeError`
instead. We want to align them and use `PySparkValueError` for both cases.
The root cause is that `_to_col` method is checking for None values and
raising AssertionError.
### Does this PR introduce _any_ user-facing change?
User would see `PySparkTypeError` with more explicit error message instead
of a general `AssertionError`.
### How was this patch tested?
Existing unit tests are already covering this functionality.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #52459 from Yicong-Huang/SPARK-53719/fix/use-pyspark-value-error.
Authored-by: Yicong-Huang <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
---
python/pyspark/sql/connect/functions/builtin.py | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/python/pyspark/sql/connect/functions/builtin.py
b/python/pyspark/sql/connect/functions/builtin.py
index 391469765f62..71865816b49a 100644
--- a/python/pyspark/sql/connect/functions/builtin.py
+++ b/python/pyspark/sql/connect/functions/builtin.py
@@ -87,8 +87,15 @@ if TYPE_CHECKING:
def _to_col(col: "ColumnOrName") -> Column:
- assert isinstance(col, (Column, str))
- return col if isinstance(col, Column) else column(col)
+ if isinstance(col, Column):
+ return col
+ elif isinstance(col, str):
+ return column(col)
+ else:
+ raise PySparkTypeError(
+ errorClass="NOT_COLUMN_OR_STR",
+ messageParameters={"arg_name": "col", "arg_type":
type(col).__name__},
+ )
def _sort_col(col: "ColumnOrName") -> Column:
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]