This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 2f2abd5584e0 [SPARK-53719][SQL][PYTHON][CONNECT] Enhance type checking 
in `_to_col` function
2f2abd5584e0 is described below

commit 2f2abd5584e0c50eb32edcf3d8bd1c859088fe8c
Author: Yicong-Huang <[email protected]>
AuthorDate: Fri Sep 26 08:34:14 2025 +0800

    [SPARK-53719][SQL][PYTHON][CONNECT] Enhance type checking in `_to_col` 
function
    
    ### What changes were proposed in this pull request?
    
    Change the `AssertionError` in `pyspark.sql.connect._to_col` to raise 
`PySparkTypeError`.
    
    ### Why are the changes needed?
    
    Currently, the
    `pyspark.sql.connect._invoke_function_over_columns` raises Assertion error 
when a None value is inputted. Meanwhile 
`pyspark.sql.functions._invoke_function_over_columns` raises `PySparkTypeError` 
instead. We want to align them and use `PySparkValueError` for both cases.
    
    The root cause is that `_to_col` method is checking for None values and 
raising AssertionError.
    
    ### Does this PR introduce _any_ user-facing change?
    
    User would see `PySparkTypeError` with more explicit error message instead 
of a general `AssertionError`.
    
    ### How was this patch tested?
    
    Existing unit tests are already covering this functionality.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #52459 from Yicong-Huang/SPARK-53719/fix/use-pyspark-value-error.
    
    Authored-by: Yicong-Huang <[email protected]>
    Signed-off-by: Ruifeng Zheng <[email protected]>
---
 python/pyspark/sql/connect/functions/builtin.py | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/connect/functions/builtin.py 
b/python/pyspark/sql/connect/functions/builtin.py
index 391469765f62..71865816b49a 100644
--- a/python/pyspark/sql/connect/functions/builtin.py
+++ b/python/pyspark/sql/connect/functions/builtin.py
@@ -87,8 +87,15 @@ if TYPE_CHECKING:
 
 
 def _to_col(col: "ColumnOrName") -> Column:
-    assert isinstance(col, (Column, str))
-    return col if isinstance(col, Column) else column(col)
+    if isinstance(col, Column):
+        return col
+    elif isinstance(col, str):
+        return column(col)
+    else:
+        raise PySparkTypeError(
+            errorClass="NOT_COLUMN_OR_STR",
+            messageParameters={"arg_name": "col", "arg_type": 
type(col).__name__},
+        )
 
 
 def _sort_col(col: "ColumnOrName") -> Column:


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to