CTCC1 commented on code in PR #46045:
URL: https://github.com/apache/spark/pull/46045#discussion_r1566772319


##########
python/pyspark/sql/connect/functions/builtin.py:
##########
@@ -2476,8 +2476,26 @@ def repeat(col: "ColumnOrName", n: Union["ColumnOrName", 
int]) -> Column:
 repeat.__doc__ = pysparkfuncs.repeat.__doc__
 
 
-def split(str: "ColumnOrName", pattern: str, limit: int = -1) -> Column:
-    return _invoke_function("split", _to_col(str), lit(pattern), lit(limit))
+def split(
+    str: "ColumnOrName",
+    pattern: Union[Column, str],
+    limit: Union["ColumnOrName", int] = -1,
+) -> Column:
+    # work around shadowing of str in the input variable name
+    from builtins import str as py_str
+
+    if isinstance(pattern, py_str):
+        _pattern = lit(pattern)
+    elif isinstance(pattern, Column):
+        _pattern = pattern
+    else:
+        raise PySparkTypeError(
+            error_class="NOT_COLUMN_OR_STR",
+            message_parameters={"arg_name": "pattern", "arg_type": 
type(pattern).__name__},
+        )
+
+    limit = lit(limit) if isinstance(limit, int) else _to_col(limit)
+    return _invoke_function("split", _to_col(str), _pattern, limit)

Review Comment:
   Thanks for the suggestion, this is simpler for sure! The only concern is 
that we will not raise `PySparkTypeError` if `pattern` is passed in for a type 
other than `Column` or `str`, and it will form a UnresolvedFunction. Is raising 
such error early a requirement for connect?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to