zhengruifeng commented on code in PR #46045: URL: https://github.com/apache/spark/pull/46045#discussion_r1571744450
########## python/pyspark/sql/connect/functions/builtin.py: ########## @@ -2476,8 +2476,26 @@ def repeat(col: "ColumnOrName", n: Union["ColumnOrName", int]) -> Column: repeat.__doc__ = pysparkfuncs.repeat.__doc__ -def split(str: "ColumnOrName", pattern: str, limit: int = -1) -> Column: - return _invoke_function("split", _to_col(str), lit(pattern), lit(limit)) +def split( + str: "ColumnOrName", + pattern: Union[Column, str], + limit: Union["ColumnOrName", int] = -1, +) -> Column: + # work around shadowing of str in the input variable name + from builtins import str as py_str + + if isinstance(pattern, py_str): + _pattern = lit(pattern) + elif isinstance(pattern, Column): + _pattern = pattern + else: + raise PySparkTypeError( + error_class="NOT_COLUMN_OR_STR", + message_parameters={"arg_name": "pattern", "arg_type": type(pattern).__name__}, + ) + + limit = lit(limit) if isinstance(limit, int) else _to_col(limit) + return _invoke_function("split", _to_col(str), _pattern, limit) Review Comment: Only a few functions have such check, and most functions don't check the types. We might need to figure out an easy way for type checking. As to this function, let's keep it simpler for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org