allisonwang-db commented on code in PR #47253: URL: https://github.com/apache/spark/pull/47253#discussion_r1676188548
########## python/pyspark/sql/types.py: ########## @@ -194,16 +194,7 @@ def fromDDL(cls, ddl: str) -> "DataType": >>> DataType.fromDDL("b: string, a: int") StructType([StructField('b', StringType(), True), StructField('a', IntegerType(), True)]) """ - from pyspark.sql import SparkSession - from pyspark.sql.functions import udf - - # Intentionally uses SparkSession so one implementation can be shared with/without - # Spark Connect. - schema = ( - SparkSession.active().range(0).select(udf(lambda x: x, returnType=ddl)("id")).schema - ) - assert len(schema) == 1 - return schema[0].dataType + return _parse_datatype_string(ddl) Review Comment: Can we make sure the behaivor of `_parse_datatype_string` is the same as the original `fromDDL`? My concern is that this might introduce unintentional behavior change for a public API. What's the error message if we do `fromDDL(a variant)` without this change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org