Nikola Mandic created SPARK-47211: ------------------------------------- Summary: Fix ignored PySpark Connect string collation Key: SPARK-47211 URL: https://issues.apache.org/jira/browse/SPARK-47211 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Nikola Mandic Fix For: 4.0.0
When using Connect with PySpark, string collation silently gets dropped: {code:java} Client connected to the Spark Connect server at localhost SparkSession available as 'spark'. >>> spark.sql("select 'abc' collate 'UNICODE'") DataFrame[collate(abc): string] >>> from pyspark.sql.types import StructType, StringType, StructField >>> spark.createDataFrame([], StructType([StructField('id', StringType(2))])) DataFrame[id: string] {code} Instead of "string" type in dataframe, we should be seeing "string COLLATE 'UNICODE'". -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org