[ https://issues.apache.org/jira/browse/SPARK-47211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-47211. ---------------------------------- Resolution: Fixed Issue resolved by pull request 45316 [https://github.com/apache/spark/pull/45316] > Fix ignored PySpark Connect string collation > -------------------------------------------- > > Key: SPARK-47211 > URL: https://issues.apache.org/jira/browse/SPARK-47211 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark > Affects Versions: 4.0.0 > Reporter: Nikola Mandic > Assignee: Nikola Mandic > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When using Connect with PySpark, string collation silently gets dropped: > {code:java} > Client connected to the Spark Connect server at localhost > SparkSession available as 'spark'. > >>> spark.sql("select 'abc' collate 'UNICODE'") > DataFrame[collate(abc): string] > >>> from pyspark.sql.types import StructType, StringType, StructField > >>> spark.createDataFrame([], StructType([StructField('id', StringType(2))])) > DataFrame[id: string] > {code} > Instead of "string" type in dataframe, we should be seeing "string COLLATE > 'UNICODE'". -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org