Nikola Mandic created SPARK-47211:
-------------------------------------

             Summary: Fix ignored PySpark Connect string collation
                 Key: SPARK-47211
                 URL: https://issues.apache.org/jira/browse/SPARK-47211
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 4.0.0
            Reporter: Nikola Mandic
             Fix For: 4.0.0


When using Connect with PySpark, string collation silently gets dropped:
{code:java}
Client connected to the Spark Connect server at localhost
SparkSession available as 'spark'.
>>> spark.sql("select 'abc' collate 'UNICODE'")
DataFrame[collate(abc): string]
>>> from pyspark.sql.types import StructType, StringType, StructField
>>> spark.createDataFrame([], StructType([StructField('id', StringType(2))]))
DataFrame[id: string]
{code}
Instead of "string" type in dataframe, we should be seeing "string COLLATE 
'UNICODE'".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to