[ https://issues.apache.org/jira/browse/SPARK-33415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-33415: ------------------------------------ Assignee: Apache Spark > Column.__repr__ shouldn't encode JVM response > --------------------------------------------- > > Key: SPARK-33415 > URL: https://issues.apache.org/jira/browse/SPARK-33415 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL > Affects Versions: 3.1.0 > Reporter: Maciej Szymkiewicz > Assignee: Apache Spark > Priority: Minor > > At the moment PySpark {{Column}} {{encodes}} JVM response in {{__repr__}} > method. > As a result, column names using only ASCII characters get {{b}} prefix > {code:python} > >>> from pyspark.sql.functions import col > >>> > >>> > >>> col("abc") > >>> > >>> > Column<b'abc'> > {code} > and the others ugly byte string > {code:python} > >>> col("wąż") > >>> > >>> > Column<b'w\xc4\x85\xc5\xbc'> > {code} > This behaviour is inconsistent with other parts of the API, for example: > {code:python} > >>> spark.createDataFrame([], "`wąż` long") > >>> > >>> > DataFrame[wąż: bigint] > {code} > and Scala > {code:scala} > scala> col("wąż") > res0: org.apache.spark.sql.Column = wąż > {code} > and R > {code:r} > > column("wąż") > Column wąż > {code} > Encoding has been originally introduced with SPARK-5859, but it doesn't seem > like it is really required. > Desired behaviour > {code:python} > >>> col("wąż") > >>> > >>> > Column<'wąż'> > {code} > or > {code:python} > >>> col("wąż") > >>> > >>> > Column<wąż> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org