[ 
https://issues.apache.org/jira/browse/SPARK-33415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33415:
------------------------------------

    Assignee: Apache Spark

> Column.__repr__ shouldn't encode JVM response
> ---------------------------------------------
>
>                 Key: SPARK-33415
>                 URL: https://issues.apache.org/jira/browse/SPARK-33415
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>    Affects Versions: 3.1.0
>            Reporter: Maciej Szymkiewicz
>            Assignee: Apache Spark
>            Priority: Minor
>
> At the moment PySpark {{Column}} {{encodes}} JVM response in {{__repr__}} 
> method.
> As a result, column names using only ASCII characters get {{b}} prefix
> {code:python}
> >>> from pyspark.sql.functions import col                                     
> >>>                                                                           
> >>>                                  
> >>> col("abc")                                                                
> >>>                                                                           
> >>>                                  
> Column<b'abc'>
> {code}
> and the others ugly byte string
> {code:python}
> >>> col("wąż")                                                                
> >>>                                                                           
> >>>                                  
> Column<b'w\xc4\x85\xc5\xbc'>
> {code}
> This behaviour is inconsistent with other parts of the API, for example:
> {code:python}
> >>> spark.createDataFrame([], "`wąż` long")                                   
> >>>                                                                           
> >>>                                  
> DataFrame[wąż: bigint]
> {code}
> and Scala
> {code:scala}
> scala> col("wąż")
> res0: org.apache.spark.sql.Column = wąż
> {code}
> and R
> {code:r}
> > column("wąż")
> Column wąż 
> {code}
> Encoding has been originally introduced with SPARK-5859, but it doesn't seem 
> like it is really required.
> Desired behaviour
> {code:python}
> >>> col("wąż")                                                                
> >>>                                                                           
> >>>                                  
> Column<'wąż'>
> {code}
> or
> {code:python}
> >>> col("wąż")                                                                
> >>>                                                                           
> >>>                                  
> Column<wąż>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to