[ https://issues.apache.org/jira/browse/SPARK-20367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976541#comment-15976541 ]
Apache Spark commented on SPARK-20367: -------------------------------------- User 'juliuszsompolski' has created a pull request for this issue: https://github.com/apache/spark/pull/17703 > Spark silently escapes partition column names > --------------------------------------------- > > Key: SPARK-20367 > URL: https://issues.apache.org/jira/browse/SPARK-20367 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0, 2.2.0 > Reporter: Juliusz Sompolski > Priority: Minor > > CSV files can have arbitrary column names: > {code} > scala> spark.range(1).select(col("id").as("Column?"), > col("id")).write.option("header", true).csv("/tmp/foo") > scala> spark.read.option("header", true).csv("/tmp/foo").schema > res1: org.apache.spark.sql.types.StructType = > StructType(StructField(Column?,StringType,true), > StructField(id,StringType,true)) > {code} > However, once a column with characters like "?" in the name gets used in a > partitioning column, the column name gets silently escaped, and reading the > schema information back renders the column name with "?" turned into "%3F": > {code} > scala> spark.range(1).select(col("id").as("Column?"), > col("id")).write.partitionBy("Column?").option("header", true).csv("/tmp/bar") > scala> spark.read.option("header", true).csv("/tmp/bar").schema > res3: org.apache.spark.sql.types.StructType = > StructType(StructField(id,StringType,true), > StructField(Column%3F,IntegerType,true)) > {code} > The same happens for other formats, but I encountered it working with CSV, > since these more often contain ugly schemas... > Not sure if it's a bug or a feature, but it might be more intuitive to fail > queries with invalid characters in the partitioning column name, rather than > silently escaping the name? -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org