[ https://issues.apache.org/jira/browse/SPARK-34314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maxim Gekk updated SPARK-34314: ------------------------------- Affects Version/s: 3.1.0 3.0.2 2.4.8 > Wrong discovered partition value > -------------------------------- > > Key: SPARK-34314 > URL: https://issues.apache.org/jira/browse/SPARK-34314 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0 > Reporter: Maxim Gekk > Priority: Major > > The example below portraits the issue: > {code:scala} > val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part") > df.write > .partitionBy("part") > .format("parquet") > .save(path) > val readback = spark.read.parquet(path) > readback.printSchema() > readback.show(false) > {code} > It write the partition value as string: > {code} > /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tc0000gn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d > ├── _SUCCESS > ├── part=-0 > │ └── part-00001-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet > └── part=AA > └── part-00000-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet > {code} > *"-0"* and "AA". > but when Spark reads data back, it transforms "-0" to "0" > {code} > root > |-- id: integer (nullable = true) > |-- part: string (nullable = true) > +---+----+ > |id |part| > +---+----+ > |0 |AA | > |1 |0 | > +---+----+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org