[jira] [Assigned] (SPARK-34314) Wrong discovered partition value

2021-02-19 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34314:
---

Assignee: Maxim Gekk

> Wrong discovered partition value
> 
>
> Key: SPARK-34314
> URL: https://issues.apache.org/jira/browse/SPARK-34314
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> The example below portraits the issue:
> {code:scala}
>   val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part")
>   df.write
> .partitionBy("part")
> .format("parquet")
> .save(path)
>   val readback = spark.read.parquet(path)
>   readback.printSchema()
>   readback.show(false)
> {code}
> It write the partition value as string:
> {code}
> /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d
> ├── _SUCCESS
> ├── part=-0
> │   └── part-1-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> └── part=AA
> └── part-0-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> {code}
> *"-0"* and "AA".
> but when Spark reads data back, it transforms "-0" to "0"
> {code}
> root
>  |-- id: integer (nullable = true)
>  |-- part: string (nullable = true)
> +---++
> |id |part|
> +---++
> |0  |AA  |
> |1  |0   |
> +---++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34314) Wrong discovered partition value

2021-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34314:


Assignee: Apache Spark

> Wrong discovered partition value
> 
>
> Key: SPARK-34314
> URL: https://issues.apache.org/jira/browse/SPARK-34314
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The example below portraits the issue:
> {code:scala}
>   val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part")
>   df.write
> .partitionBy("part")
> .format("parquet")
> .save(path)
>   val readback = spark.read.parquet(path)
>   readback.printSchema()
>   readback.show(false)
> {code}
> It write the partition value as string:
> {code}
> /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d
> ├── _SUCCESS
> ├── part=-0
> │   └── part-1-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> └── part=AA
> └── part-0-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> {code}
> *"-0"* and "AA".
> but when Spark reads data back, it transforms "-0" to "0"
> {code}
> root
>  |-- id: integer (nullable = true)
>  |-- part: string (nullable = true)
> +---++
> |id |part|
> +---++
> |0  |AA  |
> |1  |0   |
> +---++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34314) Wrong discovered partition value

2021-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34314:


Assignee: (was: Apache Spark)

> Wrong discovered partition value
> 
>
> Key: SPARK-34314
> URL: https://issues.apache.org/jira/browse/SPARK-34314
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The example below portraits the issue:
> {code:scala}
>   val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part")
>   df.write
> .partitionBy("part")
> .format("parquet")
> .save(path)
>   val readback = spark.read.parquet(path)
>   readback.printSchema()
>   readback.show(false)
> {code}
> It write the partition value as string:
> {code}
> /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d
> ├── _SUCCESS
> ├── part=-0
> │   └── part-1-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> └── part=AA
> └── part-0-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> {code}
> *"-0"* and "AA".
> but when Spark reads data back, it transforms "-0" to "0"
> {code}
> root
>  |-- id: integer (nullable = true)
>  |-- part: string (nullable = true)
> +---++
> |id |part|
> +---++
> |0  |AA  |
> |1  |0   |
> +---++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org