[ 
https://issues.apache.org/jira/browse/SPARK-27512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821954#comment-16821954
 ] 

koert kuipers commented on SPARK-27512:
---------------------------------------

{code:bash}
$ hadoop fs -cat test.bsv
x|y
1|1,2,3
2|4,5,6
3|7,8,9

scala> data.printSchema
root
 |-- x: integer (nullable = true)
 |-- y: decimal(3,0) (nullable = true)

scala> data.show
+---+---+
|  x|  y|
+---+---+
|  1|123|
|  2|456|
|  3|789|
+---+---+
{code}

its great we can provide locale, but that also means there is a default locale. 
somehow the default locale interprets  1,2,3 is a decimal. i cannot think of 
any locale where this would be true, and if it existed it should not be our 
default.


> Decimal parsing leads to unexpected type inference
> --------------------------------------------------
>
>                 Key: SPARK-27512
>                 URL: https://issues.apache.org/jira/browse/SPARK-27512
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>         Environment: spark 3.0.0-SNAPSHOT from this commit:
> {code:bash}
> commit 3ab96d7acf870e53c9016b0b63d0b328eec23bed
> Author: Dilip Biswal <dbis...@us.ibm.com>
> Date:   Mon Apr 15 21:26:45 2019 +0800
> {code}
>            Reporter: koert kuipers
>            Priority: Minor
>
> {code:bash}
> $ hadoop fs -text test.bsv
> x|y
> 1|1,2
> 2|2,3
> 3|3,4
> {code}
> in spark 2.4.1:
> {code:bash}
> scala> val data = spark.read.format("csv").option("header", 
> true).option("delimiter", "|").option("inferSchema", true).load("test.bsv")
> scala> data.printSchema
> root
>  |-- x: integer (nullable = true)
>  |-- y: string (nullable = true)
> scala> data.show
> +---+---+
> |  x|  y|
> +---+---+
> |  1|1,2|
> |  2|2,3|
> |  3|3,4|
> +---+---+
> {code}
> in spark 3.0.0-SNAPSHOT:
> {code:bash}
> scala> val data = spark.read.format("csv").option("header", 
> true).option("delimiter", "|").option("inferSchema", true).load("test.bsv")
> scala> data.printSchema
> root
>  |-- x: integer (nullable = true)
>  |-- y: decimal(2,0) (nullable = true)
> scala> data.show
> +---+---+
> |  x|  y|
> +---+---+
> |  1| 12|
> |  2| 23|
> |  3| 34|
> +---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to