[jira] [Commented] (SPARK-18484) case class datasets - ability to specify decimal precision and scale

Arkadiusz Bicz (JIRA) Wed, 21 Jun 2017 05:46:40 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-18484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057410#comment-16057410
 ]


Arkadiusz Bicz commented on SPARK-18484:
----------------------------------------

Usage of DecimalType should be avoided with this implementation, as there are 
so many issues with it. From my experience you will never know which precision 
you will end up in parquet file, and if you have different precision from 
different files parquet in one dir, it is not readable by spark.

> case class datasets - ability to specify decimal precision and scale
> --------------------------------------------------------------------
>
>                 Key: SPARK-18484
>                 URL: https://issues.apache.org/jira/browse/SPARK-18484
>             Project: Spark
>          Issue Type: Improvement
>    Affects Versions: 2.0.0, 2.0.1
>            Reporter: Damian Momot
>
> Currently when using decimal type (BigDecimal in scala case class) there's no 
> way to enforce precision and scale. This is quite critical when saving data - 
> regarding space usage and compatibility with external systems (for example 
> Hive table) because spark saves data as Decimal(38,18)
> {code}
> case class TestClass(id: String, money: BigDecimal)
> val testDs = spark.createDataset(Seq(
>   TestClass("1", BigDecimal("22.50")),
>   TestClass("2", BigDecimal("500.66"))
> ))
> testDs.printSchema()
> {code}
> {code}
> root
>  |-- id: string (nullable = true)
>  |-- money: decimal(38,18) (nullable = true)
> {code}
> Workaround is to convert dataset to dataframe before saving and manually cast 
> to specific decimal scale/precision:
> {code}
> import org.apache.spark.sql.types.DecimalType
> val testDf = testDs.toDF()
> testDf
>   .withColumn("money", testDf("money").cast(DecimalType(10,2)))
>   .printSchema()
> {code}
> {code}
> root
>  |-- id: string (nullable = true)
>  |-- money: decimal(10,2) (nullable = true)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18484) case class datasets - ability to specify decimal precision and scale

Reply via email to