[ 
https://issues.apache.org/jira/browse/SPARK-18484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Momot updated SPARK-18484:
---------------------------------
    Description: 
Currently when using decimal type (BigDecimal in scala case class) there's no 
way to enforce precision and scale. This is quite critical when saving data - 
regarding space usage and compatibility with external systems (for example Hive 
table) because spark saves data as Decimal(38,18)

{code}
case class TestClass(id: String, money: BigDecimal)

val testDs = spark.createDataset(Seq(
  TestClass("1", BigDecimal("22.50")),
  TestClass("2", BigDecimal("500.66"))
))

testDs.printSchema()
{code}

{code}
root
 |-- id: string (nullable = true)
 |-- money: decimal(38,18) (nullable = true)
{code}

Workaround is to convert dataset to dataframe before saving and manually cast 
to specific decimal scale/precision:

{code}
import org.apache.spark.sql.types.DecimalType
val testDf = testDs.toDF()

testDf
  .withColumn("money", testDf("money").cast(DecimalType(10,2)))
  .printSchema()
{code}

{code}
root
 |-- id: string (nullable = true)
 |-- money: decimal(10,2) (nullable = true)
{code}

  was:
Currently when using decimal type (BigDecimal in scala case class) there's no 
way to enforce precision and scale. This is quite critical when saving data - 
regarding space usage and compatibility with external systems (for example Hive 
table) because spark saves data as Decimal(38,18)

{code:scala}
val spark: SparkSession = ???

case class TestClass(id: String, money: BigDecimal)

val testDs = spark.createDataset(Seq(
  TestClass("1", BigDecimal("22.50")),
  TestClass("2", BigDecimal("500.66"))
))

testDs.printSchema()
{code}

{code}
root
 |-- id: string (nullable = true)
 |-- money: decimal(38,18) (nullable = true)
{code}

Workaround is to convert dataset to dataframe before saving and manually cast 
to specific decimal scale/precision:

{code:scala}
import org.apache.spark.sql.types.DecimalType
val testDf = testDs.toDF()

testDf
  .withColumn("money", testDf("money").cast(DecimalType(10,2)))
  .printSchema()
{code}

{code}
root
 |-- id: string (nullable = true)
 |-- money: decimal(10,2) (nullable = true)
{code}


> case class datasets - ability to specify decimal precision and scale
> --------------------------------------------------------------------
>
>                 Key: SPARK-18484
>                 URL: https://issues.apache.org/jira/browse/SPARK-18484
>             Project: Spark
>          Issue Type: Improvement
>    Affects Versions: 2.0.0, 2.0.1
>            Reporter: Damian Momot
>
> Currently when using decimal type (BigDecimal in scala case class) there's no 
> way to enforce precision and scale. This is quite critical when saving data - 
> regarding space usage and compatibility with external systems (for example 
> Hive table) because spark saves data as Decimal(38,18)
> {code}
> case class TestClass(id: String, money: BigDecimal)
> val testDs = spark.createDataset(Seq(
>   TestClass("1", BigDecimal("22.50")),
>   TestClass("2", BigDecimal("500.66"))
> ))
> testDs.printSchema()
> {code}
> {code}
> root
>  |-- id: string (nullable = true)
>  |-- money: decimal(38,18) (nullable = true)
> {code}
> Workaround is to convert dataset to dataframe before saving and manually cast 
> to specific decimal scale/precision:
> {code}
> import org.apache.spark.sql.types.DecimalType
> val testDf = testDs.toDF()
> testDf
>   .withColumn("money", testDf("money").cast(DecimalType(10,2)))
>   .printSchema()
> {code}
> {code}
> root
>  |-- id: string (nullable = true)
>  |-- money: decimal(10,2) (nullable = true)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to