Re: Creating a dataframe with decimals changes the precision and scale

Ted Yu Thu, 03 Dec 2015 10:13:04 -0800

Looks like what you observed is due to the following code in Decimal.scala :


  def set(decimal: BigDecimal, precision: Int, scale: Int): Decimal = {
    this.decimalVal = decimal.setScale(scale, ROUND_HALF_UP)
    require(
      decimalVal.precision <= precision,
      s"Decimal precision ${decimalVal.precision} exceeds max precision
$precision")

You can construct BigDecimal, use the following method on the instance:
http://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html#setScale(int,%20java.math.RoundingMode)

and pass to set(decimal: BigDecimal)

FYI

On Thu, Dec 3, 2015 at 9:46 AM, Philip Dodds <philip.do...@gmail.com> wrote:

> I'm not sure if there is a way around this just looking for advice,
>
> I create a dataframe from some decimals with a specific precision and
> scale,  then when I look at the dataframe it has defaulted the precision
> and scale back again.
>
> Is there a way to retain the precision and scale when doing a toDF()
>
> example code:
>
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext.implicits._
> import org.apache.spark.sql.types.Decimal
>
> val a = new Decimal().set(BigDecimal(50),14,4)
> val b = new Decimal().set(BigDecimal(50),14,4)
>
> val data = Seq.fill(5) {
>      (a,b)
>    }
>
> val trades = data.toDF()
>
> trades.printSchema()
>
> the result of this code would show
>
> root
>  |-- _1: decimal(38,18) (nullable = true)
>  |-- _2: decimal(38,18) (nullable = true)
>
> sqlContext: org.apache.spark.sql.SQLContext = 
> org.apache.spark.sql.SQLContext@3a1a48f7
> import sqlContext.implicits._
> import org.apache.spark.sql.types.Decimal
> a: org.apache.spark.sql.types.Decimal = 50.000000000000000000
> b: org.apache.spark.sql.types.Decimal = 50.000000000000000000
> data: Seq[(org.apache.spark.sql.types.Decimal, 
> org.apache.spark.sql.types.Decimal)] = 
> List((50.000000000000000000,50.000000000000000000), 
> (50.000000000000000000,50.000000000000000000), 
> (50.000000000000000000,50.000000000000000000), 
> (50.000000000000000000,50.000000000000000000), 
> (50.000000000000000000,50.000000000000000000))
> trades: org.apache.spark.sql.DataFrame = [_1: decimal(38,18), _2: 
> decimal(38,18)]
>
>
> Any advice would be brilliant
>
>
> Thanks
>
> P
>
>
> --
> Philip Dodds
>
> philip.do...@gmail.com
> @philipdodds
>
>

Re: Creating a dataframe with decimals changes the precision and scale

Reply via email to