[jira] [Updated] (SPARK-40032) Support Decimal128 type

jiaan.geng (Jira) Wed, 10 Aug 2022 00:51:08 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-40032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


jiaan.geng updated SPARK-40032:
-------------------------------
    Description: 
Spark SQL today supports the DECIMAL data type. The implementation of Decimal 
that can hold a BigDecimal or Long.  Decimal provides some operators like +, -, 
*, / and so on.
Take the + as example, the implementation show below.

{code:java}
  def + (that: Decimal): Decimal = {
    if (decimalVal.eq(null) && that.decimalVal.eq(null) && scale == that.scale) 
{
      Decimal(longVal + that.longVal, Math.max(precision, that.precision) + 1, 
scale)
    } else {
      Decimal(toBigDecimal.bigDecimal.add(that.toBigDecimal.bigDecimal))
    }
  }
{code}

We can see there exists two addition and call Decimal.apply. The add operator 
of BigDecimal will construct a new BigDecimal instance.
The implementation of Decimal.apply will call new to construct a new Decimal 
instance with the new BigDecimal instance.
As we know, Decimal instance will hold the new BigDecimal instance.
If a large table has a Decimal field called 'colA, the execution of SUM('colA) 
will involve the creation of a large number of Decimal instances and BigDecimal 
instances. These Decimal instances and BigDecimal instances will lead to 
garbage collection to occur frequently.

Decimal128 is a high-performance decimal about 8X more efficient than Java 
BigDecimal for typical operations. It uses a finite (128 bit) precision and can 
handle up to decimal(38, X). It is also "mutable" so you can change the 
contents of an existing object. This helps reduce the cost of new() and garbage 
collection.

In this new feature, we will introduce DECIMAL128 to accelerate decimal 
calculation.

h3. Milestone 1 – Spark Decimal equivalency ( The new Decimal type Decimal128 
meets or exceeds all function of the existing SQL Decimal):
Add a new DataType implementation for TimestampWithoutTZ.
Support TimestampWithoutTZ in Dataset/UDF.
TimestampWithoutTZ literals
TimestampWithoutTZ arithmetic(e.g. TimestampWithoutTZ - TimestampWithoutTZ, 
TimestampWithoutTZ - Date)
Datetime functions/operators: dayofweek, weekofyear, year, etc
Cast to and from TimestampWithoutTZ, cast String/Timestamp to 
TimestampWithoutTZ, cast TimestampWithoutTZ to string (pretty 
printing)/Timestamp, with the SQL syntax to specify the types
Support sorting TimestampWithoutTZ.

  was:
Spark SQL today supports the DECIMAL data type. The implementation of Decimal 
that can hold a BigDecimal or Long.  Decimal provides some operators like +, -, 
*, / and so on.
Take the + as example, the implementation show below.

{code:java}
  def + (that: Decimal): Decimal = {
    if (decimalVal.eq(null) && that.decimalVal.eq(null) && scale == that.scale) 
{
      Decimal(longVal + that.longVal, Math.max(precision, that.precision) + 1, 
scale)
    } else {
      Decimal(toBigDecimal.bigDecimal.add(that.toBigDecimal.bigDecimal))
    }
  }
{code}

We can see there exists two addition and call Decimal.apply. The add operator 
of BigDecimal will construct a new BigDecimal instance.
The implementation of Decimal.apply will call new to construct a new Decimal 
instance with the new BigDecimal instance.
As we know, Decimal instance will hold the new BigDecimal instance.
If a large table has a Decimal field called 'colA, the execution of SUM('colA) 
will involve the creation of a large number of Decimal instances and BigDecimal 
instances. These Decimal instances and BigDecimal instances will lead to 
garbage collection to occur frequently.

Decimal128 is a high-performance decimal about 8X more efficient than Java 
BigDecimal for typical operations. It uses a finite (128 bit) precision and can 
handle up to decimal(38, X). It is also "mutable" so you can change the 
contents of an existing object. This helps reduce the cost of new() and garbage 
collection.

In this new feature, we will introduce DECIMAL128 to accelerate decimal 
calculation.

h3. Milestone 1 – Spark Timestamp equivalency ( The new Timestamp type 
TimestampWithoutTZ meets or exceeds all function of the existing SQL Timestamp):
Add a new DataType implementation for TimestampWithoutTZ.
Support TimestampWithoutTZ in Dataset/UDF.
TimestampWithoutTZ literals
TimestampWithoutTZ arithmetic(e.g. TimestampWithoutTZ - TimestampWithoutTZ, 
TimestampWithoutTZ - Date)
Datetime functions/operators: dayofweek, weekofyear, year, etc
Cast to and from TimestampWithoutTZ, cast String/Timestamp to 
TimestampWithoutTZ, cast TimestampWithoutTZ to string (pretty 
printing)/Timestamp, with the SQL syntax to specify the types
Support sorting TimestampWithoutTZ.


> Support Decimal128 type
> -----------------------
>
>                 Key: SPARK-40032
>                 URL: https://issues.apache.org/jira/browse/SPARK-40032
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: jiaan.geng
>            Priority: Major
>
> Spark SQL today supports the DECIMAL data type. The implementation of Decimal 
> that can hold a BigDecimal or Long.  Decimal provides some operators like +, 
> -, *, / and so on.
> Take the + as example, the implementation show below.
> {code:java}
>   def + (that: Decimal): Decimal = {
>     if (decimalVal.eq(null) && that.decimalVal.eq(null) && scale == 
> that.scale) {
>       Decimal(longVal + that.longVal, Math.max(precision, that.precision) + 
> 1, scale)
>     } else {
>       Decimal(toBigDecimal.bigDecimal.add(that.toBigDecimal.bigDecimal))
>     }
>   }
> {code}
> We can see there exists two addition and call Decimal.apply. The add operator 
> of BigDecimal will construct a new BigDecimal instance.
> The implementation of Decimal.apply will call new to construct a new Decimal 
> instance with the new BigDecimal instance.
> As we know, Decimal instance will hold the new BigDecimal instance.
> If a large table has a Decimal field called 'colA, the execution of 
> SUM('colA) will involve the creation of a large number of Decimal instances 
> and BigDecimal instances. These Decimal instances and BigDecimal instances 
> will lead to garbage collection to occur frequently.
> Decimal128 is a high-performance decimal about 8X more efficient than Java 
> BigDecimal for typical operations. It uses a finite (128 bit) precision and 
> can handle up to decimal(38, X). It is also "mutable" so you can change the 
> contents of an existing object. This helps reduce the cost of new() and 
> garbage collection.
> In this new feature, we will introduce DECIMAL128 to accelerate decimal 
> calculation.
> h3. Milestone 1 – Spark Decimal equivalency ( The new Decimal type Decimal128 
> meets or exceeds all function of the existing SQL Decimal):
> Add a new DataType implementation for TimestampWithoutTZ.
> Support TimestampWithoutTZ in Dataset/UDF.
> TimestampWithoutTZ literals
> TimestampWithoutTZ arithmetic(e.g. TimestampWithoutTZ - TimestampWithoutTZ, 
> TimestampWithoutTZ - Date)
> Datetime functions/operators: dayofweek, weekofyear, year, etc
> Cast to and from TimestampWithoutTZ, cast String/Timestamp to 
> TimestampWithoutTZ, cast TimestampWithoutTZ to string (pretty 
> printing)/Timestamp, with the SQL syntax to specify the types
> Support sorting TimestampWithoutTZ.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40032) Support Decimal128 type

Reply via email to