[jira] [Updated] (SPARK-40032) Support Decimal128 type

jiaan.geng (Jira) Wed, 10 Aug 2022 01:03:04 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-40032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


jiaan.geng updated SPARK-40032:
-------------------------------
    Description: 
Spark SQL today supports the DECIMAL data type. The implementation of Decimal 
that can hold a BigDecimal or Long.  Decimal provides some operators like +, -, 
*, / and so on.
Take the + as example, the implementation show below.

{code:java}
  def + (that: Decimal): Decimal = {
    if (decimalVal.eq(null) && that.decimalVal.eq(null) && scale == that.scale) 
{
      Decimal(longVal + that.longVal, Math.max(precision, that.precision) + 1, 
scale)
    } else {
      Decimal(toBigDecimal.bigDecimal.add(that.toBigDecimal.bigDecimal))
    }
  }
{code}

We can see there exists two addition and call Decimal.apply. The add operator 
of BigDecimal will construct a new BigDecimal instance.
The implementation of Decimal.apply will call new to construct a new Decimal 
instance with the new BigDecimal instance.
As we know, Decimal instance will hold the new BigDecimal instance.
If a large table has a Decimal field called 'colA, the execution of SUM('colA) 
will involve the creation of a large number of Decimal instances and BigDecimal 
instances. These Decimal instances and BigDecimal instances will lead to 
garbage collection to occur frequently.

Decimal128 is a high-performance decimal about 8X more efficient than Java 
BigDecimal for typical operations. It uses a finite (128 bit) precision and can 
handle up to decimal(38, X). It is also "mutable" so you can change the 
contents of an existing object. This helps reduce the cost of new() and garbage 
collection.

In this new feature, we will introduce DECIMAL128 to accelerate decimal 
calculation.

h3. Milestone 1 – Spark Decimal equivalency ( The new Decimal type Decimal128 
meets or exceeds all function of the existing SQL Decimal):
* Add a new DataType implementation for Decimal128.
* Support Decimal128 in Dataset/UDF.
* Decimal128 literals
* Decimal128 arithmetic(e.g. Decimal128 + Decimal128, Decimal128 - Decimal)
* Decimal or Math functions/operators: POWER, LOG, Round, etc
* Cast to and from Decimal128, cast String/Decimal to Decimal128, cast 
Decimal128 to string (pretty printing)/Decimal, with the * * SQL syntax to 
specify the types
* Support sorting Decimal128.

h3. Milestone 2 – Persistence:
 * Ability to create tables of type Decimal128
 * Ability to write to common file formats such as Parquet and JSON.
 * INSERT, SELECT, UPDATE, MERGE
 * Discovery

h3. Milestone 3 – Client support
 * JDBC support
 * Hive Thrift server

h3. Milestone 4 – PySpark and Spark R integration
 * Python UDF can take and return Decimal128
 * DataFrame support

  was:
Spark SQL today supports the DECIMAL data type. The implementation of Decimal 
that can hold a BigDecimal or Long.  Decimal provides some operators like +, -, 
*, / and so on.
Take the + as example, the implementation show below.

{code:java}
  def + (that: Decimal): Decimal = {
    if (decimalVal.eq(null) && that.decimalVal.eq(null) && scale == that.scale) 
{
      Decimal(longVal + that.longVal, Math.max(precision, that.precision) + 1, 
scale)
    } else {
      Decimal(toBigDecimal.bigDecimal.add(that.toBigDecimal.bigDecimal))
    }
  }
{code}

We can see there exists two addition and call Decimal.apply. The add operator 
of BigDecimal will construct a new BigDecimal instance.
The implementation of Decimal.apply will call new to construct a new Decimal 
instance with the new BigDecimal instance.
As we know, Decimal instance will hold the new BigDecimal instance.
If a large table has a Decimal field called 'colA, the execution of SUM('colA) 
will involve the creation of a large number of Decimal instances and BigDecimal 
instances. These Decimal instances and BigDecimal instances will lead to 
garbage collection to occur frequently.

Decimal128 is a high-performance decimal about 8X more efficient than Java 
BigDecimal for typical operations. It uses a finite (128 bit) precision and can 
handle up to decimal(38, X). It is also "mutable" so you can change the 
contents of an existing object. This helps reduce the cost of new() and garbage 
collection.

In this new feature, we will introduce DECIMAL128 to accelerate decimal 
calculation.

h3. Milestone 1 – Spark Decimal equivalency ( The new Decimal type Decimal128 
meets or exceeds all function of the existing SQL Decimal):
* Add a new DataType implementation for Decimal128.
* Support Decimal128 in Dataset/UDF.
* Decimal128 literals
* Decimal128 arithmetic(e.g. Decimal128 + Decimal128, Decimal128 - Decimal)
* Decimal or Math functions/operators: POWER, LOG, Round, etc
* Cast to and from Decimal128, cast String/Decimal to Decimal128, cast 
Decimal128 to string (pretty printing)/Decimal, with the * * SQL syntax to 
specify the types
* Support sorting Decimal128.


> Support Decimal128 type
> -----------------------
>
>                 Key: SPARK-40032
>                 URL: https://issues.apache.org/jira/browse/SPARK-40032
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: jiaan.geng
>            Priority: Major
>
> Spark SQL today supports the DECIMAL data type. The implementation of Decimal 
> that can hold a BigDecimal or Long.  Decimal provides some operators like +, 
> -, *, / and so on.
> Take the + as example, the implementation show below.
> {code:java}
>   def + (that: Decimal): Decimal = {
>     if (decimalVal.eq(null) && that.decimalVal.eq(null) && scale == 
> that.scale) {
>       Decimal(longVal + that.longVal, Math.max(precision, that.precision) + 
> 1, scale)
>     } else {
>       Decimal(toBigDecimal.bigDecimal.add(that.toBigDecimal.bigDecimal))
>     }
>   }
> {code}
> We can see there exists two addition and call Decimal.apply. The add operator 
> of BigDecimal will construct a new BigDecimal instance.
> The implementation of Decimal.apply will call new to construct a new Decimal 
> instance with the new BigDecimal instance.
> As we know, Decimal instance will hold the new BigDecimal instance.
> If a large table has a Decimal field called 'colA, the execution of 
> SUM('colA) will involve the creation of a large number of Decimal instances 
> and BigDecimal instances. These Decimal instances and BigDecimal instances 
> will lead to garbage collection to occur frequently.
> Decimal128 is a high-performance decimal about 8X more efficient than Java 
> BigDecimal for typical operations. It uses a finite (128 bit) precision and 
> can handle up to decimal(38, X). It is also "mutable" so you can change the 
> contents of an existing object. This helps reduce the cost of new() and 
> garbage collection.
> In this new feature, we will introduce DECIMAL128 to accelerate decimal 
> calculation.
> h3. Milestone 1 – Spark Decimal equivalency ( The new Decimal type Decimal128 
> meets or exceeds all function of the existing SQL Decimal):
> * Add a new DataType implementation for Decimal128.
> * Support Decimal128 in Dataset/UDF.
> * Decimal128 literals
> * Decimal128 arithmetic(e.g. Decimal128 + Decimal128, Decimal128 - Decimal)
> * Decimal or Math functions/operators: POWER, LOG, Round, etc
> * Cast to and from Decimal128, cast String/Decimal to Decimal128, cast 
> Decimal128 to string (pretty printing)/Decimal, with the * * SQL syntax to 
> specify the types
> * Support sorting Decimal128.
> h3. Milestone 2 – Persistence:
>  * Ability to create tables of type Decimal128
>  * Ability to write to common file formats such as Parquet and JSON.
>  * INSERT, SELECT, UPDATE, MERGE
>  * Discovery
> h3. Milestone 3 – Client support
>  * JDBC support
>  * Hive Thrift server
> h3. Milestone 4 – PySpark and Spark R integration
>  * Python UDF can take and return Decimal128
>  * DataFrame support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40032) Support Decimal128 type

Reply via email to