[ https://issues.apache.org/jira/browse/SPARK-40032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
jiaan.geng updated SPARK-40032: ------------------------------- Description: Spark SQL today supports the DECIMAL data type. The implementation of Decimal that can hold a BigDecimal or Long. Decimal provides some operators like +, -, *, / and so on. Take the + as example, the implementation show below. {code:java} def + (that: Decimal): Decimal = { if (decimalVal.eq(null) && that.decimalVal.eq(null) && scale == that.scale) { Decimal(longVal + that.longVal, Math.max(precision, that.precision) + 1, scale) } else { Decimal(toBigDecimal.bigDecimal.add(that.toBigDecimal.bigDecimal)) } } {code} We can see there exists two addition and call Decimal.apply. The add operator of BigDecimal will construct a new BigDecimal instance. The implementation of Decimal.apply will call new to construct a new Decimal instance with the new BigDecimal instance. As we know, Decimal instance will hold the new BigDecimal instance. If a large table has a Decimal field called 'colA, the execution of SUM('colA) will involve the creation of a large number of Decimal instances and BigDecimal instances. These Decimal instances and BigDecimal instances will lead to garbage collection to occur frequently. Decimal128 is a high-performance decimal about 8X more efficient than Java BigDecimal for typical operations. It uses a finite (128 bit) precision and can handle up to decimal(38, X). It is also "mutable" so you can change the contents of an existing object. This helps reduce the cost of new() and garbage collection. In this new feature, we will introduce DECIMAL128 to accelerate decimal calculation. h3. Milestone 1 – Spark Decimal equivalency ( The new Decimal type Decimal128 meets or exceeds all function of the existing SQL Decimal): Add a new DataType implementation for TimestampWithoutTZ. Support TimestampWithoutTZ in Dataset/UDF. TimestampWithoutTZ literals TimestampWithoutTZ arithmetic(e.g. TimestampWithoutTZ - TimestampWithoutTZ, TimestampWithoutTZ - Date) Datetime functions/operators: dayofweek, weekofyear, year, etc Cast to and from TimestampWithoutTZ, cast String/Timestamp to TimestampWithoutTZ, cast TimestampWithoutTZ to string (pretty printing)/Timestamp, with the SQL syntax to specify the types Support sorting TimestampWithoutTZ. was: Spark SQL today supports the DECIMAL data type. The implementation of Decimal that can hold a BigDecimal or Long. Decimal provides some operators like +, -, *, / and so on. Take the + as example, the implementation show below. {code:java} def + (that: Decimal): Decimal = { if (decimalVal.eq(null) && that.decimalVal.eq(null) && scale == that.scale) { Decimal(longVal + that.longVal, Math.max(precision, that.precision) + 1, scale) } else { Decimal(toBigDecimal.bigDecimal.add(that.toBigDecimal.bigDecimal)) } } {code} We can see there exists two addition and call Decimal.apply. The add operator of BigDecimal will construct a new BigDecimal instance. The implementation of Decimal.apply will call new to construct a new Decimal instance with the new BigDecimal instance. As we know, Decimal instance will hold the new BigDecimal instance. If a large table has a Decimal field called 'colA, the execution of SUM('colA) will involve the creation of a large number of Decimal instances and BigDecimal instances. These Decimal instances and BigDecimal instances will lead to garbage collection to occur frequently. Decimal128 is a high-performance decimal about 8X more efficient than Java BigDecimal for typical operations. It uses a finite (128 bit) precision and can handle up to decimal(38, X). It is also "mutable" so you can change the contents of an existing object. This helps reduce the cost of new() and garbage collection. In this new feature, we will introduce DECIMAL128 to accelerate decimal calculation. h3. Milestone 1 – Spark Timestamp equivalency ( The new Timestamp type TimestampWithoutTZ meets or exceeds all function of the existing SQL Timestamp): Add a new DataType implementation for TimestampWithoutTZ. Support TimestampWithoutTZ in Dataset/UDF. TimestampWithoutTZ literals TimestampWithoutTZ arithmetic(e.g. TimestampWithoutTZ - TimestampWithoutTZ, TimestampWithoutTZ - Date) Datetime functions/operators: dayofweek, weekofyear, year, etc Cast to and from TimestampWithoutTZ, cast String/Timestamp to TimestampWithoutTZ, cast TimestampWithoutTZ to string (pretty printing)/Timestamp, with the SQL syntax to specify the types Support sorting TimestampWithoutTZ. > Support Decimal128 type > ----------------------- > > Key: SPARK-40032 > URL: https://issues.apache.org/jira/browse/SPARK-40032 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 3.4.0 > Reporter: jiaan.geng > Priority: Major > > Spark SQL today supports the DECIMAL data type. The implementation of Decimal > that can hold a BigDecimal or Long. Decimal provides some operators like +, > -, *, / and so on. > Take the + as example, the implementation show below. > {code:java} > def + (that: Decimal): Decimal = { > if (decimalVal.eq(null) && that.decimalVal.eq(null) && scale == > that.scale) { > Decimal(longVal + that.longVal, Math.max(precision, that.precision) + > 1, scale) > } else { > Decimal(toBigDecimal.bigDecimal.add(that.toBigDecimal.bigDecimal)) > } > } > {code} > We can see there exists two addition and call Decimal.apply. The add operator > of BigDecimal will construct a new BigDecimal instance. > The implementation of Decimal.apply will call new to construct a new Decimal > instance with the new BigDecimal instance. > As we know, Decimal instance will hold the new BigDecimal instance. > If a large table has a Decimal field called 'colA, the execution of > SUM('colA) will involve the creation of a large number of Decimal instances > and BigDecimal instances. These Decimal instances and BigDecimal instances > will lead to garbage collection to occur frequently. > Decimal128 is a high-performance decimal about 8X more efficient than Java > BigDecimal for typical operations. It uses a finite (128 bit) precision and > can handle up to decimal(38, X). It is also "mutable" so you can change the > contents of an existing object. This helps reduce the cost of new() and > garbage collection. > In this new feature, we will introduce DECIMAL128 to accelerate decimal > calculation. > h3. Milestone 1 – Spark Decimal equivalency ( The new Decimal type Decimal128 > meets or exceeds all function of the existing SQL Decimal): > Add a new DataType implementation for TimestampWithoutTZ. > Support TimestampWithoutTZ in Dataset/UDF. > TimestampWithoutTZ literals > TimestampWithoutTZ arithmetic(e.g. TimestampWithoutTZ - TimestampWithoutTZ, > TimestampWithoutTZ - Date) > Datetime functions/operators: dayofweek, weekofyear, year, etc > Cast to and from TimestampWithoutTZ, cast String/Timestamp to > TimestampWithoutTZ, cast TimestampWithoutTZ to string (pretty > printing)/Timestamp, with the SQL syntax to specify the types > Support sorting TimestampWithoutTZ. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org