Re: [PR] [SPARK-45905][SQL] Least common type between decimal types should retain integral digits first [spark]

via GitHub Tue, 14 Nov 2023 14:25:08 -0800


viirya commented on code in PR #43781:
URL: https://github.com/apache/spark/pull/43781#discussion_r1393383832



##########
docs/sql-ref-ansi-compliance.md:
##########
@@ -240,6 +240,25 @@ The least common type resolution is used to:
 - Derive the result type for expressions such as the case expression.
 - Derive the element, key, or value types for array and map constructors.
 Special rules are applied if the least common type resolves to FLOAT. With 
float type values, if any of the types is INT, BIGINT, or DECIMAL the least 
common type is pushed to DOUBLE to avoid potential loss of digits.
+
+Decimal type is a bit more complicated here, as it's not a simple type but has 
parameters: precision and scale.
+A `decimal(precision, scale)` means the value can has at most `precision - 
scale` digits in the integral part and `scale` digits in the fractional part.
+A least common type between decimal types should have enough digits in both 
integral and fractional parts to represent all values.
+More precisely, a least common type between `decimal(p1, s1)` and `decimal(p2, 
s2)` has the scale of `max(s1, s2)` and precision of `max(s1, s2) + max(p1 - 
s1, p2 - s2)`.
+However, decimal types in Spark has a maximum precision: 38. If the final 
decimal type needs more precision, we must do truncation.
+Since the digits in the integral part are more significant, Spark truncates 
the digits in the fractional part first. For example, `decimal(48, 20)` will be 
reduced to `decimal(38, 10)`.
+
+Note, arithmetic operations have special rules to calculate the least common 
type for decimal inputs:
+
+| Operation  | Result precision                         | Result scale        |
+|------------|------------------------------------------|---------------------|
+| e1 + e2    | max(s1, s2) + max(p1 - s1, p2 - s2) + 1  | max(s1, s2)         |
+| e1 - e2    | max(s1, s2) + max(p1 - s1, p2 - s2) + 1 | max(s1, s2)         |
+| e1 * e2    | p1 + p2 + 1                             | s1 + s2             |
+| e1 / e2    | p1 - s1 + s2 + max(6, s1 + p2 + 1)       | max(6, s1 + p2 + 1) |
+| e1 % e2    | min(p1 - s1, p2 - s2) + max(s1, s2)      | max(s1, s2)         |
+
+The truncation rule is also different for arithmetic operations: they retain 
at least 6 digits in the fractional part, which means we can only reduce 
`scale` to 6.

Review Comment:
   Should we mention what happens if we cannot truncate fractional part to make 
it fit into maximum precision? Overflow?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45905][SQL] Least common type between decimal types should retain integral digits first [spark]

Reply via email to