viirya commented on code in PR #43781: URL: https://github.com/apache/spark/pull/43781#discussion_r1393383832
########## docs/sql-ref-ansi-compliance.md: ########## @@ -240,6 +240,25 @@ The least common type resolution is used to: - Derive the result type for expressions such as the case expression. - Derive the element, key, or value types for array and map constructors. Special rules are applied if the least common type resolves to FLOAT. With float type values, if any of the types is INT, BIGINT, or DECIMAL the least common type is pushed to DOUBLE to avoid potential loss of digits. + +Decimal type is a bit more complicated here, as it's not a simple type but has parameters: precision and scale. +A `decimal(precision, scale)` means the value can has at most `precision - scale` digits in the integral part and `scale` digits in the fractional part. +A least common type between decimal types should have enough digits in both integral and fractional parts to represent all values. +More precisely, a least common type between `decimal(p1, s1)` and `decimal(p2, s2)` has the scale of `max(s1, s2)` and precision of `max(s1, s2) + max(p1 - s1, p2 - s2)`. +However, decimal types in Spark has a maximum precision: 38. If the final decimal type needs more precision, we must do truncation. +Since the digits in the integral part are more significant, Spark truncates the digits in the fractional part first. For example, `decimal(48, 20)` will be reduced to `decimal(38, 10)`. + +Note, arithmetic operations have special rules to calculate the least common type for decimal inputs: + +| Operation | Result precision | Result scale | +|------------|------------------------------------------|---------------------| +| e1 + e2 | max(s1, s2) + max(p1 - s1, p2 - s2) + 1 | max(s1, s2) | +| e1 - e2 | max(s1, s2) + max(p1 - s1, p2 - s2) + 1 | max(s1, s2) | +| e1 * e2 | p1 + p2 + 1 | s1 + s2 | +| e1 / e2 | p1 - s1 + s2 + max(6, s1 + p2 + 1) | max(6, s1 + p2 + 1) | +| e1 % e2 | min(p1 - s1, p2 - s2) + max(s1, s2) | max(s1, s2) | + +The truncation rule is also different for arithmetic operations: they retain at least 6 digits in the fractional part, which means we can only reduce `scale` to 6. Review Comment: Should we mention what happens if we cannot truncate fractional part to make it fit into maximum precision? Overflow? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org