[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range
[ https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643984#comment-17643984 ] Apache Arrow JIRA Bot commented on ARROW-17601: --- This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned per [project policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment]. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon. > [C++] Error when creating Expression on Decimal128 types: precision out of > range > > > Key: ARROW-17601 > URL: https://issues.apache.org/jira/browse/ARROW-17601 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Neal Richardson >Assignee: Yibo Cai >Priority: Major > > Reproducer in R: > {code} > library(arrow) > library(dplyr) > tab <- Table$create(col1 = 1:4, col2 = 5:8) > tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2))) > tab %>% mutate(col1 * col2) > # Error: Invalid: Decimal precision out of range [1, 38]: 49 > # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078 > DecimalType::Make(left_type.id(), precision, scale) > # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413 > call.kernel->signature->out_type().Resolve(_context, types) > {code} > We don't have this problem integers and floats (see comment below). For > consistency with the other arithmetic functions, what I would expect would be > that we would expand the precision as much as we could within Decimal128–in > this case, Decimal128(38, 6)–and the compute function would either error _if_ > there is an overflow (in the _checked version) or just overflow in the > non-checked version. But it wouldn't error on determining the output type. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range
[ https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601408#comment-17601408 ] Weston Pace commented on ARROW-17601: - The exact rules specified by substrait can be found here: https://github.com/substrait-io/substrait/blob/main/extensions/functions_arithmetic_decimal.yaml > [C++] Error when creating Expression on Decimal128 types: precision out of > range > > > Key: ARROW-17601 > URL: https://issues.apache.org/jira/browse/ARROW-17601 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Neal Richardson >Assignee: Yibo Cai >Priority: Major > > Reproducer in R: > {code} > library(arrow) > library(dplyr) > tab <- Table$create(col1 = 1:4, col2 = 5:8) > tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2))) > tab %>% mutate(col1 * col2) > # Error: Invalid: Decimal precision out of range [1, 38]: 49 > # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078 > DecimalType::Make(left_type.id(), precision, scale) > # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413 > call.kernel->signature->out_type().Resolve(_context, types) > {code} > We don't have this problem integers and floats (see comment below). For > consistency with the other arithmetic functions, what I would expect would be > that we would expand the precision as much as we could within Decimal128–in > this case, Decimal128(38, 6)–and the compute function would either error _if_ > there is an overflow (in the _checked version) or just overflow in the > non-checked version. But it wouldn't error on determining the output type. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range
[ https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601404#comment-17601404 ] Weston Pace commented on ARROW-17601: - I think query engines and planners really like to know the output type before the plan is not executed. Determining the output type based on the input is not something that is traditionally done (though maybe it is time to break with tradition :). That being said, we could perhaps model ourselves after Substrait (which itself, modeled its rules after Calcite which is maybe based on Spark's rules which is probably modeled after something else...) In Substrait, if a limit is hit when determining the output, then precision is dropped (instead of outright rejection). It is first dropped from the decimal places. Once the decimal places have been reduced to 6 digits it begins to drop precision from the non-decimal digits. So, by Substrait rules, MULTIPLY(DECIMAL<33, 4>, DECIMAL<15, 2>) yields DECIMAL<49, 6> which truncates to DECIMAL<38, 6> Here is how it is explained in TSQL https://docs.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql?view=sql-server-ver16 {quote} In addition and subtraction operations, we need max(p1 - s1, p2 - s2) places to store integral part of the decimal number. If there isn't enough space to store them that is, max(p1 - s1, p2 - s2) < min(38, precision) - scale, the scale is reduced to provide enough space for integral part. Resulting scale is MIN(precision, 38) - max(p1 - s1, p2 - s2), so the fractional part might be rounded to fit into the resulting scale. In multiplication and division operations, we need precision - scale places to store the integral part of the result. The scale might be reduced using the following rules: The resulting scale is reduced to min(scale, 38 - (precision-scale)) if the integral part is less than 32, because it can't be greater than 38 - (precision-scale). Result might be rounded in this case. The scale won't be changed if it's less than 6 and if the integral part is greater than 32. In this case, overflow error might be raised if it can't fit into decimal(38, scale) The scale will be set to 6 if it's greater than 6 and if the integral part is greater than 32. In this case, both integral part and scale would be reduced and resulting type is decimal(38,6). Result might be rounded to 6 decimal places or the overflow error will be thrown if the integral part can't fit into 32 digits. {quote} > [C++] Error when creating Expression on Decimal128 types: precision out of > range > > > Key: ARROW-17601 > URL: https://issues.apache.org/jira/browse/ARROW-17601 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Neal Richardson >Assignee: Yibo Cai >Priority: Major > > Reproducer in R: > {code} > library(arrow) > library(dplyr) > tab <- Table$create(col1 = 1:4, col2 = 5:8) > tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2))) > tab %>% mutate(col1 * col2) > # Error: Invalid: Decimal precision out of range [1, 38]: 49 > # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078 > DecimalType::Make(left_type.id(), precision, scale) > # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413 > call.kernel->signature->out_type().Resolve(_context, types) > {code} > We don't have this problem integers and floats (see comment below). For > consistency with the other arithmetic functions, what I would expect would be > that we would expand the precision as much as we could within Decimal128–in > this case, Decimal128(38, 6)–and the compute function would either error _if_ > there is an overflow (in the _checked version) or just overflow in the > non-checked version. But it wouldn't error on determining the output type. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range
[ https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601120#comment-17601120 ] Yibo Cai commented on ARROW-17601: -- For arrays, if we reduce the maximal possible precision, we don't know ahead if threre will be some elements lead to overflow. I think we must check every result and raise overflow error even for non-checked operation, probably not a problem as decimal operations are quite expensive already. Looks better than simply reject the input. Will look into how other dbs handle this situation. > [C++] Error when creating Expression on Decimal128 types: precision out of > range > > > Key: ARROW-17601 > URL: https://issues.apache.org/jira/browse/ARROW-17601 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Neal Richardson >Assignee: Yibo Cai >Priority: Major > > Reproducer in R: > {code} > library(arrow) > library(dplyr) > tab <- Table$create(col1 = 1:4, col2 = 5:8) > tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2))) > tab %>% mutate(col1 * col2) > # Error: Invalid: Decimal precision out of range [1, 38]: 49 > # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078 > DecimalType::Make(left_type.id(), precision, scale) > # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413 > call.kernel->signature->out_type().Resolve(_context, types) > {code} > We don't have this problem integers and floats (see comment below). For > consistency with the other arithmetic functions, what I would expect would be > that we would expand the precision as much as we could within Decimal128–in > this case, Decimal128(38, 6)–and the compute function would either error _if_ > there is an overflow (in the _checked version) or just overflow in the > non-checked version. But it wouldn't error on determining the output type. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range
[ https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600769#comment-17600769 ] Neal Richardson commented on ARROW-17601: - The trouble with the status quo is that we fail even if there is no overflow or required digits > 38: {code} > one <- Array$create(1)$cast(decimal128(33, 0)) > one Array [ 1 ] > two <- Array$create(2)$cast(decimal128(33, 0)) > one + two Array [ 3 ] > one * two Error: Invalid: Decimal precision out of range [1, 38]: 67 {code} > [C++] Error when creating Expression on Decimal128 types: precision out of > range > > > Key: ARROW-17601 > URL: https://issues.apache.org/jira/browse/ARROW-17601 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Neal Richardson >Assignee: Yibo Cai >Priority: Major > > Reproducer in R: > {code} > library(arrow) > library(dplyr) > tab <- Table$create(col1 = 1:4, col2 = 5:8) > tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2))) > tab %>% mutate(col1 * col2) > # Error: Invalid: Decimal precision out of range [1, 38]: 49 > # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078 > DecimalType::Make(left_type.id(), precision, scale) > # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413 > call.kernel->signature->out_type().Resolve(_context, types) > {code} > We don't have this problem integers and floats (see comment below). For > consistency with the other arithmetic functions, what I would expect would be > that we would expand the precision as much as we could within Decimal128–in > this case, Decimal128(38, 6)–and the compute function would either error _if_ > there is an overflow (in the _checked version) or just overflow in the > non-checked version. But it wouldn't error on determining the output type. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range
[ https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600547#comment-17600547 ] Yibo Cai commented on ARROW-17601: -- Hmm, there are some difficulties: - If the total digits of the two input decimal128 types to be multiplied is larger than 38, the output type cannot be represented by decimal128 without losing precision. It has to fail. - Decimal overlow, like floating points, should lead to {{Inf}}, whether checked or non-checked. Currently we don't support decimal {{Inf}}. > [C++] Error when creating Expression on Decimal128 types: precision out of > range > > > Key: ARROW-17601 > URL: https://issues.apache.org/jira/browse/ARROW-17601 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Neal Richardson >Assignee: Yibo Cai >Priority: Major > > Reproducer in R: > {code} > library(arrow) > library(dplyr) > tab <- Table$create(col1 = 1:4, col2 = 5:8) > tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2))) > tab %>% mutate(col1 * col2) > # Error: Invalid: Decimal precision out of range [1, 38]: 49 > # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078 > DecimalType::Make(left_type.id(), precision, scale) > # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413 > call.kernel->signature->out_type().Resolve(_context, types) > {code} > We don't have this problem integers and floats (see comment below). For > consistency with the other arithmetic functions, what I would expect would be > that we would expand the precision as much as we could within Decimal128–in > this case, Decimal128(38, 6)–and the compute function would either error _if_ > there is an overflow (in the _checked version) or just overflow in the > non-checked version. But it wouldn't error on determining the output type. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range
[ https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600137#comment-17600137 ] Yibo Cai commented on ARROW-17601: -- Will look into this issue. Thanks [~npr] for the report. > [C++] Error when creating Expression on Decimal128 types: precision out of > range > > > Key: ARROW-17601 > URL: https://issues.apache.org/jira/browse/ARROW-17601 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Neal Richardson >Priority: Major > > Reproducer in R: > {code} > library(arrow) > library(dplyr) > tab <- Table$create(col1 = 1:4, col2 = 5:8) > tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2))) > tab %>% mutate(col1 * col2) > # Error: Invalid: Decimal precision out of range [1, 38]: 49 > # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078 > DecimalType::Make(left_type.id(), precision, scale) > # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413 > call.kernel->signature->out_type().Resolve(_context, types) > {code} > We don't have this problem integers and floats (see comment below). For > consistency with the other arithmetic functions, what I would expect would be > that we would expand the precision as much as we could within Decimal128–in > this case, Decimal128(38, 6)–and the compute function would either error _if_ > there is an overflow (in the _checked version) or just overflow in the > non-checked version. But it wouldn't error on determining the output type. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range
[ https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600095#comment-17600095 ] Neal Richardson commented on ARROW-17601: - cc [~yibocai] since you added the decimal arithmetic in ARROW-12074. > [C++] Error when creating Expression on Decimal128 types: precision out of > range > > > Key: ARROW-17601 > URL: https://issues.apache.org/jira/browse/ARROW-17601 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Neal Richardson >Priority: Major > > Reproducer in R: > {code} > library(arrow) > library(dplyr) > tab <- Table$create(col1 = 1:4, col2 = 5:8) > tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2))) > tab %>% mutate(col1 * col2) > # Error: Invalid: Decimal precision out of range [1, 38]: 49 > # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078 > DecimalType::Make(left_type.id(), precision, scale) > # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413 > call.kernel->signature->out_type().Resolve(_context, types) > {code} > We don't have this problem integers and floats (see comment below). For > consistency with the other arithmetic functions, what I would expect would be > that we would expand the precision as much as we could within Decimal128–in > this case, Decimal128(38, 6)–and the compute function would either error _if_ > there is an overflow (in the _checked version) or just overflow in the > non-checked version. But it wouldn't error on determining the output type. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range
[ https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599693#comment-17599693 ] Neal Richardson commented on ARROW-17601: - Also, in case it's relevant: we don't have any problems like this with integer or floating-point types: {code} > expr <- Expression$create("multiply_checked", Expression$field_ref("col1"), > Expression$field_ref("col2")) > > expr$type(schema(col1=int64(), col2=int64())) Int64 int64 > expr$type(schema(col1=int32(), col2=int32())) Int32 int32 > expr$type(schema(col1=float32(), col2=float32())) Float32 float {code} > [C++] Error when creating Expression on Decimal128 types: precision out of > range > > > Key: ARROW-17601 > URL: https://issues.apache.org/jira/browse/ARROW-17601 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Neal Richardson >Priority: Major > > Reproducer in R: > {code} > library(arrow) > library(dplyr) > tab <- Table$create(col1 = 1:4, col2 = 5:8) > tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2))) > tab %>% mutate(col1 * col2) > # Error: Invalid: Decimal precision out of range [1, 38]: 49 > # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078 > DecimalType::Make(left_type.id(), precision, scale) > # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413 > call.kernel->signature->out_type().Resolve(_context, types) > {code} > With integers and floats, we upcast to a wider size in some compute functions > like multiplication. Should this go up to Decimal256? Or is there a better > way to determine the size required? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range
[ https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599604#comment-17599604 ] Neal Richardson commented on ARROW-17601: - Here's a reproducer just using Expressions, no dplyr: {code} expr <- Expression$create("multiply_checked", Expression$field_ref("col1"), Expression$field_ref("col2")) expr # Expression # multiply_checked(col1, col2) expr$type(schema(col1=decimal128(33, 4), col2=decimal128(15, 2))) # Error: Invalid: Decimal precision out of range [1, 38]: 49 {code} Note that the different kernels have different promotion logic. Subtraction, for example, doesn't go out of range with these types. {code} expr <- Expression$create("subtract_checked", Expression$field_ref("col1"), Expression$field_ref("col2")) expr$type(schema(col1=decimal128(33, 4), col2=decimal128(15, 2))) # Decimal128Type # decimal128(34, 4) {code} > [C++] Error when creating Expression on Decimal128 types: precision out of > range > > > Key: ARROW-17601 > URL: https://issues.apache.org/jira/browse/ARROW-17601 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Neal Richardson >Priority: Major > > Reproducer in R: > {code} > library(arrow) > library(dplyr) > tab <- Table$create(col1 = 1:4, col2 = 5:8) > tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2))) > tab %>% mutate(col1 * col2) > # Error: Invalid: Decimal precision out of range [1, 38]: 49 > # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078 > DecimalType::Make(left_type.id(), precision, scale) > # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413 > call.kernel->signature->out_type().Resolve(_context, types) > {code} > With integers and floats, we upcast to a wider size in some compute functions > like multiplication. Should this go up to Decimal256? Or is there a better > way to determine the size required? -- This message was sent by Atlassian Jira (v8.20.10#820010)