[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range

2022-12-06 Thread Apache Arrow JIRA Bot (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643984#comment-17643984
 ] 

Apache Arrow JIRA Bot commented on ARROW-17601:
---

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [C++] Error when creating Expression on Decimal128 types: precision out of 
> range
> 
>
> Key: ARROW-17601
> URL: https://issues.apache.org/jira/browse/ARROW-17601
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Yibo Cai
>Priority: Major
>
> Reproducer in R:
> {code}
> library(arrow)
> library(dplyr)
> tab <- Table$create(col1 = 1:4, col2 = 5:8)
> tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2)))
> tab %>% mutate(col1 * col2)
> # Error: Invalid: Decimal precision out of range [1, 38]: 49
> # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078  
> DecimalType::Make(left_type.id(), precision, scale)
> # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413  
> call.kernel->signature->out_type().Resolve(_context, types)
> {code}
> We don't have this problem integers and floats (see comment below). For 
> consistency with the other arithmetic functions, what I would expect would be 
> that we would expand the precision as much as we could within Decimal128–in 
> this case, Decimal128(38, 6)–and the compute function would either error _if_ 
> there is an overflow (in the _checked version) or just overflow in the 
> non-checked version. But it wouldn't error on determining the output type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range

2022-09-07 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601408#comment-17601408
 ] 

Weston Pace commented on ARROW-17601:
-

The exact rules specified by substrait can be found here: 
https://github.com/substrait-io/substrait/blob/main/extensions/functions_arithmetic_decimal.yaml

> [C++] Error when creating Expression on Decimal128 types: precision out of 
> range
> 
>
> Key: ARROW-17601
> URL: https://issues.apache.org/jira/browse/ARROW-17601
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Yibo Cai
>Priority: Major
>
> Reproducer in R:
> {code}
> library(arrow)
> library(dplyr)
> tab <- Table$create(col1 = 1:4, col2 = 5:8)
> tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2)))
> tab %>% mutate(col1 * col2)
> # Error: Invalid: Decimal precision out of range [1, 38]: 49
> # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078  
> DecimalType::Make(left_type.id(), precision, scale)
> # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413  
> call.kernel->signature->out_type().Resolve(_context, types)
> {code}
> We don't have this problem integers and floats (see comment below). For 
> consistency with the other arithmetic functions, what I would expect would be 
> that we would expand the precision as much as we could within Decimal128–in 
> this case, Decimal128(38, 6)–and the compute function would either error _if_ 
> there is an overflow (in the _checked version) or just overflow in the 
> non-checked version. But it wouldn't error on determining the output type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range

2022-09-07 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601404#comment-17601404
 ] 

Weston Pace commented on ARROW-17601:
-

I think query engines and planners really like to know the output type before 
the plan is not executed.  Determining the output type based on the input is 
not something that is traditionally done (though maybe it is time to break with 
tradition :).

That being said, we could perhaps model ourselves after Substrait (which 
itself, modeled its rules after Calcite which is maybe based on Spark's rules 
which is probably modeled after something else...)

In Substrait, if a limit is hit when determining the output, then precision is 
dropped (instead of outright rejection).  It is first dropped from the decimal 
places.  Once the decimal places have been reduced to 6 digits it begins to 
drop precision from the non-decimal digits.

So, by Substrait rules, MULTIPLY(DECIMAL<33, 4>, DECIMAL<15, 2>) yields 
DECIMAL<49, 6> which truncates to DECIMAL<38, 6>

Here is how it is explained in TSQL 
https://docs.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql?view=sql-server-ver16

{quote}
In addition and subtraction operations, we need max(p1 - s1, p2 - s2) places to 
store integral part of the decimal number. If there isn't enough space to store 
them that is, max(p1 - s1, p2 - s2) < min(38, precision) - scale, the scale is 
reduced to provide enough space for integral part. Resulting scale is 
MIN(precision, 38) - max(p1 - s1, p2 - s2), so the fractional part might be 
rounded to fit into the resulting scale.

In multiplication and division operations, we need precision - scale places to 
store the integral part of the result. The scale might be reduced using the 
following rules:

The resulting scale is reduced to min(scale, 38 - (precision-scale)) if the 
integral part is less than 32, because it can't be greater than 38 - 
(precision-scale). Result might be rounded in this case.
The scale won't be changed if it's less than 6 and if the integral part is 
greater than 32. In this case, overflow error might be raised if it can't fit 
into decimal(38, scale)
The scale will be set to 6 if it's greater than 6 and if the integral part 
is greater than 32. In this case, both integral part and scale would be reduced 
and resulting type is decimal(38,6). Result might be rounded to 6 decimal 
places or the overflow error will be thrown if the integral part can't fit into 
32 digits.
{quote}

> [C++] Error when creating Expression on Decimal128 types: precision out of 
> range
> 
>
> Key: ARROW-17601
> URL: https://issues.apache.org/jira/browse/ARROW-17601
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Yibo Cai
>Priority: Major
>
> Reproducer in R:
> {code}
> library(arrow)
> library(dplyr)
> tab <- Table$create(col1 = 1:4, col2 = 5:8)
> tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2)))
> tab %>% mutate(col1 * col2)
> # Error: Invalid: Decimal precision out of range [1, 38]: 49
> # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078  
> DecimalType::Make(left_type.id(), precision, scale)
> # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413  
> call.kernel->signature->out_type().Resolve(_context, types)
> {code}
> We don't have this problem integers and floats (see comment below). For 
> consistency with the other arithmetic functions, what I would expect would be 
> that we would expand the precision as much as we could within Decimal128–in 
> this case, Decimal128(38, 6)–and the compute function would either error _if_ 
> there is an overflow (in the _checked version) or just overflow in the 
> non-checked version. But it wouldn't error on determining the output type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range

2022-09-06 Thread Yibo Cai (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601120#comment-17601120
 ] 

Yibo Cai commented on ARROW-17601:
--

For arrays, if we reduce the maximal possible precision, we don't know ahead if 
threre will be some elements lead to overflow. I think we must check every 
result and raise overflow error even for non-checked operation, probably not a 
problem as decimal operations are quite expensive already.
Looks better than simply reject the input. Will look into how other dbs handle 
this situation.

> [C++] Error when creating Expression on Decimal128 types: precision out of 
> range
> 
>
> Key: ARROW-17601
> URL: https://issues.apache.org/jira/browse/ARROW-17601
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Yibo Cai
>Priority: Major
>
> Reproducer in R:
> {code}
> library(arrow)
> library(dplyr)
> tab <- Table$create(col1 = 1:4, col2 = 5:8)
> tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2)))
> tab %>% mutate(col1 * col2)
> # Error: Invalid: Decimal precision out of range [1, 38]: 49
> # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078  
> DecimalType::Make(left_type.id(), precision, scale)
> # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413  
> call.kernel->signature->out_type().Resolve(_context, types)
> {code}
> We don't have this problem integers and floats (see comment below). For 
> consistency with the other arithmetic functions, what I would expect would be 
> that we would expand the precision as much as we could within Decimal128–in 
> this case, Decimal128(38, 6)–and the compute function would either error _if_ 
> there is an overflow (in the _checked version) or just overflow in the 
> non-checked version. But it wouldn't error on determining the output type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range

2022-09-06 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600769#comment-17600769
 ] 

Neal Richardson commented on ARROW-17601:
-

The trouble with the status quo is that we fail even if there is no overflow or 
required digits > 38:

{code}
> one <- Array$create(1)$cast(decimal128(33, 0))
> one
Array

[
  1
]
> two <- Array$create(2)$cast(decimal128(33, 0))
> one + two
Array

[
  3
]
> one * two
Error: Invalid: Decimal precision out of range [1, 38]: 67
{code}

> [C++] Error when creating Expression on Decimal128 types: precision out of 
> range
> 
>
> Key: ARROW-17601
> URL: https://issues.apache.org/jira/browse/ARROW-17601
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Yibo Cai
>Priority: Major
>
> Reproducer in R:
> {code}
> library(arrow)
> library(dplyr)
> tab <- Table$create(col1 = 1:4, col2 = 5:8)
> tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2)))
> tab %>% mutate(col1 * col2)
> # Error: Invalid: Decimal precision out of range [1, 38]: 49
> # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078  
> DecimalType::Make(left_type.id(), precision, scale)
> # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413  
> call.kernel->signature->out_type().Resolve(_context, types)
> {code}
> We don't have this problem integers and floats (see comment below). For 
> consistency with the other arithmetic functions, what I would expect would be 
> that we would expand the precision as much as we could within Decimal128–in 
> this case, Decimal128(38, 6)–and the compute function would either error _if_ 
> there is an overflow (in the _checked version) or just overflow in the 
> non-checked version. But it wouldn't error on determining the output type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range

2022-09-05 Thread Yibo Cai (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600547#comment-17600547
 ] 

Yibo Cai commented on ARROW-17601:
--

Hmm, there are some difficulties:
- If the total digits of the two input decimal128 types to be multiplied is 
larger than 38, the output type cannot be represented by decimal128 without 
losing precision. It has to fail.
- Decimal overlow, like floating points, should lead to {{Inf}}, whether 
checked or non-checked. Currently we don't support decimal {{Inf}}.

> [C++] Error when creating Expression on Decimal128 types: precision out of 
> range
> 
>
> Key: ARROW-17601
> URL: https://issues.apache.org/jira/browse/ARROW-17601
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Yibo Cai
>Priority: Major
>
> Reproducer in R:
> {code}
> library(arrow)
> library(dplyr)
> tab <- Table$create(col1 = 1:4, col2 = 5:8)
> tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2)))
> tab %>% mutate(col1 * col2)
> # Error: Invalid: Decimal precision out of range [1, 38]: 49
> # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078  
> DecimalType::Make(left_type.id(), precision, scale)
> # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413  
> call.kernel->signature->out_type().Resolve(_context, types)
> {code}
> We don't have this problem integers and floats (see comment below). For 
> consistency with the other arithmetic functions, what I would expect would be 
> that we would expand the precision as much as we could within Decimal128–in 
> this case, Decimal128(38, 6)–and the compute function would either error _if_ 
> there is an overflow (in the _checked version) or just overflow in the 
> non-checked version. But it wouldn't error on determining the output type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range

2022-09-04 Thread Yibo Cai (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600137#comment-17600137
 ] 

Yibo Cai commented on ARROW-17601:
--

Will look into this issue. Thanks [~npr] for the report.

> [C++] Error when creating Expression on Decimal128 types: precision out of 
> range
> 
>
> Key: ARROW-17601
> URL: https://issues.apache.org/jira/browse/ARROW-17601
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Priority: Major
>
> Reproducer in R:
> {code}
> library(arrow)
> library(dplyr)
> tab <- Table$create(col1 = 1:4, col2 = 5:8)
> tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2)))
> tab %>% mutate(col1 * col2)
> # Error: Invalid: Decimal precision out of range [1, 38]: 49
> # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078  
> DecimalType::Make(left_type.id(), precision, scale)
> # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413  
> call.kernel->signature->out_type().Resolve(_context, types)
> {code}
> We don't have this problem integers and floats (see comment below). For 
> consistency with the other arithmetic functions, what I would expect would be 
> that we would expand the precision as much as we could within Decimal128–in 
> this case, Decimal128(38, 6)–and the compute function would either error _if_ 
> there is an overflow (in the _checked version) or just overflow in the 
> non-checked version. But it wouldn't error on determining the output type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range

2022-09-04 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600095#comment-17600095
 ] 

Neal Richardson commented on ARROW-17601:
-

cc [~yibocai] since you added the decimal arithmetic in ARROW-12074.

> [C++] Error when creating Expression on Decimal128 types: precision out of 
> range
> 
>
> Key: ARROW-17601
> URL: https://issues.apache.org/jira/browse/ARROW-17601
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Priority: Major
>
> Reproducer in R:
> {code}
> library(arrow)
> library(dplyr)
> tab <- Table$create(col1 = 1:4, col2 = 5:8)
> tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2)))
> tab %>% mutate(col1 * col2)
> # Error: Invalid: Decimal precision out of range [1, 38]: 49
> # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078  
> DecimalType::Make(left_type.id(), precision, scale)
> # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413  
> call.kernel->signature->out_type().Resolve(_context, types)
> {code}
> We don't have this problem integers and floats (see comment below). For 
> consistency with the other arithmetic functions, what I would expect would be 
> that we would expand the precision as much as we could within Decimal128–in 
> this case, Decimal128(38, 6)–and the compute function would either error _if_ 
> there is an overflow (in the _checked version) or just overflow in the 
> non-checked version. But it wouldn't error on determining the output type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range

2022-09-02 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599693#comment-17599693
 ] 

Neal Richardson commented on ARROW-17601:
-

Also, in case it's relevant: we don't have any problems like this with integer 
or floating-point types:

{code}
> expr <- Expression$create("multiply_checked", Expression$field_ref("col1"), 
> Expression$field_ref("col2"))
> 
> expr$type(schema(col1=int64(), col2=int64()))
Int64
int64
> expr$type(schema(col1=int32(), col2=int32()))
Int32
int32
> expr$type(schema(col1=float32(), col2=float32()))
Float32
float
{code}

> [C++] Error when creating Expression on Decimal128 types: precision out of 
> range
> 
>
> Key: ARROW-17601
> URL: https://issues.apache.org/jira/browse/ARROW-17601
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Priority: Major
>
> Reproducer in R:
> {code}
> library(arrow)
> library(dplyr)
> tab <- Table$create(col1 = 1:4, col2 = 5:8)
> tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2)))
> tab %>% mutate(col1 * col2)
> # Error: Invalid: Decimal precision out of range [1, 38]: 49
> # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078  
> DecimalType::Make(left_type.id(), precision, scale)
> # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413  
> call.kernel->signature->out_type().Resolve(_context, types)
> {code}
> With integers and floats, we upcast to a wider size in some compute functions 
> like multiplication. Should this go up to Decimal256? Or is there a better 
> way to determine the size required?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range

2022-09-02 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599604#comment-17599604
 ] 

Neal Richardson commented on ARROW-17601:
-

Here's a reproducer just using Expressions, no dplyr:

{code}
expr <- Expression$create("multiply_checked", Expression$field_ref("col1"), 
Expression$field_ref("col2"))
expr

# Expression
# multiply_checked(col1, col2)

expr$type(schema(col1=decimal128(33, 4), col2=decimal128(15, 2)))

# Error: Invalid: Decimal precision out of range [1, 38]: 49
{code}

Note that the different kernels have different promotion logic. Subtraction, 
for example, doesn't go out of range with these types.

{code}
expr <- Expression$create("subtract_checked", Expression$field_ref("col1"), 
Expression$field_ref("col2"))
expr$type(schema(col1=decimal128(33, 4), col2=decimal128(15, 2)))

# Decimal128Type
# decimal128(34, 4)
{code}

> [C++] Error when creating Expression on Decimal128 types: precision out of 
> range
> 
>
> Key: ARROW-17601
> URL: https://issues.apache.org/jira/browse/ARROW-17601
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Neal Richardson
>Priority: Major
>
> Reproducer in R:
> {code}
> library(arrow)
> library(dplyr)
> tab <- Table$create(col1 = 1:4, col2 = 5:8)
> tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2)))
> tab %>% mutate(col1 * col2)
> # Error: Invalid: Decimal precision out of range [1, 38]: 49
> # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078  
> DecimalType::Make(left_type.id(), precision, scale)
> # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413  
> call.kernel->signature->out_type().Resolve(_context, types)
> {code}
> With integers and floats, we upcast to a wider size in some compute functions 
> like multiplication. Should this go up to Decimal256? Or is there a better 
> way to determine the size required?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)