This is an automated email from the ASF dual-hosted git repository.
zeroshade pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-go.git
The following commit(s) were added to refs/heads/main by this push:
new 2c14a75 perf(overflow): Speed up overflow checks for small integers
(#303)
2c14a75 is described below
commit 2c14a759e611122ab254eca45fdcacb4c405d857
Author: Chris Bandy <[email protected]>
AuthorDate: Mon Mar 10 19:42:58 2025 +0000
perf(overflow): Speed up overflow checks for small integers (#303)
### Rationale for this change
We can check the bounds of the arguments to avoid an int64 division. The
`Mul` and `Mul64` functions perform better on the systems I tested.
This is an older machine:
```
goos: linux
goarch: amd64
pkg: github.com/apache/arrow-go/v18/internal/utils
cpu: Intel(R) Core(TM) i5-7Y57 CPU @ 1.20GHz
│ bench_amd64_old.txt │
bench_amd64_new.txt │
│ sec/op │ sec/op vs
base │
Add/int/8192_+_8192-4 1.990n ± 14% 2.001n ± 18%
~ (p=1.000 n=6)
Add/int/MaxInt16_+_1-4 2.248n ± 1% 2.271n ± 4%
+1.07% (p=0.035 n=6)
Add/int/MaxInt16_+_5-4 2.215n ± 2% 2.270n ± 3%
+2.48% (p=0.013 n=6)
Add/int/MaxInt16_+_MaxInt16-4 2.293n ± 4% 2.331n ± 9%
~ (p=0.195 n=6)
Add/int/MaxInt32_+_1-4 2.339n ± 16% 2.345n ± 67%
~ (p=0.937 n=6)
Add/int/MaxInt32_+_5-4 2.363n ± 4% 2.330n ± 2%
~ (p=0.331 n=6)
Add/int/MaxInt32_+_MaxInt32-4 2.402n ± 23% 2.343n ± 3%
~ (p=0.310 n=6)
Add/int/MaxInt_+_MaxInt-4 2.334n ± 3% 2.364n ± 2%
~ (p=0.180 n=6)
Mul/int/8192_×_8192-4 14.335n ± 3% 3.578n ± 5%
-75.04% (p=0.002 n=6)
Mul/int/MaxInt16_×_1-4 14.015n ± 90% 3.718n ± 20%
-73.47% (p=0.002 n=6)
Mul/int/MaxInt16_×_5-4 14.480n ± 25% 3.647n ± 3%
-74.82% (p=0.002 n=6)
Mul/int/MaxInt16_×_MaxInt16-4 14.450n ± 30% 3.627n ± 35%
-74.90% (p=0.002 n=6)
Mul/int/MaxInt32_×_1-4 14.725n ± 2% 3.597n ± 12%
-75.57% (p=0.002 n=6)
Mul/int/MaxInt32_×_5-4 17.485n ± 31% 3.667n ± 5%
-79.03% (p=0.002 n=6)
Mul/int/MaxInt32_×_MaxInt32-4 14.385n ± 31% 5.704n ± 48%
-60.35% (p=0.002 n=6)
Mul/int/MaxInt_×_MaxInt-4 14.34n ± 3% 15.68n ± 4%
+9.34% (p=0.002 n=6)
Mul64/int64/8192_×_8192-4 14.495n ± 6% 3.730n ± 4%
-74.27% (p=0.002 n=6)
Mul64/int64/MaxInt16_×_1-4 14.300n ± 3% 3.612n ± 1%
-74.74% (p=0.002 n=6)
Mul64/int64/MaxInt16_×_5-4 14.020n ± 20% 3.657n ± 3%
-73.91% (p=0.002 n=6)
Mul64/int64/MaxInt16_×_MaxInt16-4 14.110n ± 2% 3.611n ± 12%
-74.41% (p=0.002 n=6)
Mul64/int64/MaxInt32_×_1-4 14.330n ± 2% 3.609n ± 4%
-74.82% (p=0.002 n=6)
Mul64/int64/MaxInt32_×_5-4 14.250n ± 14% 3.670n ± 2%
-74.25% (p=0.002 n=6)
Mul64/int64/MaxInt32_×_MaxInt32-4 14.070n ± 2% 3.571n ± 3%
-74.62% (p=0.002 n=6)
Mul64/int64/MaxInt_×_MaxInt-4 14.17n ± 2% 15.77n ± 3%
+11.25% (p=0.002 n=6)
geomean 7.807n 3.583n
-54.10%
```
The following is inside a `docker.io/i386/ubuntu` container on that same
machine. I don't have any 32-bit hardware.
```
goos: linux
goarch: 386
pkg: github.com/apache/arrow-go/v18/internal/utils
cpu: Intel(R) Core(TM) i5-7Y57 CPU @ 1.20GHz
│ bench_386_old.txt
│ bench_386_new.txt │
│ sec/op
│ sec/op vs base │
Add/int/8192_+_8192-4 3.113n ± 22%
2.848n ± 17% ~ (p=0.180 n=6)
Add/int/MaxInt16_+_1-4 3.344n ± 4%
3.294n ± 18% ~ (p=0.818 n=6)
Add/int/MaxInt16_+_5-4 3.420n ± 7%
3.411n ± 5% ~ (p=0.937 n=6)
Add/int/MaxInt16_+_MaxInt16-4 3.360n ± 3%
3.402n ± 30% ~ (p=0.240 n=6)
Add/int/MaxInt_+_1-4 3.329n ± 2%
3.433n ± 3% +3.09% (p=0.039 n=6)
Add/int/MaxInt_+_5-4 3.452n ± 5%
3.436n ± 2% ~ (p=0.937 n=6)
Add/int/MaxInt_+_MaxInt-4 3.353n ± 6%
3.454n ± 3% ~ (p=0.132 n=6)
Add/int/MaxInt_+_MaxInt#01-4 3.346n ± 13%
3.488n ± 3% ~ (p=0.132 n=6)
Mul/int/8192_×_8192-4 11.250n ± 2%
6.238n ± 6% -44.55% (p=0.002 n=6)
Mul/int/MaxInt16_×_1-4 11.565n ± 4%
6.261n ± 1% -45.86% (p=0.002 n=6)
Mul/int/MaxInt16_×_5-4 11.300n ± 3%
6.505n ± 3% -42.44% (p=0.002 n=6)
Mul/int/MaxInt16_×_MaxInt16-4 11.370n ± 4%
6.394n ± 40% -43.76% (p=0.002 n=6)
Mul/int/MaxInt_×_1-4 11.245n ± 3%
5.798n ± 2% -48.44% (p=0.002 n=6)
Mul/int/MaxInt_×_5-4 11.210n ± 18%
9.941n ± 2% -11.32% (p=0.002 n=6)
Mul/int/MaxInt_×_MaxInt-4 10.975n ± 4%
9.895n ± 2% -9.84% (p=0.002 n=6)
Mul/int/MaxInt_×_MaxInt#01-4 11.40n ± 18%
10.17n ± 2% -10.83% (p=0.002 n=6)
Mul64/int64/8192_×_8192-4 21.74n ± 70%
23.75n ± 2% ~ (p=0.065 n=6)
Mul64/int64/MaxInt16_×_1-4 22.37n ± 28%
23.87n ± 12% ~ (p=0.065 n=6)
Mul64/int64/MaxInt16_×_5-4 22.09n ± 2%
23.96n ± 23% +8.44% (p=0.002 n=6)
Mul64/int64/MaxInt16_×_MaxInt16-4 21.06n ± 3%
24.27n ± 3% +15.24% (p=0.002 n=6)
Mul64/int64/MaxInt_×_1-4 21.42n ± 8%
23.91n ± 64% +11.62% (p=0.002 n=6)
Mul64/int64/MaxInt_×_5-4 29.87n ± 4%
24.16n ± 22% -19.09% (p=0.009 n=6)
Mul64/int64/MaxInt_×_MaxInt-4 28.10n ± 2%
24.40n ± 2% -13.17% (p=0.002 n=6)
Mul64/int64/9223372036854775807_×_9223372036854775807-4 23.22n ± 6%
31.90n ± 35% +37.38% (p=0.002 n=6)
geomean 9.609n
8.523n -11.30%
```
### What changes are included in this PR?
A new generic `Add` function in `internal/utils` returns the sum of any
two integers and whether or not it overflowed.
New `Mul` and `Mul64` functions in `internal/utils` return the product
of two `int` or `int64`, respectively, and whether or not it overflowed.
These functions perform better on the systems I tested.
### Are these changes tested?
Yes.
### Are there any user-facing changes?
No effect on the API. This drops one dependency.
---------
Signed-off-by: Chris Bandy <[email protected]>
---
arrow/compute/internal/kernels/base_arithmetic.go | 4 +-
go.mod | 1 -
go.sum | 2 -
internal/utils/math.go | 95 +++++++++-
internal/utils/math_32bit_test.go | 67 +++++++
internal/utils/math_64bit_test.go | 67 +++++++
internal/utils/math_test.go | 204 ++++++++++++++++++++++
parquet/file/page_reader.go | 3 +-
parquet/file/record_reader.go | 7 +-
parquet/internal/encoding/levels.go | 3 +-
parquet/metadata/page_index.go | 4 +-
11 files changed, 441 insertions(+), 16 deletions(-)
diff --git a/arrow/compute/internal/kernels/base_arithmetic.go
b/arrow/compute/internal/kernels/base_arithmetic.go
index d225404..3e12663 100644
--- a/arrow/compute/internal/kernels/base_arithmetic.go
+++ b/arrow/compute/internal/kernels/base_arithmetic.go
@@ -23,12 +23,12 @@ import (
"math"
"math/bits"
- "github.com/JohnCGriffin/overflow"
"github.com/apache/arrow-go/v18/arrow"
"github.com/apache/arrow-go/v18/arrow/compute/exec"
"github.com/apache/arrow-go/v18/arrow/decimal128"
"github.com/apache/arrow-go/v18/arrow/decimal256"
"github.com/apache/arrow-go/v18/arrow/internal/debug"
+ "github.com/apache/arrow-go/v18/internal/utils"
"golang.org/x/exp/constraints"
)
@@ -709,7 +709,7 @@ func SubtractDate32(op ArithmeticOp) exec.ArrayKernelExec {
case OpSubChecked:
return ScalarBinary(getGoArithmeticBinary(func(a, b
arrow.Time32, e *error) (result arrow.Duration) {
result = arrow.Duration(a) - arrow.Duration(b)
- val, ok := overflow.Mul64(int64(result), secondsPerDay)
+ val, ok := utils.Mul64(int64(result), secondsPerDay)
if !ok {
*e = errOverflow
}
diff --git a/go.mod b/go.mod
index ade1b93..9932bab 100644
--- a/go.mod
+++ b/go.mod
@@ -21,7 +21,6 @@ go 1.23.0
toolchain go1.23.2
require (
- github.com/JohnCGriffin/overflow v0.0.0-20211019200055-46fa312c352c
github.com/andybalholm/brotli v1.1.1
github.com/apache/thrift v0.21.0
github.com/docopt/docopt-go v0.0.0-20180111231733-ee0de3bc6815
diff --git a/go.sum b/go.sum
index 147a3bd..dcfadf8 100644
--- a/go.sum
+++ b/go.sum
@@ -8,8 +8,6 @@ atomicgo.dev/schedule v0.1.0
h1:nTthAbhZS5YZmgYbb2+DH8uQIZcTlIrd4eYr3UQxEjs=
atomicgo.dev/schedule v0.1.0/go.mod
h1:xeUa3oAkiuHYh8bKiQBRojqAMq3PXXbJujjb0hw8pEU=
cloud.google.com/go v0.118.0 h1:tvZe1mgqRxpiVa3XlIGMiPcEUbP1gNXELgD4y/IXmeQ=
cloud.google.com/go v0.118.0/go.mod
h1:zIt2pkedt/mo+DQjcT4/L3NDxzHPR29j5HcclNH+9PM=
-github.com/JohnCGriffin/overflow v0.0.0-20211019200055-46fa312c352c
h1:RGWPOewvKIROun94nF7v2cua9qP+thov/7M50KEoeSU=
-github.com/JohnCGriffin/overflow v0.0.0-20211019200055-46fa312c352c/go.mod
h1:X0CRv0ky0k6m906ixxpzmDRLvX58TFUKS2eePweuyxk=
github.com/MarvinJWendt/testza v0.1.0/go.mod
h1:7AxNvlfeHP7Z/hDQ5JtE3OKYT3XFUeLCDE2DQninSqs=
github.com/MarvinJWendt/testza v0.2.1/go.mod
h1:God7bhG8n6uQxwdScay+gjm9/LnO4D3kkcZX4hv9Rp8=
github.com/MarvinJWendt/testza v0.2.8/go.mod
h1:nwIcjmr0Zz+Rcwfh3/4UhBp7ePKVhuBExvZqnKYWlII=
diff --git a/internal/utils/math.go b/internal/utils/math.go
index c831175..10acf36 100644
--- a/internal/utils/math.go
+++ b/internal/utils/math.go
@@ -16,7 +16,12 @@
package utils
-import "golang.org/x/exp/constraints"
+import (
+ "math"
+ "math/bits"
+
+ "golang.org/x/exp/constraints"
+)
func Min[T constraints.Ordered](a, b T) T {
if a < b {
@@ -31,3 +36,91 @@ func Max[T constraints.Ordered](a, b T) T {
}
return b
}
+
+// Add returns the sum of two integers while checking for [overflow].
+// It returns false (not ok) when the operation overflows.
+//
+// [overflow]: https://go.dev/ref/spec#Integer_overflow
+func Add[T constraints.Signed](a, b T) (T, bool) {
+ // Overflow occurs when a and b are too positive or too negative.
+ // That is, when: (a > 0) && (b > 0) && (a > math.Max[T] - b)
+ // or when: (a < 0) && (b < 0) && (a < math.Min[T] - b)
+ result := a + b
+
+ // No overflow occurred if the result is larger exactly when b is
positive.
+ return result, (result > a) == (b > 0)
+}
+
+const (
+ sqrtMaxInt = 1<<((bits.UintSize>>1)-1) - 1
+ sqrtMinInt = -1 << ((bits.UintSize >> 1) - 1)
+)
+
+// Mul returns the product of two integers while checking for [overflow].
+// It returns false (not ok) when the operation overflows.
+//
+// [overflow]: https://go.dev/ref/spec#Integer_overflow
+func Mul(a, b int) (int, bool) {
+ // Avoid division by zero and calculate nothing when a or b is zero.
+ if a == 0 || b == 0 {
+ return 0, true
+ }
+
+ result := a * b
+
+ // Overflow occurred if the result is positive when exactly one input
+ // is negative.
+ if result > 0 == ((a < 0) != (b < 0)) {
+ return result, false
+ }
+
+ // Overflow cannot occur when a or b is zero or one.
+ // Overflow cannot occur when a and b are less positive than
sqrt(MaxInt).
+ // Overflow cannot occur when a and b are less negative than
sqrt(MinInt).
+ if (sqrtMinInt <= a && a <= sqrtMaxInt &&
+ sqrtMinInt <= b && b <= sqrtMaxInt) || a == 1 || b == 1 {
+ return result, true
+ }
+
+ // Finally, no overflow occurred if division produces the input. This is
+ // last because division can be expensive. Dividing by -1 can overflow,
+ // but we returned early in that case above.
+ return result, (result/a == b)
+}
+
+const (
+ sqrtMaxInt64 = math.MaxInt32
+ sqrtMinInt64 = math.MinInt32
+)
+
+// Mul64 returns the product of two integers while checking for [overflow].
+// It returns false (not ok) when the operation overflows.
+//
+// [overflow]: https://go.dev/ref/spec#Integer_overflow
+func Mul64(a, b int64) (int64, bool) {
+ // Avoid division by zero and calculate nothing when a or b is zero.
+ if a == 0 || b == 0 {
+ return 0, true
+ }
+
+ result := a * b
+
+ // Overflow occurred if the result is positive when exactly one input
+ // is negative.
+ if result > 0 == ((a < 0) != (b < 0)) {
+ return result, false
+ }
+
+ // Overflow cannot occur when a or b is zero or one.
+ // Overflow cannot occur when a and b are less positive than
sqrt(MaxInt64).
+ // Overflow cannot occur when a and b are less negative than
sqrt(MinInt64).
+ if (sqrtMinInt64 <= a && a <= sqrtMaxInt64 &&
+ sqrtMinInt64 <= b && b <= sqrtMaxInt64) || a == 1 || b == 1 {
+ return result, true
+ }
+
+ // Finally, no overflow occurred if division produces the input. This is
+ // last because division can be expensive. Dividing by -1 can overflow,
+ // but we returned early in that case above.
+ return result, (result/a == b)
+}
diff --git a/internal/utils/math_32bit_test.go
b/internal/utils/math_32bit_test.go
new file mode 100644
index 0000000..d9b5866
--- /dev/null
+++ b/internal/utils/math_32bit_test.go
@@ -0,0 +1,67 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+//go:build 386 || arm || mips || ppc
+
+package utils
+
+import (
+ "math"
+ "testing"
+
+ "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
+)
+
+func TestMul_32bit(t *testing.T) {
+ // These constants are smaller in magnitude than the exact square root.
+ assert.Greater(t, sqrtMinInt, int(math.Sqrt(math.MinInt32)))
+ assert.Less(t, sqrtMaxInt, int(math.Sqrt(math.MaxInt32)))
+
+ for _, tt := range []struct {
+ a, b, expected int
+ ok bool
+ }{
+ {ok: true, a: math.MinInt16, b: 0, expected: 0},
+ {ok: true, a: math.MinInt16, b: 1, expected: math.MinInt16},
+ {ok: true, a: math.MinInt16, b: -1, expected: -math.MinInt16},
+
+ {ok: true, a: math.MaxInt16, b: 0, expected: 0},
+ {ok: true, a: math.MaxInt16, b: 1, expected: math.MaxInt16},
+ {ok: true, a: math.MaxInt16, b: -1, expected: -math.MaxInt16},
+
+ {ok: true, a: math.MinInt16, b: math.MinInt16, expected:
1073741824},
+ {ok: true, a: math.MaxInt16, b: math.MaxInt16, expected:
1073676289},
+ {ok: true, a: math.MinInt16, b: math.MaxInt16, expected:
-1073709056},
+ {ok: true, a: math.MaxInt16, b: math.MinInt16, expected:
-1073709056},
+
+ {ok: true, a: math.MinInt32, b: 0, expected: 0},
+ {ok: true, a: math.MinInt32, b: 1, expected: math.MinInt32},
+ {ok: false, a: math.MinInt32, b: -1, expected: math.MinInt32},
+ {ok: false, a: math.MinInt32, b: -2, expected: 0},
+ {ok: false, a: math.MinInt32, b: 2, expected: 0},
+
+ {ok: true, a: math.MaxInt32, b: 0, expected: 0},
+ {ok: true, a: math.MaxInt32, b: 1, expected: math.MaxInt32},
+ {ok: true, a: math.MaxInt32, b: -1, expected: -math.MaxInt32},
+ {ok: false, a: math.MaxInt32, b: -2, expected: 2},
+ {ok: false, a: math.MaxInt32, b: 2, expected: -2},
+ } {
+ actual, ok := Mul(tt.a, tt.b)
+ assert.Equal(t, tt.ok, ok, "(%v + %v)", tt.a, tt.b)
+ require.Equal(t, tt.expected, actual, "(%v + %v)", tt.a, tt.b)
+ }
+}
diff --git a/internal/utils/math_64bit_test.go
b/internal/utils/math_64bit_test.go
new file mode 100644
index 0000000..0d9f987
--- /dev/null
+++ b/internal/utils/math_64bit_test.go
@@ -0,0 +1,67 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+//go:build amd64 || arm64 || mips64 || ppc64 || s390x
+
+package utils
+
+import (
+ "math"
+ "testing"
+
+ "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
+)
+
+func TestMul_64bit(t *testing.T) {
+ // These constants are smaller in magnitude than the exact square root.
+ assert.Greater(t, sqrtMinInt, int(math.Sqrt(math.MinInt64)))
+ assert.Less(t, sqrtMaxInt, int(math.Sqrt(math.MaxInt64)))
+
+ for _, tt := range []struct {
+ a, b, expected int
+ ok bool
+ }{
+ {ok: true, a: math.MinInt32, b: 0, expected: 0},
+ {ok: true, a: math.MinInt32, b: 1, expected: math.MinInt32},
+ {ok: true, a: math.MinInt32, b: -1, expected: -math.MinInt32},
+
+ {ok: true, a: math.MaxInt32, b: 0, expected: 0},
+ {ok: true, a: math.MaxInt32, b: 1, expected: math.MaxInt32},
+ {ok: true, a: math.MaxInt32, b: -1, expected: -math.MaxInt32},
+
+ {ok: true, a: math.MinInt32, b: math.MinInt32, expected:
4611686018427387904},
+ {ok: true, a: math.MaxInt32, b: math.MaxInt32, expected:
4611686014132420609},
+ {ok: true, a: math.MinInt32, b: math.MaxInt32, expected:
-4611686016279904256},
+ {ok: true, a: math.MaxInt32, b: math.MinInt32, expected:
-4611686016279904256},
+
+ {ok: true, a: math.MinInt64, b: 0, expected: 0},
+ {ok: true, a: math.MinInt64, b: 1, expected: math.MinInt64},
+ {ok: false, a: math.MinInt64, b: -1, expected: math.MinInt64},
+ {ok: false, a: math.MinInt64, b: -2, expected: 0},
+ {ok: false, a: math.MinInt64, b: 2, expected: 0},
+
+ {ok: true, a: math.MaxInt64, b: 0, expected: 0},
+ {ok: true, a: math.MaxInt64, b: 1, expected: math.MaxInt64},
+ {ok: true, a: math.MaxInt64, b: -1, expected: -math.MaxInt64},
+ {ok: false, a: math.MaxInt64, b: -2, expected: 2},
+ {ok: false, a: math.MaxInt64, b: 2, expected: -2},
+ } {
+ actual, ok := Mul(tt.a, tt.b)
+ assert.Equal(t, tt.ok, ok, "(%v * %v)", tt.a, tt.b)
+ require.Equal(t, tt.expected, actual, "(%v * %v)", tt.a, tt.b)
+ }
+}
diff --git a/internal/utils/math_test.go b/internal/utils/math_test.go
new file mode 100644
index 0000000..b790f48
--- /dev/null
+++ b/internal/utils/math_test.go
@@ -0,0 +1,204 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+//go:build go1.24
+
+// Benchmarks use [testing.B.Loop], introduced in Go 1.24.
+
+package utils
+
+import (
+ "fmt"
+ "math"
+ "math/bits"
+ "testing"
+
+ "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
+)
+
+func name[T int | int64](i T) string {
+ if i == math.MaxInt {
+ return "MaxInt"
+ }
+ if i == math.MaxInt32 {
+ return "MaxInt32"
+ }
+ if i == math.MaxInt16 {
+ return "MaxInt16"
+ }
+ return fmt.Sprint(i)
+}
+
+func BenchmarkAdd(b *testing.B) {
+ b.Run("int", func(b *testing.B) {
+ for _, bb := range [][]int{
+ {8192, 8192},
+ {math.MaxInt16, 1},
+ {math.MaxInt16, 5},
+ {math.MaxInt16, math.MaxInt16},
+ {math.MaxInt32, 1},
+ {math.MaxInt32, 5},
+ {math.MaxInt32, math.MaxInt32},
+ {math.MaxInt, math.MaxInt},
+ } {
+ x, y := bb[0], bb[1]
+
+ b.Run(fmt.Sprintf("%s + %s", name(x), name(y)), func(b
*testing.B) {
+ for b.Loop() {
+ Add(x, y)
+ }
+ })
+ }
+ })
+}
+
+func TestAdd(t *testing.T) {
+ t.Run("int32", func(t *testing.T) {
+ for _, tt := range []struct {
+ a, b, expected int32
+ ok bool
+ }{
+ {ok: true, a: math.MinInt16, b: math.MaxInt16,
expected: -1},
+ {ok: true, a: math.MinInt16, b: math.MinInt16,
expected: -65536},
+ {ok: true, a: math.MaxInt16, b: math.MaxInt16,
expected: 65534},
+
+ {ok: true, a: math.MinInt32, b: math.MaxInt32,
expected: -1},
+ {ok: true, a: math.MinInt32, b: 0, expected:
math.MinInt32},
+ {ok: false, a: math.MinInt32, b: -1, expected:
math.MaxInt32},
+
+ {ok: true, a: math.MaxInt32, b: math.MinInt32,
expected: -1},
+ {ok: true, a: math.MaxInt32, b: 0, expected:
math.MaxInt32},
+ {ok: false, a: math.MaxInt32, b: 1, expected:
math.MinInt32},
+ } {
+ actual, ok := Add(tt.a, tt.b)
+ assert.Equal(t, tt.ok, ok, "(%v + %v)", tt.a, tt.b)
+ require.Equal(t, tt.expected, actual, "(%v + %v)",
tt.a, tt.b)
+ }
+ })
+
+ t.Run("int64", func(t *testing.T) {
+ for _, tt := range []struct {
+ a, b, expected int64
+ ok bool
+ }{
+ {ok: true, a: math.MinInt32, b: math.MaxInt32,
expected: -1},
+ {ok: true, a: math.MinInt32, b: math.MinInt32,
expected: -4294967296},
+ {ok: true, a: math.MaxInt32, b: math.MaxInt32,
expected: 4294967294},
+
+ {ok: true, a: math.MinInt64, b: math.MaxInt64,
expected: -1},
+ {ok: true, a: math.MinInt64, b: 0, expected:
math.MinInt64},
+ {ok: false, a: math.MinInt64, b: -1, expected:
math.MaxInt64},
+
+ {ok: true, a: math.MaxInt64, b: math.MinInt64,
expected: -1},
+ {ok: true, a: math.MaxInt64, b: 0, expected:
math.MaxInt64},
+ {ok: false, a: math.MaxInt64, b: 1, expected:
math.MinInt64},
+ } {
+ actual, ok := Add(tt.a, tt.b)
+ assert.Equal(t, tt.ok, ok, "(%v + %v)", tt.a, tt.b)
+ require.Equal(t, tt.expected, actual, "(%v + %v)",
tt.a, tt.b)
+ }
+ })
+}
+
+func BenchmarkMul(b *testing.B) {
+ b.Run("int", func(b *testing.B) {
+ for _, bb := range [][]int{
+ {8192, 8192},
+ {math.MaxInt16, 1},
+ {math.MaxInt16, 5},
+ {math.MaxInt16, math.MaxInt16},
+ {math.MaxInt32, 1},
+ {math.MaxInt32, 5},
+ {math.MaxInt32, math.MaxInt32},
+ {math.MaxInt, math.MaxInt},
+ } {
+ x, y := bb[0], bb[1]
+
+ b.Run(fmt.Sprintf("%s × %s", name(x), name(y)), func(b
*testing.B) {
+ for b.Loop() {
+ Mul(x, y)
+ }
+ })
+ }
+ })
+}
+
+// See [TestMul_32bit] and [TestMul_64bit] tests.
+func TestMul(t *testing.T) {
+ require.Truef(t,
+ bits.UintSize == 32 || bits.UintSize == 64,
+ "Expected a 32-bit or 64-bit system, got %d-bit", bits.UintSize)
+}
+
+func BenchmarkMul64(b *testing.B) {
+ b.Run("int64", func(b *testing.B) {
+ for _, bb := range [][]int64{
+ {8192, 8192},
+ {math.MaxInt16, 1},
+ {math.MaxInt16, 5},
+ {math.MaxInt16, math.MaxInt16},
+ {math.MaxInt32, 1},
+ {math.MaxInt32, 5},
+ {math.MaxInt32, math.MaxInt32},
+ {math.MaxInt64, math.MaxInt64},
+ } {
+ x, y := bb[0], bb[1]
+
+ b.Run(fmt.Sprintf("%s × %s", name(x), name(y)), func(b
*testing.B) {
+ for b.Loop() {
+ Mul64(x, y)
+ }
+ })
+ }
+ })
+}
+
+func TestMul64(t *testing.T) {
+ for _, tt := range []struct {
+ a, b, expected int64
+ ok bool
+ }{
+ {ok: true, a: math.MinInt32, b: 0, expected: 0},
+ {ok: true, a: math.MinInt32, b: 1, expected: math.MinInt32},
+ {ok: true, a: math.MinInt32, b: -1, expected: -math.MinInt32},
+
+ {ok: true, a: math.MaxInt32, b: 0, expected: 0},
+ {ok: true, a: math.MaxInt32, b: 1, expected: math.MaxInt32},
+ {ok: true, a: math.MaxInt32, b: -1, expected: -math.MaxInt32},
+
+ {ok: true, a: math.MinInt32, b: math.MinInt32, expected:
4611686018427387904},
+ {ok: true, a: math.MaxInt32, b: math.MaxInt32, expected:
4611686014132420609},
+ {ok: true, a: math.MinInt32, b: math.MaxInt32, expected:
-4611686016279904256},
+ {ok: true, a: math.MaxInt32, b: math.MinInt32, expected:
-4611686016279904256},
+
+ {ok: true, a: math.MinInt64, b: 0, expected: 0},
+ {ok: true, a: math.MinInt64, b: 1, expected: math.MinInt64},
+ {ok: false, a: math.MinInt64, b: -1, expected: math.MinInt64},
+ {ok: false, a: math.MinInt64, b: -2, expected: 0},
+ {ok: false, a: math.MinInt64, b: 2, expected: 0},
+
+ {ok: true, a: math.MaxInt64, b: 0, expected: 0},
+ {ok: true, a: math.MaxInt64, b: 1, expected: math.MaxInt64},
+ {ok: true, a: math.MaxInt64, b: -1, expected: -math.MaxInt64},
+ {ok: false, a: math.MaxInt64, b: -2, expected: 2},
+ {ok: false, a: math.MaxInt64, b: 2, expected: -2},
+ } {
+ actual, ok := Mul64(tt.a, tt.b)
+ assert.Equal(t, tt.ok, ok, "(%v * %v)", tt.a, tt.b)
+ require.Equal(t, tt.expected, actual, "(%v * %v)", tt.a, tt.b)
+ }
+}
diff --git a/parquet/file/page_reader.go b/parquet/file/page_reader.go
index 11f43c1..92cc2f0 100644
--- a/parquet/file/page_reader.go
+++ b/parquet/file/page_reader.go
@@ -24,7 +24,6 @@ import (
"sort"
"sync"
- "github.com/JohnCGriffin/overflow"
"github.com/apache/arrow-go/v18/arrow/memory"
"github.com/apache/arrow-go/v18/internal/utils"
"github.com/apache/arrow-go/v18/parquet"
@@ -788,7 +787,7 @@ func (p *serializedPageReader) Next() bool {
// extract stats
firstRowIdx := p.rowsSeen
p.rowsSeen += int64(dataHeader.GetNumRows())
- levelsBytelen, ok :=
overflow.Add(int(dataHeader.GetDefinitionLevelsByteLength()),
int(dataHeader.GetRepetitionLevelsByteLength()))
+ levelsBytelen, ok :=
utils.Add(int(dataHeader.GetDefinitionLevelsByteLength()),
int(dataHeader.GetRepetitionLevelsByteLength()))
if !ok {
p.err = xerrors.New("parquet: levels size too
large (corrupt file?)")
return false
diff --git a/parquet/file/record_reader.go b/parquet/file/record_reader.go
index 87c2713..20d68f4 100644
--- a/parquet/file/record_reader.go
+++ b/parquet/file/record_reader.go
@@ -22,7 +22,6 @@ import (
"sync/atomic"
"unsafe"
- "github.com/JohnCGriffin/overflow"
"github.com/apache/arrow-go/v18/arrow"
"github.com/apache/arrow-go/v18/arrow/array"
"github.com/apache/arrow-go/v18/arrow/bitutil"
@@ -208,7 +207,7 @@ func (pr *primitiveRecordReader) ResetValues() {
func (pr *primitiveRecordReader) numBytesForValues(nitems int64) (num int64,
err error) {
typeSize := int64(pr.Descriptor().PhysicalType().ByteSize())
var ok bool
- if num, ok = overflow.Mul64(nitems, typeSize); !ok {
+ if num, ok = utils.Mul64(nitems, typeSize); !ok {
err = xerrors.New("total size of items too large")
}
return
@@ -392,7 +391,7 @@ func updateCapacity(cap, size, extra int64) (int64, error) {
if extra < 0 {
return 0, xerrors.New("negative size (corrupt file?)")
}
- target, ok := overflow.Add64(size, extra)
+ target, ok := utils.Add(size, extra)
if !ok {
return 0, xerrors.New("allocation size too large (corrupt
file?)")
}
@@ -423,7 +422,7 @@ func (rr *recordReader) reserveLevels(extra int64) error {
}
if newCap > rr.levelsCap {
- capBytes, ok := overflow.Mul(int(newCap),
arrow.Int16SizeBytes)
+ capBytes, ok := utils.Mul(int(newCap),
arrow.Int16SizeBytes)
if !ok {
return fmt.Errorf("allocation size too large
(corrupt file?)")
}
diff --git a/parquet/internal/encoding/levels.go
b/parquet/internal/encoding/levels.go
index 8b2d766..a63e9ab 100644
--- a/parquet/internal/encoding/levels.go
+++ b/parquet/internal/encoding/levels.go
@@ -23,7 +23,6 @@ import (
"fmt"
"math/bits"
- "github.com/JohnCGriffin/overflow"
"github.com/apache/arrow-go/v18/arrow/bitutil"
shared_utils "github.com/apache/arrow-go/v18/internal/utils"
"github.com/apache/arrow-go/v18/parquet"
@@ -210,7 +209,7 @@ func (l *LevelDecoder) SetData(encoding parquet.Encoding,
maxLvl int16, nbuffere
}
return int(nbytes) + 4, nil
case parquet.Encodings.BitPacked:
- nbits, ok := overflow.Mul(nbuffered, l.bitWidth)
+ nbits, ok := shared_utils.Mul(nbuffered, l.bitWidth)
if !ok {
return 0, errors.New("parquet: number of buffered
values too large (corrupt data page?)")
}
diff --git a/parquet/metadata/page_index.go b/parquet/metadata/page_index.go
index 497a260..020ab4c 100644
--- a/parquet/metadata/page_index.go
+++ b/parquet/metadata/page_index.go
@@ -22,8 +22,8 @@ import (
"math"
"sync"
- "github.com/JohnCGriffin/overflow"
"github.com/apache/arrow-go/v18/arrow"
+ shared_utils "github.com/apache/arrow-go/v18/internal/utils"
"github.com/apache/arrow-go/v18/parquet"
"github.com/apache/arrow-go/v18/parquet/internal/debug"
"github.com/apache/arrow-go/v18/parquet/internal/encoding"
@@ -433,7 +433,7 @@ func determinePageIndexRangesInRowGroup(rgMeta
*RowGroupMetaData, cols []int32)
return nil
}
- indexEnd, ok := overflow.Add64(idxLocation.Offset,
int64(idxLocation.Length))
+ indexEnd, ok := shared_utils.Add(idxLocation.Offset,
int64(idxLocation.Length))
if idxLocation.Offset < 0 || idxLocation.Length <= 0 || !ok {
return fmt.Errorf("%w: invalid page index location:
offset %d length %d",
arrow.ErrIndex, idxLocation.Offset,
idxLocation.Length)