This is an automated email from the ASF dual-hosted git repository.
kou pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 4b64398aae GH-50111: [C++][Gandiva] Improve function error messages
(#50112)
4b64398aae is described below
commit 4b64398aae88bae3b4efd86ea1d5a2221a0d61bf
Author: Logan Riggs <[email protected]>
AuthorDate: Mon Jun 15 17:43:50 2026 -0700
GH-50111: [C++][Gandiva] Improve function error messages (#50112)
### Rationale for this change
Gandiva's runtime errors are surfaced to SQL users via
`ExecutionContext::set_error_msg`. An audit of all ~90 call sites in
`cpp/src/gandiva/` turned up two recurring problems:
1. **No SQL function name in the message.** Users see `"divide by zero
error"` or `"Output buffer length can't be negative"` with no indication of
which SQL function produced it, making errors hard to localize in long queries.
2. **The offending value is not echoed.** Many messages reject a value
(invalid weekday, bad boolean string, out-of-range index, …) without telling
the user what was supplied.
A good runtime error should answer four questions: *which function*, *what
went wrong*, *what value triggered it*, *what's the valid range*. This PR moves
the highest-impact messages toward that bar.
### What changes are included in this PR?
All edits follow the same pattern:
```
<SQL_FUNCTION_NAME>: <what failed>; <offending value>; <valid range or hint>
```
Where the old message contained a load-bearing substring (e.g. `divide by
zero error`, `Output buffer length can't be negative`), the substring is
preserved so existing substring-based tests still match.
#### REGEXP_EXTRACT (the original case study)
[cpp/src/gandiva/regex_functions_holder.cc:239-247](cpp/src/gandiva/regex_functions_holder.cc#L239-L247)
— the message `"Index to extract out of range"` now reads `"REGEXP_EXTRACT:
invalid group_index '<N>'; must be between 0 and <max> (the number of capture
groups in the pattern)"`, matching the level of detail of the Java path.
#### Divide-by-zero family
Eight call sites in `arithmetic_ops.cc`, `decimal_ops.cc`, and
`extended_math_ops.cc` (covering `DIVIDE`, `DIV`, `MOD`, `PMOD`, decimal
`Divide`/`Mod`, `LOG`) now prefix the SQL function name while preserving the
original `divide by zero error` substring. `mod_float64_float64` additionally
echoes the dividend.
#### Value-echoing
The following sites now include both the SQL function name and the
offending value:
- `CAST_BIT` — echoes the invalid input string, lists expected values
- `CAST_VARCHAR` / `CAST_VARBINARY` length checks — echoes requested length
- `REPEAT` — echoes count (and on overflow: count × input length)
- `CONVERT_REPLACE_INVALID_FROM_UTF8` — echoes byte count
- `LOCATE` — echoes start position
- `FACTORIAL` — echoes input value (both negative and >20 paths)
- `NEGATIVE` (integer overflow on `INT_MIN`, and `negative_daytimeinterval`
out-of-bounds)
- `CRC32` — echoes input length
- `BASE64` / `UNBASE64` — echoes input length
- `AES_ENCRYPT` / `AES_DECRYPT` — echoes data length on negative input;
`AES_ENCRYPT` OOM message also rewritten to name the function clearly
- `NEXT_DAY` — echoes the unrecognized weekday string, lists expected values
- `CAST_INTERVAL_YEAR` — echoes the overflowing source value
- `CAST_*_FROM_HEX` (3 error paths in the macro) — echoes the input hex
string
- `REPLACE` — buffer-overflow message prefixed with `REPLACE:`
### Are these changes tested?
- Strict `EXPECT_EQ(error, "exact string")` assertions converted to
`HasSubstr` / `.find()` so the test contract is the SQL function name plus a
stable phrase, not the exact wording.
- Updated assertions where the new message no longer contains the old
free-text fragment (e.g. `"Factorial of negative"` → `HasSubstr("FACTORIAL") +
HasSubstr("non-negative")`).
- Files touched: `arithmetic_ops_test.cc`, `decimal_ops_test.cc`,
`extended_math_ops_test.cc`, `time_test.cc`, `gdv_function_stubs_test.cc`.
```bash
cd cpp/debug
cmake --build . --target gandiva_shared gandiva-precompiled-test
gandiva-internals-test gandiva-projector-test -j4
./debug/gandiva-precompiled-test # 130/130 pass
./debug/gandiva-internals-test # 156/156 pass
./debug/gandiva-projector-test # 218/218 pass
```
All affected error paths are exercised by the existing unit tests — that is
how each site was located in the audit.
### Are there any user-facing changes?
Yes, improved error messages.
#### Example before / after
```
-- before
SELECT REGEXP_EXTRACT('100-500', '(\d+)-(\d+)', -1);
-- FUNCTION ERROR: Index to extract out of range
-- after
-- FUNCTION ERROR: REGEXP_EXTRACT: invalid group_index '-1';
-- must be between 0 and 2
-- (the number of capture groups in the pattern)
```
```
-- before
SELECT CAST('maybe' AS BOOLEAN);
-- FUNCTION ERROR: Invalid value for boolean.
-- after
-- FUNCTION ERROR: CAST_BIT: Invalid value for boolean: 'maybe'
-- (expected 0, 1, true, false; case-insensitive)
```
```
-- before
SELECT NEXT_DAY(TIMESTAMP '2025-01-01 00:00:00', 'frusday');
-- FUNCTION ERROR: The weekday in this entry is invalid
-- after
-- FUNCTION ERROR: NEXT_DAY: 'frusday' is not a recognized weekday
-- (expected MON|TUE|WED|THU|FRI|SAT|SUN)
```
* GitHub Issue: #50111
Authored-by: [email protected] <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
---
cpp/src/gandiva/gdv_function_stubs.cc | 29 +-
cpp/src/gandiva/gdv_function_stubs_test.cc | 6 +-
cpp/src/gandiva/precompiled/arithmetic_ops.cc | 42 ++-
cpp/src/gandiva/precompiled/arithmetic_ops_test.cc | 20 +-
cpp/src/gandiva/precompiled/decimal_ops.cc | 4 +-
cpp/src/gandiva/precompiled/decimal_ops_test.cc | 5 +-
cpp/src/gandiva/precompiled/extended_math_ops.cc | 39 ++-
.../gandiva/precompiled/extended_math_ops_test.cc | 13 +-
cpp/src/gandiva/precompiled/string_ops.cc | 345 ++++++++++++---------
cpp/src/gandiva/precompiled/time.cc | 13 +-
cpp/src/gandiva/precompiled/time_test.cc | 75 ++++-
cpp/src/gandiva/regex_functions_holder.cc | 6 +-
12 files changed, 385 insertions(+), 212 deletions(-)
diff --git a/cpp/src/gandiva/gdv_function_stubs.cc
b/cpp/src/gandiva/gdv_function_stubs.cc
index ddd5df6d39..94c5454cf2 100644
--- a/cpp/src/gandiva/gdv_function_stubs.cc
+++ b/cpp/src/gandiva/gdv_function_stubs.cc
@@ -20,6 +20,7 @@
#include <utf8proc.h>
#include <boost/crc.hpp>
+#include <cstdio>
#include <sstream>
#include <string>
#include <vector>
@@ -189,7 +190,10 @@ int32_t gdv_fn_populate_varlen_vector(int64_t context_ptr,
int8_t* data_ptr,
GANDIVA_EXPORT
\
int64_t gdv_fn_crc_32_##TYPE(int64_t ctx, const char* input, int32_t
input_len) { \
if (input_len < 0) {
\
- gdv_fn_context_set_error_msg(ctx, "Input length can't be negative");
\
+ char err_msg[96];
\
+ snprintf(err_msg, sizeof(err_msg),
\
+ "CRC32: Input length can't be negative, got %d", input_len);
\
+ gdv_fn_context_set_error_msg(ctx, err_msg);
\
return 0;
\
}
\
boost::crc_32_type result;
\
@@ -233,7 +237,10 @@ GANDIVA_EXPORT
const char* gdv_fn_base64_encode_binary(int64_t context, const char* in,
int32_t in_len,
int32_t* out_len) {
if (in_len < 0) {
- gdv_fn_context_set_error_msg(context, "Buffer length cannot be negative");
+ char err_msg[96];
+ snprintf(err_msg, sizeof(err_msg),
+ "BASE64: input length must be non-negative, got %d", in_len);
+ gdv_fn_context_set_error_msg(context, err_msg);
*out_len = 0;
return "";
}
@@ -260,7 +267,10 @@ GANDIVA_EXPORT
const char* gdv_fn_base64_decode_utf8(int64_t context, const char* in, int32_t
in_len,
int32_t* out_len) {
if (in_len < 0) {
- gdv_fn_context_set_error_msg(context, "Buffer length cannot be negative");
+ char err_msg[96];
+ snprintf(err_msg, sizeof(err_msg),
+ "UNBASE64: input length must be non-negative, got %d", in_len);
+ gdv_fn_context_set_error_msg(context, err_msg);
*out_len = 0;
return "";
}
@@ -343,7 +353,10 @@ const char* gdv_fn_aes_encrypt(int64_t context, const
char* data, int32_t data_l
const char* key_data, int32_t key_data_len,
int32_t* out_len) {
if (data_len < 0) {
- gdv_fn_context_set_error_msg(context, "Invalid data length to be
encrypted");
+ char err_msg[96];
+ snprintf(err_msg, sizeof(err_msg),
+ "AES_ENCRYPT: data length can't be negative, got %d", data_len);
+ gdv_fn_context_set_error_msg(context, err_msg);
*out_len = 0;
return "";
}
@@ -363,8 +376,7 @@ const char* gdv_fn_aes_encrypt(int64_t context, const char*
data, int32_t data_l
static_cast<int32_t>(arrow::bit_util::RoundUpToPowerOf2(data_len,
kAesBlockSize));
char* ret = reinterpret_cast<char*>(gdv_fn_context_arena_malloc(context,
*out_len));
if (ret == nullptr) {
- std::string err_msg =
- "Could not allocate memory for returning aes encrypt cypher text";
+ std::string err_msg = "AES_ENCRYPT: could not allocate memory for
ciphertext output";
gdv_fn_context_set_error_msg(context, err_msg.data());
*out_len = 0;
return nullptr;
@@ -387,7 +399,10 @@ const char* gdv_fn_aes_decrypt(int64_t context, const
char* data, int32_t data_l
const char* key_data, int32_t key_data_len,
int32_t* out_len) {
if (data_len < 0) {
- gdv_fn_context_set_error_msg(context, "Invalid data length to be
decrypted");
+ char err_msg[96];
+ snprintf(err_msg, sizeof(err_msg),
+ "AES_DECRYPT: data length can't be negative, got %d", data_len);
+ gdv_fn_context_set_error_msg(context, err_msg);
*out_len = 0;
return "";
}
diff --git a/cpp/src/gandiva/gdv_function_stubs_test.cc
b/cpp/src/gandiva/gdv_function_stubs_test.cc
index 0b8d14914e..2eb43689d8 100644
--- a/cpp/src/gandiva/gdv_function_stubs_test.cc
+++ b/cpp/src/gandiva/gdv_function_stubs_test.cc
@@ -124,7 +124,8 @@ TEST(TestGdvFnStubs, TestBase64Encode) {
value = gdv_fn_base64_encode_binary(ctx_ptr, "test", -5, &out_len);
out_value = std::string(value, out_len);
EXPECT_EQ(out_value, "");
- EXPECT_THAT(ctx.get_error(), ::testing::HasSubstr("Buffer length cannot be
negative"));
+ EXPECT_THAT(ctx.get_error(), ::testing::HasSubstr("BASE64"));
+ EXPECT_THAT(ctx.get_error(), ::testing::HasSubstr("non-negative"));
ctx.Reset();
}
@@ -153,7 +154,8 @@ TEST(TestGdvFnStubs, TestBase64Decode) {
value = gdv_fn_base64_decode_utf8(ctx_ptr, "test", -5, &out_len);
out_value = std::string(value, out_len);
EXPECT_EQ(out_value, "");
- EXPECT_THAT(ctx.get_error(), ::testing::HasSubstr("Buffer length cannot be
negative"));
+ EXPECT_THAT(ctx.get_error(), ::testing::HasSubstr("UNBASE64"));
+ EXPECT_THAT(ctx.get_error(), ::testing::HasSubstr("non-negative"));
ctx.Reset();
}
diff --git a/cpp/src/gandiva/precompiled/arithmetic_ops.cc
b/cpp/src/gandiva/precompiled/arithmetic_ops.cc
index 2bd31bd788..bebeb191a0 100644
--- a/cpp/src/gandiva/precompiled/arithmetic_ops.cc
+++ b/cpp/src/gandiva/precompiled/arithmetic_ops.cc
@@ -15,8 +15,10 @@
// specific language governing permissions and limitations
// under the License.
+#include <cinttypes>
#include <cmath>
#include <cstdint>
+#include <cstdio>
#include "arrow/util/basic_decimal.h"
extern "C" {
@@ -65,7 +67,7 @@ extern "C" {
gdv_##OUT_TYPE NAME##_##IN_TYPE1##_##IN_TYPE2(int64_t context,
gdv_##IN_TYPE1 left, \
gdv_##IN_TYPE2 right) {
\
if (right == static_cast<gdv_##IN_TYPE2>(0)) {
\
- gdv_fn_context_set_error_msg(context, "divide by zero error");
\
+ gdv_fn_context_set_error_msg(context, "PMOD: divide by zero error");
\
return static_cast<gdv_##IN_TYPE1>(0);
\
}
\
double mod = fmod(static_cast<double>(left), static_cast<double>(right));
\
@@ -109,7 +111,8 @@ PMOD_OP(pmod, float64, float64, float64)
gdv_float64 mod_float64_float64(int64_t context, gdv_float64 x, gdv_float64 y)
{
if (y == 0.0) {
- const char* err_msg = "divide by zero error";
+ char err_msg[96];
+ snprintf(err_msg, sizeof(err_msg), "MOD: divide by zero error (dividend:
%g)", x);
gdv_fn_context_set_error_msg(context, err_msg);
return 0.0;
}
@@ -351,7 +354,7 @@ NUMERIC_BOOL_DATE_FUNCTION(IS_NOT_DISTINCT_FROM)
FORCE_INLINE
\
gdv_##TYPE divide_##TYPE##_##TYPE(gdv_int64 context, gdv_##TYPE in1,
gdv_##TYPE in2) { \
if (in2 == 0) {
\
- const char* err_msg = "divide by zero error";
\
+ const char* err_msg = "DIVIDE: divide by zero error";
\
gdv_fn_context_set_error_msg(context, err_msg);
\
return 0;
\
}
\
@@ -376,14 +379,19 @@ NUMERIC_FUNCTION(POSITIVE)
NUMERIC_FUNCTION_FOR_REAL(NEGATIVE)
-#define NEGATIVE_INTEGER(TYPE, SIZE)
\
- FORCE_INLINE
\
- gdv_##TYPE negative_##TYPE(gdv_int64 context, gdv_##TYPE in) {
\
- if (in <= INT##SIZE##_MIN) {
\
- gdv_fn_context_set_error_msg(context, "Overflow in negative execution");
\
- return 0;
\
- }
\
- return -1 * in;
\
+#define NEGATIVE_INTEGER(TYPE, SIZE) \
+ FORCE_INLINE \
+ gdv_##TYPE negative_##TYPE(gdv_int64 context, gdv_##TYPE in) { \
+ if (in <= INT##SIZE##_MIN) { \
+ char err_msg[96]; \
+ snprintf(err_msg, sizeof(err_msg), \
+ "NEGATIVE: Overflow in negative execution " \
+ "(cannot negate INT" #SIZE "_MIN: %" PRId64 ")", \
+ static_cast<int64_t>(in)); \
+ gdv_fn_context_set_error_msg(context, err_msg); \
+ return 0; \
+ } \
+ return -1 * in; \
}
NEGATIVE_INTEGER(int32, 32)
@@ -396,8 +404,12 @@ const int64_t INT_MIN_TO_NEGATIVE_INTERVAL_DAY_TIME =
-9223372030412324863;
gdv_int64 negative_daytimeinterval(gdv_int64 context, gdv_day_time_interval
interval) {
if (interval > INT_MAX_TO_NEGATIVE_INTERVAL_DAY_TIME ||
interval < INT_MIN_TO_NEGATIVE_INTERVAL_DAY_TIME) {
- gdv_fn_context_set_error_msg(
- context, "Interval day time is out of boundaries for the negative
function");
+ char err_msg[128];
+ snprintf(err_msg, sizeof(err_msg),
+ "NEGATIVE: Interval day time is out of boundaries for the
negative "
+ "function (value: %" PRId64 ")",
+ static_cast<int64_t>(interval));
+ gdv_fn_context_set_error_msg(context, err_msg);
return 0;
}
@@ -430,7 +442,7 @@ void negative_decimal(gdv_int64 context, int64_t high_bits,
uint64_t low_bits,
FORCE_INLINE
\
gdv_##TYPE div_##TYPE##_##TYPE(gdv_int64 context, gdv_##TYPE in1, gdv_##TYPE
in2) { \
if (in2 == 0) {
\
- const char* err_msg = "divide by zero error";
\
+ const char* err_msg = "DIV: divide by zero error";
\
gdv_fn_context_set_error_msg(context, err_msg);
\
return 0;
\
}
\
@@ -448,7 +460,7 @@ DIV(uint64)
FORCE_INLINE
\
gdv_##TYPE div_##TYPE##_##TYPE(gdv_int64 context, gdv_##TYPE in1, gdv_##TYPE
in2) { \
if (in2 == 0) {
\
- const char* err_msg = "divide by zero error";
\
+ const char* err_msg = "DIV: divide by zero error";
\
gdv_fn_context_set_error_msg(context, err_msg);
\
return 0;
\
}
\
diff --git a/cpp/src/gandiva/precompiled/arithmetic_ops_test.cc
b/cpp/src/gandiva/precompiled/arithmetic_ops_test.cc
index 02fc68713b..8753e36faf 100644
--- a/cpp/src/gandiva/precompiled/arithmetic_ops_test.cc
+++ b/cpp/src/gandiva/precompiled/arithmetic_ops_test.cc
@@ -52,7 +52,7 @@ TEST(TestArithmeticOps, TestPmod) {
EXPECT_EQ(pmod_int64_int64(ctx, 3, 0), 0);
EXPECT_TRUE(context.has_error());
- EXPECT_EQ(context.get_error(), "divide by zero error");
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("divide by zero
error"));
context.Reset();
}
@@ -65,7 +65,7 @@ TEST(TestArithmeticOps, TestMod) {
EXPECT_DOUBLE_EQ(mod_float64_float64(reinterpret_cast<gdv_int64>(&context),
2.5, 0.0),
0.0);
EXPECT_TRUE(context.has_error());
- EXPECT_EQ(context.get_error(), "divide by zero error");
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("divide by zero
error"));
context.Reset();
EXPECT_NEAR(mod_float64_float64(reinterpret_cast<gdv_int64>(&context), 2.5,
1.2), 0.1,
@@ -219,8 +219,9 @@ TEST(TestArithmeticOps, TestNegativeIntervalTypes) {
result = negative_daytimeinterval(ctx_ptr, INT64_MAX);
EXPECT_EQ(ctx.has_error(), true);
- EXPECT_EQ(ctx.get_error(),
- "Interval day time is out of boundaries for the negative
function");
+ EXPECT_THAT(ctx.get_error(),
+ ::testing::HasSubstr(
+ "Interval day time is out of boundaries for the negative
function"));
ctx.Reset();
const int64_t INT_MIN_TO_NEGATIVE_INTERVAL_DAY_TIME = -9223372030412324863;
@@ -229,8 +230,9 @@ TEST(TestArithmeticOps, TestNegativeIntervalTypes) {
result = negative_daytimeinterval(ctx_ptr, INT64_MIN);
EXPECT_EQ(ctx.has_error(), true);
- EXPECT_EQ(ctx.get_error(),
- "Interval day time is out of boundaries for the negative
function");
+ EXPECT_THAT(ctx.get_error(),
+ ::testing::HasSubstr(
+ "Interval day time is out of boundaries for the negative
function"));
ctx.Reset();
// Month interval
@@ -251,7 +253,7 @@ TEST(TestArithmeticOps, TestDivide) {
gandiva::ExecutionContext context;
EXPECT_EQ(divide_int64_int64(reinterpret_cast<gdv_int64>(&context), 10, 0),
0);
EXPECT_EQ(context.has_error(), true);
- EXPECT_EQ(context.get_error(), "divide by zero error");
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("divide by zero
error"));
context.Reset();
EXPECT_EQ(divide_int64_int64(reinterpret_cast<gdv_int64>(&context), 10, 2),
5);
@@ -262,7 +264,7 @@ TEST(TestArithmeticOps, TestDiv) {
gandiva::ExecutionContext context;
EXPECT_EQ(div_int64_int64(reinterpret_cast<gdv_int64>(&context), 101, 0), 0);
EXPECT_EQ(context.has_error(), true);
- EXPECT_EQ(context.get_error(), "divide by zero error");
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("divide by zero
error"));
context.Reset();
EXPECT_EQ(div_int64_int64(reinterpret_cast<gdv_int64>(&context), 101, 111),
0);
@@ -278,7 +280,7 @@ TEST(TestArithmeticOps, TestDiv) {
div_float64_float64(reinterpret_cast<gdv_int64>(&context), 1010.1010,
0.00000),
0.0);
EXPECT_EQ(context.has_error(), true);
- EXPECT_EQ(context.get_error(), "divide by zero error");
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("divide by zero
error"));
context.Reset();
EXPECT_EQ(div_float32_float32(reinterpret_cast<gdv_int64>(&context),
1010.1010f, 2.1f),
diff --git a/cpp/src/gandiva/precompiled/decimal_ops.cc
b/cpp/src/gandiva/precompiled/decimal_ops.cc
index 68949680b7..9332fbfaa7 100644
--- a/cpp/src/gandiva/precompiled/decimal_ops.cc
+++ b/cpp/src/gandiva/precompiled/decimal_ops.cc
@@ -351,7 +351,7 @@ BasicDecimal128 Divide(int64_t context, const
BasicDecimalScalar128& x,
const BasicDecimalScalar128& y, int32_t out_precision,
int32_t out_scale, bool* overflow) {
if (y.value() == 0) {
- const char* err_msg = "divide by zero error";
+ const char* err_msg = "DIVIDE: divide by zero error (decimal)";
gdv_fn_context_set_error_msg(context, err_msg);
return 0;
}
@@ -396,7 +396,7 @@ BasicDecimal128 Mod(int64_t context, const
BasicDecimalScalar128& x,
const BasicDecimalScalar128& y, int32_t out_precision,
int32_t out_scale, bool* overflow) {
if (y.value() == 0) {
- const char* err_msg = "divide by zero error";
+ const char* err_msg = "MOD: divide by zero error (decimal)";
gdv_fn_context_set_error_msg(context, err_msg);
return 0;
}
diff --git a/cpp/src/gandiva/precompiled/decimal_ops_test.cc
b/cpp/src/gandiva/precompiled/decimal_ops_test.cc
index be8a1fe8ad..95b0a874b9 100644
--- a/cpp/src/gandiva/precompiled/decimal_ops_test.cc
+++ b/cpp/src/gandiva/precompiled/decimal_ops_test.cc
@@ -15,6 +15,7 @@
// specific language governing permissions and limitations
// under the License.
+#include <gmock/gmock.h>
#include <gtest/gtest.h>
#include <algorithm>
#include <limits>
@@ -455,7 +456,7 @@ TEST_F(TestDecimalSql, DivideByZero) {
DecimalScalar128{"201", 20, 3}, DecimalScalar128{"0", 20,
2},
result_precision, result_scale, &overflow);
EXPECT_TRUE(context.has_error());
- EXPECT_EQ(context.get_error(), "divide by zero error");
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("divide by zero
error"));
// divide-by-nonzero should not cause an error.
context.Reset();
@@ -472,7 +473,7 @@ TEST_F(TestDecimalSql, DivideByZero) {
DecimalScalar128{"0", 20, 2}, result_precision, result_scale,
&overflow);
EXPECT_TRUE(context.has_error());
- EXPECT_EQ(context.get_error(), "divide by zero error");
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("divide by zero
error"));
// mod-by-nonzero should not cause an error.
context.Reset();
diff --git a/cpp/src/gandiva/precompiled/extended_math_ops.cc
b/cpp/src/gandiva/precompiled/extended_math_ops.cc
index c29f8f2a86..348d595faf 100644
--- a/cpp/src/gandiva/precompiled/extended_math_ops.cc
+++ b/cpp/src/gandiva/precompiled/extended_math_ops.cc
@@ -24,6 +24,7 @@
extern "C" {
+#include <inttypes.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
@@ -80,7 +81,7 @@ ENUMERIC_TYPES_UNARY(LOG10, float64)
FORCE_INLINE
void set_error_for_logbase(int64_t execution_context, double base) {
- const char* prefix = "divide by zero error with log of base";
+ const char* prefix = "LOG: divide by zero error with log of base";
int size = static_cast<int>(strlen(prefix)) + 64;
char* error = reinterpret_cast<char*>(malloc(size));
snprintf(error, size, "%s %f", prefix, base);
@@ -251,20 +252,28 @@ static const int64_t kFactorialLookupTable[] = {1,
121645100408832000,
2432902008176640000};
-#define FACTORIAL(IN_TYPE)
\
- FORCE_INLINE
\
- gdv_int64 factorial_##IN_TYPE(gdv_int64 ctx, gdv_##IN_TYPE value) {
\
- if (value < 0) {
\
- gdv_fn_context_set_error_msg(ctx, "Factorial of negative number not
exist!"); \
- return 0;
\
- }
\
- /* For numbers greater than 20 causes an overflow. */
\
- if (value > 20) {
\
- gdv_fn_context_set_error_msg(ctx, "Numbers greater than 20 cause
overflow!"); \
- return 0;
\
- }
\
-
\
- return kFactorialLookupTable[static_cast<int32_t>(value)];
\
+#define FACTORIAL(IN_TYPE)
\
+ FORCE_INLINE
\
+ gdv_int64 factorial_##IN_TYPE(gdv_int64 ctx, gdv_##IN_TYPE value) {
\
+ if (value < 0) {
\
+ char err_msg[96];
\
+ snprintf(err_msg, sizeof(err_msg),
\
+ "FACTORIAL: input must be non-negative, got %" PRId64,
\
+ static_cast<int64_t>(value));
\
+ gdv_fn_context_set_error_msg(ctx, err_msg);
\
+ return 0;
\
+ }
\
+ /* For numbers greater than 20 causes an overflow. */
\
+ if (value > 20) {
\
+ char err_msg[96];
\
+ snprintf(err_msg, sizeof(err_msg),
\
+ "FACTORIAL: input %" PRId64 " exceeds maximum 20 (would
overflow int64)", \
+ static_cast<int64_t>(value));
\
+ gdv_fn_context_set_error_msg(ctx, err_msg);
\
+ return 0;
\
+ }
\
+
\
+ return kFactorialLookupTable[static_cast<int32_t>(value)];
\
}
FACTORIAL(int32)
diff --git a/cpp/src/gandiva/precompiled/extended_math_ops_test.cc
b/cpp/src/gandiva/precompiled/extended_math_ops_test.cc
index ad0cb78188..e7a0f74c1b 100644
--- a/cpp/src/gandiva/precompiled/extended_math_ops_test.cc
+++ b/cpp/src/gandiva/precompiled/extended_math_ops_test.cc
@@ -19,6 +19,7 @@
# define M_PI 3.14159265358979323846
#endif
+#include <gmock/gmock.h>
#include <gtest/gtest.h>
#include <cmath>
@@ -61,11 +62,12 @@ TEST(TestExtendedMathOps, TestFactorial) {
}
factorial_int32(ctx, 21);
- EXPECT_TRUE(context.get_error().find("overflow") != std::string::npos);
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("overflow"));
context.Reset();
factorial_int32(ctx, -5);
- EXPECT_TRUE(context.get_error().find("Factorial of negative") !=
std::string::npos);
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("FACTORIAL"));
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("non-negative"));
context.Reset();
for (int64_t i = 0; i <= 20; ++i) {
@@ -79,11 +81,12 @@ TEST(TestExtendedMathOps, TestFactorial) {
}
factorial_int64(ctx, 21);
- EXPECT_TRUE(context.get_error().find("overflow") != std::string::npos);
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("overflow"));
context.Reset();
factorial_int64(ctx, -5);
- EXPECT_TRUE(context.get_error().find("Factorial of negative") !=
std::string::npos);
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("FACTORIAL"));
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("non-negative"));
context.Reset();
}
@@ -125,7 +128,7 @@ TEST(TestExtendedMathOps, TestLogWithBase) {
log_int32_int32(reinterpret_cast<gdv_int64>(&context), 1 /*base*/, 10
/*value*/);
VerifyFuzzyEquals(out, 0);
EXPECT_EQ(context.has_error(), true);
- EXPECT_TRUE(context.get_error().find("divide by zero error") !=
std::string::npos)
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("divide by zero
error"))
<< context.get_error();
gandiva::ExecutionContext context1;
diff --git a/cpp/src/gandiva/precompiled/string_ops.cc
b/cpp/src/gandiva/precompiled/string_ops.cc
index 9a2d8c8415..ff0fa026c8 100644
--- a/cpp/src/gandiva/precompiled/string_ops.cc
+++ b/cpp/src/gandiva/precompiled/string_ops.cc
@@ -27,6 +27,7 @@ extern "C" {
#include <cstdio>
#include <cstdlib>
#include <cstring>
+#include <string>
#include "./types.h"
@@ -540,7 +541,9 @@ gdv_boolean compare_lower_strings(const char* base_str,
gdv_int32 base_str_len,
FORCE_INLINE
gdv_boolean castBIT_utf8(gdv_int64 context, const char* data, gdv_int32
data_len) {
if (data_len <= 0) {
- gdv_fn_context_set_error_msg(context, "Invalid value for boolean.");
+ gdv_fn_context_set_error_msg(context,
+ "CAST_BIT: Invalid value for boolean: empty
string "
+ "(expected 0, 1, true, false;
case-insensitive)");
return false;
}
@@ -569,7 +572,10 @@ gdv_boolean castBIT_utf8(gdv_int64 context, const char*
data, gdv_int32 data_len
if (compare_lower_strings("false", 5, trimmed_data, trimmed_len)) return
false;
}
// if no 'true', 'false', '0' or '1' value is found, set an error
- gdv_fn_context_set_error_msg(context, "Invalid value for boolean.");
+ std::string err_msg = "CAST_BIT: Invalid value for boolean: '" +
+ std::string(data, data_len) +
+ "' (expected 0, 1, true, false; case-insensitive)";
+ gdv_fn_context_set_error_msg(context, err_msg.c_str());
return false;
}
@@ -578,7 +584,10 @@ const char* castVARCHAR_bool_int64(gdv_int64 context,
gdv_boolean value,
gdv_int64 out_len, gdv_int32* out_length) {
gdv_int32 len = static_cast<gdv_int32>(out_len);
if (len < 0) {
- gdv_fn_context_set_error_msg(context, "Output buffer length can't be
negative");
+ char err_msg[96];
+ snprintf(err_msg, sizeof(err_msg),
+ "CAST_VARCHAR: Output buffer length can't be negative, got %d",
len);
+ gdv_fn_context_set_error_msg(context, err_msg);
*out_length = 0;
return "";
}
@@ -592,90 +601,93 @@ const char* castVARCHAR_bool_int64(gdv_int64 context,
gdv_boolean value,
}
// Truncates the string to given length
-#define CAST_VARCHAR_FROM_VARLEN_TYPE(TYPE)
\
- FORCE_INLINE
\
- const char* castVARCHAR_##TYPE##_int64(gdv_int64 context, const char* data,
\
- gdv_int32 data_len, int64_t out_len,
\
- int32_t* out_length) {
\
- int32_t len = static_cast<int32_t>(out_len);
\
-
\
- if (len < 0) {
\
- gdv_fn_context_set_error_msg(context, "Output buffer length can't be
negative"); \
- *out_length = 0;
\
- return "";
\
- }
\
-
\
- if (len >= data_len || len == 0) {
\
- *out_length = data_len;
\
- return data;
\
- }
\
-
\
- int32_t remaining = len;
\
- int32_t index = 0;
\
- bool is_multibyte = false;
\
- do {
\
- /* In utf8, MSB of a single byte unicode char is always 0,
\
- * whereas for a multibyte character the MSB of each byte is 1.
\
- * So for a single byte char, a bitwise-and with x80 (10000000) will be
0 \
- * and it won't be 0 for bytes of a multibyte char.
\
- */
\
- char* data_ptr = const_cast<char*>(data);
\
-
\
- /* advance byte by byte till the 8-byte boundary then advance 8 bytes */
\
- auto num_bytes = reinterpret_cast<uintptr_t>(data_ptr) & 0x07;
\
- num_bytes = (8 - num_bytes) & 0x07;
\
- while (num_bytes > 0) {
\
- uint8_t* ptr = reinterpret_cast<uint8_t*>(data_ptr + index);
\
- if ((*ptr & 0x80) != 0) {
\
- is_multibyte = true;
\
- break;
\
- }
\
- index++;
\
- remaining--;
\
- num_bytes--;
\
- }
\
- if (is_multibyte) break;
\
- while (remaining >= 8) {
\
- uint64_t* ptr = reinterpret_cast<uint64_t*>(data_ptr + index);
\
- if ((*ptr & 0x8080808080808080) != 0) {
\
- is_multibyte = true;
\
- break;
\
- }
\
- index += 8;
\
- remaining -= 8;
\
- }
\
- if (is_multibyte) break;
\
- if (remaining >= 4) {
\
- uint32_t* ptr = reinterpret_cast<uint32_t*>(data_ptr + index);
\
- if ((*ptr & 0x80808080) != 0) break;
\
- index += 4;
\
- remaining -= 4;
\
- }
\
- while (remaining > 0) {
\
- uint8_t* ptr = reinterpret_cast<uint8_t*>(data_ptr + index);
\
- if ((*ptr & 0x80) != 0) {
\
- is_multibyte = true;
\
- break;
\
- }
\
- index++;
\
- remaining--;
\
- }
\
- if (is_multibyte) break;
\
- /* reached here; all are single byte characters */
\
- *out_length = len;
\
- return data;
\
- } while (false);
\
-
\
- /* detected multibyte utf8 characters; slow path */
\
- int32_t byte_pos =
\
- utf8_byte_pos(context, data + index, data_len - index, len - index);
\
- if (byte_pos < 0) {
\
- *out_length = 0;
\
- return "";
\
- }
\
-
\
- *out_length = index + byte_pos;
\
- return data;
\
+#define CAST_VARCHAR_FROM_VARLEN_TYPE(TYPE)
\
+ FORCE_INLINE
\
+ const char* castVARCHAR_##TYPE##_int64(gdv_int64 context, const char* data,
\
+ gdv_int32 data_len, int64_t out_len,
\
+ int32_t* out_length) {
\
+ int32_t len = static_cast<int32_t>(out_len);
\
+
\
+ if (len < 0) {
\
+ char err_msg[96];
\
+ snprintf(err_msg, sizeof(err_msg),
\
+ "CAST_VARCHAR: Output buffer length can't be negative, got %d",
len); \
+ gdv_fn_context_set_error_msg(context, err_msg);
\
+ *out_length = 0;
\
+ return "";
\
+ }
\
+
\
+ if (len >= data_len || len == 0) {
\
+ *out_length = data_len;
\
+ return data;
\
+ }
\
+
\
+ int32_t remaining = len;
\
+ int32_t index = 0;
\
+ bool is_multibyte = false;
\
+ do {
\
+ /* In utf8, MSB of a single byte unicode char is always 0,
\
+ * whereas for a multibyte character the MSB of each byte is 1.
\
+ * So for a single byte char, a bitwise-and with x80 (10000000) will be
0 \
+ * and it won't be 0 for bytes of a multibyte char.
\
+ */
\
+ char* data_ptr = const_cast<char*>(data);
\
+
\
+ /* advance byte by byte till the 8-byte boundary then advance 8 bytes */
\
+ auto num_bytes = reinterpret_cast<uintptr_t>(data_ptr) & 0x07;
\
+ num_bytes = (8 - num_bytes) & 0x07;
\
+ while (num_bytes > 0) {
\
+ uint8_t* ptr = reinterpret_cast<uint8_t*>(data_ptr + index);
\
+ if ((*ptr & 0x80) != 0) {
\
+ is_multibyte = true;
\
+ break;
\
+ }
\
+ index++;
\
+ remaining--;
\
+ num_bytes--;
\
+ }
\
+ if (is_multibyte) break;
\
+ while (remaining >= 8) {
\
+ uint64_t* ptr = reinterpret_cast<uint64_t*>(data_ptr + index);
\
+ if ((*ptr & 0x8080808080808080) != 0) {
\
+ is_multibyte = true;
\
+ break;
\
+ }
\
+ index += 8;
\
+ remaining -= 8;
\
+ }
\
+ if (is_multibyte) break;
\
+ if (remaining >= 4) {
\
+ uint32_t* ptr = reinterpret_cast<uint32_t*>(data_ptr + index);
\
+ if ((*ptr & 0x80808080) != 0) break;
\
+ index += 4;
\
+ remaining -= 4;
\
+ }
\
+ while (remaining > 0) {
\
+ uint8_t* ptr = reinterpret_cast<uint8_t*>(data_ptr + index);
\
+ if ((*ptr & 0x80) != 0) {
\
+ is_multibyte = true;
\
+ break;
\
+ }
\
+ index++;
\
+ remaining--;
\
+ }
\
+ if (is_multibyte) break;
\
+ /* reached here; all are single byte characters */
\
+ *out_length = len;
\
+ return data;
\
+ } while (false);
\
+
\
+ /* detected multibyte utf8 characters; slow path */
\
+ int32_t byte_pos =
\
+ utf8_byte_pos(context, data + index, data_len - index, len - index);
\
+ if (byte_pos < 0) {
\
+ *out_length = 0;
\
+ return "";
\
+ }
\
+
\
+ *out_length = index + byte_pos;
\
+ return data;
\
}
CAST_VARCHAR_FROM_VARLEN_TYPE(utf8)
@@ -691,7 +703,10 @@ CAST_VARCHAR_FROM_VARLEN_TYPE(binary)
int32_t* out_length) {
\
int32_t len = static_cast<int32_t>(out_len);
\
if (len < 0) {
\
- gdv_fn_context_set_error_msg(context, "Output buffer length can't be
negative"); \
+ char err_msg[96];
\
+ snprintf(err_msg, sizeof(err_msg),
\
+ "CAST_VARBINARY: Output buffer length can't be negative, got
%d", len); \
+ gdv_fn_context_set_error_msg(context, err_msg);
\
*out_length = 0;
\
return "";
\
}
\
@@ -839,13 +854,21 @@ const char* repeat_utf8_int32(gdv_int64 context, const
char* in, gdv_int32 in_le
}
// if the repeat number is a negative number, an error is set on context
if (repeat_number < 0) {
- gdv_fn_context_set_error_msg(context, "Repeat number can't be negative");
+ char err_msg[96];
+ snprintf(err_msg, sizeof(err_msg), "REPEAT: Repeat number can't be
negative, got %d",
+ repeat_number);
+ gdv_fn_context_set_error_msg(context, err_msg);
*out_len = 0;
return "";
}
if (ARROW_PREDICT_FALSE(
arrow::internal::MultiplyWithOverflow(repeat_number, in_len,
out_len))) {
- gdv_fn_context_set_error_msg(context, "Would overflow maximum output
size");
+ char err_msg[128];
+ snprintf(err_msg, sizeof(err_msg),
+ "REPEAT: Would overflow maximum output size "
+ "(repeat count %d * input length %d)",
+ repeat_number, in_len);
+ gdv_fn_context_set_error_msg(context, err_msg);
*out_len = 0;
return "";
}
@@ -1435,7 +1458,12 @@ const char*
convert_replace_invalid_fromUTF8_binary(int64_t context, const char*
int32_t
char_to_replace_len,
int32_t* out_len) {
if (char_to_replace_len > 1) {
- gdv_fn_context_set_error_msg(context, "Replacement of multiple bytes not
supported");
+ char err_msg[128];
+ snprintf(err_msg, sizeof(err_msg),
+ "CONVERT_REPLACE_INVALID_FROM_UTF8: replacement must be a single
byte, "
+ "got %d bytes",
+ char_to_replace_len);
+ gdv_fn_context_set_error_msg(context, err_msg);
*out_len = 0;
return "";
}
@@ -1820,7 +1848,10 @@ gdv_int32 locate_utf8_utf8_int32(gdv_int64 context,
const char* sub_str,
gdv_int32 sub_str_len, const char* str,
gdv_int32 str_len, gdv_int32 start_pos) {
if (start_pos < 1) {
- gdv_fn_context_set_error_msg(context, "Start position must be greater than
0");
+ char err_msg[96];
+ snprintf(err_msg, sizeof(err_msg),
+ "LOCATE: Start position must be greater than 0, got %d",
start_pos);
+ gdv_fn_context_set_error_msg(context, err_msg);
return 0;
}
@@ -1864,7 +1895,8 @@ const char* replace_with_max_len_utf8_utf8_utf8(gdv_int64
context, const char* t
for (; text_index <= text_len - from_str_len;) {
if (memcmp(text + text_index, from_str, from_str_len) == 0) {
if (out_index + text_index - last_match_index + to_str_len > max_length)
{
- gdv_fn_context_set_error_msg(context, "Buffer overflow for output
string");
+ gdv_fn_context_set_error_msg(context,
+ "REPLACE: Buffer overflow for output
string");
*out_len = 0;
return "";
}
@@ -1899,7 +1931,7 @@ const char* replace_with_max_len_utf8_utf8_utf8(gdv_int64
context, const char* t
}
if (out_index + text_len - last_match_index > max_length) {
- gdv_fn_context_set_error_msg(context, "Buffer overflow for output string");
+ gdv_fn_context_set_error_msg(context, "REPLACE: Buffer overflow for output
string");
*out_len = 0;
return "";
}
@@ -2320,62 +2352,75 @@ const char* binary_string(gdv_int64 context, const
char* text, gdv_int32 text_le
return ret;
}
-#define CAST_INT_BIGINT_VARBINARY(OUT_TYPE, TYPE_NAME)
\
- FORCE_INLINE
\
- OUT_TYPE
\
- cast##TYPE_NAME##_varbinary(gdv_int64 context, const char* in, int32_t
in_len) { \
- if (in_len == 0) {
\
- gdv_fn_context_set_error_msg(context, "Can't cast an empty string.");
\
- return -1;
\
- }
\
- char sign = in[0];
\
-
\
- bool negative = false;
\
- if (sign == '-') {
\
- negative = true;
\
- /* Ignores the sign char in the hexadecimal string */
\
- in++;
\
- in_len--;
\
- }
\
-
\
- if (negative && in_len == 0) {
\
- gdv_fn_context_set_error_msg(context,
\
- "Can't cast hexadecimal with only a minus
sign."); \
- return -1;
\
- }
\
-
\
- OUT_TYPE result = 0;
\
- int digit;
\
-
\
- int read_index = 0;
\
- while (read_index < in_len) {
\
- char c1 = in[read_index];
\
- if (isxdigit(c1)) {
\
- digit = to_binary_from_hex(c1);
\
-
\
- OUT_TYPE next = result * 16 - digit;
\
-
\
- if (next > result) {
\
- gdv_fn_context_set_error_msg(context, "Integer overflow.");
\
- return -1;
\
- }
\
- result = next;
\
- read_index++;
\
- } else {
\
- gdv_fn_context_set_error_msg(context,
\
- "The hexadecimal given has invalid
characters."); \
- return -1;
\
- }
\
- }
\
- if (!negative) {
\
- result *= -1;
\
-
\
- if (result < 0) {
\
- gdv_fn_context_set_error_msg(context, "Integer overflow.");
\
- return -1;
\
- }
\
- }
\
- return result;
\
+#define CAST_INT_BIGINT_VARBINARY(OUT_TYPE, TYPE_NAME)
\
+ FORCE_INLINE
\
+ OUT_TYPE
\
+ cast##TYPE_NAME##_varbinary(gdv_int64 context, const char* in, int32_t
in_len) { \
+ const char* in_original = in;
\
+ int32_t in_len_original = in_len;
\
+ if (in_len == 0) {
\
+ gdv_fn_context_set_error_msg(
\
+ context, "CAST_" #TYPE_NAME "_FROM_HEX: can't cast an empty
string"); \
+ return -1;
\
+ }
\
+ char sign = in[0];
\
+
\
+ bool negative = false;
\
+ if (sign == '-') {
\
+ negative = true;
\
+ /* Ignores the sign char in the hexadecimal string */
\
+ in++;
\
+ in_len--;
\
+ }
\
+
\
+ if (negative && in_len == 0) {
\
+ gdv_fn_context_set_error_msg(
\
+ context, "CAST_" #TYPE_NAME
\
+ "_FROM_HEX: can't cast hexadecimal with only a minus
sign"); \
+ return -1;
\
+ }
\
+
\
+ OUT_TYPE result = 0;
\
+ int digit;
\
+
\
+ int read_index = 0;
\
+ while (read_index < in_len) {
\
+ char c1 = in[read_index];
\
+ if (isxdigit(c1)) {
\
+ digit = to_binary_from_hex(c1);
\
+
\
+ OUT_TYPE next = result * 16 - digit;
\
+
\
+ if (next > result) {
\
+ std::string err_msg =
\
+ "CAST_" #TYPE_NAME
\
+ "_FROM_HEX: integer overflow while reading hex value '" +
\
+ std::string(in_original, in_len_original) + "'";
\
+ gdv_fn_context_set_error_msg(context, err_msg.c_str());
\
+ return -1;
\
+ }
\
+ result = next;
\
+ read_index++;
\
+ } else {
\
+ std::string err_msg = "CAST_" #TYPE_NAME
\
+ "_FROM_HEX: invalid character in hex value '" +
\
+ std::string(in_original, in_len_original) + "'";
\
+ gdv_fn_context_set_error_msg(context, err_msg.c_str());
\
+ return -1;
\
+ }
\
+ }
\
+ if (!negative) {
\
+ result *= -1;
\
+
\
+ if (result < 0) {
\
+ std::string err_msg = "CAST_" #TYPE_NAME
\
+ "_FROM_HEX: integer overflow while reading hex
value '" + \
+ std::string(in_original, in_len_original) + "'";
\
+ gdv_fn_context_set_error_msg(context, err_msg.c_str());
\
+ return -1;
\
+ }
\
+ }
\
+ return result;
\
}
CAST_INT_BIGINT_VARBINARY(int32_t, INT)
diff --git a/cpp/src/gandiva/precompiled/time.cc
b/cpp/src/gandiva/precompiled/time.cc
index 8414d0ed37..ff3b849ef4 100644
--- a/cpp/src/gandiva/precompiled/time.cc
+++ b/cpp/src/gandiva/precompiled/time.cc
@@ -17,6 +17,8 @@
#include "./epoch_time_point.h"
+#include <string>
+
extern "C" {
#define __STDC_FORMAT_MACROS
@@ -268,7 +270,9 @@ static const int WEEK_LEN[] = {6, 6, 7, 9, 8, 6, 8};
}
\
}
\
if (dateSearch == 0) {
\
- gdv_fn_context_set_error_msg(context, "The weekday in this entry is
invalid"); \
+ std::string err_msg = "NEXT_DAY: '" + std::string(in, in_len) +
\
+ "' is not a recognized day of the week";
\
+ gdv_fn_context_set_error_msg(context, err_msg.c_str());
\
return 0;
\
}
\
\
@@ -1047,7 +1051,12 @@ CAST_NULLABLE_INTERVAL_DAY(int64)
gdv_month_interval castNULLABLEINTERVALYEAR_##TYPE(int64_t context,
gdv_##TYPE in) { \
gdv_month_interval value = static_cast<gdv_month_interval>(in);
\
if (value != in) {
\
- gdv_fn_context_set_error_msg(context, "Integer overflow");
\
+ char err_msg[96];
\
+ snprintf(err_msg, sizeof(err_msg),
\
+ "CAST_INTERVAL_YEAR: Integer overflow casting %" PRId64
\
+ " to month interval",
\
+ static_cast<int64_t>(in));
\
+ gdv_fn_context_set_error_msg(context, err_msg);
\
}
\
return value;
\
}
diff --git a/cpp/src/gandiva/precompiled/time_test.cc
b/cpp/src/gandiva/precompiled/time_test.cc
index 6cfa6acf57..4310490a22 100644
--- a/cpp/src/gandiva/precompiled/time_test.cc
+++ b/cpp/src/gandiva/precompiled/time_test.cc
@@ -15,6 +15,7 @@
// specific language governing permissions and limitations
// under the License.
+#include <gmock/gmock.h>
#include <gtest/gtest.h>
#include "arrow/util/logging_internal.h"
@@ -962,10 +963,80 @@ TEST(TestTime, TestNextDay) {
ts = StringToTimestamp("2015-08-06 11:12:30");
out = next_day_from_timestamp(context_ptr, ts, "AHSRK", 5);
- EXPECT_EQ(context.get_error(), "The weekday in this entry is invalid");
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("NEXT_DAY"));
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("AHSRK"));
context.Reset();
}
+// Document that next_day's weekday-name matching is case-sensitive:
+// the WEEK[] lookup table holds uppercase names ("MONDAY", "TUE", ...) and
+// is_substr_utf8_utf8 does a byte-exact memcmp, so lowercase or mixed-case
+// input does not match and produces a NEXT_DAY error.
+TEST(TestTime, TestNextDayCaseSensitive) {
+ ExecutionContext context;
+ int64_t context_ptr = reinterpret_cast<int64_t>(&context);
+
+ gdv_timestamp ts = StringToTimestamp("2021-11-08 10:20:34");
+
+ // Uppercase: matches.
+ auto out = next_day_from_timestamp(context_ptr, ts, "FRIDAY", 6);
+ EXPECT_EQ(StringToTimestamp("2021-11-12 00:00:00"), out);
+ EXPECT_FALSE(context.has_error());
+
+ // Lowercase: does NOT match (case-sensitive memcmp against uppercase
WEEK[]).
+ out = next_day_from_timestamp(context_ptr, ts, "friday", 6);
+ EXPECT_TRUE(context.has_error());
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("NEXT_DAY"));
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("friday"));
+ context.Reset();
+
+ // Mixed case: also does NOT match.
+ out = next_day_from_timestamp(context_ptr, ts, "Friday", 6);
+ EXPECT_TRUE(context.has_error());
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("NEXT_DAY"));
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("Friday"));
+ context.Reset();
+}
+
+// Document that next_day's weekday-name matching is loose: it uses
+// is_substr_utf8_utf8 to test whether the input is a *substring* of any
+// WEEK[] entry, walking the array in the order
+// SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY
+// and returning the FIRST match. This means single-letter prefixes 'S' and
+// 'T' are ambiguous and silently resolve to SUNDAY / TUESDAY respectively
+// (never SATURDAY / THURSDAY). Likewise, any substring shared across weekdays
+// (e.g. "DAY") matches SUNDAY because it is first in the array.
+TEST(TestTime, TestNextDayAmbiguousPrefix) {
+ ExecutionContext context;
+ int64_t context_ptr = reinterpret_cast<int64_t>(&context);
+
+ gdv_timestamp ts = StringToTimestamp("2021-11-08 10:20:34"); // Mon
+
+ // "S" -> Sunday (could have been Saturday).
+ auto out = next_day_from_timestamp(context_ptr, ts, "S", 1);
+ EXPECT_EQ(StringToTimestamp("2021-11-14 00:00:00"), out); // next Sunday
+ EXPECT_FALSE(context.has_error());
+
+ // "T" -> Tuesday (could have been Thursday).
+ out = next_day_from_timestamp(context_ptr, ts, "T", 1);
+ EXPECT_EQ(StringToTimestamp("2021-11-09 00:00:00"), out); // next Tuesday
+ EXPECT_FALSE(context.has_error());
+
+ // "DAY" appears in every weekday name -> matches SUNDAY (first in array).
+ out = next_day_from_timestamp(context_ptr, ts, "DAY", 3);
+ EXPECT_EQ(StringToTimestamp("2021-11-14 00:00:00"), out); // next Sunday
+ EXPECT_FALSE(context.has_error());
+
+ // Unambiguous 2-letter prefixes work as expected.
+ out = next_day_from_timestamp(context_ptr, ts, "SA", 2);
+ EXPECT_EQ(StringToTimestamp("2021-11-13 00:00:00"), out); // next Saturday
+ EXPECT_FALSE(context.has_error());
+
+ out = next_day_from_timestamp(context_ptr, ts, "TH", 2);
+ EXPECT_EQ(StringToTimestamp("2021-11-11 00:00:00"), out); // next Thursday
+ EXPECT_FALSE(context.has_error());
+}
+
TEST(TestTime, TestCastTimestampToTime) {
gdv_timestamp ts = StringToTimestamp("2000-05-01 10:20:34");
auto expected_response =
@@ -1172,7 +1243,7 @@ TEST(TestTime, TestCastNullableInterval) {
EXPECT_EQ(castNULLABLEINTERVALYEAR_int64(context_ptr, 1201), 1201);
// validate overflow error when using bigint as input
castNULLABLEINTERVALYEAR_int64(context_ptr, INT64_MAX);
- EXPECT_EQ(context.get_error(), "Integer overflow");
+ EXPECT_THAT(context.get_error(), ::testing::HasSubstr("Integer overflow"));
context.Reset();
}
diff --git a/cpp/src/gandiva/regex_functions_holder.cc
b/cpp/src/gandiva/regex_functions_holder.cc
index 6c0c3d40f1..e6bd09be37 100644
--- a/cpp/src/gandiva/regex_functions_holder.cc
+++ b/cpp/src/gandiva/regex_functions_holder.cc
@@ -237,7 +237,11 @@ const char* ExtractHolder::operator()(ExecutionContext*
ctx, const char* user_in
int32_t user_input_len, int32_t
extract_index,
int32_t* out_length) {
if (extract_index < 0 || extract_index >= num_groups_pattern_) {
- ctx->set_error_msg("Index to extract out of range");
+ std::string err_msg = "REGEXP_EXTRACT: invalid group_index '" +
+ std::to_string(extract_index) + "'; must be between
0 and " +
+ std::to_string(num_groups_pattern_ - 1) +
+ " (the number of capture groups in the pattern)";
+ ctx->set_error_msg(err_msg.c_str());
*out_length = 0;
return "";
}