Yibo Cai created ARROW-7404: ------------------------------- Summary: [C++][Gandiva] Fix utf8 char length error on Arm64 Key: ARROW-7404 URL: https://issues.apache.org/jira/browse/ARROW-7404 Project: Apache Arrow Issue Type: Bug Components: C++ - Gandiva Reporter: Yibo Cai Assignee: Yibo Cai
Current code checks if a UTF-8 eight-bit code unit is within 0x00~0x7F by "if (c >= 0)", where c is defined as "char". This checking assumes char is always signed, which is not true[1]. On Arm64, char is unsigned by default and causes some Gandiva unit tests fail. Fix it by casting to "signed char" explicitly. [1] Cited from https://en.cppreference.com/w/cpp/language/types The signedness of char depends on the compiler and the target platform: the defaults for ARM and PowerPC are typically unsigned, the defaults for x86 and x64 are typically signed. -- This message was sent by Atlassian Jira (v8.3.4#803005)