chaokunyang commented on code in PR #2033:
URL: https://github.com/apache/fury/pull/2033#discussion_r1935521890
##########
cpp/fury/util/string_util.h:
##########
@@ -204,9 +163,15 @@ inline bool isAscii(const std::string &str) {
}
inline bool isLatin1(const std::u16string &str) {
- const std::uint16_t *data =
- reinterpret_cast<const std::uint16_t *>(str.data());
- return isLatin1(data, str.size());
+ // Try the conversion directly, and all characters are considered Latin1 if
+ // they are successfully converted
+ size_t latin1_len = simdutf::latin1_length_from_utf16(str.size());
Review Comment:
If we use `simdutf::latin1_length_from_utf16(str.size()) != str.size();` to
check whether string is latin1, it will needs to iterate the whole string, but
in many cases this is not necessary. Take non-latin1 string `"abcd\u1234abcda"
+ std::string(100000, "x")` as an example, current implementation will return
when check 5rd char, but in this PR, it must iterate the whole string, which is
inefficient
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]