pandalee99 commented on code in PR #2033:
URL: https://github.com/apache/fury/pull/2033#discussion_r1932505259
##########
cpp/fury/util/string_util.h:
##########
@@ -204,9 +163,15 @@ inline bool isAscii(const std::string &str) {
}
inline bool isLatin1(const std::u16string &str) {
- const std::uint16_t *data =
- reinterpret_cast<const std::uint16_t *>(str.data());
- return isLatin1(data, str.size());
+ // Try the conversion directly, and all characters are considered Latin1 if
+ // they are successfully converted
+ size_t latin1_len = simdutf::latin1_length_from_utf16(str.size());
Review Comment:
sure.
``` c++
static void BM_IsLatin1_SIMDUTF(benchmark::State &state) {
for (auto _ : state) {
for (const auto &str : test_latin1_strings) {
bool result = isLatin1_SIMDUTF(str);
benchmark::DoNotOptimize(result);
}
}
}
// here fury::isLatin1 is old api
static void BM_IsLatin1_FURY(benchmark::State &state) {
for (auto _ : state) {
for (const auto &str : test_latin1_strings) {
bool result = fury::isLatin1(str); // old api
benchmark::DoNotOptimize(result);
}
}
}
```
Run on (4 X 2688 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 1280 KiB (x4)
L3 Unified 24576 KiB (x2)
Load Average: 0.26, 0.15, 0.10
Benchmark Time CPU Iterations
BM_IsLatin1_SIMDUTF   259315 ns  
259217 ns   2839
BM_IsLatin1_FURY       621085 ns
  621018 ns   1108
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]