pandalee99 commented on code in PR #2033:
URL: https://github.com/apache/fury/pull/2033#discussion_r1932505259


##########
cpp/fury/util/string_util.h:
##########
@@ -204,9 +163,15 @@ inline bool isAscii(const std::string &str) {
 }
 
 inline bool isLatin1(const std::u16string &str) {
-  const std::uint16_t *data =
-      reinterpret_cast<const std::uint16_t *>(str.data());
-  return isLatin1(data, str.size());
+  // Try the conversion directly, and all characters are considered Latin1 if
+  // they are successfully converted
+  size_t latin1_len = simdutf::latin1_length_from_utf16(str.size());

Review Comment:
   sure.
   
   ``` c++
   
   static void BM_IsLatin1_SIMDUTF(benchmark::State &state) {
     for (auto _ : state) {
       for (const auto &str : test_latin1_strings) {
         bool result = isLatin1_SIMDUTF(str);
         benchmark::DoNotOptimize(result); 
       }
     }
   }
   
   // here  fury::isLatin1   is old api 
   static void BM_IsLatin1_FURY(benchmark::State &state) {
     for (auto _ : state) {
       for (const auto &str : test_latin1_strings) {
         bool result = fury::isLatin1(str); // old api 
         benchmark::DoNotOptimize(result); 
       }
     }
   }
   ```
   Run on (4 X 2688 MHz CPU s)
   CPU Caches:
     L1 Data 48 KiB (x4)
     L1 Instruction 32 KiB (x4)
     L2 Unified 1280 KiB (x4)
     L3 Unified 24576 KiB (x2)
   Load Average: 0.26, 0.15, 0.10
   
   
   Benchmark          &ensp;  &ensp;     &ensp;  &ensp;       &ensp;            
       Time        &ensp;       CPU  &ensp;   Iterations
   
   BM_IsLatin1_SIMDUTF          &ensp;            259315 ns        &ensp; 
259217 ns      &ensp;     2839
   BM_IsLatin1_FURY                 &ensp;     &ensp;  &ensp;        621085 ns  
    &ensp;   621018 ns    &ensp;       1108
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to