pandalee99 commented on code in PR #2033:
URL: https://github.com/apache/fury/pull/2033#discussion_r1934184404


##########
cpp/fury/util/string_util.h:
##########
@@ -204,9 +163,15 @@ inline bool isAscii(const std::string &str) {
 }
 
 inline bool isLatin1(const std::u16string &str) {
-  const std::uint16_t *data =
-      reinterpret_cast<const std::uint16_t *>(str.data());
-  return isLatin1(data, str.size());
+  // Try the conversion directly, and all characters are considered Latin1 if
+  // they are successfully converted
+  size_t latin1_len = simdutf::latin1_length_from_utf16(str.size());

Review Comment:
   Thank you for your comments.
   
   I've thought about all these questions. 
   
   First, I looked up this library, there is no related direct API, so it can 
only be implemented in this way.
   
   
   Second, even if there is a direct implementation of the API, the effect is 
not necessarily better.
   
    For example,
   
   there is a direct implementation of acsii detection in simdutf, but the 
results are not good.
   ``` c++
   bool isAscii_SIMDUTF(const std::string &str) {
     // Call the API directly without validation
     return simdutf::validate_ascii(str.data(), str.size());
   }
   static void BM_IsAscii_SIMDUTF(benchmark::State &state) {
     for (auto _ : state) {
       for (const auto &str : test_ascii_strings) {
         bool result = isAscii_SIMDUTF(str);
       }
     }
   }
   static void BM_IsAscii_FURY(benchmark::State &state) {
     for (auto _ : state) {
       for (const auto &str : test_ascii_strings) {
         bool result = fury::isAscii(str); // self-implementation 
       }
     }
   }
   ```
   <img width="374" alt="image" 
src="https://github.com/user-attachments/assets/cd558d32-31f7-45d3-adbc-92e4f841f8c3";
 />
   
   As you can see, the implementation of acsii is not good (which is strange, 
but I don't know why).
   
   After testing, only the use of simdutf API to detect Latin1 is relatively 
good.
   
   To sum up, this is my opinion.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to