urlyy opened a new pull request, #1778:
URL: https://github.com/apache/fury/pull/1778

   ## What does this PR do?
   For the conversion from UTF-16 to UTF-8, a SIMD method based on AVX/SSE/NEON 
instruction sets was added on the basis of #1730 , and benchmarks were written.
   
   referencing 
   - 
https://github.com/simdutf/simdutf/blob/master/src/westmere/sse_convert_utf16_to_utf8.cpp
   - 
https://github.com/simdutf/simdutf/blob/master/src/haswell/avx2_convert_utf16_to_utf8.cpp
   - 
https://github.com/simdutf/simdutf/blob/5c1a86887010cd2b4d648049c4d73de81a026341/src/arm64/arm_convert_utf16_to_utf8.cpp
   - 
https://github.com/simdutf/simdutf/blob/master/src/tables/utf16_to_utf8_tables.h
   
   Notice:
   - I use two precomputing table , as same as what have done in `simdutf`. But 
it takes 1600 lines.
   - I copied two utf8 encoded text file into rust project for benchmark.
   - `util.rs` might need to be merged with `string_util.rs` 
   
   
   ## Related issues
   - #1547 
   - #1730 
   
   
   
   ## Does this PR introduce any user-facing change?
   - [x] Does this PR introduce any public API change?
   - [ ] Does this PR introduce any binary protocol compatibility change?
   
   
   ## Benchmark
   dataset from 
https://github.com/lemire/unicode_lipsum/tree/main/wikipedia_mars
   
   Both SIMD and non-SIMD approach  are faster than using 
`String::from_utf16(bytes)`.In my win11 x86 machine benchmark , SIMD approach 
seems to be approximately only a little faster than normal approach , that is 
out of my expectation. AVX seems better than SSE because AVX handle 256bit at 
one time but SSE onlyt handle 128 bits at one time. When handling with 
surrogate pair, algorithm will use fall_back (normal, without SIMD) way, in 
this case simd approach might be worse than normal way.  
   
![image](https://github.com/user-attachments/assets/3011a6d6-b9de-4033-8b88-ea71fe9c9beb)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to