Re: [PR] Improve parquet BinaryView / StringView decoder performance (up to -35%) [arrow-rs]

via GitHub Thu, 22 Jan 2026 00:14:07 -0800


Dandandan commented on code in PR #9236:
URL: https://github.com/apache/arrow-rs/pull/9236#discussion_r2715780311



##########
parquet/src/arrow/array_reader/byte_view_array.rs:
##########
@@ -373,32 +383,33 @@ impl ByteViewArrayDecoderPlain {
                 // The implementation keeps a water mark 
`utf8_validation_begin` to track the beginning of the buffer that is not 
validated.
                 // If the length is smaller than 128, then we continue to next 
string.
                 // If the length is larger than 128, then we validate the 
buffer before the length bytes, and move the water mark to the beginning of 
next string.
-                if len < 128 {
-                    // fast path, move to next string.
-                    // the len bytes are valid utf8.
-                } else {
+                if len >= 128 {
                     // unfortunately, the len bytes may not be valid utf8, we 
need to wrap up and validate everything before it.
-                    check_valid_utf8(unsafe {
-                        buf.get_unchecked(utf8_validation_begin..self.offset)
-                    })?;
+                    check_valid_utf8(unsafe { 
buf.get_unchecked(utf8_validation_begin..offset) })?;
                     // move the cursor to skip the len bytes.
                     utf8_validation_begin = start_offset;
                 }
             }
 
+            let view = make_view(

Review Comment:
   I now went with the unsafe optimization.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Improve parquet BinaryView / StringView decoder performance (up to -35%) [arrow-rs]

Reply via email to