friendlymatthew commented on code in PR #7908: URL: https://github.com/apache/arrow-rs/pull/7908#discussion_r2202058456
########## parquet-variant/src/utils.rs: ########## @@ -84,6 +84,16 @@ pub(crate) fn string_from_slice( .map_err(|_| ArrowError::InvalidArgumentError("invalid UTF-8 string".to_string())) } +/// Extracts a byte slice from the given range and validates it as UTF-8. +pub(crate) fn extract_and_validate_utf8_slice( + bytes: &[u8], + range: Range<usize>, +) -> Result<&str, ArrowError> { + let offset_buffer = slice_from_slice(bytes, range)?; + simdutf8::basic::from_utf8(offset_buffer) + .map_err(|_| ArrowError::InvalidArgumentError("invalid UTF-8 string".to_string())) +} + Review Comment: This looks great. I think @viirya pointed out that if this function errs, the error message will contain the entire byte slice. Which isn't the best error message I wonder if we could use something like: https://github.com/apache/arrow-rs/blob/7b219f98c25fcd318a0c207f51a41398d1b23724/parquet/src/util/utf8.rs#L40-L57 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org