friendlymatthew opened a new issue, #7902: URL: https://github.com/apache/arrow-rs/issues/7902
This is a follow up on https://github.com/apache/arrow-rs/pull/7878 The [variant spec](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#:~:text=The%20last%20part%20of%20the%20metadata%20is%20bytes%2C%20which%20stores%20all%20the%20string%20values%20in%20the%20dictionary.%20All%20string%20values%20must%20be%20UTF%2D8%20encoded%20strings.) states the string values in the metadata dictionary must be UTF-8 encoded strings. We do this check here: https://github.com/apache/arrow-rs/blob/387490a7a97a9ea6d2fcd0105e6a1abaf819a386/parquet-variant/src/variant/metadata.rs#L250-L252 Since we offer `simdutf8` as an optional dependency in other crates, we could do the same when performing the validation above. See @Dandandan's [comment](https://github.com/apache/arrow-rs/pull/7878#discussion_r2197556647). The rough idea being: If `simdutf8` is supported, do: ```rs let value_str = simdutf8::basic::from_utf8(value_buffer)?; ``` else, default to the existing implementation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org