alamb opened a new issue, #9157:
URL: https://github.com/apache/arrow-rs/issues/9157

   **Describe the bug**
   @jhorstmann  points out 
https://github.com/apache/arrow-rs/pull/9058/changes#r2683034659
   
   > Unrelated to the performance improvement: I think this also needs to 
assert that data_type equals T::DATA_TYPE, otherwise it allows unchecked 
casting from binary to string without utf8 validation.
   
   Basically, by (mis) using safe APIs it is possible to convert a binary view 
array to Utf8View and bypass the Utf8 check.
   
   **To Reproduce**
   ```rust
       #[test]
       #[should_panic(expected = "Invalid UTF-8")]
       fn invalid_casting_from_array_data() {
           let array = GenericByteViewArray::<BinaryViewType>::from(vec![
               b"aaaaaaaaaaaaaaaaaaaaaaaaaaa" as &[u8],
               &[
                   0xf0, 0x80, 0x80, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x00,
                   0x00, 0x00,
               ],
               b"good",
           ]);
           assert!(String::from_utf8(array.value(0).to_vec()).is_ok());
           // value 1 is invalid utf8
           assert!(String::from_utf8(array.value(1).to_vec()).is_err());
           assert!(String::from_utf8(array.value(0).to_vec()).is_ok());
   
           // Should not be able to cast to StringViewArray due to invalid UTF-8
           let array_data: arrow_data::ArrayData = array.into();
           let _ = StringViewArray::from(array_data);
       }
   ```
   **Expected behavior**
   The conversion should panic given the incorrect data type
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to