kosiew opened a new issue, #8403:
URL: https://github.com/apache/arrow-rs/issues/8403
**Describe the bug**
Casting from `BinaryView` to `Utf8View` fails when encountering invalid
UTF-8, even with `CastOptions.safe = true`. This behavior is inconsistent with
other binary types in Arrow, which replace invalid UTF-8 sequences with `null`
when `safe=true`.
**To Reproduce**
```rust
#[test]
fn test_arrow_cast_binaryview_to_utf8view_fails_with_invalid_utf8() {
use arrow::compute::kernels::cast::{cast_with_options, CastOptions};
use arrow_array::{cast::AsArray, ArrayRef, BinaryViewArray};
use arrow_schema::DataType;
use std::sync::Arc;
let binary_data = vec![
Some("valid".as_bytes()),
Some(&[0xf0, 0x28, 0x8c, 0x28]), // invalid UTF-8 sequence
Some("also_valid".as_bytes()),
];
let binary_view_array: ArrayRef =
Arc::new(BinaryViewArray::from(binary_data));
// Try casting with safe=false (should fail)
let cast_options = CastOptions::default(); // safe=false by default
let result = cast_with_options(&binary_view_array, &DataType::Utf8View,
&cast_options);
assert!(
result.is_err(),
"Expected BinaryView->Utf8View cast to fail with safe=false"
);
assert!(
result
.unwrap_err()
.to_string()
.contains("Encountered non-UTF-8 data"),
"Error should mention non-UTF-8 data"
);
// Try casting with safe=true (should still fail, but this is unexpected)
let mut safe_cast_options = CastOptions::default();
safe_cast_options.safe = true;
let safe_result = cast_with_options(
&binary_view_array,
&DataType::Utf8View,
&safe_cast_options,
);
assert!(
safe_result.is_err(),
"BinaryView->Utf8View cast fails even with safe=true (unlike other
binary types)"
);
assert!(
safe_result
.unwrap_err()
.to_string()
.contains("Encountered non-UTF-8 data"),
"Safe cast error should also mention non-UTF-8 data"
);
}
```
**Expected behavior**
When using `CastOptions.safe = true`, invalid UTF-8 in a `BinaryView` array
should result in `null` values in the resulting `Utf8View` array, not a hard
failure—similar to how other binary array types behave.
**Additional context**
This behavior appears inconsistent and surprising. In other binary array
types, setting `safe=true` allows for graceful degradation (returning `null`s
for invalid entries). However, `BinaryView` does not follow this pattern and
fails even when safe casting is requested.
Let me know if this behavior is intentional, or if a fix would be welcomed!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]