TrevorADHD opened a new issue, #8380:
URL: https://github.com/apache/arrow-rs/issues/8380
#### Describe the bug
Try to run the example for demonstrating how to use Utf8View support in the
Arrow Avro reader. But failed.
The example reads an Avro file with string data twice - once with regular
StringArray and once with StringViewArray as description.
The root cause is the clone
```rust
let file_for_view = file.try_clone()?;
```
In rust lib's doc, the
```
Creates a new File instance that shares the same underlying file handle as
the existing File instance. Reads, writes, and seeks will affect both File
instances simultaneously.
```
So while try to decode the header of avro, the handle have moved the offset,
then it will get 0 size header and failed.
```rust
let file = File::open(file_path)?;
let file_for_view = file.try_clone()?;
let start = Instant::now();
let reader = BufReader::new(file);
let avro_reader = ReaderBuilder::new().build(reader)?;
let schema = avro_reader.schema();
let batches: Vec<RecordBatch> = avro_reader.collect::<Result<_, _>>()?;
let regular_duration = start.elapsed();
let start = Instant::now();
// here the file handle will be consume twice and lead to the error.
let reader_view = BufReader::new(file_for_view);
let avro_reader_view = ReaderBuilder::new()
.with_utf8_view(true)
.build(reader_view)?;
let batches_view: Vec<RecordBatch> =
avro_reader_view.collect::<Result<_, _>>()?;
let view_duration = start.elapsed();
```
#### To Reproduce
```shell
(base) ➜ arrow-rs git:(main) ✗ cd arrow-avro
(base) ➜ arrow-avro git:(main) ✗ cargo run --package arrow-avro --example
read_with_utf8view -- test/data/nested_record_reuse.avro
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.33s
Running
`/Users/trevor.wang/Workspace/rust/arrow-rs/target/debug/examples/read_with_utf8view
test/data/nested_record_reuse.avro`
Error: ParseError("Unexpected EOF while reading Avro header")
```
#### Expected behavior
Example can be executed successfully and give the performance result.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]