Jane Lewis created AVRO-4063:
--------------------------------
Summary: `apache_avro::Writer::flush` does not call
`std::io::Write::flush` on the inner writer
Key: AVRO-4063
URL: https://issues.apache.org/jira/browse/AVRO-4063
Project: Apache Avro
Issue Type: Bug
Components: rust
Reporter: Jane Lewis
The Rust documentation for {{apache_avro::Writer::flush}} describes the
function as follows:
{quote}Flush the content appended to a {{{}Writer{}}}. Call this function to
make sure all the content has been written before releasing the {{{}Writer{}}}.
{quote}
However, this function does not actually guarantee that all the content will be
written out after the {{flush()}} call, because it does not call
{{std::io::Write::flush}} on the inner {{{}writer{}}}.
This can be a problem when the inner writer uses its own buffer. Here's an
example of how this can lead to misleading behavior:
{code:java}
fn main() {
let buffered_writer =
std::io::BufWriter::new(std::fs::File::create("test.avro").unwrap());
let schema = apache_avro::Schema::parse_str(
r#"
{
"type": "record",
"name": "example_schema",
"fields": [
{"name": "example_field", "type": "string"}
]
}
"#,
)
.unwrap();
let mut writer = apache_avro::Writer::new(&schema, buffered_writer);
let mut record = apache_avro::types::Record::new(writer.schema()).unwrap();
record.put("example_field", "value");
writer.append(record).unwrap();
writer.flush().unwrap();
let test_file_contents = std::fs::read("test.avro").unwrap();
assert_ne!(test_file_contents.len(), 0); // this will fail
}
{code}
In this example, the internal {{BufWriter}} had not yet flushed its internal
buffer after {{writer.flush().unwrap()}} was called. In fact, the buffer is
only written out once {{writer}} is dropped.
To fix this issue, I propose that {{.flush()}} should be called on the inner
writer at the end of {{{}apache_avro::Writer::flush{}}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)