tustvold commented on code in PR #4885:
URL: https://github.com/apache/arrow-rs/pull/4885#discussion_r1342557037
##########
parquet/src/bloom_filter/mod.rs:
##########
@@ -289,11 +293,12 @@ impl Sbbf {
// this match exists to future proof the singleton hash enum
}
}
- // length in bytes
- let length: usize = header.num_bytes.try_into().map_err(|_| {
- ParquetError::General("Bloom filter length is invalid".to_string())
- })?;
Review Comment:
We appear to have lost this check
##########
parquet/src/file/writer.rs:
##########
@@ -267,12 +267,19 @@ impl<W: Write + Send> SerializedFileWriter<W> {
Some(bloom_filter) => {
let start_offset = self.buf.bytes_written();
bloom_filter.write(&mut self.buf)?;
+ let end_offset = self.buf.bytes_written();
// set offset and index for bloom filter
column_chunk
.meta_data
.as_mut()
.expect("can't have bloom filter without column
metadata")
.bloom_filter_offset = Some(start_offset as i64);
+ column_chunk
+ .meta_data
+ .as_mut()
+ .expect("can't have bloom filter without column
metadata")
+ .bloom_filter_length =
+ Some((end_offset - start_offset) as i32)
Review Comment:
```suggestion
let meta =
column_chunk.meta_data.as_mut().expect("can't have bloom filter without column
metadata");
meta.bloom_filter_offset = Some(start_offset as i64);
meta.bloom_filter_length = Some((end_offset -
start_offset) as i32);
```
Or something to that effect
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]