etseidl commented on code in PR #6738:
URL: https://github.com/apache/arrow-rs/pull/6738#discussion_r1850842356
##########
parquet/src/file/statistics.rs:
##########
@@ -157,6 +157,32 @@ pub fn from_thrift(
stats.max_value
};
+ fn check_len(min: &Option<Vec<u8>>, max: &Option<Vec<u8>>, len:
usize) -> Result<()> {
+ if let Some(min) = min {
+ if min.len() < len {
+ return Err(ParquetError::General(
+ "Insufficient bytes to parse max
statistic".to_string(),
Review Comment:
```suggestion
"Insufficient bytes to parse min
statistic".to_string(),
```
##########
parquet/src/file/metadata/reader.rs:
##########
@@ -617,7 +617,8 @@ impl ParquetMetaDataReader {
for rg in t_file_metadata.row_groups {
row_groups.push(RowGroupMetaData::from_thrift(schema_descr.clone(), rg)?);
}
- let column_orders =
Self::parse_column_orders(t_file_metadata.column_orders, &schema_descr);
+ let column_orders =
+ Self::parse_column_orders(t_file_metadata.column_orders,
&schema_descr)?;
Review Comment:
I realize this would currently panic, but would one ever prefer to just set
`column_orders` to `None` and continue? The only impact AFAIK would be
statistics being unusable, which would only matter if predicates were in use.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]