alamb commented on code in PR #7481:
URL: https://github.com/apache/arrow-rs/pull/7481#discussion_r2078209996
##########
parquet/src/arrow/arrow_reader/filter.rs:
##########
@@ -108,11 +109,18 @@ where
/// not contiguous.
///
/// [`RowSelection`]: crate::arrow::arrow_reader::RowSelection
+
pub struct RowFilter {
/// A list of [`ArrowPredicate`]
pub(crate) predicates: Vec<Box<dyn ArrowPredicate>>,
}
+impl Debug for RowFilter {
+ fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
+ write!(f, "RowFilter {{ {} predicates: }}", self.predicates.len())
Review Comment:
ArrowPredicate doesn't implement `Debug`
##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -506,37 +525,47 @@ impl ArrowReaderMetadata {
// parquet_to_arrow_field_levels is expected to throw an error if the
schemas have
// different lengths, but we check here to be safe.
if inferred_len != supplied_len {
- Err(arrow_err!(format!(
- "incompatible arrow schema, expected {} columns received {}",
+ return Err(arrow_err!(format!(
+ "Incompatible supplied Arrow schema: expected {} columns
received {}",
inferred_len, supplied_len
- )))
- } else {
- let diff_fields: Vec<_> = supplied_schema
- .fields()
- .iter()
- .zip(fields.iter())
- .filter_map(|(field1, field2)| {
- if field1 != field2 {
- Some(field1.name().clone())
- } else {
- None
- }
- })
- .collect();
+ )));
+ }
- if !diff_fields.is_empty() {
- Err(ParquetError::ArrowError(format!(
- "incompatible arrow schema, the following fields could not
be cast: [{}]",
- diff_fields.join(", ")
- )))
- } else {
- Ok(Self {
- metadata,
- schema: supplied_schema,
- fields: field_levels.levels.map(Arc::new),
- })
+ let mut errors = Vec::new();
Review Comment:
Improving this message is the point of the PR
I also relaxed the check slightly so this will now allow the fields to
differ in metadata where previously this would return an error. There is no
test coverage for mismatched schemas
FYI @paleolimbot in case you have any wisdom to share here
##########
arrow-array/src/record_batch.rs:
##########
@@ -1073,8 +1073,8 @@ mod tests {
let a = Int64Array::from(vec![1, 2, 3, 4, 5]);
- let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(a)]);
- assert!(batch.is_err());
+ let err = RecordBatch::try_new(Arc::new(schema),
vec![Arc::new(a)]).unwrap_err();
Review Comment:
drive by cleanup
##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -3462,7 +3497,7 @@ mod tests {
Field::new("col2_valid", ArrowDataType::Int32, false),
Field::new("col3_invalid", ArrowDataType::Int32, false),
])),
- "Arrow: incompatible arrow schema, the following fields could not
be cast: [col1_invalid, col3_invalid]",
+ "Arrow: Incompatible supplied Arrow schema: data type mismatch for
field col1_invalid: requested Int32 but found Int64, data type mismatch for
field col3_invalid: requested Int32 but found Int64",
Review Comment:
this is a pretty good example of the before / after error messages. I would
feel much better trying to debug the new message than the old
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]