alamb commented on code in PR #7481:
URL: https://github.com/apache/arrow-rs/pull/7481#discussion_r2078209996


##########
parquet/src/arrow/arrow_reader/filter.rs:
##########
@@ -108,11 +109,18 @@ where
 /// not contiguous.
 ///
 /// [`RowSelection`]: crate::arrow::arrow_reader::RowSelection
+
 pub struct RowFilter {
     /// A list of [`ArrowPredicate`]
     pub(crate) predicates: Vec<Box<dyn ArrowPredicate>>,
 }
 
+impl Debug for RowFilter {
+    fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
+        write!(f, "RowFilter {{ {} predicates: }}", self.predicates.len())

Review Comment:
   ArrowPredicate doesn't implement `Debug`



##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -506,37 +525,47 @@ impl ArrowReaderMetadata {
         // parquet_to_arrow_field_levels is expected to throw an error if the 
schemas have
         // different lengths, but we check here to be safe.
         if inferred_len != supplied_len {
-            Err(arrow_err!(format!(
-                "incompatible arrow schema, expected {} columns received {}",
+            return Err(arrow_err!(format!(
+                "Incompatible supplied Arrow schema: expected {} columns 
received {}",
                 inferred_len, supplied_len
-            )))
-        } else {
-            let diff_fields: Vec<_> = supplied_schema
-                .fields()
-                .iter()
-                .zip(fields.iter())
-                .filter_map(|(field1, field2)| {
-                    if field1 != field2 {
-                        Some(field1.name().clone())
-                    } else {
-                        None
-                    }
-                })
-                .collect();
+            )));
+        }
 
-            if !diff_fields.is_empty() {
-                Err(ParquetError::ArrowError(format!(
-                    "incompatible arrow schema, the following fields could not 
be cast: [{}]",
-                    diff_fields.join(", ")
-                )))
-            } else {
-                Ok(Self {
-                    metadata,
-                    schema: supplied_schema,
-                    fields: field_levels.levels.map(Arc::new),
-                })
+        let mut errors = Vec::new();

Review Comment:
   Improving this message is the point of the PR
   
   I also relaxed the check slightly so this will now allow the fields to 
differ in metadata where previously this would return an error. There is no 
test coverage for mismatched schemas
   
   FYI @paleolimbot  in case you have any wisdom to share here



##########
arrow-array/src/record_batch.rs:
##########
@@ -1073,8 +1073,8 @@ mod tests {
 
         let a = Int64Array::from(vec![1, 2, 3, 4, 5]);
 
-        let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(a)]);
-        assert!(batch.is_err());
+        let err = RecordBatch::try_new(Arc::new(schema), 
vec![Arc::new(a)]).unwrap_err();

Review Comment:
   drive by cleanup



##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -3462,7 +3497,7 @@ mod tests {
                 Field::new("col2_valid", ArrowDataType::Int32, false),
                 Field::new("col3_invalid", ArrowDataType::Int32, false),
             ])),
-            "Arrow: incompatible arrow schema, the following fields could not 
be cast: [col1_invalid, col3_invalid]",
+            "Arrow: Incompatible supplied Arrow schema: data type mismatch for 
field col1_invalid: requested Int32 but found Int64, data type mismatch for 
field col3_invalid: requested Int32 but found Int64",

Review Comment:
   this is a pretty good example of the before / after error messages. I would 
feel much better trying to debug the new message than the old



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to