etseidl commented on issue #6988: URL: https://github.com/apache/arrow-rs/issues/6988#issuecomment-2597180117
> 1. The file metadata in the PyArrow-produced Parquet bytes has a single SchemaElement with num_children: 0 and repetition_type: 0. The file metadata in the Rust-produced bytes does have the single SchemaElement, with the num_children: 0, but repetition_type is unspecified. This discrepancy leads to `schema::types::from_thrift_helper` throwing the error for the Rust-produced bytes. Actually, the lack of `repetition_type` is per the [spec](https://github.com/apache/parquet-format/blob/a498aa9a377edcdbc5da802cf9f1763a2e409411/src/main/thrift/parquet.thrift#L439) > repetition of the field. The root of the schema does not have a repetition_type. All other nodes must have one The issue seems to be that since `num_children` is 0, it's assumed to be a leaf node rather than the root of the schema. I think we need to check for this case and return early from `from_thrift_helper`. I'll take a stab at this if no one beats me to it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
