etseidl commented on issue #6988:
URL: https://github.com/apache/arrow-rs/issues/6988#issuecomment-2597180117

   > 1. The file metadata in the PyArrow-produced Parquet bytes has a single 
SchemaElement with num_children: 0 and repetition_type: 0. The file metadata in 
the Rust-produced bytes does have the single SchemaElement, with the 
num_children: 0, but repetition_type is unspecified. This discrepancy leads to 
`schema::types::from_thrift_helper` throwing the error for the Rust-produced 
bytes.
   
   Actually, the lack of `repetition_type` is per the 
[spec](https://github.com/apache/parquet-format/blob/a498aa9a377edcdbc5da802cf9f1763a2e409411/src/main/thrift/parquet.thrift#L439)
   >   repetition of the field. The root of the schema does not have a 
repetition_type. All other nodes must have one
   
   The issue seems to be that since `num_children` is 0, it's assumed to be a 
leaf node rather than the root of the schema. I think we need to check for this 
case and return early from `from_thrift_helper`. I'll take a stab at this if no 
one beats me to it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to