nathaniel-d-ef commented on PR #17861:
URL: https://github.com/apache/datafusion/pull/17861#issuecomment-3511836639
> Hello! @nathaniel-d-ef I have a question. I tried to read a file with the
following metadata using arrow-avro and got an error. is it unable to read
files whose root record name is not `topLevelRecord`?
>
> ```shell
> ➜ avro-tools getmeta ./testing/data/avro/nested_records.avro
> avro.codec null
> avro.schema
{"type":"record","namespace":"ns1","name":"record1","fields":[{"name":"f1","type":{"type":"record","namespace":"ns2","name":"record2","fields":[{"name":"f1_1","type":"string"},{"name":"f1_2","type":"int"},{"name":"f1_3","type":{"type":"record","namespace":"ns3","name":"record3","fields":[{"name":"f1_3_1","type":"double"}]}}]}},{"name":"f2","type":{"type":"array","items":{"type":"record","namespace":"ns4","name":"record4","fields":[{"name":"f2_1","type":"boolean"},{"name":"f2_2","type":"float"}]}}},{"name":"f3","type":["null",{"type":"record","namespace":"ns5","name":"record5","fields":[{"name":"f3_1","type":"string"}]}],"default":null},{"name":"f4","type":{"type":"array","items":["null",{"type":"record","namespace":"ns6","name":"record6","fields":[{"name":"f4_1","type":"long"}]}]}}]}
> ```
Hey @getChan, I came across this same issue on Friday while working on an
implementation of the writer. The `arrow-avro` reader can absolutely handle a
schema with a custom name; there are thorough tests in the crate that
demonstrate this. What I think is going on here is that the name is lost in the
Original Avro -> DataFusion SchemaRef -> Projected Avro process. The
`AvroSchema::try_from()` in AvroSource generates a projected schema without the
name. We need that optimized schema in order to provide the ReaderBuilder with
the correct projection. In other words, it works fine when passing a named
schema directly to the ReaderBuilder, but not one that has been funneled
through the optimizations of DataFusion via Arrow, where some contextual
information is lost.
@jecsand838 any thoughts on this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]