[ https://issues.apache.org/jira/browse/BEAM-8953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16993919#comment-16993919 ]
Ryan Berti commented on BEAM-8953: ---------------------------------- Example implementation would be: * add 'abstract Builder setDataModel(GenericData model)` method to builders * utilize model value on ParquetIO.java:221 as argument to [https://javadoc.io/doc/org.apache.parquet/parquet-avro/1.10.1/org/apache/parquet/avro/AvroParquetReader.Builder.html] > Extend ParquetIO.Read/ReadFiles.Builder to support Avro GenericData model > ------------------------------------------------------------------------- > > Key: BEAM-8953 > URL: https://issues.apache.org/jira/browse/BEAM-8953 > Project: Beam > Issue Type: Improvement > Components: examples-java > Affects Versions: 2.16.0 > Reporter: Ryan Berti > Priority: Minor > > When utilizing ParquetIO to deserialize objects into case classes in Scala, > we'd like to utilize a downstream converter which takes GenericRecords and > converts them to instances of our case classes, rather than relying on > ParquetIO to deserialize into the case class via reflection + implementing > the IndexedRecord interface. > The ParquetIO.Read / ParquetIO.ReadFiles Builders currently support a > filepattern + schema / schema arguments respectively. When using the Read / > ReadFiles Builders with these arguments, the underlying AvroParquetReader > object that gets created in the ParquetIO.ReadFiles.ReadFn method defaults to > utilizing an AvroReadSupport instance whose GenericData model gets set to > SpecificData. We'd like to have the the underlying AvroReadSupport utilize > the GenericData model, but there's currently no way to force this to happen > via the existing ParquetIO Read / ReadFiles builders. > I'd like to extend the ParquetIO Read / ReadFiles builders to support a new > method allowing users to define a GenericData model, which will then be > passed into the AvroParquetReader builder. I've tested and validated that > this method allows ParquetIO to generate GenericRecord instances without > requiring that the users classes can be reflectively instantiated and > initialized via the IndexedRecord interface. -- This message was sent by Atlassian Jira (v8.3.4#803005)