[ 
https://issues.apache.org/jira/browse/BEAM-8953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16993919#comment-16993919
 ] 

Ryan Berti commented on BEAM-8953:
----------------------------------

Example implementation would be:
 * add 'abstract Builder setDataModel(GenericData model)` method to builders
 * utilize model value on ParquetIO.java:221 as argument to 
[https://javadoc.io/doc/org.apache.parquet/parquet-avro/1.10.1/org/apache/parquet/avro/AvroParquetReader.Builder.html]

> Extend ParquetIO.Read/ReadFiles.Builder to support Avro GenericData model
> -------------------------------------------------------------------------
>
>                 Key: BEAM-8953
>                 URL: https://issues.apache.org/jira/browse/BEAM-8953
>             Project: Beam
>          Issue Type: Improvement
>          Components: examples-java
>    Affects Versions: 2.16.0
>            Reporter: Ryan Berti
>            Priority: Minor
>
> When utilizing ParquetIO to deserialize objects into case classes in Scala, 
> we'd like to utilize a downstream converter which takes GenericRecords and 
> converts them to instances of our case classes, rather than relying on 
> ParquetIO to deserialize into the case class via reflection + implementing 
> the IndexedRecord interface.
> The ParquetIO.Read / ParquetIO.ReadFiles Builders currently support a 
> filepattern + schema / schema arguments respectively. When using the Read / 
> ReadFiles Builders with these arguments, the underlying AvroParquetReader 
> object that gets created in the ParquetIO.ReadFiles.ReadFn method defaults to 
> utilizing an AvroReadSupport instance whose GenericData model gets set to 
> SpecificData. We'd like to have the the underlying AvroReadSupport utilize 
> the GenericData model, but there's currently no way to force this to happen 
> via the existing ParquetIO Read / ReadFiles builders. 
> I'd like to extend the ParquetIO Read / ReadFiles builders to support a new 
> method allowing users to define a GenericData model, which will then be 
> passed into the AvroParquetReader builder. I've tested and validated that 
> this method allows ParquetIO to generate GenericRecord instances without 
> requiring that the users classes can be reflectively instantiated and 
> initialized via the IndexedRecord interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to