[ 
https://issues.apache.org/jira/browse/BEAM-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16190947#comment-16190947
 ] 

Etienne Chauchot commented on BEAM-2993:
----------------------------------------

Your questions are rightful. In more detail, we use the lazy avro coder I was 
talking about. It is responsible for determining the schema at runtime and 
delegate to the AvroCoder. The thing is that it also stores the obtained schema 
to a network registry service. We find it a bad idea to call the network 
registry before writing just to get back the schema while we can avoid passing 
it to the write transform. But I know, it entails calling once again (in 
addition to the call in lazy avro coder)  {{GenericRecord.getSchema()}}. 
[~ryanskraba] feel free to comment if you have anything to add.

> AvroIO.write without specifying a schema
> ----------------------------------------
>
>                 Key: BEAM-2993
>                 URL: https://issues.apache.org/jira/browse/BEAM-2993
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
>
> Similarly to https://issues.apache.org/jira/browse/BEAM-2677, we should be 
> able to write to avro files using {{AvroIO}} without specifying a schema at 
> build time. Consider the following use case: a user has a 
> {{PCollection<GenericRecord>}}  but the schema is only known while running 
> the pipeline.  {{AvroIO.writeGenericRecords}} needs the schema, but the 
> schema is already available in {{GenericRecord}}. We should be able to call 
> {{AvroIO.writeGenericRecords()}} with no schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to