[ 
https://issues.apache.org/jira/browse/BEAM-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192568#comment-16192568
 ] 

Etienne Chauchot commented on BEAM-2993:
----------------------------------------

thanks [~jkff] for your points:
* yes it works with the side input example above. What I propose is an 
improvement of the AvroIO even if we can workaround using the side Input and 
the {{DynamicAvroDestiantions}}
* in the PR that I'm about to send, it indeed choses the schema of the "first" 
(but PCollection is not ordered) element of the PCollection. So, the schema 
needs to be the same for all elements of the PCollection. This is the case in 
our use case. But the current implementation {{write(SCHEMA)}}, 
{{write(class)}} or {{writeGenericRecords(SCHEMA)}} also needs all the elements 
of the PCollection to have {{SCHEMA}} as a schema because this schema is passed 
to the {{TypedWrite}} then to the {{ConstantAvroDestination}}. Or am I missing 
something?
*As PCollection elements have the same schema in our use case, there is no 
point of grouping per schema.  And moreover, if we have the ability to do 
{{AvroIO.write()}} I guess most of the interests of having a network schema 
registry become null, except maybe for the lazy avro coder to avoid doing an 
{{element.getSchema()}} each time we {{encode}} or {{decode}} an element

PS: please note that I used {{GenericRecord}} rather than parent 
{{IndexedRecord}} to describe our use case in the previous comments to stick to 
the generic object chosen in AvroIO :)


> AvroIO.write without specifying a schema
> ----------------------------------------
>
>                 Key: BEAM-2993
>                 URL: https://issues.apache.org/jira/browse/BEAM-2993
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
>
> Similarly to https://issues.apache.org/jira/browse/BEAM-2677, we should be 
> able to write to avro files using {{AvroIO}} without specifying a schema at 
> build time. Consider the following use case: a user has a 
> {{PCollection<GenericRecord>}}  but the schema is only known while running 
> the pipeline.  {{AvroIO.writeGenericRecords}} needs the schema, but the 
> schema is already available in {{GenericRecord}}. We should be able to call 
> {{AvroIO.writeGenericRecords()}} with no schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to