[ 
https://issues.apache.org/jira/browse/BEAM-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192129#comment-16192129
 ] 

Eugene Kirpichov commented on BEAM-2993:
----------------------------------------

OK, thanks for the explanations. A couple more questions:

- Does AvroIO.write().to(DynamicDestinations) work for you? It seems like what 
you have is a very specialized use case (I've never seen nor imagined anything 
like it), so if an existing solution does the job, then it might be best to 
just use that rather than develop a new feature guided only by a single very 
exotic use case.
- Suppose a schemaless AvroIO.write() was implemented, and suppose you give it 
a PCollection<GenericRecord> that happens to contain records with many 
different schemas. What should it do? Should it group them by schema? Should it 
simply fail? Should it use the schema of a (non-deterministically chosen) 
"first" record in each generated file and hope that other records have the same 
schema?
- Would it make things easier, if instead of PCollection<IndexedRecord> you 
operated in terms of PCollection<SchemaRefAndRecord> where SchemaRefAndRecord 
is your custom type { String schemaURI; GenericRecord record; }, with a custom 
coder for it that fetches the schema over the network from a schema registry by 
URI or something? And then when writing to AvroIO, you'd go down the path of 
DynamicDestinations and group by schemaURI before writing (i.e. use it as a 
destination type); and it would be up to your code to ensure that the schema 
URIs are unique.

> AvroIO.write without specifying a schema
> ----------------------------------------
>
>                 Key: BEAM-2993
>                 URL: https://issues.apache.org/jira/browse/BEAM-2993
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
>
> Similarly to https://issues.apache.org/jira/browse/BEAM-2677, we should be 
> able to write to avro files using {{AvroIO}} without specifying a schema at 
> build time. Consider the following use case: a user has a 
> {{PCollection<GenericRecord>}}  but the schema is only known while running 
> the pipeline.  {{AvroIO.writeGenericRecords}} needs the schema, but the 
> schema is already available in {{GenericRecord}}. We should be able to call 
> {{AvroIO.writeGenericRecords()}} with no schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to