[ https://issues.apache.org/jira/browse/BEAM-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192129#comment-16192129 ]
Eugene Kirpichov commented on BEAM-2993: ---------------------------------------- OK, thanks for the explanations. A couple more questions: - Does AvroIO.write().to(DynamicDestinations) work for you? It seems like what you have is a very specialized use case (I've never seen nor imagined anything like it), so if an existing solution does the job, then it might be best to just use that rather than develop a new feature guided only by a single very exotic use case. - Suppose a schemaless AvroIO.write() was implemented, and suppose you give it a PCollection<GenericRecord> that happens to contain records with many different schemas. What should it do? Should it group them by schema? Should it simply fail? Should it use the schema of a (non-deterministically chosen) "first" record in each generated file and hope that other records have the same schema? - Would it make things easier, if instead of PCollection<IndexedRecord> you operated in terms of PCollection<SchemaRefAndRecord> where SchemaRefAndRecord is your custom type { String schemaURI; GenericRecord record; }, with a custom coder for it that fetches the schema over the network from a schema registry by URI or something? And then when writing to AvroIO, you'd go down the path of DynamicDestinations and group by schemaURI before writing (i.e. use it as a destination type); and it would be up to your code to ensure that the schema URIs are unique. > AvroIO.write without specifying a schema > ---------------------------------------- > > Key: BEAM-2993 > URL: https://issues.apache.org/jira/browse/BEAM-2993 > Project: Beam > Issue Type: Improvement > Components: sdk-java-extensions > Reporter: Etienne Chauchot > Assignee: Etienne Chauchot > > Similarly to https://issues.apache.org/jira/browse/BEAM-2677, we should be > able to write to avro files using {{AvroIO}} without specifying a schema at > build time. Consider the following use case: a user has a > {{PCollection<GenericRecord>}} but the schema is only known while running > the pipeline. {{AvroIO.writeGenericRecords}} needs the schema, but the > schema is already available in {{GenericRecord}}. We should be able to call > {{AvroIO.writeGenericRecords()}} with no schema. -- This message was sent by Atlassian JIRA (v6.4.14#64029)