How do you set the coder on your collections of GenericRecord? My claim is that it's impossible to create a PCollection of GenericRecord without knowing its schema => you actually have the schema => I'm not sure why you can't just pass it to write(). What am I missing? Are you perhaps using a different coder for GenericRecord, eg are you encoding the full schema with every record? I guess this will become more clear if I look at the PR.
On Tue, Oct 3, 2017, 8:40 AM Etienne Chauchot (JIRA) <j...@apache.org> wrote: > > [ > https://issues.apache.org/jira/browse/BEAM-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189872#comment-16189872 > ] > > Etienne Chauchot commented on BEAM-2993: > ---------------------------------------- > > You're right, I simplified a bit the use case.:) The complete use case is > more complicated. We generate beam code and every collection element is a > GenericRecord no matter what the initial read or the upstream transforms > were. We need to write these elements. > > But nevermind, the core thing is that: as any Avro record knows its > schema, passing the schema should not be mandatory for writing as it is now > (passing it in {{write(schema)}} or {{withSchema}} which will end up in a > {{DynamicAvroDestinations}} or directly in a custom > {{DynamicAvroDestinations}} as I did in the code above). We should either > get the schema from {{DynamicAvroDestinations}} if it is available or lazy > determine it just before writing the elements out of those elements. > > I'm preparing a PR to do this, I'm almost done. I'll give it for reviewing > if you have a bit of time. > > > > AvroIO.write without specifying a schema > > ---------------------------------------- > > > > Key: BEAM-2993 > > URL: https://issues.apache.org/jira/browse/BEAM-2993 > > Project: Beam > > Issue Type: Improvement > > Components: sdk-java-extensions > > Reporter: Etienne Chauchot > > Assignee: Etienne Chauchot > > > > Similarly to https://issues.apache.org/jira/browse/BEAM-2677, we should > be able to write to avro files using {{AvroIO}} without specifying a schema > at build time. Consider the following use case: a user has a > {{PCollection<GenericRecord>}} but the schema is only known while running > the pipeline. {{AvroIO.writeGenericRecords}} needs the schema, but the > schema is already available in {{GenericRecord}}. We should be able to call > {{AvroIO.writeGenericRecords()}} with no schema. > > > > -- > This message was sent by Atlassian JIRA > (v6.4.14#64029) >