How do you set the coder on your collections of GenericRecord? My claim is
that it's impossible to create a PCollection of GenericRecord without
knowing its schema => you actually have the schema => I'm not sure why you
can't just pass it to write(). What am I missing? Are you perhaps using a
different coder for GenericRecord, eg are you encoding the full schema with
every record? I guess this will become more clear if I look at the PR.

On Tue, Oct 3, 2017, 8:40 AM Etienne Chauchot (JIRA) <j...@apache.org>
wrote:

>
>     [
> https://issues.apache.org/jira/browse/BEAM-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189872#comment-16189872
> ]
>
> Etienne Chauchot commented on BEAM-2993:
> ----------------------------------------
>
> You're right, I simplified a bit the use case.:) The complete use case is
> more complicated. We generate beam code and every collection element is a
> GenericRecord no matter what the initial read or the upstream transforms
> were. We need to write these elements.
>
> But nevermind, the core thing is that: as any Avro record knows its
> schema, passing the schema should not be mandatory for writing as it is now
> (passing it in {{write(schema)}} or {{withSchema}} which will end up in a
> {{DynamicAvroDestinations}} or directly in a custom
> {{DynamicAvroDestinations}} as I did in the code above). We should either
> get the schema from {{DynamicAvroDestinations}} if it is available or lazy
> determine it just before writing the elements out of those elements.
>
> I'm preparing a PR to do this, I'm almost done. I'll give it for reviewing
> if you have a bit of time.
>
>
> > AvroIO.write without specifying a schema
> > ----------------------------------------
> >
> >                 Key: BEAM-2993
> >                 URL: https://issues.apache.org/jira/browse/BEAM-2993
> >             Project: Beam
> >          Issue Type: Improvement
> >          Components: sdk-java-extensions
> >            Reporter: Etienne Chauchot
> >            Assignee: Etienne Chauchot
> >
> > Similarly to https://issues.apache.org/jira/browse/BEAM-2677, we should
> be able to write to avro files using {{AvroIO}} without specifying a schema
> at build time. Consider the following use case: a user has a
> {{PCollection<GenericRecord>}}  but the schema is only known while running
> the pipeline.  {{AvroIO.writeGenericRecords}} needs the schema, but the
> schema is already available in {{GenericRecord}}. We should be able to call
> {{AvroIO.writeGenericRecords()}} with no schema.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.4.14#64029)
>

Reply via email to