> are unknown fields propagated through if the user only reads/modifies a row? I'm not sure I understand this question. Are you asking about handling schema changes? The wire format includes the number of fields in the schema, specifically so that we can detect when the schema changes. This is restricted to added or removed fields at the end of the schema. i.e. if we receive an element that says it has N more fields than the schema this coder was created with we assume the pipeline was updated with a schema that drops the last N fields and ignore the extra fields. Similarly if we receive an element with N fewer fields than we expect we'll just fill the last N fields with nulls. This logic is implemented in Python [1] and Java [2], but it's not exercised since no runners actually support pipeline update with schema changes.
> how does it work in a pipeline update scenario (downgrade / upgrade)? It's a standard coder with a defined spec [3] and tests in standard_coders.yaml [4] (although we could certainly use more coverage there) so I think pipeline update should work fine, unless I'm missing something. Brian [1] https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/row_coder.py#L177-L189 [2] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java#L341-L356 [3] https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L833-L864 [4] https://github.com/apache/beam/blob/master/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml#L344-L364 On Fri, Jun 12, 2020 at 3:32 PM Luke Cwik <[email protected]> wrote: > +Boyuan Zhang <[email protected]> > > On Fri, Jun 12, 2020 at 3:32 PM Luke Cwik <[email protected]> wrote: > >> What is the update / compat story around schemas? >> * are unknown fields propagated through if the user only reads/modifies a >> row? >> * how does it work in a pipeline update scenario (downgrade / upgrade)? >> >> Boyuan has been working on a Kafka via SDF source and have been trying to >> figure out which interchange format to use for the "source descriptors" >> that feed into the SDF. Some obvious choices are json, avro, proto, and >> Beam schemas all with their caveats. >> >> On Fri, Jun 12, 2020 at 1:32 PM Brian Hulette <[email protected]> >> wrote: >> >>> Thanks! I see there are jiras for SpannerIO and JdbcIO as part of that. >>> Are you planning on using row coder for them? >>> If so I want to make sure you're aware of >>> https://s.apache.org/beam-schema-io (sent to the dev list last week >>> [1]). +Scott Lukas <[email protected]> will be working on building out >>> the ideas there this summer. His work could be useful for making these IOs >>> cross-language (and you would get a mapping to SQL out of it without much >>> more effort). >>> >>> Brian >>> >>> [1] >>> https://lists.apache.org/thread.html/rc1695025d41c5dc38cdf7bc32bea0e7421379b1c543c2d82f69aa179%40%3Cdev.beam.apache.org%3E >>> >>> On Tue, Jun 2, 2020 at 9:30 AM Piotr Szuberski < >>> [email protected]> wrote: >>> >>>> Sure, I'll do that >>>> >>>> On 2020/05/28 17:54:49, Chamikara Jayalath <[email protected]> >>>> wrote: >>>> > Great. Thanks for working on this. Can you please add these tasks and >>>> JIRAs >>>> > to the cross-language transforms roadmap under "Connector/transform >>>> > support". >>>> > https://beam.apache.org/roadmap/connectors-multi-sdk/ >>>> > >>>> > Happy to help if you run into any issues during this task. >>>> > >>>> > <https://beam.apache.org/roadmap/connectors-multi-sdk/>Thanks, >>>> > Cham >>>> > >>>> > On Thu, May 28, 2020 at 9:59 AM Piotr Szuberski < >>>> [email protected]> >>>> > wrote: >>>> > >>>> > > I added to Jira task of creating cross-language wrappers for Java >>>> IOs. It >>>> > > will soon be in progress. >>>> > > >>>> > >>>> >>>
