The change should be schema change, mostly adding new fields.

On Mon, Jun 15, 2020 at 11:32 AM Brian Hulette <bhule...@google.com> wrote:

>
>
> On Mon, Jun 15, 2020 at 11:12 AM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> On Fri, Jun 12, 2020 at 4:12 PM Brian Hulette <bhule...@google.com>
>> wrote:
>>
>>> > are unknown fields propagated through if the user only reads/modifies
>>> a row?
>>> I'm not sure I understand this question. Are you asking about handling
>>> schema changes?
>>> The wire format includes the number of fields in the schema,
>>> specifically so that we can detect when the schema changes. This is
>>> restricted to added or removed fields at the end of the schema. i.e. if we
>>> receive an element that says it has N more fields than the schema this
>>> coder was created with we assume the pipeline was updated with a schema
>>> that drops the last N fields and ignore the extra fields. Similarly if we
>>> receive an element with N fewer fields than we expect we'll just fill the
>>> last N fields with nulls.
>>> This logic is implemented in Python [1] and Java [2], but it's not
>>> exercised since no runners actually support pipeline update with schema
>>> changes.
>>>
>>> > how does it work in a pipeline update scenario (downgrade / upgrade)?
>>> It's a standard coder with a defined spec [3] and tests in
>>> standard_coders.yaml [4] (although we could certainly use more coverage
>>> there) so I think pipeline update should work fine, unless I'm missing
>>> something.
>>>
>>
>> The big question is whether the pipeline update will be rejected due to
>> the Coder having "changed."
>>
>>
>
> Do you mean changed because the schema has changed, or due to the vagaries
> of Java serialization?
>
>
>> Brian
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/row_coder.py#L177-L189
>>> [2]
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java#L341-L356
>>> [3]
>>> https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L833-L864
>>> [4]
>>> https://github.com/apache/beam/blob/master/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml#L344-L364
>>>
>>> On Fri, Jun 12, 2020 at 3:32 PM Luke Cwik <lc...@google.com> wrote:
>>>
>>>> +Boyuan Zhang <boyu...@google.com>
>>>>
>>>> On Fri, Jun 12, 2020 at 3:32 PM Luke Cwik <lc...@google.com> wrote:
>>>>
>>>>> What is the update / compat story around schemas?
>>>>> * are unknown fields propagated through if the user only
>>>>> reads/modifies a row?
>>>>> * how does it work in a pipeline update scenario (downgrade / upgrade)?
>>>>>
>>>>> Boyuan has been working on a Kafka via SDF source and have been trying
>>>>> to figure out which interchange format to use for the "source descriptors"
>>>>> that feed into the SDF. Some obvious choices are json, avro, proto, and
>>>>> Beam schemas all with their caveats.
>>>>>
>>>>> On Fri, Jun 12, 2020 at 1:32 PM Brian Hulette <bhule...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks! I see there are jiras for SpannerIO and JdbcIO as part of
>>>>>> that. Are you planning on using row coder for them?
>>>>>> If so I want to make sure you're aware of
>>>>>> https://s.apache.org/beam-schema-io (sent to the dev list last week
>>>>>> [1]). +Scott Lukas <slu...@google.com> will be working on building
>>>>>> out the ideas there this summer. His work could be useful for making 
>>>>>> these
>>>>>> IOs cross-language (and you would get a mapping to SQL out of it without
>>>>>> much more effort).
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>> [1]
>>>>>> https://lists.apache.org/thread.html/rc1695025d41c5dc38cdf7bc32bea0e7421379b1c543c2d82f69aa179%40%3Cdev.beam.apache.org%3E
>>>>>>
>>>>>> On Tue, Jun 2, 2020 at 9:30 AM Piotr Szuberski <
>>>>>> piotr.szuber...@polidea.com> wrote:
>>>>>>
>>>>>>> Sure, I'll do that
>>>>>>>
>>>>>>> On 2020/05/28 17:54:49, Chamikara Jayalath <chamik...@google.com>
>>>>>>> wrote:
>>>>>>> > Great. Thanks for working on this. Can you please add these tasks
>>>>>>> and JIRAs
>>>>>>> > to the cross-language transforms roadmap under "Connector/transform
>>>>>>> > support".
>>>>>>> > https://beam.apache.org/roadmap/connectors-multi-sdk/
>>>>>>> >
>>>>>>> > Happy to help if you run into any issues during this task.
>>>>>>> >
>>>>>>> > <https://beam.apache.org/roadmap/connectors-multi-sdk/>Thanks,
>>>>>>> > Cham
>>>>>>> >
>>>>>>> > On Thu, May 28, 2020 at 9:59 AM Piotr Szuberski <
>>>>>>> piotr.szuber...@polidea.com>
>>>>>>> > wrote:
>>>>>>> >
>>>>>>> > > I added to Jira task of creating cross-language wrappers for
>>>>>>> Java IOs. It
>>>>>>> > > will soon be in progress.
>>>>>>> > >
>>>>>>> >
>>>>>>>
>>>>>>

Reply via email to