Re: Python Cross-language wrappers for Java IOs

Brian Hulette Fri, 12 Jun 2020 16:13:06 -0700

> are unknown fields propagated through if the user only reads/modifies a
row?
I'm not sure I understand this question. Are you asking about handling
schema changes?
The wire format includes the number of fields in the schema, specifically
so that we can detect when the schema changes. This is restricted to added
or removed fields at the end of the schema. i.e. if we receive an element
that says it has N more fields than the schema this coder was created with
we assume the pipeline was updated with a schema that drops the last N
fields and ignore the extra fields. Similarly if we receive an element with
N fewer fields than we expect we'll just fill the last N fields with nulls.
This logic is implemented in Python [1] and Java [2], but it's not
exercised since no runners actually support pipeline update with schema
changes.


> how does it work in a pipeline update scenario (downgrade / upgrade)?
It's a standard coder with a defined spec [3] and tests in
standard_coders.yaml [4] (although we could certainly use more coverage
there) so I think pipeline update should work fine, unless I'm missing
something.

Brian

[1]
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/row_coder.py#L177-L189
[2]
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java#L341-L356
[3]
https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L833-L864
[4]
https://github.com/apache/beam/blob/master/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml#L344-L364

On Fri, Jun 12, 2020 at 3:32 PM Luke Cwik <[email protected]> wrote:

> +Boyuan Zhang <[email protected]>
>
> On Fri, Jun 12, 2020 at 3:32 PM Luke Cwik <[email protected]> wrote:
>
>> What is the update / compat story around schemas?
>> * are unknown fields propagated through if the user only reads/modifies a
>> row?
>> * how does it work in a pipeline update scenario (downgrade / upgrade)?
>>
>> Boyuan has been working on a Kafka via SDF source and have been trying to
>> figure out which interchange format to use for the "source descriptors"
>> that feed into the SDF. Some obvious choices are json, avro, proto, and
>> Beam schemas all with their caveats.
>>
>> On Fri, Jun 12, 2020 at 1:32 PM Brian Hulette <[email protected]>
>> wrote:
>>
>>> Thanks! I see there are jiras for SpannerIO and JdbcIO as part of that.
>>> Are you planning on using row coder for them?
>>> If so I want to make sure you're aware of
>>> https://s.apache.org/beam-schema-io (sent to the dev list last week
>>> [1]). +Scott Lukas <[email protected]> will be working on building out
>>> the ideas there this summer. His work could be useful for making these IOs
>>> cross-language (and you would get a mapping to SQL out of it without much
>>> more effort).
>>>
>>> Brian
>>>
>>> [1]
>>> https://lists.apache.org/thread.html/rc1695025d41c5dc38cdf7bc32bea0e7421379b1c543c2d82f69aa179%40%3Cdev.beam.apache.org%3E
>>>
>>> On Tue, Jun 2, 2020 at 9:30 AM Piotr Szuberski <
>>> [email protected]> wrote:
>>>
>>>> Sure, I'll do that
>>>>
>>>> On 2020/05/28 17:54:49, Chamikara Jayalath <[email protected]>
>>>> wrote:
>>>> > Great. Thanks for working on this. Can you please add these tasks and
>>>> JIRAs
>>>> > to the cross-language transforms roadmap under "Connector/transform
>>>> > support".
>>>> > https://beam.apache.org/roadmap/connectors-multi-sdk/
>>>> >
>>>> > Happy to help if you run into any issues during this task.
>>>> >
>>>> > <https://beam.apache.org/roadmap/connectors-multi-sdk/>Thanks,
>>>> > Cham
>>>> >
>>>> > On Thu, May 28, 2020 at 9:59 AM Piotr Szuberski <
>>>> [email protected]>
>>>> > wrote:
>>>> >
>>>> > > I added to Jira task of creating cross-language wrappers for Java
>>>> IOs. It
>>>> > > will soon be in progress.
>>>> > >
>>>> >
>>>>
>>>

Re: Python Cross-language wrappers for Java IOs

Reply via email to