I've opened BEAM-10571 [1] for this, and I'm most of the way to an
implementation now. Aiming to have it done before the 2.24.0 cut since it
will be the last release with python 2 support.

[1] https://issues.apache.org/jira/browse/BEAM-10571

On Wed, Jul 15, 2020 at 9:03 AM Chamikara Jayalath <[email protected]>
wrote:

>
>
> On Fri, Jul 10, 2020 at 4:47 PM Robert Bradshaw <[email protected]>
> wrote:
>
>> On Fri, Jul 10, 2020 at 4:36 PM Brian Hulette <[email protected]>
>> wrote:
>> >
>> > Ah yes I'm +1 for that approach too - it would let us leverage all the
>> schema-inference already in the Java SDK for translating configuration
>> objects which would be great.
>> > Things on the Python side would be trickier as schemas don't formally
>> support all the types you can use in the PayloadBuilder implementations [1]
>> yet, just NamedTuple. For now we could just make the PayloadBuilder
>> implementations generate Rows without making that translation available for
>> use in PCollections.
>>
>
> This will be a good opportunity to add some sort of a minimal Python type
> to Beam schema mapping :)
>
>
>>
>> Yes, though eventually it might be nice to support all of these
>> various types as schema'd PCollection elements as well.
>>
>> > Do we need to worry about update compatibility for
>> ExternalConfigurationPayload?
>>
>> Technically, each URN defines their payload, and the fact that we've
>> settled on ExternalConfigurationPayload is a convention. On a
>> practical note, we haven't declared these protos stable yet. (I would
>> like to do so before we drop support for Python 2, as external
>> transforms are a possible escape hatch and the first strong motivation
>> to have external transforms that span Beam versions).
>>
>
> +1
>
>
>>
>> > [1]
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/external.py
>> >
>> > On Fri, Jul 10, 2020 at 4:23 PM Robert Bradshaw <[email protected]>
>> wrote:
>> >>
>> >> I would be in favor of just using a schema to store the entire
>> >> configuration. The reason we went with what we have to day is that we
>> >> didn't have cross language schemas yet.
>> >>
>> >> On Fri, Jul 10, 2020 at 12:24 PM Brian Hulette <[email protected]>
>> wrote:
>> >> >
>> >> > Hi everyone,
>> >> > I noticed that currently the ExternalConfigurationPayload uses a
>> list of coder URNs to represent the coder that was used to serialize each
>> configuration field [1]. This seems acceptable at first blush, but there's
>> one notable issue: it has no place to store a payload for the coder. Most
>> standard coders don't use a payload so it's not a problem, but row coder
>> does use a payload to store it's schema, which means it can't be used in an
>> ExternalConfigurationPayload today.
>> >> >
>> >> > Is there a reason not to just use the Coder message [2] in
>> ExternalConfigurationPayload instead of a list of coder URNs? That would
>> work with row coder, and it would also make it easier to re-use logic for
>> translating Pipeline protos.
>> >> >
>> >> > I'd be happy to make this change, but I wanted to ask on dev@ in
>> case there's something I'm missing here.
>> >> >
>> >> > Brian
>> >> >
>> >> > [1]
>> https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/external_transforms.proto#L34-L35
>> >> > [2]
>> https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/beam_runner_api.proto#L542-L555
>>
>

Reply via email to