+1

The format to store coders is not set in stone, it was a first version to make external configuration work. Using the Coder message would be better.

As for using Schema to store the configuration, could somebody fill me in how that would work?

-Max

On 04.08.20 02:01, Brian Hulette wrote:
I've opened BEAM-10571 [1] for this, and I'm most of the way to an implementation now. Aiming to have it done before the 2.24.0 cut since it will be the last release with python 2 support.

[1] https://issues.apache.org/jira/browse/BEAM-10571

On Wed, Jul 15, 2020 at 9:03 AM Chamikara Jayalath <[email protected] <mailto:[email protected]>> wrote:



    On Fri, Jul 10, 2020 at 4:47 PM Robert Bradshaw <[email protected]
    <mailto:[email protected]>> wrote:

        On Fri, Jul 10, 2020 at 4:36 PM Brian Hulette
        <[email protected] <mailto:[email protected]>> wrote:
         >
         > Ah yes I'm +1 for that approach too - it would let us
        leverage all the schema-inference already in the Java SDK for
        translating configuration objects which would be great.
         > Things on the Python side would be trickier as schemas don't
        formally support all the types you can use in the PayloadBuilder
        implementations [1] yet, just NamedTuple. For now we could just
        make the PayloadBuilder implementations generate Rows without
        making that translation available for use in PCollections.


    This will be a good opportunity to add some sort of a minimal Python
    type to Beam schema mapping :)


        Yes, though eventually it might be nice to support all of these
        various types as schema'd PCollection elements as well.

         > Do we need to worry about update compatibility for
        ExternalConfigurationPayload?

        Technically, each URN defines their payload, and the fact that we've
        settled on ExternalConfigurationPayload is a convention. On a
        practical note, we haven't declared these protos stable yet. (I
        would
        like to do so before we drop support for Python 2, as external
        transforms are a possible escape hatch and the first strong
        motivation
        to have external transforms that span Beam versions).


    +1


         > [1]
        
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/external.py
         >
         > On Fri, Jul 10, 2020 at 4:23 PM Robert Bradshaw
        <[email protected] <mailto:[email protected]>> wrote:
         >>
         >> I would be in favor of just using a schema to store the entire
         >> configuration. The reason we went with what we have to day
        is that we
         >> didn't have cross language schemas yet.
         >>
         >> On Fri, Jul 10, 2020 at 12:24 PM Brian Hulette
        <[email protected] <mailto:[email protected]>> wrote:
         >> >
         >> > Hi everyone,
         >> > I noticed that currently the ExternalConfigurationPayload
        uses a list of coder URNs to represent the coder that was used
        to serialize each configuration field [1]. This seems acceptable
        at first blush, but there's one notable issue: it has no place
        to store a payload for the coder. Most standard coders don't use
        a payload so it's not a problem, but row coder does use a
        payload to store it's schema, which means it can't be used in an
        ExternalConfigurationPayload today.
         >> >
         >> > Is there a reason not to just use the Coder message [2] in
        ExternalConfigurationPayload instead of a list of coder URNs?
        That would work with row coder, and it would also make it easier
        to re-use logic for translating Pipeline protos.
         >> >
         >> > I'd be happy to make this change, but I wanted to ask on
        dev@ in case there's something I'm missing here.
         >> >
         >> > Brian
         >> >
         >> > [1]
        
https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/external_transforms.proto#L34-L35
         >> > [2]
        
https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/beam_runner_api.proto#L542-L555

Reply via email to