On Fri, Jul 10, 2020 at 4:47 PM Robert Bradshaw <[email protected]> wrote:
> On Fri, Jul 10, 2020 at 4:36 PM Brian Hulette <[email protected]> wrote: > > > > Ah yes I'm +1 for that approach too - it would let us leverage all the > schema-inference already in the Java SDK for translating configuration > objects which would be great. > > Things on the Python side would be trickier as schemas don't formally > support all the types you can use in the PayloadBuilder implementations [1] > yet, just NamedTuple. For now we could just make the PayloadBuilder > implementations generate Rows without making that translation available for > use in PCollections. > This will be a good opportunity to add some sort of a minimal Python type to Beam schema mapping :) > > Yes, though eventually it might be nice to support all of these > various types as schema'd PCollection elements as well. > > > Do we need to worry about update compatibility for > ExternalConfigurationPayload? > > Technically, each URN defines their payload, and the fact that we've > settled on ExternalConfigurationPayload is a convention. On a > practical note, we haven't declared these protos stable yet. (I would > like to do so before we drop support for Python 2, as external > transforms are a possible escape hatch and the first strong motivation > to have external transforms that span Beam versions). > +1 > > > [1] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/external.py > > > > On Fri, Jul 10, 2020 at 4:23 PM Robert Bradshaw <[email protected]> > wrote: > >> > >> I would be in favor of just using a schema to store the entire > >> configuration. The reason we went with what we have to day is that we > >> didn't have cross language schemas yet. > >> > >> On Fri, Jul 10, 2020 at 12:24 PM Brian Hulette <[email protected]> > wrote: > >> > > >> > Hi everyone, > >> > I noticed that currently the ExternalConfigurationPayload uses a list > of coder URNs to represent the coder that was used to serialize each > configuration field [1]. This seems acceptable at first blush, but there's > one notable issue: it has no place to store a payload for the coder. Most > standard coders don't use a payload so it's not a problem, but row coder > does use a payload to store it's schema, which means it can't be used in an > ExternalConfigurationPayload today. > >> > > >> > Is there a reason not to just use the Coder message [2] in > ExternalConfigurationPayload instead of a list of coder URNs? That would > work with row coder, and it would also make it easier to re-use logic for > translating Pipeline protos. > >> > > >> > I'd be happy to make this change, but I wanted to ask on dev@ in > case there's something I'm missing here. > >> > > >> > Brian > >> > > >> > [1] > https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/external_transforms.proto#L34-L35 > >> > [2] > https://github.com/apache/beam/blob/c54a0b7f49f2eb4a15df115205e2fa455116ccbe/model/pipeline/src/main/proto/beam_runner_api.proto#L542-L555 >
