Yeah. Essentially one needs do (1) name the arguments and (2) implement the transform. Hopefully (1) could be done in a concise way that allows for easy consumption from both Java and cross-language.
On Tue, May 30, 2023 at 12:25 PM Byron Ellis <byronel...@google.com> wrote: > Or perhaps the other way around? If you have a Schema we can auto-generate > the associated builder on the PTransform? Either way, more DRY. > > On Tue, May 30, 2023 at 10:59 AM Robert Bradshaw via dev < > dev@beam.apache.org> wrote: > >> +1 to this simplification, it's a historical artifact that provides no >> value. >> >> I would love it if we also looked into ways to auto-generate the >> SchemaTransformProvider (e.g. via introspection if a transform takes a >> small number of arguments, or uses the standard builder pattern...), >> ideally with something as simple as adding a decorator to the PTransform >> itself. >> >> >> On Tue, May 30, 2023 at 7:42 AM Ahmed Abualsaud via dev < >> dev@beam.apache.org> wrote: >> >>> Hey everyone, >>> >>> I was looking at how we use SchemaTransforms in our expansion service. >>> From what I see, there may be a redundant step in developing >>> SchemaTransforms. Currently, we have 3 pieces: >>> - SchemaTransformProvider [1] >>> - A configuration object >>> - SchemaTransform [2] >>> >>> The API is generally used like this: >>> 1. The SchemaTransformProvider takes a configuration object and returns >>> a SchemaTransform >>> 2. The SchemaTransform is used to build a PTransform according to the >>> configuration >>> >>> In these steps, the SchemaTransform class seems unnecessary. We can >>> combine the two steps if we have SchemaTransformProvider return the >>> PTransform directly. >>> >>> We can then remove the SchemaTransform class as it will be obsolete. >>> This should be safe to do; the only place it's used in our API is here [3], >>> and that can be simplified if we make this change (we'd just trim ` >>> .buildTransform()` off the end as `provider.from(configRow)` will >>> directly return the PTransform). >>> >>> I'd like to first mention that I was not involved in the design process >>> of this API so I may be missing some information on why it was set up this >>> way. >>> >>> A few developers already raised questions about how there's seemingly >>> unnecessary boilerplate involved in making a Java transform portable. I >>> wasn't involved in the design process of this API so I may be missing some >>> information, but my assumption is this was designed to follow the pattern >>> of the previous iteration of this API (SchemaIO): SchemaIOProvider[4] -> >>> SchemaIO[5] -> PTransform. However, with the newer >>> SchemaTransformProvider API, we dropped a few methods and reduced the >>> SchemaTransform class to have a generic buildTransform() method. See the >>> example of PubsubReadSchemaTransformProvider [6], where the >>> SchemaTransform interface and buildTransform method are implemented >>> just to satisfy the requirement that SchemaTransformProvider::from >>> return a SchemaTransform. >>> >>> I'm bringing this up because if we are looking to encourage contribution >>> to cross-language use cases, we should make it simpler and less convoluted >>> to develop portable transforms. >>> >>> There are a number of SchemaTransforms already developed, but applying >>> these changes to them should be straightforward. If people think this is a >>> good idea, I can open a PR and implement them. >>> >>> Best, >>> Ahmed >>> >>> [1] >>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.java >>> [2] >>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/SchemaTransform.java >>> [3] >>> https://github.com/apache/beam/blob/d7ded3f07064919c202c81a2c786910e20a834f9/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServiceSchemaTransformProvider.java#L138 >>> [4] >>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/SchemaIOProvider.java >>> [5] >>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/SchemaIO.java >>> [6] >>> https://github.com/apache/beam/blob/ed1a297904d5f5c743a6aca1a7648e3fb8f02e18/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubReadSchemaTransformProvider.java#L133-L137 >>> >>