Yeah. Essentially one needs do (1) name the arguments and (2) implement the
transform. Hopefully (1) could be done in a concise way that allows for
easy consumption from both Java and cross-language.

On Tue, May 30, 2023 at 12:25 PM Byron Ellis <byronel...@google.com> wrote:

> Or perhaps the other way around? If you have a Schema we can auto-generate
> the associated builder on the PTransform? Either way, more DRY.
>
> On Tue, May 30, 2023 at 10:59 AM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> +1 to this simplification, it's a historical artifact that provides no
>> value.
>>
>> I would love it if we also looked into ways to auto-generate the
>> SchemaTransformProvider (e.g. via introspection if a transform takes a
>> small number of arguments, or uses the standard builder pattern...),
>> ideally with something as simple as adding a decorator to the PTransform
>> itself.
>>
>>
>> On Tue, May 30, 2023 at 7:42 AM Ahmed Abualsaud via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hey everyone,
>>>
>>> I was looking at how we use SchemaTransforms in our expansion service.
>>> From what I see, there may be a redundant step in developing
>>> SchemaTransforms. Currently, we have 3 pieces:
>>> - SchemaTransformProvider [1]
>>> - A configuration object
>>> - SchemaTransform [2]
>>>
>>> The API is generally used like this:
>>> 1. The SchemaTransformProvider takes a configuration object and returns
>>> a SchemaTransform
>>> 2. The SchemaTransform is used to build a PTransform according to the
>>> configuration
>>>
>>> In these steps, the SchemaTransform class seems unnecessary. We can
>>> combine the two steps if we have SchemaTransformProvider return the
>>> PTransform directly.
>>>
>>> We can then remove the SchemaTransform class as it will be obsolete.
>>> This should be safe to do; the only place it's used in our API is here [3],
>>> and that can be simplified if we make this change (we'd just trim `
>>> .buildTransform()` off the end as `provider.from(configRow)` will
>>> directly return the PTransform).
>>>
>>> I'd like to first mention that I was not involved in the design process
>>> of this API so I may be missing some information on why it was set up this
>>> way.
>>>
>>> A few developers already raised questions about how there's seemingly
>>> unnecessary boilerplate involved in making a Java transform portable. I
>>> wasn't involved in the design process of this API so I may be missing some
>>> information, but my assumption is this was designed to follow the pattern
>>> of the previous iteration of this API (SchemaIO): SchemaIOProvider[4] ->
>>> SchemaIO[5] -> PTransform. However, with the newer
>>> SchemaTransformProvider API, we dropped a few methods and reduced the
>>> SchemaTransform class to have a generic buildTransform() method. See the
>>> example of PubsubReadSchemaTransformProvider [6], where the
>>> SchemaTransform interface and buildTransform method are implemented
>>> just to satisfy the requirement that SchemaTransformProvider::from
>>> return a SchemaTransform.
>>>
>>> I'm bringing this up because if we are looking to encourage contribution
>>> to cross-language use cases, we should make it simpler and less convoluted
>>> to develop portable transforms.
>>>
>>> There are a number of SchemaTransforms already developed, but applying
>>> these changes to them should be straightforward. If people think this is a
>>> good idea, I can open a PR and implement them.
>>>
>>> Best,
>>> Ahmed
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.java
>>> [2]
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/SchemaTransform.java
>>> [3]
>>> https://github.com/apache/beam/blob/d7ded3f07064919c202c81a2c786910e20a834f9/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionServiceSchemaTransformProvider.java#L138
>>> [4]
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/SchemaIOProvider.java
>>> [5]
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/SchemaIO.java
>>> [6]
>>> https://github.com/apache/beam/blob/ed1a297904d5f5c743a6aca1a7648e3fb8f02e18/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubReadSchemaTransformProvider.java#L133-L137
>>>
>>

Reply via email to