On Tue, Nov 15, 2022 at 12:52 PM Ahmed Abualsaud <ahmedabuals...@google.com>
wrote:

> Schema-aware transforms are not restricted to I/Os. An arbitrary transform
>> can be a Schema-Transform.  Also, designation Read/Write does not map to an
>> arbitrary transform. Probably we should try to make this more generic ?
>>
>
> Agreed, I suggest keeping everything on the left side of the name unique
> to the transform, so that the right side is consistently SchemaTransform
> | SchemaTransformProvider | SchemaTransformConfiguration. What do others
> think?
>

Sgtm. I don't think we should enforce class names though but it's good to
have a recommendation.


>
> Also, probably what's more important is the identifier of the
>> SchemaTransformProvider being unique.
>
> FWIW, we came up with a similar generic URN naming scheme for
>> cross-language transforms:
>> https://beam.apache.org/documentation/programming-guide/#1314-defining-a-urn
>
>
> The URN convention in that link looks good, it may be a good idea to
> replace transform with schematransform in the URN in this case to make a
> distinction. ie.
> beam:schematransform:org.apache.beam:kafka_read_with_metadata:v1. I will
> mention this in the other thread when I go over the comments in the
> Supporting SchemaTransforms doc [1].
>

+1 for replacing "transform" with "schematransform" to prevent URN
conflicts (even though these are not exactly in the same category).

Thanks,
Cham


>
> [1]
>
>  Supporting existing connectors with SchemaTrans...
> <https://docs.google.com/document/d/1qW9O3VxdGxUM887TdwhD1iH9AdNbpu0_wXbCGvFP0OM/edit?usp=drive_web>
>
>
> On Tue, Nov 15, 2022 at 3:41 PM John Casey via dev <dev@beam.apache.org>
> wrote:
>
>> One distinction here is the difference between the URN for a provider /
>> transform, and the class name in Java.
>>
>> We should have a standard for both, but they are distinct
>>
>> On Tue, Nov 15, 2022 at 3:39 PM Chamikara Jayalath via dev <
>> dev@beam.apache.org> wrote:
>>
>>>
>>>
>>> On Tue, Nov 15, 2022 at 11:50 AM Damon Douglas via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> Hello Everyone,
>>>>
>>>> Do we like the following Java class naming convention for
>>>> SchemaTransformProviders [1]?  The proposal is:
>>>>
>>>> <IOName>(Read|Write)SchemaTransformProvider
>>>>
>>>>
>>>> *For those new to Beam, even if this is your first day, consider
>>>> yourselves a welcome contributor to this conversation.  Below are
>>>> definitions/references and a suggested learning guide to understand this
>>>> email.*
>>>>
>>>> Explanation
>>>>
>>>> The <IOName> identifies the Beam I/O [2] and Read or Write identifies a
>>>> read or write Ptransform, respectively.
>>>>
>>>
>>> Schema-aware transforms are not restricted to I/Os. An arbitrary
>>> transform can be a Schema-Transform.  Also, designation Read/Write does not
>>> map to an arbitrary transform. Probably we should try to make this more
>>> generic ?
>>>
>>> Also, probably what's more important is the identifier of the
>>> SchemaTransformProvider being unique. Note the class name (the latter is
>>> guaranteed to be unique if we follow the Java package naming guidelines).
>>>
>>> FWIW, we came up with a similar generic URN naming scheme for
>>> cross-language transforms:
>>> https://beam.apache.org/documentation/programming-guide/#1314-defining-a-urn
>>>
>>> Thanks,
>>> Cham
>>>
>>>
>>>> For example, to implement a SchemaTransformProvider [1] for
>>>> BigQueryIO.Write[7], would look like:
>>>>
>>>> BigQueryWriteSchemaTransformProvider
>>>>
>>>>
>>>> And to implement a SchemaTransformProvider for PubSubIO.Read[8] would
>>>> like like:
>>>>
>>>> PubsubReadSchemaTransformProvider
>>>>
>>>>
>>>> Definitions/References
>>>>
>>>> [1] *SchemaTransformProvider*: A way for us to instantiate Beam IO
>>>> transforms using a language agnostic configuration.
>>>> SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that
>>>> functions as the configuration of that SchemaProvider.
>>>>
>>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html
>>>>
>>>> [2] *Beam I/O*: PTransform for reading from or writing to sources and
>>>> sinks.
>>>> https://beam.apache.org/documentation/programming-guide/#pipeline-io
>>>>
>>>> [3] *SchemaTransform*: An interface containing a buildTransform method
>>>> that returns a PCollectionRowTuple[5] to PCollectionRowTuple PTransform.
>>>>
>>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransform.html
>>>>
>>>> [4] *Row*: A Beam Row is a generic element of data whose properties
>>>> are defined by a Schema[5].
>>>>
>>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/Row.html
>>>>
>>>> [5] *Schema*: A description of expected field names and their data
>>>> types.
>>>>
>>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html
>>>>
>>>> [6] *PCollectionRowTuple*: A grouping of Beam Rows[4] into a single
>>>> PInput or POutput tagged by a String name.
>>>>
>>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollectionRowTuple.html
>>>>
>>>> [7] *BigQueryIO.Write*: A PTransform for writing Beam elements to a
>>>> BigQuery table.
>>>>
>>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html
>>>>
>>>> [8] *PubSubIO.Read*: A PTransform for reading from Pub/Sub and
>>>> emitting message payloads into a PCollection.
>>>>
>>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html
>>>>
>>>> Suggested Learning/Reading to understand this email
>>>>
>>>> 1. https://beam.apache.org/documentation/programming-guide/#overview
>>>> 2. https://beam.apache.org/documentation/programming-guide/#transforms
>>>> (Up to 4.1)
>>>> 3. https://beam.apache.org/documentation/programming-guide/#pipeline-io
>>>> 4. https://beam.apache.org/documentation/programming-guide/#schemas
>>>>
>>>

Reply via email to