On Tue, Nov 15, 2022 at 1:38 PM Reuven Lax via dev <dev@beam.apache.org>
wrote:

> Out of curiosity, several IOs (including PubSub) already do support
> schemas. Are you planning on modifying those?
>

Schema-aware Transform is an overloaded term. I think this is about the
implementations of the following.
https://docs.google.com/document/d/1B-pxOjIA8Znl99nDRFEQMfr7VG91MZGfki2BPanjjZA/edit


>
> On Tue, Nov 15, 2022 at 11:50 AM Damon Douglas via dev <
> dev@beam.apache.org> wrote:
>
>> Hello Everyone,
>>
>> Do we like the following Java class naming convention for
>> SchemaTransformProviders [1]?  The proposal is:
>>
>> <IOName>(Read|Write)SchemaTransformProvider
>>
>>
>> *For those new to Beam, even if this is your first day, consider
>> yourselves a welcome contributor to this conversation.  Below are
>> definitions/references and a suggested learning guide to understand this
>> email.*
>>
>> Explanation
>>
>> The <IOName> identifies the Beam I/O [2] and Read or Write identifies a
>> read or write Ptransform, respectively.
>>
>> For example, to implement a SchemaTransformProvider [1] for
>> BigQueryIO.Write[7], would look like:
>>
>> BigQueryWriteSchemaTransformProvider
>>
>>
>> And to implement a SchemaTransformProvider for PubSubIO.Read[8] would
>> like like:
>>
>> PubsubReadSchemaTransformProvider
>>
>>
>> Definitions/References
>>
>> [1] *SchemaTransformProvider*: A way for us to instantiate Beam IO
>> transforms using a language agnostic configuration.
>> SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that
>> functions as the configuration of that SchemaProvider.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html
>>
>> [2] *Beam I/O*: PTransform for reading from or writing to sources and
>> sinks.
>> https://beam.apache.org/documentation/programming-guide/#pipeline-io
>>
>> [3] *SchemaTransform*: An interface containing a buildTransform method
>> that returns a PCollectionRowTuple[5] to PCollectionRowTuple PTransform.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransform.html
>>
>> [4] *Row*: A Beam Row is a generic element of data whose properties are
>> defined by a Schema[5].
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/Row.html
>>
>> [5] *Schema*: A description of expected field names and their data types.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html
>>
>> [6] *PCollectionRowTuple*: A grouping of Beam Rows[4] into a single
>> PInput or POutput tagged by a String name.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollectionRowTuple.html
>>
>> [7] *BigQueryIO.Write*: A PTransform for writing Beam elements to a
>> BigQuery table.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html
>>
>> [8] *PubSubIO.Read*: A PTransform for reading from Pub/Sub and emitting
>> message payloads into a PCollection.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html
>>
>> Suggested Learning/Reading to understand this email
>>
>> 1. https://beam.apache.org/documentation/programming-guide/#overview
>> 2. https://beam.apache.org/documentation/programming-guide/#transforms
>> (Up to 4.1)
>> 3. https://beam.apache.org/documentation/programming-guide/#pipeline-io
>> 4. https://beam.apache.org/documentation/programming-guide/#schemas
>>
>

Reply via email to