On Tue, Nov 15, 2022 at 1:38 PM Reuven Lax via dev <dev@beam.apache.org> wrote:
> Out of curiosity, several IOs (including PubSub) already do support > schemas. Are you planning on modifying those? > Schema-aware Transform is an overloaded term. I think this is about the implementations of the following. https://docs.google.com/document/d/1B-pxOjIA8Znl99nDRFEQMfr7VG91MZGfki2BPanjjZA/edit > > On Tue, Nov 15, 2022 at 11:50 AM Damon Douglas via dev < > dev@beam.apache.org> wrote: > >> Hello Everyone, >> >> Do we like the following Java class naming convention for >> SchemaTransformProviders [1]? The proposal is: >> >> <IOName>(Read|Write)SchemaTransformProvider >> >> >> *For those new to Beam, even if this is your first day, consider >> yourselves a welcome contributor to this conversation. Below are >> definitions/references and a suggested learning guide to understand this >> email.* >> >> Explanation >> >> The <IOName> identifies the Beam I/O [2] and Read or Write identifies a >> read or write Ptransform, respectively. >> >> For example, to implement a SchemaTransformProvider [1] for >> BigQueryIO.Write[7], would look like: >> >> BigQueryWriteSchemaTransformProvider >> >> >> And to implement a SchemaTransformProvider for PubSubIO.Read[8] would >> like like: >> >> PubsubReadSchemaTransformProvider >> >> >> Definitions/References >> >> [1] *SchemaTransformProvider*: A way for us to instantiate Beam IO >> transforms using a language agnostic configuration. >> SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that >> functions as the configuration of that SchemaProvider. >> >> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html >> >> [2] *Beam I/O*: PTransform for reading from or writing to sources and >> sinks. >> https://beam.apache.org/documentation/programming-guide/#pipeline-io >> >> [3] *SchemaTransform*: An interface containing a buildTransform method >> that returns a PCollectionRowTuple[5] to PCollectionRowTuple PTransform. >> >> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransform.html >> >> [4] *Row*: A Beam Row is a generic element of data whose properties are >> defined by a Schema[5]. >> >> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/Row.html >> >> [5] *Schema*: A description of expected field names and their data types. >> >> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html >> >> [6] *PCollectionRowTuple*: A grouping of Beam Rows[4] into a single >> PInput or POutput tagged by a String name. >> >> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollectionRowTuple.html >> >> [7] *BigQueryIO.Write*: A PTransform for writing Beam elements to a >> BigQuery table. >> >> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html >> >> [8] *PubSubIO.Read*: A PTransform for reading from Pub/Sub and emitting >> message payloads into a PCollection. >> >> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html >> >> Suggested Learning/Reading to understand this email >> >> 1. https://beam.apache.org/documentation/programming-guide/#overview >> 2. https://beam.apache.org/documentation/programming-guide/#transforms >> (Up to 4.1) >> 3. https://beam.apache.org/documentation/programming-guide/#pipeline-io >> 4. https://beam.apache.org/documentation/programming-guide/#schemas >> >