Out of curiosity, several IOs (including PubSub) already do support schemas. Are you planning on modifying those?
On Tue, Nov 15, 2022 at 11:50 AM Damon Douglas via dev <dev@beam.apache.org> wrote: > Hello Everyone, > > Do we like the following Java class naming convention for > SchemaTransformProviders [1]? The proposal is: > > <IOName>(Read|Write)SchemaTransformProvider > > > *For those new to Beam, even if this is your first day, consider > yourselves a welcome contributor to this conversation. Below are > definitions/references and a suggested learning guide to understand this > email.* > > Explanation > > The <IOName> identifies the Beam I/O [2] and Read or Write identifies a > read or write Ptransform, respectively. > > For example, to implement a SchemaTransformProvider [1] for > BigQueryIO.Write[7], would look like: > > BigQueryWriteSchemaTransformProvider > > > And to implement a SchemaTransformProvider for PubSubIO.Read[8] would like > like: > > PubsubReadSchemaTransformProvider > > > Definitions/References > > [1] *SchemaTransformProvider*: A way for us to instantiate Beam IO > transforms using a language agnostic configuration. > SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that > functions as the configuration of that SchemaProvider. > > https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html > > [2] *Beam I/O*: PTransform for reading from or writing to sources and > sinks. > https://beam.apache.org/documentation/programming-guide/#pipeline-io > > [3] *SchemaTransform*: An interface containing a buildTransform method > that returns a PCollectionRowTuple[5] to PCollectionRowTuple PTransform. > > https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransform.html > > [4] *Row*: A Beam Row is a generic element of data whose properties are > defined by a Schema[5]. > > https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/Row.html > > [5] *Schema*: A description of expected field names and their data types. > > https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html > > [6] *PCollectionRowTuple*: A grouping of Beam Rows[4] into a single > PInput or POutput tagged by a String name. > > https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollectionRowTuple.html > > [7] *BigQueryIO.Write*: A PTransform for writing Beam elements to a > BigQuery table. > > https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html > > [8] *PubSubIO.Read*: A PTransform for reading from Pub/Sub and emitting > message payloads into a PCollection. > > https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html > > Suggested Learning/Reading to understand this email > > 1. https://beam.apache.org/documentation/programming-guide/#overview > 2. https://beam.apache.org/documentation/programming-guide/#transforms > (Up to 4.1) > 3. https://beam.apache.org/documentation/programming-guide/#pipeline-io > 4. https://beam.apache.org/documentation/programming-guide/#schemas >