[ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=295758&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295758
 ]

ASF GitHub Bot logged work on BEAM-7802:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Aug/19 20:57
            Start Date: 15/Aug/19 20:57
    Worklog Time Spent: 10m 
      Work Description: iemejia commented on pull request #9130: [BEAM-7802] 
Expose a method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#discussion_r314495461
 
 

 ##########
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
 ##########
 @@ -186,6 +188,49 @@
  * scalability. Note that it may decrease performance if the filepattern 
matches only a small number
  * of files.
  *
+ * <h3>Inferring Beam schemas from Avro files</h3>
+ *
+ * <p>If you want to use SQL or schema based operations on an Avro-based 
PCollection. You must
+ * configure the read transform to infer the Beam schema and automatically 
setup the Beam related
+ * coders by doing:
+ *
+ * <pre>{@code
+ * PCollection<AvroAutoGenClass> records =
+ *     p.apply(AvroIO.read(...).from(...).withBeamSchemas(true);
+ * }</pre>
+ *
+ * <h3>Inferring Beam schemas from Avro PCollections</h3>
+ *
+ * <p>If you created an Avro-based PCollection by other means e.g. reading 
records from Kafka or as
+ * the output of another PTransform. You may be interested on making your 
PCollection schema-aware
+ * so you can use the Schema-based APIs or Beam's SqlTransform.
+ *
+ * <p>If you are using Avro specific records (generated classes from an Avro 
schema), you can
+ * register a schema provider for the specific Avro class to make any 
PCollection of these objects
+ * schema-aware.
+ *
+ * <pre>{@code
+ * pipeline.getSchemaRegistry().registerSchemaProvider(AvroAutoGenClass.class, 
new AvroRecordSchema());
 
 Review comment:
   good one, definitely simpler
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 295758)
    Time Spent: 4h 50m  (was: 4h 40m)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --------------------------------------------------------------------------
>
>                 Key: BEAM-7802
>                 URL: https://issues.apache.org/jira/browse/BEAM-7802
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Ismaël Mejía
>            Assignee: Ismaël Mejía
>            Priority: Minor
>          Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to