Yes, that's my understanding where the Schema work is heading towards. Generic Row+Schema are in core java SDK and potentially can be backed by Avro or JSON or something else as an implementation/configuration detail. At the moment though the only implementation we have relies on RowCoder.
On Thu, Apr 26, 2018 at 1:17 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > For now we have a generic schema interface. Json-b can be an impl, avro > could be another one. > > Regards > JB > Le 26 avr. 2018, à 12:08, Romain Manni-Bucau <rmannibu...@gmail.com> a > écrit: >> >> Hmm, >> >> avro has still the pitfalls to have an uncontrolled stack which brings >> way too much dependencies to be part of any API, >> this is why I proposed a JSON-P based API (JsonObject) with a custom beam >> entry for some metadata (headers "à la Camel"). >> >> >> Romain Manni-Bucau >> @rmannibucau <https://twitter.com/rmannibucau> | Blog >> <https://rmannibucau.metawerx.net/> | Old Blog >> <http://rmannibucau.wordpress.com> | Github >> <https://github.com/rmannibucau> | LinkedIn >> <https://www.linkedin.com/in/rmannibucau> | Book >> <https://www.packtpub.com/application-development/java-ee-8-high-performance> >> >> 2018-04-26 9:59 GMT+02:00 Jean-Baptiste Onofré <j...@nanthrax.net>: >> >>> Hi Ismael >>> >>> You mean directly in Beam SQL ? >>> >>> That will be part of schema support: generic record could be one of the >>> payload with across schema. >>> >>> Regards >>> JB >>> Le 26 avr. 2018, à 11:39, "Ismaël Mejía" < ieme...@gmail.com> a écrit: >>>> >>>> Hello Anton, >>>> >>>> Thanks for the descriptive email and the really useful work. Any plans >>>> to tackle PCollections of GenericRecord/IndexedRecords? it seems Avro >>>> is a natural fit for this approach too. >>>> >>>> Regards, >>>> Ismaël >>>> >>>> On Wed, Apr 25, 2018 at 9:04 PM, Anton Kedin <ke...@google.com> wrote: >>>> >>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> I want to highlight a couple of improvements to Beam SQL we have been >>>>> >>>>> working on recently which are targeted to make Beam SQL API easier to >>>>> use. >>>>> >>>>> Specifically these features simplify conversion of Java Beans and JSON >>>>> >>>>> strings to Rows. >>>>> >>>>> >>>>> >>>>> Feel free to try this and send any bugs/comments/PRs my way. >>>>> >>>>> >>>>> >>>>> **Caveat: this is still work in progress, and has known bugs and >>>>> incomplete >>>>> >>>>> features, see below for details.** >>>>> >>>>> >>>>> >>>>> Background >>>>> >>>>> >>>>> >>>>> Beam SQL queries can only be applied to PCollection<Row>. This means that >>>>> >>>>> users need to convert whatever PCollection elements they have to Rows >>>>> before >>>>> >>>>> querying them with SQL. This usually requires manually creating a Schema >>>>> and >>>>> >>>>> implementing a custom conversion PTransform<PCollection< >>>>> Element>, >>>>> >>>>> PCollection<Row>> (see Beam SQL Guide). >>>>> >>>>> >>>>> >>>>> The improvements described here are an attempt to reduce this overhead >>>>> for >>>>> >>>>> few common cases, as a start. >>>>> >>>>> >>>>> >>>>> Status >>>>> >>>>> >>>>> >>>>> Introduced a InferredRowCoder to automatically generate rows from beans. >>>>> >>>>> Removes the need to manually define a Schema and Row conversion logic; >>>>> >>>>> Introduced JsonToRow transform to automatically parse JSON objects to >>>>> Rows. >>>>> >>>>> Removes the need to manually implement a conversion logic; >>>>> >>>>> This is still experimental work in progress, APIs will likely change; >>>>> >>>>> There are known bugs/unsolved problems; >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Java Beans >>>>> >>>>> >>>>> >>>>> Introduced a coder which facilitates Rows generation from Java Beans. >>>>> >>>>> Reduces the overhead to: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> /** Some user-defined Java Bean */ >>>>>> >>>>>> class JavaBeanObject implements Serializable { >>>>>> >>>>>> String getName() { ... } >>>>>> >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> // Obtain the objects: >>>>>> >>>>>> PCollection<JavaBeanObject> javaBeans = ...; >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> // Convert to Rows and apply a SQL query: >>>>>> >>>>>> PCollection<Row> queryResult = >>>>>> >>>>>> javaBeans >>>>>> >>>>>> .setCoder(InferredRowCoder. >>>>>> ofSerializable(JavaBeanObject. >>>>>> class)) >>>>>> >>>>>> .apply(BeamSql.query("SELECT name FROM PCOLLECTION")); >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> Notice, there is no more manual Schema definition or custom conversion >>>>> >>>>> logic. >>>>> >>>>> >>>>> >>>>> Links >>>>> >>>>> >>>>> >>>>> example; >>>>> >>>>> InferredRowCoder; >>>>> >>>>> test; >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> JSON >>>>> >>>>> >>>>> >>>>> Introduced JsonToRow transform. It is possible to query a >>>>> >>>>> PCollection<String> that contains JSON objects like this: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> // Assuming JSON objects look like this: >>>>>> >>>>>> // { "type" : "foo", "size" : 333 } >>>>>> >>>>>> >>>>>> >>>>>> // Define a Schema: >>>>>> >>>>>> Schema jsonSchema = >>>>>> >>>>>> Schema >>>>>> >>>>>> .builder() >>>>>> >>>>>> .addStringField("type") >>>>>> >>>>>> .addInt32Field("size") >>>>>> >>>>>> .build(); >>>>>> >>>>>> >>>>>> >>>>>> // Obtain PCollection of the objects in JSON format: >>>>>> >>>>>> PCollection<String> jsonObjects = ... >>>>>> >>>>>> >>>>>> >>>>>> // Convert to Rows and apply a SQL query: >>>>>> >>>>>> PCollection<Row> queryResults = >>>>>> >>>>>> jsonObjects >>>>>> >>>>>> .apply(JsonToRow.withSchema( >>>>>> jsonSchema)) >>>>>> >>>>>> .apply(BeamSql.query("SELECT type, AVG(size) FROM PCOLLECTION GROUP BY >>>>>> >>>>>> type")); >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> Notice, JSON to Row conversion is done by JsonToRow transform. It is >>>>> >>>>> currently required to supply a Schema. >>>>> >>>>> >>>>> >>>>> Links >>>>> >>>>> >>>>> >>>>> JsonToRow; >>>>> >>>>> test/example; >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Going Forward >>>>> >>>>> >>>>> >>>>> fix bugs (BEAM-4163, BEAM-4161 ...) >>>>> >>>>> implement more features (BEAM-4167, more types of objects); >>>>> >>>>> wire this up with sources/sinks to further simplify SQL API; >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Thank you, >>>>> >>>>> Anton >>>>> >>>>> >>>>> >>