Re: Beam SQL Improvements

Reuven Lax Mon, 11 Jun 2018 10:48:53 -0700

Does DirectRunner do this today?

On Mon, Jun 4, 2018 at 9:10 PM Lukasz Cwik <lc...@google.com> wrote:


> Shouldn't the runner isolate each instance of the pipeline behind an
> appropriate class loader?
>
> On Sun, Jun 3, 2018 at 12:45 PM Reuven Lax <re...@google.com> wrote:
>
>> Just an update: Romain and I chatted on Slack, and I think I understand
>> his concern. The concern wasn't specifically about schemas, rather about
>> having a generic way to register per-ParDo state that has worker lifetime.
>> As evidence that such is needed, in many cases static variables are used to
>> simiulate that. static variables however have downsides - if two pipelines
>> are run on the same JVM (happens often with unit tests, and there's nothing
>> that prevents a runner from doing so in a production environment), these
>> static variables will interfere with each other.
>>
>> On Thu, May 24, 2018 at 12:30 AM Reuven Lax <re...@google.com> wrote:
>>
>>> Romain, maybe it would be useful for us to find some time on slack. I'd
>>> like to understand your concerns. Also keep in mind that I'm tagging all
>>> these classes as Experimental for now, so we can definitely change these
>>> interfaces around if we decide they are not the best ones.
>>>
>>> Reuven
>>>
>>> On Tue, May 22, 2018 at 11:35 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
>>>> Why not extending ProcessContext to add the new remapped output? But
>>>> looks good (the part i dont like is that creating a new context each time a
>>>> new feature is added is hurting users. What when beam will add some
>>>> reactive support? ReactiveOutputReceiver?)
>>>>
>>>> Pipeline sounds the wrong storage since once distributed you serialized
>>>> the instances so kind of broke the lifecycle of the original instance and
>>>> have no real release/close hook on them anymore right? Not sure we can do
>>>> better than dofn/source embedded instances today.
>>>>
>>>>
>>>>
>>>>
>>>> Le mer. 23 mai 2018 08:02, Romain Manni-Bucau <rmannibu...@gmail.com>
>>>> a écrit :
>>>>
>>>>>
>>>>>
>>>>> Le mer. 23 mai 2018 07:55, Jean-Baptiste Onofré <j...@nanthrax.net> a
>>>>> écrit :
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> IMHO, it would be better to have a explicit transform/IO as converter.
>>>>>>
>>>>>> It would be easier for users.
>>>>>>
>>>>>> Another option would be to use a "TypeConverter/SchemaConverter" map
>>>>>> as
>>>>>> we do in Camel: Beam could check the source/destination "type" and
>>>>>> check
>>>>>> in the map if there's a converter available. This map can be store as
>>>>>> part of the pipeline (as we do for filesystem registration).
>>>>>>
>>>>>
>>>>>
>>>>> It works in camel because it is not strongly typed, isnt it? So can
>>>>> require a beam new pipeline api.
>>>>>
>>>>> +1 for the explicit transform, if added to the pipeline api as coder
>>>>> it wouldnt break the fluent api:
>>>>>
>>>>> p.apply(io).setOutputType(Foo.class)
>>>>>
>>>>> Coders can be a workaround since they owns the type but since the
>>>>> pcollection is the real owner it is surely saner this way, no?
>>>>>
>>>>> Also it needs to ensure all converters are present before running the
>>>>> pipeline probably, no implicit environment converter support is probably
>>>>> good to start to avoid late surprises.
>>>>>
>>>>>
>>>>>
>>>>>> My $0.01
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>> On 23/05/2018 07:51, Romain Manni-Bucau wrote:
>>>>>> > How does it work on the pipeline side?
>>>>>> > Do you generate these "virtual" IO at build time to enable the
>>>>>> fluent
>>>>>> > API to work not erasing generics?
>>>>>> >
>>>>>> > ex: SQL(row)->BigQuery(native) will not compile so we need a
>>>>>> > SQL(row)->BigQuery(row)
>>>>>> >
>>>>>> > Side note unrelated to Row: if you add another registry maybe a
>>>>>> pretask
>>>>>> > is to ensure beam has a kind of singleton/context to avoid to
>>>>>> duplicate
>>>>>> > it or not track it properly. These kind of converters will need a
>>>>>> global
>>>>>> > close and not only per record in general:
>>>>>> > converter.init();converter.convert(row);....converter.destroy();,
>>>>>> > otherwise it easily leaks. This is why it can require some way to
>>>>>> not
>>>>>> > recreate it. A quick fix, if you are in bytebuddy already, can be
>>>>>> to add
>>>>>> > it to setup/teardown pby, being more global would be nicer but is
>>>>>> more
>>>>>> > challenging.
>>>>>> >
>>>>>> > Romain Manni-Bucau
>>>>>> > @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>>>>> > <https://rmannibucau.metawerx.net/> | Old Blog
>>>>>> > <http://rmannibucau.wordpress.com> | Github
>>>>>> > <https://github.com/rmannibucau> | LinkedIn
>>>>>> > <https://www.linkedin.com/in/rmannibucau> | Book
>>>>>> > <
>>>>>> https://www.packtpub.com/application-development/java-ee-8-high-performance
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > Le mer. 23 mai 2018 à 07:22, Reuven Lax <re...@google.com
>>>>>> > <mailto:re...@google.com>> a écrit :
>>>>>> >
>>>>>> >     No - the only modules we need to add to core are the ones we
>>>>>> choose
>>>>>> >     to add. For example, I will probably add a registration for
>>>>>> >     TableRow/TableSchema (GCP BigQuery) so these can work seamlessly
>>>>>> >     with schemas. However I will add that to the GCP module, so only
>>>>>> >     someone depending on that module need to pull in that
>>>>>> dependency.
>>>>>> >     The Java ServiceLoader framework can be used by these modules to
>>>>>> >     register schemas for their types (we already do something
>>>>>> similar
>>>>>> >     for FileSystem and for coders as well).
>>>>>> >
>>>>>> >     BTW, right now the conversion back and forth between Row
>>>>>> objects I'm
>>>>>> >     doing in the ByteBuddy generated bytecode that we generate in
>>>>>> order
>>>>>> >     to invoke DoFns.
>>>>>> >
>>>>>> >     Reuven
>>>>>> >
>>>>>> >     On Tue, May 22, 2018 at 10:04 PM Romain Manni-Bucau
>>>>>> >     <rmannibu...@gmail.com <mailto:rmannibu...@gmail.com>> wrote:
>>>>>> >
>>>>>> >         Hmm, the pluggability part is close to what I wanted to do
>>>>>> with
>>>>>> >         JsonObject as a main API (to avoid to redo a "row" API and
>>>>>> >         schema API)
>>>>>> >         Row.as(Class<T>) sounds good but then, does it mean we'll
>>>>>> get
>>>>>> >         beam-sdk-java-row-jsonobject like modules (I'm not against,
>>>>>> just
>>>>>> >         trying to understand here)?
>>>>>> >         If so, how an IO can use as() with the type it expects?
>>>>>> Doesnt
>>>>>> >         it lead to have a tons of  these modules at the end?
>>>>>> >
>>>>>> >         Romain Manni-Bucau
>>>>>> >         @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>>>>> >         <https://rmannibucau.metawerx.net/> | Old Blog
>>>>>> >         <http://rmannibucau.wordpress.com> | Github
>>>>>> >         <https://github.com/rmannibucau> | LinkedIn
>>>>>> >         <https://www.linkedin.com/in/rmannibucau> | Book
>>>>>> >         <
>>>>>> https://www.packtpub.com/application-development/java-ee-8-high-performance
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >         Le mer. 23 mai 2018 à 04:57, Reuven Lax <re...@google.com
>>>>>> >         <mailto:re...@google.com>> a écrit :
>>>>>> >
>>>>>> >             By the way Romain, if you have specific scenarios in
>>>>>> mind I
>>>>>> >             would love to hear them. I can try and guess what
>>>>>> exactly
>>>>>> >             you would like to get out of schemas, but it would work
>>>>>> >             better if you gave me concrete scenarios that you would
>>>>>> like
>>>>>> >             to work.
>>>>>> >
>>>>>> >             Reuven
>>>>>> >
>>>>>> >             On Tue, May 22, 2018 at 7:45 PM Reuven Lax <
>>>>>> re...@google.com
>>>>>> >             <mailto:re...@google.com>> wrote:
>>>>>> >
>>>>>> >                 Yeah, what I'm working on will help with IO.
>>>>>> Basically
>>>>>> >                 if you register a function with SchemaRegistry that
>>>>>> >                 converts back and forth between a type (say
>>>>>> JsonObject)
>>>>>> >                 and a Beam Row, then it is applied by the framework
>>>>>> >                 behind the scenes as part of DoFn invocation.
>>>>>> Concrete
>>>>>> >                 example: let's say I have an IO that reads json
>>>>>> objects
>>>>>> >                   class MyJsonIORead extends PTransform<PBegin,
>>>>>> >                 JsonObject> {...}
>>>>>> >
>>>>>> >                 If you register a schema for this type (or you can
>>>>>> also
>>>>>> >                 just set the schema directly on the output
>>>>>> PCollection),
>>>>>> >                 then Beam knows how to convert back and forth
>>>>>> between
>>>>>> >                 JsonObject and Row. So the next ParDo can look like
>>>>>> >
>>>>>> >                 p.apply(new MyJsonIORead())
>>>>>> >                 .apply(ParDo.of(new DoFn<JsonObject, T>....
>>>>>> >                     @ProcessElement void process(@Element Row row) {
>>>>>> >                    })
>>>>>> >
>>>>>> >                 And Beam will automatically convert JsonObject to a
>>>>>> Row
>>>>>> >                 for processing (you aren't forced to do this of
>>>>>> course -
>>>>>> >                 you can always ask for it as a JsonObject).
>>>>>> >
>>>>>> >                 The same is true for output. If you have a sink that
>>>>>> >                 takes in JsonObject but the transform before it
>>>>>> produces
>>>>>> >                 Row objects (for instance - because the transform
>>>>>> before
>>>>>> >                 it is Beam SQL), Beam can automatically convert Row
>>>>>> back
>>>>>> >                 to JsonObject for you.
>>>>>> >
>>>>>> >                 All of this was detailed in the Schema doc I shared
>>>>>> a
>>>>>> >                 few months ago. There was a lot of discussion on
>>>>>> that
>>>>>> >                 document from various parties, and some of this API
>>>>>> is a
>>>>>> >                 result of that discussion. This is also working in
>>>>>> the
>>>>>> >                 branch JB and I were working on, though not yet
>>>>>> >                 integrated back to master.
>>>>>> >
>>>>>> >                 I would like to actually go further and make Row an
>>>>>> >                 interface and provide a way to automatically put a
>>>>>> Row
>>>>>> >                 interface on top of any other object (e.g.
>>>>>> JsonObject,
>>>>>> >                 Pojo, etc.) This won't change the way the user
>>>>>> writes
>>>>>> >                 code, but instead of Beam having to copy and
>>>>>> convert at
>>>>>> >                 each stage (e.g. from JsonObject to Row) it simply
>>>>>> will
>>>>>> >                 create a Row object that uses the the JsonObject as
>>>>>> its
>>>>>> >                 underlying storage.
>>>>>> >
>>>>>> >                 Reuven
>>>>>> >
>>>>>> >                 On Tue, May 22, 2018 at 11:37 AM Romain Manni-Bucau
>>>>>> >                 <rmannibu...@gmail.com <mailto:
>>>>>> rmannibu...@gmail.com>>
>>>>>> >                 wrote:
>>>>>> >
>>>>>> >                     Well, beam can implement a new mapper but it
>>>>>> doesnt
>>>>>> >                     help for io. Most of modern backends will take
>>>>>> json
>>>>>> >                     directly, even javax one and it must stay
>>>>>> generic.
>>>>>> >
>>>>>> >                     Then since json to pojo mapping is already done
>>>>>> a
>>>>>> >                     dozen of times, not sure it is worth it for now.
>>>>>> >
>>>>>> >                     Le mar. 22 mai 2018 20:27, Reuven Lax
>>>>>> >                     <re...@google.com <mailto:re...@google.com>> a
>>>>>> écrit :
>>>>>> >
>>>>>> >                         We can do even better btw. Building a
>>>>>> >                         SchemaRegistry where automatic conversions
>>>>>> can
>>>>>> >                         be registered between schema and Java data
>>>>>> >                         types. With this the user won't even need a
>>>>>> DoFn
>>>>>> >                         to do the conversion.
>>>>>> >
>>>>>> >                         On Tue, May 22, 2018, 10:13 AM Romain
>>>>>> >                         Manni-Bucau <rmannibu...@gmail.com
>>>>>> >                         <mailto:rmannibu...@gmail.com>> wrote:
>>>>>> >
>>>>>> >                             Hi guys,
>>>>>> >
>>>>>> >                             Checked out what has been done on schema
>>>>>> >                             model and think it is acceptable -
>>>>>> regarding
>>>>>> >                             the json debate -
>>>>>> >                             if
>>>>>> https://issues.apache.org/jira/browse/BEAM-4381
>>>>>> >                             can be fixed.
>>>>>> >
>>>>>> >                             High level, it is about providing a
>>>>>> >                             mainstream and not too impacting model
>>>>>> OOTB
>>>>>> >                             and JSON seems the most valid option for
>>>>>> >                             now, at least for IO and some user
>>>>>> transforms.
>>>>>> >
>>>>>> >                             Wdyt?
>>>>>> >
>>>>>> >                             Le ven. 27 avr. 2018 18:36, Romain
>>>>>> >                             Manni-Bucau <rmannibu...@gmail.com
>>>>>> >                             <mailto:rmannibu...@gmail.com>> a
>>>>>> écrit :
>>>>>> >
>>>>>> >                                  Can give it a try end of may, sure.
>>>>>> >                                 (holidays and work constraints will
>>>>>> make
>>>>>> >                                 it hard before).
>>>>>> >
>>>>>> >                                 Le 27 avr. 2018 18:26, "Anton Kedin"
>>>>>> >                                 <ke...@google.com
>>>>>> >                                 <mailto:ke...@google.com>> a
>>>>>> écrit :
>>>>>> >
>>>>>> >                                     Romain,
>>>>>> >
>>>>>> >                                     I don't believe that JSON
>>>>>> approach
>>>>>> >                                     was investigated very
>>>>>> thoroughIy. I
>>>>>> >                                     mentioned few reasons which will
>>>>>> >                                     make it not the best choice my
>>>>>> >                                     opinion, but I may be wrong.
>>>>>> Can you
>>>>>> >                                     put together a design doc or a
>>>>>> >                                     prototype?
>>>>>> >
>>>>>> >                                     Thank you,
>>>>>> >                                     Anton
>>>>>> >
>>>>>> >
>>>>>> >                                     On Thu, Apr 26, 2018 at 10:17 PM
>>>>>> >                                     Romain Manni-Bucau
>>>>>> >                                     <rmannibu...@gmail.com
>>>>>> >                                     <mailto:rmannibu...@gmail.com>>
>>>>>> wrote:
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >                                         Le 26 avr. 2018 23:13,
>>>>>> "Anton
>>>>>> >                                         Kedin" <ke...@google.com
>>>>>> >                                         <mailto:ke...@google.com>>
>>>>>> a écrit :
>>>>>> >
>>>>>> >                                             BeamRecord (Row) has
>>>>>> very
>>>>>> >                                             little in common with
>>>>>> >                                             JsonObject (I assume
>>>>>> you're
>>>>>> >                                             talking about
>>>>>> javax.json),
>>>>>> >                                             except maybe some
>>>>>> >                                             similarities of the
>>>>>> API. Few
>>>>>> >                                             reasons why JsonObject
>>>>>> >                                             doesn't work:
>>>>>> >
>>>>>> >                                               * it is a Java EE API:
>>>>>> >                                                   o Beam SDK is not
>>>>>> >                                                     limited to Java.
>>>>>> >                                                     There are
>>>>>> probably
>>>>>> >                                                     similar APIs for
>>>>>> >                                                     other languages
>>>>>> but
>>>>>> >                                                     they might not
>>>>>> >                                                     necessarily
>>>>>> carry
>>>>>> >                                                     the same
>>>>>> semantics /
>>>>>> >                                                     APIs;
>>>>>> >
>>>>>> >
>>>>>> >                                         Not a big deal I think. At
>>>>>> least
>>>>>> >                                         not a technical blocker.
>>>>>> >
>>>>>> >                                                   o It can change
>>>>>> >                                                     between Java
>>>>>> versions;
>>>>>> >
>>>>>> >                                         No, this is javaee ;).
>>>>>> >
>>>>>> >
>>>>>> >                                                   o Current Beam
>>>>>> java
>>>>>> >                                                     implementation
>>>>>> is an
>>>>>> >                                                     experimental
>>>>>> feature
>>>>>> >                                                     to identify
>>>>>> what's
>>>>>> >                                                     needed from such
>>>>>> >                                                     API, in the end
>>>>>> we
>>>>>> >                                                     might end up
>>>>>> with
>>>>>> >                                                     something
>>>>>> similar to
>>>>>> >                                                     JsonObject API,
>>>>>> but
>>>>>> >                                                     likely not
>>>>>> >
>>>>>> >
>>>>>> >                                         I dont get that point as a
>>>>>> blocker
>>>>>> >
>>>>>> >                                                   o ;
>>>>>> >                                               * represents JSON,
>>>>>> which
>>>>>> >                                                 is not an API but an
>>>>>> >                                                 object notation:
>>>>>> >                                                   o it is defined as
>>>>>> >                                                     unicode string
>>>>>> in a
>>>>>> >                                                     certain format.
>>>>>> If
>>>>>> >                                                     you choose to
>>>>>> adhere
>>>>>> >                                                     to ECMA-404,
>>>>>> then it
>>>>>> >                                                     doesn't sound
>>>>>> like
>>>>>> >                                                     JsonObject can
>>>>>> >                                                     represent an
>>>>>> Avro
>>>>>> >                                                     object, if I'm
>>>>>> >                                                     reading it
>>>>>> right;
>>>>>> >
>>>>>> >
>>>>>> >                                         It is in the generator
>>>>>> impl, you
>>>>>> >                                         can impl an avrogenerator.
>>>>>> >
>>>>>> >                                               * doesn't define a
>>>>>> type
>>>>>> >                                                 system (JSON does,
>>>>>> but
>>>>>> >                                                 it's lacking):
>>>>>> >                                                   o for example,
>>>>>> JSON
>>>>>> >                                                     doesn't define
>>>>>> >                                                     semantics for
>>>>>> numbers;
>>>>>> >                                                   o doesn't define
>>>>>> >                                                     date/time types;
>>>>>> >                                                   o doesn't allow
>>>>>> >                                                     extending JSON
>>>>>> type
>>>>>> >                                                     system at all;
>>>>>> >
>>>>>> >
>>>>>> >                                         That is why you need a
>>>>>> metada
>>>>>> >                                         object, or simpler, a schema
>>>>>> >                                         with that data. Json or beam
>>>>>> >                                         record doesnt help here and
>>>>>> you
>>>>>> >                                         end up on the same outcome
>>>>>> if
>>>>>> >                                         you think about it.
>>>>>> >
>>>>>> >                                               * lacks schemas;
>>>>>> >
>>>>>> >                                         Jsonschema are standard,
>>>>>> widely
>>>>>> >                                         spread and tooled compared
>>>>>> to
>>>>>> >                                         alternative.
>>>>>> >
>>>>>> >                                             You can definitely try
>>>>>> >                                             loosen the requirements
>>>>>> and
>>>>>> >                                             define everything in
>>>>>> JSON in
>>>>>> >                                             userland, but the point
>>>>>> of
>>>>>> >                                             Row/Schema is to avoid
>>>>>> it
>>>>>> >                                             and define everything in
>>>>>> >                                             Beam model, which can be
>>>>>> >                                             extended, mapped to
>>>>>> JSON,
>>>>>> >                                             Avro, BigQuery Schemas,
>>>>>> >                                             custom binary format
>>>>>> etc.,
>>>>>> >                                             with same semantics
>>>>>> across
>>>>>> >                                             beam SDKs.
>>>>>> >
>>>>>> >
>>>>>> >                                         This is what jsonp would
>>>>>> allow
>>>>>> >                                         with the benefit of a
>>>>>> natural
>>>>>> >                                         pojo support through jsonb.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >                                             On Thu, Apr 26, 2018 at
>>>>>> >                                             12:28 PM Romain
>>>>>> Manni-Bucau
>>>>>> >                                             <rmannibu...@gmail.com
>>>>>> >                                             <mailto:
>>>>>> rmannibu...@gmail.com>>
>>>>>> >                                             wrote:
>>>>>> >
>>>>>> >                                                 Just to let it be
>>>>>> clear
>>>>>> >                                                 and let me
>>>>>> understand:
>>>>>> >                                                 how is BeamRecord
>>>>>> >                                                 different from a
>>>>>> >                                                 JsonObject which is
>>>>>> an
>>>>>> >                                                 API without
>>>>>> >                                                 implementation (not
>>>>>> >                                                 event a json one
>>>>>> OOTB)?
>>>>>> >                                                 Advantage of json
>>>>>> *api*
>>>>>> >                                                 are indeed natural
>>>>>> >                                                 mapping (jsonb is
>>>>>> based
>>>>>> >                                                 on jsonp so no new
>>>>>> >                                                 binding to
>>>>>> reinvent) and
>>>>>> >                                                 simple serialization
>>>>>> >                                                 (json+gzip for ex,
>>>>>> or
>>>>>> >                                                 avro if you want to
>>>>>> be
>>>>>> >                                                 geeky).
>>>>>> >
>>>>>> >                                                 I fail to see the
>>>>>> point
>>>>>> >                                                 to rebuild an
>>>>>> ecosystem ATM.
>>>>>> >
>>>>>> >                                                 Le 26 avr. 2018
>>>>>> 19:12,
>>>>>> >                                                 "Reuven Lax"
>>>>>> >                                                 <re...@google.com
>>>>>> >                                                 <mailto:
>>>>>> re...@google.com>>
>>>>>> >                                                 a écrit :
>>>>>> >
>>>>>> >                                                     Exactly what JB
>>>>>> >                                                     said. We will
>>>>>> write
>>>>>> >                                                     a generic
>>>>>> conversion
>>>>>> >                                                     from Avro (or
>>>>>> json)
>>>>>> >                                                     to Beam schemas,
>>>>>> >                                                     which will make
>>>>>> them
>>>>>> >                                                     work
>>>>>> transparently
>>>>>> >                                                     with SQL. The
>>>>>> plan
>>>>>> >                                                     is also to
>>>>>> migrate
>>>>>> >                                                     Anton's work so
>>>>>> that
>>>>>> >                                                     POJOs works
>>>>>> >                                                     generically for
>>>>>> any
>>>>>> >                                                     schema.
>>>>>> >
>>>>>> >                                                     Reuven
>>>>>> >
>>>>>> >                                                     On Thu, Apr 26,
>>>>>> 2018
>>>>>> >                                                     at 1:17 AM
>>>>>> >                                                     Jean-Baptiste
>>>>>> Onofré
>>>>>> >                                                     <
>>>>>> j...@nanthrax.net
>>>>>> >                                                     <mailto:
>>>>>> j...@nanthrax.net>>
>>>>>> >                                                     wrote:
>>>>>> >
>>>>>> >                                                         For now we
>>>>>> have
>>>>>> >                                                         a generic
>>>>>> schema
>>>>>> >                                                         interface.
>>>>>> >                                                         Json-b can
>>>>>> be an
>>>>>> >                                                         impl, avro
>>>>>> could
>>>>>> >                                                         be another
>>>>>> one.
>>>>>> >
>>>>>> >                                                         Regards
>>>>>> >                                                         JB
>>>>>> >                                                         Le 26 avr.
>>>>>> 2018,
>>>>>> >                                                         à 12:08,
>>>>>> Romain
>>>>>> >                                                         Manni-Bucau
>>>>>> >                                                         <
>>>>>> rmannibu...@gmail.com
>>>>>> >                                                         <mailto:
>>>>>> rmannibu...@gmail.com>>
>>>>>> >                                                         a écrit:
>>>>>> >
>>>>>> >                                                             Hmm,
>>>>>> >
>>>>>> >                                                             avro has
>>>>>> >                                                             still
>>>>>> the
>>>>>> >
>>>>>>  pitfalls to
>>>>>> >                                                             have an
>>>>>> >
>>>>>>  uncontrolled
>>>>>> >                                                             stack
>>>>>> which
>>>>>> >                                                             brings
>>>>>> way
>>>>>> >                                                             too much
>>>>>> >
>>>>>>  dependencies
>>>>>> >                                                             to be
>>>>>> part
>>>>>> >                                                             of any
>>>>>> API,
>>>>>> >                                                             this is
>>>>>> why
>>>>>> >                                                             I
>>>>>> proposed a
>>>>>> >                                                             JSON-P
>>>>>> based
>>>>>> >                                                             API
>>>>>> >
>>>>>>  (JsonObject)
>>>>>> >                                                             with a
>>>>>> >                                                             custom
>>>>>> beam
>>>>>> >                                                             entry
>>>>>> for
>>>>>> >                                                             some
>>>>>> >                                                             metadata
>>>>>> >
>>>>>>  (headers "à
>>>>>> >                                                             la
>>>>>> Camel").
>>>>>> >
>>>>>> >
>>>>>> >                                                             Romain
>>>>>> >
>>>>>>  Manni-Bucau
>>>>>> >
>>>>>>  @rmannibucau
>>>>>> >                                                             <
>>>>>> https://twitter.com/rmannibucau>
>>>>>> >                                                             |   Blog
>>>>>> >                                                             <
>>>>>> https://rmannibucau.metawerx.net/> |
>>>>>> >                                                             Old Blog
>>>>>> >                                                             <
>>>>>> http://rmannibucau.wordpress.com>
>>>>>> >                                                             |
>>>>>> Github
>>>>>> >                                                             <
>>>>>> https://github.com/rmannibucau> |
>>>>>> >                                                             LinkedIn
>>>>>> >                                                             <
>>>>>> https://www.linkedin.com/in/rmannibucau> |
>>>>>> >                                                             Book
>>>>>> >                                                             <
>>>>>> https://www.packtpub.com/application-development/java-ee-8-high-performance
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>  2018-04-26
>>>>>> >                                                             9:59
>>>>>> >
>>>>>>  GMT+02:00
>>>>>> >
>>>>>>  Jean-Baptiste Onofré
>>>>>> >                                                             <
>>>>>> j...@nanthrax.net
>>>>>> >                                                             <mailto:
>>>>>> j...@nanthrax.net>>:
>>>>>> >
>>>>>> >
>>>>>> >                                                                 Hi
>>>>>> Ismael
>>>>>> >
>>>>>> >                                                                 You
>>>>>> mean
>>>>>> >
>>>>>>  directly
>>>>>> >                                                                 in
>>>>>> Beam
>>>>>> >                                                                 SQL
>>>>>> ?
>>>>>> >
>>>>>> >                                                                 That
>>>>>> >
>>>>>>  will be
>>>>>> >
>>>>>>  part of
>>>>>> >
>>>>>>  schema
>>>>>> >
>>>>>>  support:
>>>>>> >
>>>>>>  generic
>>>>>> >
>>>>>>  record
>>>>>> >
>>>>>>  could be
>>>>>> >                                                                 one
>>>>>> of
>>>>>> >                                                                 the
>>>>>> >
>>>>>>  payload
>>>>>> >                                                                 with
>>>>>> >
>>>>>>  across
>>>>>> >
>>>>>>  schema.
>>>>>> >
>>>>>> >
>>>>>>  Regards
>>>>>> >                                                                 JB
>>>>>> >                                                                 Le
>>>>>> 26
>>>>>> >                                                                 avr.
>>>>>> >
>>>>>>  2018, à
>>>>>> >
>>>>>>  11:39,
>>>>>> >
>>>>>>  "Ismaël
>>>>>> >
>>>>>>  Mejía" <
>>>>>> >
>>>>>> ieme...@gmail.com
>>>>>> >
>>>>>>  <mailto:ieme...@gmail.com>>
>>>>>> >                                                                 a
>>>>>> écrit:
>>>>>> >
>>>>>> >
>>>>>>  Hello Anton,
>>>>>> >
>>>>>> >
>>>>>>  Thanks for the descriptive email and the really useful work. Any plans
>>>>>> >
>>>>>>  to tackle PCollections of GenericRecord/IndexedRecords? it seems Avro
>>>>>> >
>>>>>>  is a natural fit for this approach too.
>>>>>> >
>>>>>> >
>>>>>>  Regards,
>>>>>> >
>>>>>>  Ismaël
>>>>>> >
>>>>>> >
>>>>>>  On Wed, Apr 25, 2018 at 9:04 PM, Anton Kedin <ke...@google.com
>>>>>> >
>>>>>>  <mailto:ke...@google.com>> wrote:
>>>>>> >
>>>>>> >
>>>>>>
>>>>>> >
>>>>>> >
>>>>>>      Hi,
>>>>>> >
>>>>>> >
>>>>>>      I want
>>>>>> >
>>>>>>      to
>>>>>> >
>>>>>>      highlight
>>>>>> >
>>>>>>      a couple
>>>>>> >
>>>>>>      of
>>>>>> >
>>>>>>      improvements
>>>>>> >
>>>>>>      to
>>>>>> >
>>>>>>      Beam
>>>>>> >
>>>>>>      SQL
>>>>>> >
>>>>>>      we
>>>>>> >
>>>>>>      have
>>>>>> >
>>>>>>      been
>>>>>> >
>>>>>> >
>>>>>>      working
>>>>>> >
>>>>>>      on
>>>>>> >
>>>>>>      recently
>>>>>> >
>>>>>>      which
>>>>>> >
>>>>>>      are
>>>>>> >
>>>>>>      targeted
>>>>>> >
>>>>>>      to
>>>>>> >
>>>>>>      make
>>>>>> >
>>>>>>      Beam
>>>>>> >
>>>>>>      SQL
>>>>>> >
>>>>>>      API
>>>>>> >
>>>>>>      easier
>>>>>> >
>>>>>>      to
>>>>>> >
>>>>>>      use.
>>>>>> >
>>>>>> >
>>>>>>      Specifically
>>>>>> >
>>>>>>      these
>>>>>> >
>>>>>>      features
>>>>>> >
>>>>>>      simplify
>>>>>> >
>>>>>>      conversion
>>>>>> >
>>>>>>      of
>>>>>> >
>>>>>>      Java
>>>>>> >
>>>>>>      Beans
>>>>>> >
>>>>>>      and
>>>>>> >
>>>>>>      JSON
>>>>>> >
>>>>>> >
>>>>>>      strings
>>>>>> >
>>>>>>      to
>>>>>> >
>>>>>>      Rows.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>      Feel
>>>>>> >
>>>>>>      free
>>>>>> >
>>>>>>      to
>>>>>> >
>>>>>>      try
>>>>>> >
>>>>>>      this
>>>>>> >
>>>>>>      and
>>>>>> >
>>>>>>      send
>>>>>> >
>>>>>>      any
>>>>>> >
>>>>>>      bugs/comments/PRs
>>>>>> >
>>>>>>      my
>>>>>> >
>>>>>>      way.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>      **Caveat:
>>>>>> >
>>>>>>      this
>>>>>> >
>>>>>>      is
>>>>>> >
>>>>>>      still
>>>>>> >
>>>>>>      work
>>>>>> >
>>>>>>      in
>>>>>> >
>>>>>>      progress,
>>>>>> >
>>>>>>      and
>>>>>> >
>>>>>>      has
>>>>>> >
>>>>>>      known
>>>>>> >
>>>>>>      bugs
>>>>>> >
>>>>>>      and
>>>>>> >
>>>>>>      incomplete
>>>>>> >
>>>>>> >
>>>>>>      features,
>>>>>> >
>>>>>>      see
>>>>>> >
>>>>>>      below
>>>>>> >
>>>>>>      for
>>>>>> >
>>>>>>      details.**
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>      Background
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>      Beam
>>>>>> >
>>>>>>      SQL
>>>>>> >
>>>>>>      queries
>>>>>> >
>>>>>>      can
>>>>>> >
>>>>>>      only
>>>>>> >
>>>>>>      be
>>>>>> >
>>>>>>      applied
>>>>>> >
>>>>>>      to
>>>>>> >
>>>>>>      PCollection<Row>.
>>>>>> >
>>>>>>      This
>>>>>> >
>>>>>>      means
>>>>>> >
>>>>>>      that
>>>>>> >
>>>>>> >
>>>>>>      users
>>>>>> >
>>>>>>      need
>>>>>> >
>>>>>>      to
>>>>>> >
>>>>>>      convert
>>>>>> >
>>>>>>      whatever
>>>>>> >
>>>>>>      PCollection
>>>>>> >
>>>>>>      elements
>>>>>> >
>>>>>>      they
>>>>>> >
>>>>>>      have
>>>>>> >
>>>>>>      to
>>>>>> >
>>>>>>      Rows
>>>>>> >
>>>>>>      before
>>>>>> >
>>>>>> >
>>>>>>      querying
>>>>>> >
>>>>>>      them
>>>>>> >
>>>>>>      with
>>>>>> >
>>>>>>      SQL.
>>>>>> >
>>>>>>      This
>>>>>> >
>>>>>>      usually
>>>>>> >
>>>>>>      requires
>>>>>> >
>>>>>>
>>>>>
>>>>>

Re: Beam SQL Improvements

Reply via email to