Re: Beam SQL Improvements

Lukasz Cwik Mon, 04 Jun 2018 11:10:43 -0700

Shouldn't the runner isolate each instance of the pipeline behind an
appropriate class loader?


On Sun, Jun 3, 2018 at 12:45 PM Reuven Lax <[email protected]> wrote:

> Just an update: Romain and I chatted on Slack, and I think I understand
> his concern. The concern wasn't specifically about schemas, rather about
> having a generic way to register per-ParDo state that has worker lifetime.
> As evidence that such is needed, in many cases static variables are used to
> simiulate that. static variables however have downsides - if two pipelines
> are run on the same JVM (happens often with unit tests, and there's nothing
> that prevents a runner from doing so in a production environment), these
> static variables will interfere with each other.
>
> On Thu, May 24, 2018 at 12:30 AM Reuven Lax <[email protected]> wrote:
>
>> Romain, maybe it would be useful for us to find some time on slack. I'd
>> like to understand your concerns. Also keep in mind that I'm tagging all
>> these classes as Experimental for now, so we can definitely change these
>> interfaces around if we decide they are not the best ones.
>>
>> Reuven
>>
>> On Tue, May 22, 2018 at 11:35 PM Romain Manni-Bucau <
>> [email protected]> wrote:
>>
>>> Why not extending ProcessContext to add the new remapped output? But
>>> looks good (the part i dont like is that creating a new context each time a
>>> new feature is added is hurting users. What when beam will add some
>>> reactive support? ReactiveOutputReceiver?)
>>>
>>> Pipeline sounds the wrong storage since once distributed you serialized
>>> the instances so kind of broke the lifecycle of the original instance and
>>> have no real release/close hook on them anymore right? Not sure we can do
>>> better than dofn/source embedded instances today.
>>>
>>>
>>>
>>>
>>> Le mer. 23 mai 2018 08:02, Romain Manni-Bucau <[email protected]> a
>>> écrit :
>>>
>>>>
>>>>
>>>> Le mer. 23 mai 2018 07:55, Jean-Baptiste Onofré <[email protected]> a
>>>> écrit :
>>>>
>>>>> Hi,
>>>>>
>>>>> IMHO, it would be better to have a explicit transform/IO as converter.
>>>>>
>>>>> It would be easier for users.
>>>>>
>>>>> Another option would be to use a "TypeConverter/SchemaConverter" map as
>>>>> we do in Camel: Beam could check the source/destination "type" and
>>>>> check
>>>>> in the map if there's a converter available. This map can be store as
>>>>> part of the pipeline (as we do for filesystem registration).
>>>>>
>>>>
>>>>
>>>> It works in camel because it is not strongly typed, isnt it? So can
>>>> require a beam new pipeline api.
>>>>
>>>> +1 for the explicit transform, if added to the pipeline api as coder it
>>>> wouldnt break the fluent api:
>>>>
>>>> p.apply(io).setOutputType(Foo.class)
>>>>
>>>> Coders can be a workaround since they owns the type but since the
>>>> pcollection is the real owner it is surely saner this way, no?
>>>>
>>>> Also it needs to ensure all converters are present before running the
>>>> pipeline probably, no implicit environment converter support is probably
>>>> good to start to avoid late surprises.
>>>>
>>>>
>>>>
>>>>> My $0.01
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>> On 23/05/2018 07:51, Romain Manni-Bucau wrote:
>>>>> > How does it work on the pipeline side?
>>>>> > Do you generate these "virtual" IO at build time to enable the fluent
>>>>> > API to work not erasing generics?
>>>>> >
>>>>> > ex: SQL(row)->BigQuery(native) will not compile so we need a
>>>>> > SQL(row)->BigQuery(row)
>>>>> >
>>>>> > Side note unrelated to Row: if you add another registry maybe a
>>>>> pretask
>>>>> > is to ensure beam has a kind of singleton/context to avoid to
>>>>> duplicate
>>>>> > it or not track it properly. These kind of converters will need a
>>>>> global
>>>>> > close and not only per record in general:
>>>>> > converter.init();converter.convert(row);....converter.destroy();,
>>>>> > otherwise it easily leaks. This is why it can require some way to not
>>>>> > recreate it. A quick fix, if you are in bytebuddy already, can be to
>>>>> add
>>>>> > it to setup/teardown pby, being more global would be nicer but is
>>>>> more
>>>>> > challenging.
>>>>> >
>>>>> > Romain Manni-Bucau
>>>>> > @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>>>> > <https://rmannibucau.metawerx.net/> | Old Blog
>>>>> > <http://rmannibucau.wordpress.com> | Github
>>>>> > <https://github.com/rmannibucau> | LinkedIn
>>>>> > <https://www.linkedin.com/in/rmannibucau> | Book
>>>>> > <
>>>>> https://www.packtpub.com/application-development/java-ee-8-high-performance
>>>>> >
>>>>> >
>>>>> >
>>>>> > Le mer. 23 mai 2018 à 07:22, Reuven Lax <[email protected]
>>>>> > <mailto:[email protected]>> a écrit :
>>>>> >
>>>>> >     No - the only modules we need to add to core are the ones we
>>>>> choose
>>>>> >     to add. For example, I will probably add a registration for
>>>>> >     TableRow/TableSchema (GCP BigQuery) so these can work seamlessly
>>>>> >     with schemas. However I will add that to the GCP module, so only
>>>>> >     someone depending on that module need to pull in that dependency.
>>>>> >     The Java ServiceLoader framework can be used by these modules to
>>>>> >     register schemas for their types (we already do something similar
>>>>> >     for FileSystem and for coders as well).
>>>>> >
>>>>> >     BTW, right now the conversion back and forth between Row objects
>>>>> I'm
>>>>> >     doing in the ByteBuddy generated bytecode that we generate in
>>>>> order
>>>>> >     to invoke DoFns.
>>>>> >
>>>>> >     Reuven
>>>>> >
>>>>> >     On Tue, May 22, 2018 at 10:04 PM Romain Manni-Bucau
>>>>> >     <[email protected] <mailto:[email protected]>> wrote:
>>>>> >
>>>>> >         Hmm, the pluggability part is close to what I wanted to do
>>>>> with
>>>>> >         JsonObject as a main API (to avoid to redo a "row" API and
>>>>> >         schema API)
>>>>> >         Row.as(Class<T>) sounds good but then, does it mean we'll get
>>>>> >         beam-sdk-java-row-jsonobject like modules (I'm not against,
>>>>> just
>>>>> >         trying to understand here)?
>>>>> >         If so, how an IO can use as() with the type it expects?
>>>>> Doesnt
>>>>> >         it lead to have a tons of  these modules at the end?
>>>>> >
>>>>> >         Romain Manni-Bucau
>>>>> >         @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>>>> >         <https://rmannibucau.metawerx.net/> | Old Blog
>>>>> >         <http://rmannibucau.wordpress.com> | Github
>>>>> >         <https://github.com/rmannibucau> | LinkedIn
>>>>> >         <https://www.linkedin.com/in/rmannibucau> | Book
>>>>> >         <
>>>>> https://www.packtpub.com/application-development/java-ee-8-high-performance
>>>>> >
>>>>> >
>>>>> >
>>>>> >         Le mer. 23 mai 2018 à 04:57, Reuven Lax <[email protected]
>>>>> >         <mailto:[email protected]>> a écrit :
>>>>> >
>>>>> >             By the way Romain, if you have specific scenarios in
>>>>> mind I
>>>>> >             would love to hear them. I can try and guess what exactly
>>>>> >             you would like to get out of schemas, but it would work
>>>>> >             better if you gave me concrete scenarios that you would
>>>>> like
>>>>> >             to work.
>>>>> >
>>>>> >             Reuven
>>>>> >
>>>>> >             On Tue, May 22, 2018 at 7:45 PM Reuven Lax <
>>>>> [email protected]
>>>>> >             <mailto:[email protected]>> wrote:
>>>>> >
>>>>> >                 Yeah, what I'm working on will help with IO.
>>>>> Basically
>>>>> >                 if you register a function with SchemaRegistry that
>>>>> >                 converts back and forth between a type (say
>>>>> JsonObject)
>>>>> >                 and a Beam Row, then it is applied by the framework
>>>>> >                 behind the scenes as part of DoFn invocation.
>>>>> Concrete
>>>>> >                 example: let's say I have an IO that reads json
>>>>> objects
>>>>> >                   class MyJsonIORead extends PTransform<PBegin,
>>>>> >                 JsonObject> {...}
>>>>> >
>>>>> >                 If you register a schema for this type (or you can
>>>>> also
>>>>> >                 just set the schema directly on the output
>>>>> PCollection),
>>>>> >                 then Beam knows how to convert back and forth between
>>>>> >                 JsonObject and Row. So the next ParDo can look like
>>>>> >
>>>>> >                 p.apply(new MyJsonIORead())
>>>>> >                 .apply(ParDo.of(new DoFn<JsonObject, T>....
>>>>> >                     @ProcessElement void process(@Element Row row) {
>>>>> >                    })
>>>>> >
>>>>> >                 And Beam will automatically convert JsonObject to a
>>>>> Row
>>>>> >                 for processing (you aren't forced to do this of
>>>>> course -
>>>>> >                 you can always ask for it as a JsonObject).
>>>>> >
>>>>> >                 The same is true for output. If you have a sink that
>>>>> >                 takes in JsonObject but the transform before it
>>>>> produces
>>>>> >                 Row objects (for instance - because the transform
>>>>> before
>>>>> >                 it is Beam SQL), Beam can automatically convert Row
>>>>> back
>>>>> >                 to JsonObject for you.
>>>>> >
>>>>> >                 All of this was detailed in the Schema doc I shared a
>>>>> >                 few months ago. There was a lot of discussion on that
>>>>> >                 document from various parties, and some of this API
>>>>> is a
>>>>> >                 result of that discussion. This is also working in
>>>>> the
>>>>> >                 branch JB and I were working on, though not yet
>>>>> >                 integrated back to master.
>>>>> >
>>>>> >                 I would like to actually go further and make Row an
>>>>> >                 interface and provide a way to automatically put a
>>>>> Row
>>>>> >                 interface on top of any other object (e.g.
>>>>> JsonObject,
>>>>> >                 Pojo, etc.) This won't change the way the user writes
>>>>> >                 code, but instead of Beam having to copy and convert
>>>>> at
>>>>> >                 each stage (e.g. from JsonObject to Row) it simply
>>>>> will
>>>>> >                 create a Row object that uses the the JsonObject as
>>>>> its
>>>>> >                 underlying storage.
>>>>> >
>>>>> >                 Reuven
>>>>> >
>>>>> >                 On Tue, May 22, 2018 at 11:37 AM Romain Manni-Bucau
>>>>> >                 <[email protected] <mailto:[email protected]
>>>>> >>
>>>>> >                 wrote:
>>>>> >
>>>>> >                     Well, beam can implement a new mapper but it
>>>>> doesnt
>>>>> >                     help for io. Most of modern backends will take
>>>>> json
>>>>> >                     directly, even javax one and it must stay
>>>>> generic.
>>>>> >
>>>>> >                     Then since json to pojo mapping is already done a
>>>>> >                     dozen of times, not sure it is worth it for now.
>>>>> >
>>>>> >                     Le mar. 22 mai 2018 20:27, Reuven Lax
>>>>> >                     <[email protected] <mailto:[email protected]>> a
>>>>> écrit :
>>>>> >
>>>>> >                         We can do even better btw. Building a
>>>>> >                         SchemaRegistry where automatic conversions
>>>>> can
>>>>> >                         be registered between schema and Java data
>>>>> >                         types. With this the user won't even need a
>>>>> DoFn
>>>>> >                         to do the conversion.
>>>>> >
>>>>> >                         On Tue, May 22, 2018, 10:13 AM Romain
>>>>> >                         Manni-Bucau <[email protected]
>>>>> >                         <mailto:[email protected]>> wrote:
>>>>> >
>>>>> >                             Hi guys,
>>>>> >
>>>>> >                             Checked out what has been done on schema
>>>>> >                             model and think it is acceptable -
>>>>> regarding
>>>>> >                             the json debate -
>>>>> >                             if
>>>>> https://issues.apache.org/jira/browse/BEAM-4381
>>>>> >                             can be fixed.
>>>>> >
>>>>> >                             High level, it is about providing a
>>>>> >                             mainstream and not too impacting model
>>>>> OOTB
>>>>> >                             and JSON seems the most valid option for
>>>>> >                             now, at least for IO and some user
>>>>> transforms.
>>>>> >
>>>>> >                             Wdyt?
>>>>> >
>>>>> >                             Le ven. 27 avr. 2018 18:36, Romain
>>>>> >                             Manni-Bucau <[email protected]
>>>>> >                             <mailto:[email protected]>> a
>>>>> écrit :
>>>>> >
>>>>> >                                  Can give it a try end of may, sure.
>>>>> >                                 (holidays and work constraints will
>>>>> make
>>>>> >                                 it hard before).
>>>>> >
>>>>> >                                 Le 27 avr. 2018 18:26, "Anton Kedin"
>>>>> >                                 <[email protected]
>>>>> >                                 <mailto:[email protected]>> a écrit :
>>>>> >
>>>>> >                                     Romain,
>>>>> >
>>>>> >                                     I don't believe that JSON
>>>>> approach
>>>>> >                                     was investigated very
>>>>> thoroughIy. I
>>>>> >                                     mentioned few reasons which will
>>>>> >                                     make it not the best choice my
>>>>> >                                     opinion, but I may be wrong. Can
>>>>> you
>>>>> >                                     put together a design doc or a
>>>>> >                                     prototype?
>>>>> >
>>>>> >                                     Thank you,
>>>>> >                                     Anton
>>>>> >
>>>>> >
>>>>> >                                     On Thu, Apr 26, 2018 at 10:17 PM
>>>>> >                                     Romain Manni-Bucau
>>>>> >                                     <[email protected]
>>>>> >                                     <mailto:[email protected]>>
>>>>> wrote:
>>>>> >
>>>>> >
>>>>> >
>>>>> >                                         Le 26 avr. 2018 23:13, "Anton
>>>>> >                                         Kedin" <[email protected]
>>>>> >                                         <mailto:[email protected]>>
>>>>> a écrit :
>>>>> >
>>>>> >                                             BeamRecord (Row) has very
>>>>> >                                             little in common with
>>>>> >                                             JsonObject (I assume
>>>>> you're
>>>>> >                                             talking about
>>>>> javax.json),
>>>>> >                                             except maybe some
>>>>> >                                             similarities of the API.
>>>>> Few
>>>>> >                                             reasons why JsonObject
>>>>> >                                             doesn't work:
>>>>> >
>>>>> >                                               * it is a Java EE API:
>>>>> >                                                   o Beam SDK is not
>>>>> >                                                     limited to Java.
>>>>> >                                                     There are
>>>>> probably
>>>>> >                                                     similar APIs for
>>>>> >                                                     other languages
>>>>> but
>>>>> >                                                     they might not
>>>>> >                                                     necessarily carry
>>>>> >                                                     the same
>>>>> semantics /
>>>>> >                                                     APIs;
>>>>> >
>>>>> >
>>>>> >                                         Not a big deal I think. At
>>>>> least
>>>>> >                                         not a technical blocker.
>>>>> >
>>>>> >                                                   o It can change
>>>>> >                                                     between Java
>>>>> versions;
>>>>> >
>>>>> >                                         No, this is javaee ;).
>>>>> >
>>>>> >
>>>>> >                                                   o Current Beam java
>>>>> >                                                     implementation
>>>>> is an
>>>>> >                                                     experimental
>>>>> feature
>>>>> >                                                     to identify
>>>>> what's
>>>>> >                                                     needed from such
>>>>> >                                                     API, in the end
>>>>> we
>>>>> >                                                     might end up with
>>>>> >                                                     something
>>>>> similar to
>>>>> >                                                     JsonObject API,
>>>>> but
>>>>> >                                                     likely not
>>>>> >
>>>>> >
>>>>> >                                         I dont get that point as a
>>>>> blocker
>>>>> >
>>>>> >                                                   o ;
>>>>> >                                               * represents JSON,
>>>>> which
>>>>> >                                                 is not an API but an
>>>>> >                                                 object notation:
>>>>> >                                                   o it is defined as
>>>>> >                                                     unicode string
>>>>> in a
>>>>> >                                                     certain format.
>>>>> If
>>>>> >                                                     you choose to
>>>>> adhere
>>>>> >                                                     to ECMA-404,
>>>>> then it
>>>>> >                                                     doesn't sound
>>>>> like
>>>>> >                                                     JsonObject can
>>>>> >                                                     represent an Avro
>>>>> >                                                     object, if I'm
>>>>> >                                                     reading it right;
>>>>> >
>>>>> >
>>>>> >                                         It is in the generator impl,
>>>>> you
>>>>> >                                         can impl an avrogenerator.
>>>>> >
>>>>> >                                               * doesn't define a type
>>>>> >                                                 system (JSON does,
>>>>> but
>>>>> >                                                 it's lacking):
>>>>> >                                                   o for example, JSON
>>>>> >                                                     doesn't define
>>>>> >                                                     semantics for
>>>>> numbers;
>>>>> >                                                   o doesn't define
>>>>> >                                                     date/time types;
>>>>> >                                                   o doesn't allow
>>>>> >                                                     extending JSON
>>>>> type
>>>>> >                                                     system at all;
>>>>> >
>>>>> >
>>>>> >                                         That is why you need a metada
>>>>> >                                         object, or simpler, a schema
>>>>> >                                         with that data. Json or beam
>>>>> >                                         record doesnt help here and
>>>>> you
>>>>> >                                         end up on the same outcome if
>>>>> >                                         you think about it.
>>>>> >
>>>>> >                                               * lacks schemas;
>>>>> >
>>>>> >                                         Jsonschema are standard,
>>>>> widely
>>>>> >                                         spread and tooled compared to
>>>>> >                                         alternative.
>>>>> >
>>>>> >                                             You can definitely try
>>>>> >                                             loosen the requirements
>>>>> and
>>>>> >                                             define everything in
>>>>> JSON in
>>>>> >                                             userland, but the point
>>>>> of
>>>>> >                                             Row/Schema is to avoid it
>>>>> >                                             and define everything in
>>>>> >                                             Beam model, which can be
>>>>> >                                             extended, mapped to JSON,
>>>>> >                                             Avro, BigQuery Schemas,
>>>>> >                                             custom binary format
>>>>> etc.,
>>>>> >                                             with same semantics
>>>>> across
>>>>> >                                             beam SDKs.
>>>>> >
>>>>> >
>>>>> >                                         This is what jsonp would
>>>>> allow
>>>>> >                                         with the benefit of a natural
>>>>> >                                         pojo support through jsonb.
>>>>> >
>>>>> >
>>>>> >
>>>>> >                                             On Thu, Apr 26, 2018 at
>>>>> >                                             12:28 PM Romain
>>>>> Manni-Bucau
>>>>> >                                             <[email protected]
>>>>> >                                             <mailto:
>>>>> [email protected]>>
>>>>> >                                             wrote:
>>>>> >
>>>>> >                                                 Just to let it be
>>>>> clear
>>>>> >                                                 and let me
>>>>> understand:
>>>>> >                                                 how is BeamRecord
>>>>> >                                                 different from a
>>>>> >                                                 JsonObject which is
>>>>> an
>>>>> >                                                 API without
>>>>> >                                                 implementation (not
>>>>> >                                                 event a json one
>>>>> OOTB)?
>>>>> >                                                 Advantage of json
>>>>> *api*
>>>>> >                                                 are indeed natural
>>>>> >                                                 mapping (jsonb is
>>>>> based
>>>>> >                                                 on jsonp so no new
>>>>> >                                                 binding to reinvent)
>>>>> and
>>>>> >                                                 simple serialization
>>>>> >                                                 (json+gzip for ex, or
>>>>> >                                                 avro if you want to
>>>>> be
>>>>> >                                                 geeky).
>>>>> >
>>>>> >                                                 I fail to see the
>>>>> point
>>>>> >                                                 to rebuild an
>>>>> ecosystem ATM.
>>>>> >
>>>>> >                                                 Le 26 avr. 2018
>>>>> 19:12,
>>>>> >                                                 "Reuven Lax"
>>>>> >                                                 <[email protected]
>>>>> >                                                 <mailto:
>>>>> [email protected]>>
>>>>> >                                                 a écrit :
>>>>> >
>>>>> >                                                     Exactly what JB
>>>>> >                                                     said. We will
>>>>> write
>>>>> >                                                     a generic
>>>>> conversion
>>>>> >                                                     from Avro (or
>>>>> json)
>>>>> >                                                     to Beam schemas,
>>>>> >                                                     which will make
>>>>> them
>>>>> >                                                     work
>>>>> transparently
>>>>> >                                                     with SQL. The
>>>>> plan
>>>>> >                                                     is also to
>>>>> migrate
>>>>> >                                                     Anton's work so
>>>>> that
>>>>> >                                                     POJOs works
>>>>> >                                                     generically for
>>>>> any
>>>>> >                                                     schema.
>>>>> >
>>>>> >                                                     Reuven
>>>>> >
>>>>> >                                                     On Thu, Apr 26,
>>>>> 2018
>>>>> >                                                     at 1:17 AM
>>>>> >                                                     Jean-Baptiste
>>>>> Onofré
>>>>> >                                                     <[email protected]
>>>>> >                                                     <mailto:
>>>>> [email protected]>>
>>>>> >                                                     wrote:
>>>>> >
>>>>> >                                                         For now we
>>>>> have
>>>>> >                                                         a generic
>>>>> schema
>>>>> >                                                         interface.
>>>>> >                                                         Json-b can
>>>>> be an
>>>>> >                                                         impl, avro
>>>>> could
>>>>> >                                                         be another
>>>>> one.
>>>>> >
>>>>> >                                                         Regards
>>>>> >                                                         JB
>>>>> >                                                         Le 26 avr.
>>>>> 2018,
>>>>> >                                                         à 12:08,
>>>>> Romain
>>>>> >                                                         Manni-Bucau
>>>>> >                                                         <
>>>>> [email protected]
>>>>> >                                                         <mailto:
>>>>> [email protected]>>
>>>>> >                                                         a écrit:
>>>>> >
>>>>> >                                                             Hmm,
>>>>> >
>>>>> >                                                             avro has
>>>>> >                                                             still the
>>>>> >                                                             pitfalls
>>>>> to
>>>>> >                                                             have an
>>>>> >
>>>>>  uncontrolled
>>>>> >                                                             stack
>>>>> which
>>>>> >                                                             brings
>>>>> way
>>>>> >                                                             too much
>>>>> >
>>>>>  dependencies
>>>>> >                                                             to be
>>>>> part
>>>>> >                                                             of any
>>>>> API,
>>>>> >                                                             this is
>>>>> why
>>>>> >                                                             I
>>>>> proposed a
>>>>> >                                                             JSON-P
>>>>> based
>>>>> >                                                             API
>>>>> >
>>>>>  (JsonObject)
>>>>> >                                                             with a
>>>>> >                                                             custom
>>>>> beam
>>>>> >                                                             entry for
>>>>> >                                                             some
>>>>> >                                                             metadata
>>>>> >                                                             (headers
>>>>> "à
>>>>> >                                                             la
>>>>> Camel").
>>>>> >
>>>>> >
>>>>> >                                                             Romain
>>>>> >
>>>>>  Manni-Bucau
>>>>> >
>>>>>  @rmannibucau
>>>>> >                                                             <
>>>>> https://twitter.com/rmannibucau>
>>>>> >                                                             |   Blog
>>>>> >                                                             <
>>>>> https://rmannibucau.metawerx.net/> |
>>>>> >                                                             Old Blog
>>>>> >                                                             <
>>>>> http://rmannibucau.wordpress.com>
>>>>> >                                                             |  Github
>>>>> >                                                             <
>>>>> https://github.com/rmannibucau> |
>>>>> >                                                             LinkedIn
>>>>> >                                                             <
>>>>> https://www.linkedin.com/in/rmannibucau> |
>>>>> >                                                             Book
>>>>> >                                                             <
>>>>> https://www.packtpub.com/application-development/java-ee-8-high-performance
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>>  2018-04-26
>>>>> >                                                             9:59
>>>>> >                                                             GMT+02:00
>>>>> >
>>>>>  Jean-Baptiste Onofré
>>>>> >                                                             <
>>>>> [email protected]
>>>>> >                                                             <mailto:
>>>>> [email protected]>>:
>>>>> >
>>>>> >
>>>>> >                                                                 Hi
>>>>> Ismael
>>>>> >
>>>>> >                                                                 You
>>>>> mean
>>>>> >
>>>>>  directly
>>>>> >                                                                 in
>>>>> Beam
>>>>> >                                                                 SQL ?
>>>>> >
>>>>> >                                                                 That
>>>>> >                                                                 will
>>>>> be
>>>>> >                                                                 part
>>>>> of
>>>>> >
>>>>>  schema
>>>>> >
>>>>>  support:
>>>>> >
>>>>>  generic
>>>>> >
>>>>>  record
>>>>> >
>>>>>  could be
>>>>> >                                                                 one
>>>>> of
>>>>> >                                                                 the
>>>>> >
>>>>>  payload
>>>>> >                                                                 with
>>>>> >
>>>>>  across
>>>>> >
>>>>>  schema.
>>>>> >
>>>>> >
>>>>>  Regards
>>>>> >                                                                 JB
>>>>> >                                                                 Le 26
>>>>> >                                                                 avr.
>>>>> >
>>>>>  2018, à
>>>>> >
>>>>>  11:39,
>>>>> >
>>>>>  "Ismaël
>>>>> >
>>>>>  Mejía" <
>>>>> >
>>>>> [email protected]
>>>>> >
>>>>>  <mailto:[email protected]>>
>>>>> >                                                                 a
>>>>> écrit:
>>>>> >
>>>>> >
>>>>>  Hello Anton,
>>>>> >
>>>>> >
>>>>>  Thanks for the descriptive email and the really useful work. Any plans
>>>>> >
>>>>>  to tackle PCollections of GenericRecord/IndexedRecords? it seems Avro
>>>>> >
>>>>>  is a natural fit for this approach too.
>>>>> >
>>>>> >
>>>>>  Regards,
>>>>> >
>>>>>  Ismaël
>>>>> >
>>>>> >
>>>>>  On Wed, Apr 25, 2018 at 9:04 PM, Anton Kedin <[email protected]
>>>>> >
>>>>>  <mailto:[email protected]>> wrote:
>>>>> >
>>>>> >
>>>>>
>>>>> >
>>>>> >
>>>>>    Hi,
>>>>> >
>>>>> >
>>>>>    I want
>>>>> >
>>>>>    to
>>>>> >
>>>>>    highlight
>>>>> >
>>>>>    a couple
>>>>> >
>>>>>    of
>>>>> >
>>>>>    improvements
>>>>> >
>>>>>    to
>>>>> >
>>>>>    Beam
>>>>> >
>>>>>    SQL
>>>>> >
>>>>>    we
>>>>> >
>>>>>    have
>>>>> >
>>>>>    been
>>>>> >
>>>>> >
>>>>>    working
>>>>> >
>>>>>    on
>>>>> >
>>>>>    recently
>>>>> >
>>>>>    which
>>>>> >
>>>>>    are
>>>>> >
>>>>>    targeted
>>>>> >
>>>>>    to
>>>>> >
>>>>>    make
>>>>> >
>>>>>    Beam
>>>>> >
>>>>>    SQL
>>>>> >
>>>>>    API
>>>>> >
>>>>>    easier
>>>>> >
>>>>>    to
>>>>> >
>>>>>    use.
>>>>> >
>>>>> >
>>>>>    Specifically
>>>>> >
>>>>>    these
>>>>> >
>>>>>    features
>>>>> >
>>>>>    simplify
>>>>> >
>>>>>    conversion
>>>>> >
>>>>>    of
>>>>> >
>>>>>    Java
>>>>> >
>>>>>    Beans
>>>>> >
>>>>>    and
>>>>> >
>>>>>    JSON
>>>>> >
>>>>> >
>>>>>    strings
>>>>> >
>>>>>    to
>>>>> >
>>>>>    Rows.
>>>>> >
>>>>> >
>>>>> >
>>>>>    Feel
>>>>> >
>>>>>    free
>>>>> >
>>>>>    to
>>>>> >
>>>>>    try
>>>>> >
>>>>>    this
>>>>> >
>>>>>    and
>>>>> >
>>>>>    send
>>>>> >
>>>>>    any
>>>>> >
>>>>>    bugs/comments/PRs
>>>>> >
>>>>>    my
>>>>> >
>>>>>    way.
>>>>> >
>>>>> >
>>>>> >
>>>>>    **Caveat:
>>>>> >
>>>>>    this
>>>>> >
>>>>>    is
>>>>> >
>>>>>    still
>>>>> >
>>>>>    work
>>>>> >
>>>>>    in
>>>>> >
>>>>>    progress,
>>>>> >
>>>>>    and
>>>>> >
>>>>>    has
>>>>> >
>>>>>    known
>>>>> >
>>>>>    bugs
>>>>> >
>>>>>    and
>>>>> >
>>>>>    incomplete
>>>>> >
>>>>> >
>>>>>    features,
>>>>> >
>>>>>    see
>>>>> >
>>>>>    below
>>>>> >
>>>>>    for
>>>>> >
>>>>>    details.**
>>>>> >
>>>>> >
>>>>> >
>>>>>    Background
>>>>> >
>>>>> >
>>>>> >
>>>>>    Beam
>>>>> >
>>>>>    SQL
>>>>> >
>>>>>    queries
>>>>> >
>>>>>    can
>>>>> >
>>>>>    only
>>>>> >
>>>>>    be
>>>>> >
>>>>>    applied
>>>>> >
>>>>>    to
>>>>> >
>>>>>    PCollection<Row>.
>>>>> >
>>>>>    This
>>>>> >
>>>>>    means
>>>>> >
>>>>>    that
>>>>> >
>>>>> >
>>>>>    users
>>>>> >
>>>>>    need
>>>>> >
>>>>>    to
>>>>> >
>>>>>    convert
>>>>> >
>>>>>    whatever
>>>>> >
>>>>>    PCollection
>>>>> >
>>>>>    elements
>>>>> >
>>>>>    they
>>>>> >
>>>>>    have
>>>>> >
>>>>>    to
>>>>> >
>>>>>    Rows
>>>>> >
>>>>>    before
>>>>> >
>>>>> >
>>>>>    querying
>>>>> >
>>>>>    them
>>>>> >
>>>>>    with
>>>>> >
>>>>>    SQL.
>>>>> >
>>>>>    This
>>>>> >
>>>>>    usually
>>>>> >
>>>>>    requires
>>>>> >
>>>>>
>>>>
>>>>

Re: Beam SQL Improvements

Reply via email to