Re: Schema-Aware PCollections revisited

Reuven Lax Mon, 29 Jan 2018 16:10:31 -0800

On Mon, Jan 29, 2018 at 12:17 PM, Romain Manni-Bucau <[email protected]>
wrote:


> Hi
>
> I have some questions on this: how hierarchic schemas would work? Seems it
> is not really supported by the ecosystem (out of custom stuff) :(. How
> would it integrate smoothly with other generic record types - N bridges?
>

Do you mean nested schemas? What do you mean here?

>
> Concretely I wonder if using json API couldnt be beneficial: json-p is a
> nice generic abstraction with a built in querying mecanism (jsonpointer)
> but no actual serialization (even if json and binary json are very
> natural). The big advantage is to have a well known ecosystem - who doesnt
> know json today? - that beam can reuse for free: JsonObject (guess we dont
> want JsonValue abstraction) for the record type, jsonschema standard for
> the schema, jsonpointer for the delection/projection etc... It doesnt
> enforce the actual serialization (json, smile, avro, ...) but provide an
> expressive and alread known API so i see it as a big win-win for users (no
> need to learn a new API and use N bridges in all ways) and beam (impls are
> here and API design already thought).
>

I assume you're talking about the API for setting schemas, not using them.
Json has many downsides and I'm not sure it's true that everyone knows it;
there are also competing schema APIs, such as Avro etc.. However I think we
should give Json a fair evaluation before dismissing it.

>
> Wdyt?
>
> Le 29 janv. 2018 06:24, "Jean-Baptiste Onofré" <[email protected]> a écrit :
>
>> Hi Reuven,
>>
>> Thanks for the update ! As I'm working with you on this, I fully agree
>> and great
>> doc gathering the ideas.
>>
>> It's clearly something we have to add asap in Beam, because it would
>> allow new
>> use cases for our users (in a simple way) and open new areas for the
>> runners
>> (for instance dataframe support in the Spark runner).
>>
>> By the way, while ago, I created BEAM-3437 to track the PoC/PR around
>> this.
>>
>> Thanks !
>>
>> Regards
>> JB
>>
>> On 01/29/2018 02:08 AM, Reuven Lax wrote:
>> > Previously I submitted a proposal for adding schemas as a first-class
>> concept on
>> > Beam PCollections. The proposal engendered quite a bit of discussion
>> from the
>> > community - more discussion than I've seen from almost any of our
>> proposals to
>> > date!
>> >
>> > Based on the feedback and comments, I reworked the proposal document
>> quite a
>> > bit. It now talks more explicitly about the different between dynamic
>> schemas
>> > (where the schema is not fully not know at graph-creation time), and
>> static
>> > schemas (which are fully know at graph-creation time). Proposed APIs
>> are more
>> > fleshed out now (again thanks to feedback from community members), and
>> the
>> > document talks in more detail about evolving schemas in long-running
>> streaming
>> > pipelines.
>> >
>> > Please take a look. I think this will be very valuable to Beam, and
>> welcome any
>> > feedback.
>> >
>> > https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruUmQ
>> 12pHGK0QIvXS1FOTgRc/edit#
>> >
>> > Reuven
>>
>> --
>> Jean-Baptiste Onofré
>> [email protected]
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>

Re: Schema-Aware PCollections revisited

Reply via email to