On Mon, Jan 29, 2018 at 12:17 PM, Romain Manni-Bucau <rmannibu...@gmail.com> wrote:
> Hi > > I have some questions on this: how hierarchic schemas would work? Seems it > is not really supported by the ecosystem (out of custom stuff) :(. How > would it integrate smoothly with other generic record types - N bridges? > Do you mean nested schemas? What do you mean here? > > Concretely I wonder if using json API couldnt be beneficial: json-p is a > nice generic abstraction with a built in querying mecanism (jsonpointer) > but no actual serialization (even if json and binary json are very > natural). The big advantage is to have a well known ecosystem - who doesnt > know json today? - that beam can reuse for free: JsonObject (guess we dont > want JsonValue abstraction) for the record type, jsonschema standard for > the schema, jsonpointer for the delection/projection etc... It doesnt > enforce the actual serialization (json, smile, avro, ...) but provide an > expressive and alread known API so i see it as a big win-win for users (no > need to learn a new API and use N bridges in all ways) and beam (impls are > here and API design already thought). > I assume you're talking about the API for setting schemas, not using them. Json has many downsides and I'm not sure it's true that everyone knows it; there are also competing schema APIs, such as Avro etc.. However I think we should give Json a fair evaluation before dismissing it. > > Wdyt? > > Le 29 janv. 2018 06:24, "Jean-Baptiste Onofré" <j...@nanthrax.net> a écrit : > >> Hi Reuven, >> >> Thanks for the update ! As I'm working with you on this, I fully agree >> and great >> doc gathering the ideas. >> >> It's clearly something we have to add asap in Beam, because it would >> allow new >> use cases for our users (in a simple way) and open new areas for the >> runners >> (for instance dataframe support in the Spark runner). >> >> By the way, while ago, I created BEAM-3437 to track the PoC/PR around >> this. >> >> Thanks ! >> >> Regards >> JB >> >> On 01/29/2018 02:08 AM, Reuven Lax wrote: >> > Previously I submitted a proposal for adding schemas as a first-class >> concept on >> > Beam PCollections. The proposal engendered quite a bit of discussion >> from the >> > community - more discussion than I've seen from almost any of our >> proposals to >> > date! >> > >> > Based on the feedback and comments, I reworked the proposal document >> quite a >> > bit. It now talks more explicitly about the different between dynamic >> schemas >> > (where the schema is not fully not know at graph-creation time), and >> static >> > schemas (which are fully know at graph-creation time). Proposed APIs >> are more >> > fleshed out now (again thanks to feedback from community members), and >> the >> > document talks in more detail about evolving schemas in long-running >> streaming >> > pipelines. >> > >> > Please take a look. I think this will be very valuable to Beam, and >> welcome any >> > feedback. >> > >> > https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruUmQ >> 12pHGK0QIvXS1FOTgRc/edit# >> > >> > Reuven >> >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> >