Re: Schema-Aware PCollections revisited

Reuven Lax Sun, 04 Feb 2018 09:19:58 -0800

Romain, since you're interested maybe the two of us should put together a
proposal for how to set this things (hints, schema) on PCollections? I
don't think it'll be hard - the previous list thread on hints already
agreed on a general approach, and we would just need to flesh it out.


BTW in the past when I looked, Json schemas seemed to have some odd
limitations inherited from Javascript (e.g. no distinction between integer
and floating-point types). Is that still true?

Reuven

On Sun, Feb 4, 2018 at 9:12 AM, Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

>
>
> 2018-02-04 17:53 GMT+01:00 Reuven Lax <re...@google.com>:
>
>>
>>
>> On Sun, Feb 4, 2018 at 8:42 AM, Romain Manni-Bucau <rmannibu...@gmail.com
>> > wrote:
>>
>>>
>>> 2018-02-04 17:37 GMT+01:00 Reuven Lax <re...@google.com>:
>>>
>>>> I'm not sure where proto comes from here. Proto is one example of a
>>>> type that has a schema, but only one example.
>>>>
>>>> 1. In the initial prototype I want to avoid modifying the PCollection
>>>> API. So I think it's best to create a special SchemaCoder, and pass the
>>>> schema into this coder. Later we might targeted APIs for this instead of
>>>> going through a coder.
>>>> 1.a I don't see what hints have to do with this?
>>>>
>>>
>>> Hints are a way to replace the new API and unify the way to pass
>>> metadata in beam instead of adding a new custom way each time.
>>>
>>
>> I don't think schema is a hint. But I hear what your saying - hint is a
>> type of PCollection metadata as is schema, and we should have a unified API
>> for setting such metadata.
>>
>
> :), Ismael pointed me out earlier this week that "hint" had an old meaning
> in beam. My usage is purely the one done in most EE spec (your "metadata"
> in previous answer). But guess we are aligned on the meaning now, just
> wanted to be sure.
>
>
>>
>>
>>>
>>>
>>>>
>>>> 2. BeamSQL already has a generic record type which fits this use case
>>>> very well (though we might modify it). However as mentioned in the doc, the
>>>> user is never forced to use this generic record type.
>>>>
>>>>
>>> Well yes and not. A type already exists but 1. it is very strictly
>>> limited (flat/columns only which is very few of what big data SQL can do)
>>> and 2. it must be aligned on the converge of generic data the schema will
>>> bring (really read "aligned" as "dropped in favor of" - deprecated being a
>>> smooth way to do it).
>>>
>>
>> As I said the existing class needs to be modified and extended, and not
>> just for this schema us was. It was meant to represent Calcite SQL rows,
>> but doesn't quite even do that yet (Calcite supports nested rows). However
>> I think it's the right basis to start from.
>>
>
> Agree on the state. Current impl issues I hit (additionally to the nested
> support which would require by itself a kind of visitor solution) are the
> fact to own the schema in the record and handle field by field the
> serialization instead of as a whole which is how it would be handled with a
> schema IMHO.
>
> Concretely what I don't want is to do a PoC which works - they all work
> right? and integrate to beam without thinking to a global solution for this
> generic record issue and its schema standardization. This is where Json(-P)
> has a lot of value IMHO but requires a bit more love than just adding
> schema in the model.
>
>
>>
>>
>>>
>>> So long story short the main work of this schema track is not only on
>>> using schema in runners and other ways but also starting to make beam
>>> consistent with itself which is probably the most important outcome since
>>> it is the user facing side of this work.
>>>
>>>
>>>>
>>>> On Sun, Feb 4, 2018 at 12:22 AM, Romain Manni-Bucau <
>>>> rmannibu...@gmail.com> wrote:
>>>>
>>>>> @Reuven: is the proto only about passing schema or also the generic
>>>>> type?
>>>>>
>>>>> There are 2.5 topics to solve this issue:
>>>>>
>>>>> 1. How to pass schema
>>>>> 1.a. hints?
>>>>> 2. What is the generic record type associated to a schema and how to
>>>>> express a schema relatively to it
>>>>>
>>>>> I would be happy to help on 1.a and 2 somehow if you need.
>>>>>
>>>>> Le 4 févr. 2018 03:30, "Reuven Lax" <re...@google.com> a écrit :
>>>>>
>>>>>> One more thing. If anyone here has experience with various OSS
>>>>>> metadata stores (e.g. Kafka Schema Registry is one example), would you 
>>>>>> like
>>>>>> to collaborate on implementation? I want to make sure that source schemas
>>>>>> can be stored in a variety of OSS metadata stores, and be easily pulled
>>>>>> into a Beam pipeline.
>>>>>>
>>>>>> Reuven
>>>>>>
>>>>>> On Sat, Feb 3, 2018 at 6:28 PM, Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> If there are no concerns, I would like to start working on a
>>>>>>> prototype. It's just a prototype, so I don't think it will have the 
>>>>>>> final
>>>>>>> API (e.g. for the prototype I'm going to avoid change the API of
>>>>>>> PCollection, and use a "special" Coder instead). Also even once we go
>>>>>>> beyond prototype, it will be @Experimental for some time, so the API 
>>>>>>> will
>>>>>>> not be fixed in stone.
>>>>>>>
>>>>>>> Any more comments on this approach before we start implementing a
>>>>>>> prototype?
>>>>>>>
>>>>>>> Reuven
>>>>>>>
>>>>>>> On Wed, Jan 31, 2018 at 1:12 PM, Romain Manni-Bucau <
>>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>>
>>>>>>>> If you need help on the json part I'm happy to help. To give a few
>>>>>>>> hints on what is very doable: we can add an avro module to johnzon (asf
>>>>>>>> json{p,b} impl) to back jsonp by avro (guess it will be one of the 
>>>>>>>> first to
>>>>>>>> be asked) for instance.
>>>>>>>>
>>>>>>>>
>>>>>>>> Romain Manni-Bucau
>>>>>>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>>>>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>>>>>> <http://rmannibucau.wordpress.com> | Github
>>>>>>>> <https://github.com/rmannibucau> | LinkedIn
>>>>>>>> <https://www.linkedin.com/in/rmannibucau>
>>>>>>>>
>>>>>>>> 2018-01-31 22:06 GMT+01:00 Reuven Lax <re...@google.com>:
>>>>>>>>
>>>>>>>>> Agree. The initial implementation will be a prototype.
>>>>>>>>>
>>>>>>>>> On Wed, Jan 31, 2018 at 12:21 PM, Jean-Baptiste Onofré <
>>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Reuven,
>>>>>>>>>>
>>>>>>>>>> Agree to be able to describe the schema with different format.
>>>>>>>>>> The good point about json schemas is that they are described by a 
>>>>>>>>>> spec. My
>>>>>>>>>> point is also to avoid the reinvent the wheel. Just an abstract to 
>>>>>>>>>> be able
>>>>>>>>>> to use Avro, Json, Calcite, custom schema descriptors would be great.
>>>>>>>>>>
>>>>>>>>>> Using coder to describe a schema sounds like a smart move to
>>>>>>>>>> implement quickly. However, it has to be clear in term of 
>>>>>>>>>> documentation to
>>>>>>>>>> avoid "side effect". I still think PCollection.setSchema() is 
>>>>>>>>>> better: it
>>>>>>>>>> should be metadata (or hint ;))) on the PCollection.
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> JB
>>>>>>>>>>
>>>>>>>>>> On 31/01/2018 20:16, Reuven Lax wrote:
>>>>>>>>>>
>>>>>>>>>>> As to the question of how a schema should be specified, I want
>>>>>>>>>>> to support several common schema formats. So if a user has a Json 
>>>>>>>>>>> schema,
>>>>>>>>>>> or an Avro schema, or a Calcite schema, etc. there should be 
>>>>>>>>>>> adapters that
>>>>>>>>>>> allow setting a schema from any of them. I don't think we should 
>>>>>>>>>>> prefer one
>>>>>>>>>>> over the other. While Romain is right that many people know Json, I 
>>>>>>>>>>> think
>>>>>>>>>>> far fewer people know Json schemas.
>>>>>>>>>>>
>>>>>>>>>>> Agree, schemas should not be enforced (for one thing, that
>>>>>>>>>>> wouldn't be backwards compatible!). I think for the initial 
>>>>>>>>>>> prototype I
>>>>>>>>>>> will probably use a special coder to represent the schema (with 
>>>>>>>>>>> setSchema
>>>>>>>>>>> an option on the coder), largely because it doesn't require 
>>>>>>>>>>> modifying
>>>>>>>>>>> PCollection. However I think longer term a schema should be an 
>>>>>>>>>>> optional
>>>>>>>>>>> piece of metadata on the PCollection object. Similar to the previous
>>>>>>>>>>> discussion about "hints," I think this can be set on the producing
>>>>>>>>>>> PTransform, and a SetSchema PTransform will allow attaching a 
>>>>>>>>>>> schema to any
>>>>>>>>>>> PCollection (i.e. pc.apply(SetSchema.of(schema))). This part
>>>>>>>>>>> isn't designed yet, but I think schema should be similar to hints, 
>>>>>>>>>>> it's
>>>>>>>>>>> just another piece of metadata on the PCollection (though something
>>>>>>>>>>> interpreted by the model, where hints are interpreted by the runner)
>>>>>>>>>>>
>>>>>>>>>>> Reuven
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 30, 2018 at 1:37 AM, Jean-Baptiste Onofré <
>>>>>>>>>>> j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>     Hi,
>>>>>>>>>>>
>>>>>>>>>>>     I think we should avoid to mix two things in the discussion
>>>>>>>>>>> (and so
>>>>>>>>>>>     the document):
>>>>>>>>>>>
>>>>>>>>>>>     1. The element of the collection and the schema itself are
>>>>>>>>>>> two
>>>>>>>>>>>     different things.
>>>>>>>>>>>     By essence, Beam should not enforce any schema. That's why I
>>>>>>>>>>> think
>>>>>>>>>>>     it's a good
>>>>>>>>>>>     idea to set the schema optionally on the PCollection
>>>>>>>>>>>     (pcollection.setSchema()).
>>>>>>>>>>>
>>>>>>>>>>>     2. From point 1 comes two questions: how do we represent a
>>>>>>>>>>> schema ?
>>>>>>>>>>>     How can we
>>>>>>>>>>>     leverage the schema to simplify the serialization of the
>>>>>>>>>>> element in the
>>>>>>>>>>>     PCollection and query ? These two questions are not directly
>>>>>>>>>>> related.
>>>>>>>>>>>
>>>>>>>>>>>       2.1 How do we represent the schema
>>>>>>>>>>>     Json Schema is a very interesting idea. It could be an
>>>>>>>>>>> abstract and
>>>>>>>>>>>     other
>>>>>>>>>>>     providers, like Avro, can be bind on it. It's part of the
>>>>>>>>>>> json
>>>>>>>>>>>     processing spec
>>>>>>>>>>>     (javax).
>>>>>>>>>>>
>>>>>>>>>>>       2.2. How do we leverage the schema for query and
>>>>>>>>>>> serialization
>>>>>>>>>>>     Also in the spec, json pointer is interesting for the
>>>>>>>>>>> querying.
>>>>>>>>>>>     Regarding the
>>>>>>>>>>>     serialization, jackson or other data binder can be used.
>>>>>>>>>>>
>>>>>>>>>>>     It's still rough ideas in my mind, but I like Romain's idea
>>>>>>>>>>> about
>>>>>>>>>>>     json-p usage.
>>>>>>>>>>>
>>>>>>>>>>>     Once 2.3.0 release is out, I will start to update the
>>>>>>>>>>> document with
>>>>>>>>>>>     those ideas,
>>>>>>>>>>>     and PoC.
>>>>>>>>>>>
>>>>>>>>>>>     Thanks !
>>>>>>>>>>>     Regards
>>>>>>>>>>>     JB
>>>>>>>>>>>
>>>>>>>>>>>     On 01/30/2018 08:42 AM, Romain Manni-Bucau wrote:
>>>>>>>>>>>     >
>>>>>>>>>>>     >
>>>>>>>>>>>     > Le 30 janv. 2018 01:09, "Reuven Lax" <re...@google.com
>>>>>>>>>>> <mailto:re...@google.com>
>>>>>>>>>>>      > <mailto:re...@google.com <mailto:re...@google.com>>> a
>>>>>>>>>>> écrit :
>>>>>>>>>>>     >
>>>>>>>>>>>     >
>>>>>>>>>>>     >
>>>>>>>>>>>     >     On Mon, Jan 29, 2018 at 12:17 PM, Romain Manni-Bucau <
>>>>>>>>>>> rmannibu...@gmail.com <mailto:rmannibu...@gmail.com>
>>>>>>>>>>>      >     <mailto:rmannibu...@gmail.com
>>>>>>>>>>>
>>>>>>>>>>>     <mailto:rmannibu...@gmail.com>>> wrote:
>>>>>>>>>>>      >
>>>>>>>>>>>      >         Hi
>>>>>>>>>>>      >
>>>>>>>>>>>      >         I have some questions on this: how hierarchic
>>>>>>>>>>> schemas
>>>>>>>>>>>     would work? Seems
>>>>>>>>>>>      >         it is not really supported by the ecosystem (out
>>>>>>>>>>> of
>>>>>>>>>>>     custom stuff) :(.
>>>>>>>>>>>      >         How would it integrate smoothly with other
>>>>>>>>>>> generic record
>>>>>>>>>>>     types - N bridges?
>>>>>>>>>>>      >
>>>>>>>>>>>      >
>>>>>>>>>>>      >     Do you mean nested schemas? What do you mean here?
>>>>>>>>>>>      >
>>>>>>>>>>>      >
>>>>>>>>>>>      > Yes, sorry - wrote the mail too late ;). Was hierarchic
>>>>>>>>>>> data and
>>>>>>>>>>>     nested schemas.
>>>>>>>>>>>      >
>>>>>>>>>>>      >
>>>>>>>>>>>      >         Concretely I wonder if using json API couldnt be
>>>>>>>>>>>     beneficial: json-p is a
>>>>>>>>>>>      >         nice generic abstraction with a built in querying
>>>>>>>>>>>     mecanism (jsonpointer)
>>>>>>>>>>>      >         but no actual serialization (even if json and
>>>>>>>>>>> binary json
>>>>>>>>>>>     are very
>>>>>>>>>>>      >         natural). The big advantage is to have a well
>>>>>>>>>>> known
>>>>>>>>>>>     ecosystem - who
>>>>>>>>>>>      >         doesnt know json today? - that beam can reuse for
>>>>>>>>>>> free:
>>>>>>>>>>>     JsonObject
>>>>>>>>>>>      >         (guess we dont want JsonValue abstraction) for
>>>>>>>>>>> the record
>>>>>>>>>>>     type,
>>>>>>>>>>>      >         jsonschema standard for the schema, jsonpointer
>>>>>>>>>>> for the
>>>>>>>>>>>      >         delection/projection etc... It doesnt enforce the
>>>>>>>>>>> actual
>>>>>>>>>>>     serialization
>>>>>>>>>>>      >         (json, smile, avro, ...) but provide an
>>>>>>>>>>> expressive and
>>>>>>>>>>>     alread known API
>>>>>>>>>>>      >         so i see it as a big win-win for users (no need
>>>>>>>>>>> to learn
>>>>>>>>>>>     a new API and
>>>>>>>>>>>      >         use N bridges in all ways) and beam (impls are
>>>>>>>>>>> here and
>>>>>>>>>>>     API design
>>>>>>>>>>>      >         already thought).
>>>>>>>>>>>      >
>>>>>>>>>>>      >
>>>>>>>>>>>      >     I assume you're talking about the API for setting
>>>>>>>>>>> schemas,
>>>>>>>>>>>     not using them.
>>>>>>>>>>>      >     Json has many downsides and I'm not sure it's true
>>>>>>>>>>> that
>>>>>>>>>>>     everyone knows it;
>>>>>>>>>>>      >     there are also competing schema APIs, such as Avro
>>>>>>>>>>> etc..
>>>>>>>>>>>     However I think we
>>>>>>>>>>>      >     should give Json a fair evaluation before dismissing
>>>>>>>>>>> it.
>>>>>>>>>>>      >
>>>>>>>>>>>      >
>>>>>>>>>>>      > It is a wider topic than schema. Actually schema are not
>>>>>>>>>>> the
>>>>>>>>>>>     first citizen but a
>>>>>>>>>>>      > generic data representation is. That is where json hits
>>>>>>>>>>> almost
>>>>>>>>>>>     any other API.
>>>>>>>>>>>      > Then, when it comes to schema, json has a standard for
>>>>>>>>>>> that so we
>>>>>>>>>>>     are all good.
>>>>>>>>>>>      >
>>>>>>>>>>>      > Also json has a good indexing API compared to
>>>>>>>>>>> alternatives which
>>>>>>>>>>>     are sometimes a
>>>>>>>>>>>      > bit faster - for noop transforms - but are hardly usable
>>>>>>>>>>> or make
>>>>>>>>>>>     the code not
>>>>>>>>>>>      > that readable.
>>>>>>>>>>>      >
>>>>>>>>>>>      > Avro is a nice competitor but it is compatible - actually
>>>>>>>>>>> avro is
>>>>>>>>>>>     json driven by
>>>>>>>>>>>      > design - but its API is far to be that easy due to its
>>>>>>>>>>> schema
>>>>>>>>>>>     enforcement which
>>>>>>>>>>>      > is heavvvyyy and worse is you cant work with avro without
>>>>>>>>>>> a
>>>>>>>>>>>     schema. Json would
>>>>>>>>>>>      > allow to reconciliate the dynamic and static cases since
>>>>>>>>>>> the job
>>>>>>>>>>>     wouldnt change
>>>>>>>>>>>      > except the setschema.
>>>>>>>>>>>      >
>>>>>>>>>>>      > That is why I think json is a good compromise and having a
>>>>>>>>>>>     standard API for it
>>>>>>>>>>>      > allow to fully customize the imol as will if needed -
>>>>>>>>>>> even using
>>>>>>>>>>>     avro or protobuf.
>>>>>>>>>>>      >
>>>>>>>>>>>      > Side note on beam api: i dont think it is good to use a
>>>>>>>>>>> main API
>>>>>>>>>>>     for runner
>>>>>>>>>>>      > optimization. It enforces something to be shared on all
>>>>>>>>>>> runners
>>>>>>>>>>>     but not widely
>>>>>>>>>>>      > usable. It is also misleading for users. Would you set a
>>>>>>>>>>> flink
>>>>>>>>>>>     pipeline option
>>>>>>>>>>>      > with dataflow? My proposal here is to use hints -
>>>>>>>>>>> properties -
>>>>>>>>>>>     instead of
>>>>>>>>>>>      > something hardly defined in the API then standardize it
>>>>>>>>>>> if all
>>>>>>>>>>>     runners support it.
>>>>>>>>>>>      >
>>>>>>>>>>>      >
>>>>>>>>>>>      >
>>>>>>>>>>>      >         Wdyt?
>>>>>>>>>>>      >
>>>>>>>>>>>      >         Le 29 janv. 2018 06:24, "Jean-Baptiste Onofré"
>>>>>>>>>>>     <j...@nanthrax.net <mailto:j...@nanthrax.net>
>>>>>>>>>>>      >         <mailto:j...@nanthrax.net 
>>>>>>>>>>> <mailto:j...@nanthrax.net>>>
>>>>>>>>>>> a écrit :
>>>>>>>>>>>
>>>>>>>>>>>      >
>>>>>>>>>>>      >             Hi Reuven,
>>>>>>>>>>>      >
>>>>>>>>>>>      >             Thanks for the update ! As I'm working with
>>>>>>>>>>> you on
>>>>>>>>>>>     this, I fully
>>>>>>>>>>>      >             agree and great
>>>>>>>>>>>      >             doc gathering the ideas.
>>>>>>>>>>>      >
>>>>>>>>>>>      >             It's clearly something we have to add asap in
>>>>>>>>>>> Beam,
>>>>>>>>>>>     because it would
>>>>>>>>>>>      >             allow new
>>>>>>>>>>>      >             use cases for our users (in a simple way) and
>>>>>>>>>>> open
>>>>>>>>>>>     new areas for the
>>>>>>>>>>>      >             runners
>>>>>>>>>>>      >             (for instance dataframe support in the Spark
>>>>>>>>>>> runner).
>>>>>>>>>>>      >
>>>>>>>>>>>      >             By the way, while ago, I created BEAM-3437 to
>>>>>>>>>>> track
>>>>>>>>>>>     the PoC/PR
>>>>>>>>>>>      >             around this.
>>>>>>>>>>>      >
>>>>>>>>>>>      >             Thanks !
>>>>>>>>>>>      >
>>>>>>>>>>>      >             Regards
>>>>>>>>>>>      >             JB
>>>>>>>>>>>      >
>>>>>>>>>>>      >             On 01/29/2018 02:08 AM, Reuven Lax wrote:
>>>>>>>>>>>      >             > Previously I submitted a proposal for adding
>>>>>>>>>>>     schemas as a
>>>>>>>>>>>      >             first-class concept on
>>>>>>>>>>>      >             > Beam PCollections. The proposal engendered
>>>>>>>>>>> quite a
>>>>>>>>>>>     bit of
>>>>>>>>>>>      >             discussion from the
>>>>>>>>>>>      >             > community - more discussion than I've seen
>>>>>>>>>>> from
>>>>>>>>>>>     almost any of our
>>>>>>>>>>>      >             proposals to
>>>>>>>>>>>      >             > date!
>>>>>>>>>>>      >             >
>>>>>>>>>>>      >             > Based on the feedback and comments, I
>>>>>>>>>>> reworked the
>>>>>>>>>>>     proposal
>>>>>>>>>>>      >             document quite a
>>>>>>>>>>>      >             > bit. It now talks more explicitly about the
>>>>>>>>>>>     different between
>>>>>>>>>>>      >             dynamic schemas
>>>>>>>>>>>      >             > (where the schema is not fully not know at
>>>>>>>>>>>     graph-creation time),
>>>>>>>>>>>      >             and static
>>>>>>>>>>>      >             > schemas (which are fully know at
>>>>>>>>>>> graph-creation
>>>>>>>>>>>     time). Proposed
>>>>>>>>>>>      >             APIs are more
>>>>>>>>>>>      >             > fleshed out now (again thanks to feedback
>>>>>>>>>>> from
>>>>>>>>>>>     community members),
>>>>>>>>>>>      >             and the
>>>>>>>>>>>      >             > document talks in more detail about
>>>>>>>>>>> evolving schemas in
>>>>>>>>>>>      >             long-running streaming
>>>>>>>>>>>      >             > pipelines.
>>>>>>>>>>>      >             >
>>>>>>>>>>>      >             > Please take a look. I think this will be
>>>>>>>>>>> very
>>>>>>>>>>>     valuable to Beam,
>>>>>>>>>>>      >             and welcome any
>>>>>>>>>>>      >             > feedback.
>>>>>>>>>>>      >             >
>>>>>>>>>>>      >             >
>>>>>>>>>>>      >
>>>>>>>>>>>     https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruUmQ
>>>>>>>>>>> 12pHGK0QIvXS1FOTgRc/edit#
>>>>>>>>>>>     <https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruUm
>>>>>>>>>>> Q12pHGK0QIvXS1FOTgRc/edit#>
>>>>>>>>>>>      >                 <https://docs.google.com/docu
>>>>>>>>>>> ment/d/1tnG2DPHZYbsomvihIpXruUmQ12pHGK0QIvXS1FOTgRc/edit# <
>>>>>>>>>>> https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruUm
>>>>>>>>>>> Q12pHGK0QIvXS1FOTgRc/edit#>>
>>>>>>>>>>>      >             >
>>>>>>>>>>>      >             > Reuven
>>>>>>>>>>>      >
>>>>>>>>>>>      >             --
>>>>>>>>>>>      >             Jean-Baptiste Onofré
>>>>>>>>>>>      > jbono...@apache.org <mailto:jbono...@apache.org>
>>>>>>>>>>>     <mailto:jbono...@apache.org <mailto:jbono...@apache.org>>
>>>>>>>>>>>      > http://blog.nanthrax.net
>>>>>>>>>>>      >             Talend - http://www.talend.com
>>>>>>>>>>>      >
>>>>>>>>>>>      >
>>>>>>>>>>>      >
>>>>>>>>>>>
>>>>>>>>>>>     --
>>>>>>>>>>>     Jean-Baptiste Onofré
>>>>>>>>>>>     jbono...@apache.org <mailto:jbono...@apache.org>
>>>>>>>>>>>     http://blog.nanthrax.net
>>>>>>>>>>>     Talend - http://www.talend.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Schema-Aware PCollections revisited

Reply via email to