Which json library are you thinking of? At least in Java, there's always been a problem of no good standard Json library.
On Mon, Feb 5, 2018 at 12:03 PM, Romain Manni-Bucau <rmannibu...@gmail.com> wrote: > > > Le 5 févr. 2018 19:54, "Reuven Lax" <re...@google.com> a écrit : > > multiplying by 1.0 doesn't really solve the right problems. The number > type used by Javascript (and by extension, they standard for json) only has > 53 bits of precision. I've seen many, many bugs caused because of this - > the input data may easily contain numbers too large for 53 bits. > > > You have alternative than string at the end whatever schema you use so not > sure it is an issue. At least if runtime is in java or mainstream languages. > > > > In addition, Beam's schema representation must be no less general than > other common representations. For the case of an ETL pipeline, if input > fields are integers the output fields should also be numbers. We shouldn't > turn them into floats because the schema class we used couldn't distinguish > between ints and floats. If anything, Avro schemas are a better fit here as > they are more general. > > > This is what previous definition does. Avro are not better for 2 reasons: > > 1. Their dep stack is a clear blocker and please dont even speak of yet > another uncontrolled shade in the API. Until avro become an api only and > not an impl this is a bad fit for beam. > 2. They must be json friendly so you are back on json + metada so > jsonschema+extension entry is strictly equivalent and as typed > > > > Reuven > > On Sun, Feb 4, 2018 at 9:31 AM, Romain Manni-Bucau <rmannibu...@gmail.com> > wrote: > >> You can handle integers using multipleOf: 1.0 IIRC. >> Yes limitations are still here but it is a good starting model and to be >> honest it is good enough - not a single model will work good enough even if >> you can go a little bit further with other models a bit more complex. >> That said the idea is to enrich the model with a beam object which would >> allow to complete the metadata as required when needed (never?). >> >> >> >> Romain Manni-Bucau >> @rmannibucau <https://twitter.com/rmannibucau> | Blog >> <https://rmannibucau.metawerx.net/> | Old Blog >> <http://rmannibucau.wordpress.com> | Github >> <https://github.com/rmannibucau> | LinkedIn >> <https://www.linkedin.com/in/rmannibucau> | Book >> <https://www.packtpub.com/application-development/java-ee-8-high-performance> >> >> 2018-02-04 18:21 GMT+01:00 Jean-Baptiste Onofré <j...@nanthrax.net>: >> >>> Sorry guys, I was off today. Happy to be part of the party too ;) >>> >>> Regards >>> JB >>> >>> On 02/04/2018 06:19 PM, Reuven Lax wrote: >>> > Romain, since you're interested maybe the two of us should put >>> together a >>> > proposal for how to set this things (hints, schema) on PCollections? I >>> don't >>> > think it'll be hard - the previous list thread on hints already agreed >>> on a >>> > general approach, and we would just need to flesh it out. >>> > >>> > BTW in the past when I looked, Json schemas seemed to have some odd >>> limitations >>> > inherited from Javascript (e.g. no distinction between integer and >>> > floating-point types). Is that still true? >>> > >>> > Reuven >>> > >>> > On Sun, Feb 4, 2018 at 9:12 AM, Romain Manni-Bucau < >>> rmannibu...@gmail.com >>> > <mailto:rmannibu...@gmail.com>> wrote: >>> > >>> > >>> > >>> > 2018-02-04 17:53 GMT+01:00 Reuven Lax <re...@google.com >>> > <mailto:re...@google.com>>: >>> > >>> > >>> > >>> > On Sun, Feb 4, 2018 at 8:42 AM, Romain Manni-Bucau >>> > <rmannibu...@gmail.com <mailto:rmannibu...@gmail.com>> wrote: >>> > >>> > >>> > 2018-02-04 17:37 GMT+01:00 Reuven Lax <re...@google.com >>> > <mailto:re...@google.com>>: >>> > >>> > I'm not sure where proto comes from here. Proto is one >>> example >>> > of a type that has a schema, but only one example. >>> > >>> > 1. In the initial prototype I want to avoid modifying >>> the >>> > PCollection API. So I think it's best to create a >>> special >>> > SchemaCoder, and pass the schema into this coder. >>> Later we might >>> > targeted APIs for this instead of going through a >>> coder. >>> > 1.a I don't see what hints have to do with this? >>> > >>> > >>> > Hints are a way to replace the new API and unify the way >>> to pass >>> > metadata in beam instead of adding a new custom way each >>> time. >>> > >>> > >>> > I don't think schema is a hint. But I hear what your saying - >>> hint is a >>> > type of PCollection metadata as is schema, and we should have >>> a unified >>> > API for setting such metadata. >>> > >>> > >>> > :), Ismael pointed me out earlier this week that "hint" had an old >>> meaning >>> > in beam. My usage is purely the one done in most EE spec (your >>> "metadata" in >>> > previous answer). But guess we are aligned on the meaning now, >>> just wanted >>> > to be sure. >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > 2. BeamSQL already has a generic record type which >>> fits this use >>> > case very well (though we might modify it). However as >>> mentioned >>> > in the doc, the user is never forced to use this >>> generic record >>> > type. >>> > >>> > >>> > Well yes and not. A type already exists but 1. it is very >>> strictly >>> > limited (flat/columns only which is very few of what big >>> data SQL >>> > can do) and 2. it must be aligned on the converge of >>> generic data >>> > the schema will bring (really read "aligned" as "dropped >>> in favor >>> > of" - deprecated being a smooth way to do it). >>> > >>> > >>> > As I said the existing class needs to be modified and >>> extended, and not >>> > just for this schema us was. It was meant to represent Calcite >>> SQL rows, >>> > but doesn't quite even do that yet (Calcite supports nested >>> rows). >>> > However I think it's the right basis to start from. >>> > >>> > >>> > Agree on the state. Current impl issues I hit (additionally to the >>> nested >>> > support which would require by itself a kind of visitor solution) >>> are the >>> > fact to own the schema in the record and handle field by field the >>> > serialization instead of as a whole which is how it would be >>> handled with a >>> > schema IMHO. >>> > >>> > Concretely what I don't want is to do a PoC which works - they all >>> work >>> > right? and integrate to beam without thinking to a global solution >>> for this >>> > generic record issue and its schema standardization. This is where >>> Json(-P) >>> > has a lot of value IMHO but requires a bit more love than just >>> adding schema >>> > in the model. >>> > >>> > >>> > >>> > >>> > >>> > So long story short the main work of this schema track is >>> not only >>> > on using schema in runners and other ways but also >>> starting to make >>> > beam consistent with itself which is probably the most >>> important >>> > outcome since it is the user facing side of this work. >>> > >>> > >>> > >>> > On Sun, Feb 4, 2018 at 12:22 AM, Romain Manni-Bucau >>> > <rmannibu...@gmail.com <mailto:rmannibu...@gmail.com>> >>> wrote: >>> > >>> > @Reuven: is the proto only about passing schema or >>> also the >>> > generic type? >>> > >>> > There are 2.5 topics to solve this issue: >>> > >>> > 1. How to pass schema >>> > 1.a. hints? >>> > 2. What is the generic record type associated to a >>> schema >>> > and how to express a schema relatively to it >>> > >>> > I would be happy to help on 1.a and 2 somehow if >>> you need. >>> > >>> > Le 4 févr. 2018 03:30, "Reuven Lax" < >>> re...@google.com >>> > <mailto:re...@google.com>> a écrit : >>> > >>> > One more thing. If anyone here has experience >>> with >>> > various OSS metadata stores (e.g. Kafka Schema >>> Registry >>> > is one example), would you like to collaborate >>> on >>> > implementation? I want to make sure that >>> source schemas >>> > can be stored in a variety of OSS metadata >>> stores, and >>> > be easily pulled into a Beam pipeline. >>> > >>> > Reuven >>> > >>> > On Sat, Feb 3, 2018 at 6:28 PM, Reuven Lax >>> > <re...@google.com <mailto:re...@google.com>> >>> wrote: >>> > >>> > Hi all, >>> > >>> > If there are no concerns, I would like to >>> start >>> > working on a prototype. It's just a >>> prototype, so I >>> > don't think it will have the final API >>> (e.g. for the >>> > prototype I'm going to avoid change the >>> API of >>> > PCollection, and use a "special" Coder >>> instead). >>> > Also even once we go beyond prototype, it >>> will be >>> > @Experimental for some time, so the API >>> will not be >>> > fixed in stone. >>> > >>> > Any more comments on this approach before >>> we start >>> > implementing a prototype? >>> > >>> > Reuven >>> > >>> > On Wed, Jan 31, 2018 at 1:12 PM, Romain >>> Manni-Bucau >>> > <rmannibu...@gmail.com >>> > <mailto:rmannibu...@gmail.com>> wrote: >>> > >>> > If you need help on the json part I'm >>> happy to >>> > help. To give a few hints on what is >>> very >>> > doable: we can add an avro module to >>> johnzon >>> > (asf json{p,b} impl) to back jsonp by >>> avro >>> > (guess it will be one of the first to >>> be asked) >>> > for instance. >>> > >>> > >>> > Romain Manni-Bucau >>> > @rmannibucau < >>> https://twitter.com/rmannibucau> | >>> > Blog <https://rmannibucau.metawerx. >>> net/> | Old >>> > Blog <http://rmannibucau.wordpress.com> >>> | Github >>> > <https://github.com/rmannibucau> | >>> LinkedIn >>> > <https://www.linkedin.com/in/ >>> rmannibucau> >>> > >>> > 2018-01-31 22:06 GMT+01:00 Reuven Lax >>> > <re...@google.com <mailto: >>> re...@google.com>>: >>> > >>> > Agree. The initial implementation >>> will be a >>> > prototype. >>> > >>> > On Wed, Jan 31, 2018 at 12:21 PM, >>> > Jean-Baptiste Onofré < >>> j...@nanthrax.net >>> > <mailto:j...@nanthrax.net>> wrote: >>> > >>> > Hi Reuven, >>> > >>> > Agree to be able to describe >>> the schema >>> > with different format. The >>> good point >>> > about json schemas is that >>> they are >>> > described by a spec. My point >>> is also to >>> > avoid the reinvent the wheel. >>> Just an >>> > abstract to be able to use >>> Avro, Json, >>> > Calcite, custom schema >>> descriptors would >>> > be great. >>> > >>> > Using coder to describe a >>> schema sounds >>> > like a smart move to implement >>> quickly. >>> > However, it has to be clear in >>> term of >>> > documentation to avoid "side >>> effect". I >>> > still think >>> PCollection.setSchema() is >>> > better: it should be metadata >>> (or hint >>> > ;))) on the PCollection. >>> > >>> > Regards >>> > JB >>> > >>> > On 31/01/2018 20:16, Reuven >>> Lax wrote: >>> > >>> > As to the question of how >>> a schema >>> > should be specified, I >>> want to >>> > support several common >>> schema >>> > formats. So if a user has >>> a Json >>> > schema, or an Avro schema, >>> or a >>> > Calcite schema, etc. there >>> should be >>> > adapters that allow >>> setting a schema >>> > from any of them. I don't >>> think we >>> > should prefer one over the >>> other. >>> > While Romain is right that >>> many >>> > people know Json, I think >>> far fewer >>> > people know Json schemas. >>> > >>> > Agree, schemas should not >>> be >>> > enforced (for one thing, >>> that >>> > wouldn't be backwards >>> compatible!). >>> > I think for the initial >>> prototype I >>> > will probably use a >>> special coder to >>> > represent the schema (with >>> setSchema >>> > an option on the coder), >>> largely >>> > because it doesn't require >>> modifying >>> > PCollection. However I >>> think longer >>> > term a schema should be an >>> optional >>> > piece of metadata on the >>> PCollection >>> > object. Similar to the >>> previous >>> > discussion about "hints," >>> I think >>> > this can be set on the >>> producing >>> > PTransform, and a SetSchema >>> > PTransform will allow >>> attaching a >>> > schema to any PCollection >>> (i.e. >>> > >>> pc.apply(SetSchema.of(schema))). >>> > This part isn't designed >>> yet, but I >>> > think schema should be >>> similar to >>> > hints, it's just another >>> piece of >>> > metadata on the >>> PCollection (though >>> > something interpreted by >>> the model, >>> > where hints are >>> interpreted by the >>> > runner) >>> > >>> > Reuven >>> > >>> > On Tue, Jan 30, 2018 at >>> 1:37 AM, >>> > Jean-Baptiste Onofré >>> > <j...@nanthrax.net >>> > <mailto:j...@nanthrax.net> >>> > <mailto:j...@nanthrax.net >>> > <mailto:j...@nanthrax.net>>> >>> wrote: >>> > >>> > Hi, >>> > >>> > I think we should >>> avoid to mix >>> > two things in the >>> discussion (and so >>> > the document): >>> > >>> > 1. The element of the >>> collection >>> > and the schema itself are >>> two >>> > different things. >>> > By essence, Beam >>> should not >>> > enforce any schema. That's >>> why I think >>> > it's a good >>> > idea to set the schema >>> > optionally on the >>> PCollection >>> > >>> (pcollection.setSchema()). >>> > >>> > 2. From point 1 comes >>> two >>> > questions: how do we >>> represent a >>> > schema ? >>> > How can we >>> > leverage the schema to >>> simplify >>> > the serialization of the >>> element in the >>> > PCollection and query >>> ? These >>> > two questions are not >>> directly related. >>> > >>> > 2.1 How do we >>> represent the schema >>> > Json Schema is a very >>> > interesting idea. It could >>> be an >>> > abstract and >>> > other >>> > providers, like Avro, >>> can be >>> > bind on it. It's part of >>> the json >>> > processing spec >>> > (javax). >>> > >>> > 2.2. How do we >>> leverage the >>> > schema for query and >>> serialization >>> > Also in the spec, json >>> pointer >>> > is interesting for the >>> querying. >>> > Regarding the >>> > serialization, jackson >>> or other >>> > data binder can be used. >>> > >>> > It's still rough ideas >>> in my >>> > mind, but I like Romain's >>> idea about >>> > json-p usage. >>> > >>> > Once 2.3.0 release is >>> out, I >>> > will start to update the >>> document with >>> > those ideas, >>> > and PoC. >>> > >>> > Thanks ! >>> > Regards >>> > JB >>> > >>> > On 01/30/2018 08:42 >>> AM, Romain >>> > Manni-Bucau wrote: >>> > > >>> > > >>> > > Le 30 janv. 2018 >>> 01:09, >>> > "Reuven Lax" < >>> re...@google.com >>> > <mailto:re...@google.com> >>> > <mailto:re...@google.com >>> > <mailto:re...@google.com>> >>> > > <mailto: >>> re...@google.com >>> > <mailto:re...@google.com> >>> > <mailto:re...@google.com >>> > <mailto:re...@google.com>>>> >>> a écrit : >>> > > >>> > > >>> > > >>> > > On Mon, Jan 29, >>> 2018 at >>> > 12:17 PM, Romain >>> Manni-Bucau >>> > <rmannibu...@gmail.com >>> > <mailto: >>> rmannibu...@gmail.com> >>> > <mailto: >>> rmannibu...@gmail.com >>> > <mailto: >>> rmannibu...@gmail.com>> >>> > > >>> > <mailto: >>> rmannibu...@gmail.com >>> > <mailto: >>> rmannibu...@gmail.com> >>> > >>> > <mailto: >>> rmannibu...@gmail.com >>> > <mailto: >>> rmannibu...@gmail.com>>>> wrote: >>> > > >>> > > Hi >>> > > >>> > > I have some >>> questions >>> > on this: how hierarchic >>> schemas >>> > would work? Seems >>> > > it is not >>> really >>> > supported by the ecosystem >>> (out of >>> > custom stuff) :(. >>> > > How would it >>> > integrate smoothly with >>> other >>> > generic record >>> > types - N bridges? >>> > > >>> > > >>> > > Do you mean >>> nested >>> > schemas? What do you mean >>> here? >>> > > >>> > > >>> > > Yes, sorry - wrote >>> the mail >>> > too late ;). Was >>> hierarchic data and >>> > nested schemas. >>> > > >>> > > >>> > > Concretely >>> I wonder >>> > if using json API couldnt >>> be >>> > beneficial: json-p is a >>> > > nice generic >>> > abstraction with a built >>> in querying >>> > mecanism (jsonpointer) >>> > > but no >>> actual >>> > serialization (even if >>> json and >>> > binary json >>> > are very >>> > > natural). >>> The big >>> > advantage is to have a >>> well known >>> > ecosystem - who >>> > > doesnt know >>> json >>> > today? - that beam can >>> reuse for free: >>> > JsonObject >>> > > (guess we >>> dont want >>> > JsonValue abstraction) for >>> the record >>> > type, >>> > > jsonschema >>> standard >>> > for the schema, >>> jsonpointer for the >>> > > >>> delection/projection >>> > etc... It doesnt enforce >>> the actual >>> > serialization >>> > > (json, >>> smile, avro, >>> > ...) but provide an >>> expressive and >>> > alread known API >>> > > so i see it >>> as a big >>> > win-win for users (no need >>> to learn >>> > a new API and >>> > > use N >>> bridges in all >>> > ways) and beam (impls are >>> here and >>> > API design >>> > > already >>> thought). >>> > > >>> > > >>> > > I assume you're >>> talking >>> > about the API for setting >>> schemas, >>> > not using them. >>> > > Json has many >>> downsides >>> > and I'm not sure it's true >>> that >>> > everyone knows it; >>> > > there are also >>> competing >>> > schema APIs, such as Avro >>> etc.. >>> > However I think we >>> > > should give >>> Json a fair >>> > evaluation before >>> dismissing it. >>> > > >>> > > >>> > > It is a wider topic >>> than >>> > schema. Actually schema >>> are not the >>> > first citizen but a >>> > > generic data >>> representation >>> > is. That is where json >>> hits almost >>> > any other API. >>> > > Then, when it comes >>> to >>> > schema, json has a >>> standard for that >>> > so we >>> > are all good. >>> > > >>> > > Also json has a >>> good indexing >>> > API compared to >>> alternatives which >>> > are sometimes a >>> > > bit faster - for >>> noop >>> > transforms - but are >>> hardly usable >>> > or make >>> > the code not >>> > > that readable. >>> > > >>> > > Avro is a nice >>> competitor but >>> > it is compatible - >>> actually avro is >>> > json driven by >>> > > design - but its >>> API is far >>> > to be that easy due to its >>> schema >>> > enforcement which >>> > > is heavvvyyy and >>> worse is you >>> > cant work with avro >>> without a >>> > schema. Json would >>> > > allow to >>> reconciliate the >>> > dynamic and static cases >>> since the job >>> > wouldnt change >>> > > except the >>> setschema. >>> > > >>> > > That is why I think >>> json is a >>> > good compromise and having >>> a >>> > standard API for it >>> > > allow to fully >>> customize the >>> > imol as will if needed - >>> even using >>> > avro or protobuf. >>> > > >>> > > Side note on beam >>> api: i dont >>> > think it is good to use a >>> main API >>> > for runner >>> > > optimization. It >>> enforces >>> > something to be shared on >>> all runners >>> > but not widely >>> > > usable. It is also >>> misleading >>> > for users. Would you set a >>> flink >>> > pipeline option >>> > > with dataflow? My >>> proposal >>> > here is to use hints - >>> properties - >>> > instead of >>> > > something hardly >>> defined in >>> > the API then standardize >>> it if all >>> > runners support it. >>> > > >>> > > >>> > > >>> > > Wdyt? >>> > > >>> > > Le 29 janv. >>> 2018 >>> > 06:24, "Jean-Baptiste >>> Onofré" >>> > <j...@nanthrax.net >>> > <mailto:j...@nanthrax.net> >>> > <mailto:j...@nanthrax.net >>> > <mailto:j...@nanthrax.net>> >>> > > >>> > <mailto:j...@nanthrax.net >>> > <mailto:j...@nanthrax.net> >>> > <mailto:j...@nanthrax.net >>> > <mailto:j...@nanthrax.net>>>> >>> a écrit : >>> > >>> > > >>> > > Hi >>> Reuven, >>> > > >>> > > Thanks >>> for the >>> > update ! As I'm working >>> with you on >>> > this, I fully >>> > > agree >>> and great >>> > > doc >>> gathering the >>> > ideas. >>> > > >>> > > It's >>> clearly >>> > something we have to add >>> asap in Beam, >>> > because it would >>> > > allow >>> new >>> > > use >>> cases for our >>> > users (in a simple way) >>> and open >>> > new areas for the >>> > > runners >>> > > (for >>> instance >>> > dataframe support in the >>> Spark runner). >>> > > >>> > > By the >>> way, while >>> > ago, I created BEAM-3437 >>> to track >>> > the PoC/PR >>> > > around >>> this. >>> > > >>> > > Thanks ! >>> > > >>> > > Regards >>> > > JB >>> > > >>> > > On >>> 01/29/2018 >>> > 02:08 AM, Reuven Lax wrote: >>> > > > >>> Previously I >>> > submitted a proposal for >>> adding >>> > schemas as a >>> > > >>> first-class >>> > concept on >>> > > > Beam >>> > PCollections. The proposal >>> > engendered quite a >>> > bit of >>> > > >>> discussion from the >>> > > > >>> community - >>> > more discussion than I've >>> seen from >>> > almost any of our >>> > > >>> proposals to >>> > > > date! >>> > > > >>> > > > Based >>> on the >>> > feedback and comments, I >>> reworked the >>> > proposal >>> > > >>> document quite a >>> > > > bit. >>> It now >>> > talks more explicitly >>> about the >>> > different between >>> > > dynamic >>> schemas >>> > > > >>> (where the >>> > schema is not fully not >>> know at >>> > graph-creation time), >>> > > and >>> static >>> > > > >>> schemas (which >>> > are fully know at >>> graph-creation >>> > time). Proposed >>> > > APIs >>> are more >>> > > > >>> fleshed out now >>> > (again thanks to feedback >>> from >>> > community members), >>> > > and the >>> > > > >>> document talks >>> > in more detail about >>> evolving schemas in >>> > > >>> long-running >>> > streaming >>> > > > >>> pipelines. >>> > > > >>> > > > >>> Please take a >>> > look. I think this will be >>> very >>> > valuable to Beam, >>> > > and >>> welcome any >>> > > > >>> feedback. >>> > > > >>> > > > >>> > > >>> > >>> > >>> https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruUm >>> Q12pHGK0QIvXS1FOTgRc/edit# >>> > < >>> https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruU >>> mQ12pHGK0QIvXS1FOTgRc/edit#> >>> > >>> > < >>> https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruU >>> mQ12pHGK0QIvXS1FOTgRc/edit# >>> > < >>> https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruU >>> mQ12pHGK0QIvXS1FOTgRc/edit#>> >>> > > >>> > < >>> https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXru >>> UmQ12pHGK0QIvXS1FOTgRc/edit# >>> > < >>> https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruU >>> mQ12pHGK0QIvXS1FOTgRc/edit#> >>> > < >>> https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruU >>> mQ12pHGK0QIvXS1FOTgRc/edit# >>> > < >>> https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruU >>> mQ12pHGK0QIvXS1FOTgRc/edit#>>> >>> > > > >>> > > > Reuven >>> > > >>> > > -- >>> > > >>> Jean-Baptiste Onofré >>> > > jbono...@apache.org >>> > <mailto: >>> jbono...@apache.org> >>> > <mailto: >>> jbono...@apache.org >>> > <mailto: >>> jbono...@apache.org>> >>> > <mailto: >>> jbono...@apache.org >>> > <mailto: >>> jbono...@apache.org> >>> > <mailto: >>> jbono...@apache.org >>> > <mailto: >>> jbono...@apache.org>>> >>> > > >>> http://blog.nanthrax.net >>> > > Talend - >>> > http://www.talend.com >>> > > >>> > > >>> > > >>> > >>> > -- >>> > Jean-Baptiste Onofré >>> > jbono...@apache.org >>> > <mailto: >>> jbono...@apache.org> >>> > <mailto: >>> jbono...@apache.org >>> > <mailto: >>> jbono...@apache.org>> >>> > >>> http://blog.nanthrax.net >>> > Talend - >>> http://www.talend.com >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> >>> -- >>> Jean-Baptiste Onofré >>> jbono...@apache.org >>> http://blog.nanthrax.net >>> Talend - http://www.talend.com >>> >>> >> > >