I mean: if these runners have some limitation that forces them into only supporting tiny bundles, there's a good chance that this limitation will also apply to whatever beam model API you propose as a fix, and they won't be able to implement it.
On Thu, Nov 30, 2017, 11:19 AM Eugene Kirpichov <kirpic...@google.com> wrote: > So is your main concern potential poor performance on runners that choose > to use a very small bundle size? (Currently an IO can trivially handle too > large bundles, simply by flushing when enough data accumulates, which is > what all IOs do - but indeed working around having unreasonably small > bundles is much harder) > > If so, I think, rather than making a model change, we should understand > why those runners are choosing such a small bundle size, and potentially > fix them. > > On Thu, Nov 30, 2017, 11:01 AM Romain Manni-Bucau <rmannibu...@gmail.com> > wrote: > >> >> >> Le 30 nov. 2017 19:23, "Kenneth Knowles" <k...@google.com> a écrit : >> >> On Thu, Nov 30, 2017 at 10:03 AM, Romain Manni-Bucau < >> rmannibu...@gmail.com> wrote: >> >>> Hmm, >>> >>> ESIO: >>> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L847 >>> JDBCIO: >>> https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L592 >>> MongoIO: >>> https://github.com/apache/beam/blob/master/sdks/java/io/mongodb/src/main/java/org/apache/beam/sdk/io/mongodb/MongoDbIO.java#L657 >>> etc... >>> >>> They all use the same pattern. >>> >> >> This is actually correct - if you have some triggers set up to yield some >> low latency, then things need to actually be flushed here. If the runner >> provides a one-element bundle it could be because data volume has dipped. >> In this case, you pay per element instead of getting good amortization, but >> since data volume is low this is not so bad and anyhow the only way to >> yield the desired latency. >> >> Romain - just to echo some others, did you have a particular combination >> of runner + IO that you wanted to target for improvement? That would focus >> the discussion and we could think about what to change in the runner or IO >> or discover an issue that they cannot solve. >> >> >> I want to ensure EsIO will never do a flush of 1 element on any runner >> without a timertrigger - assuming data volume is continuous and with a size >> > 1. >> >> Really rephrased, my concern is that bundle which is a great >> infra/environment feedback is today owned by beam code which defeats that >> great purpose since beam doesnt use the impl for that. This notion should >> be unified (size + timeout are often the defaults to trigger an "end" and >> would work) accross runners or it should be hidden from the transform >> developers IMHO. >> >> Note the low latency point is not linked to bundle size but, as you >> mentionned, triggers (timeout or event based) which means both worlds can >> work together in harmony and outcome to a valid bundle api (without any >> change, yeah) and exposure to the user. >> >> >> >> Kenn >> >> >> >>> >>> From what you wrote - and technically I agree but in current state my >>> point is valid I think - you should drop bundle from the whole user >>> API and make it all @Internal, no? >>> >>> >>> Romain Manni-Bucau >>> @rmannibucau | Blog | Old Blog | Github | LinkedIn >>> >>> >>> 2017-11-30 18:58 GMT+01:00 Jean-Baptiste Onofré <j...@nanthrax.net>: >>> > Agree, but maybe we can inform the runner if wanted no ? >>> > >>> > Honestly, from my side, I'm fine with the current situation as it's >>> runner >>> > specific. >>> > >>> > Regards >>> > JB >>> > >>> > On 11/30/2017 06:12 PM, Reuven Lax wrote: >>> >> >>> >> I don't think it belongs in PIpelineOptions, as bundle size is always >>> a >>> >> runner thing. >>> >> >>> >> We could consider adding a new generic RunnerOptions, however I'm not >>> >> convinced all runners can actually support this. >>> >> >>> >> On Thu, Nov 30, 2017 at 6:09 AM, Romain Manni-Bucau < >>> rmannibu...@gmail.com >>> >> <mailto:rmannibu...@gmail.com>> wrote: >>> >> >>> >> Guys, >>> >> >>> >> what about moving getMaxBundleSize from flink options to pipeline >>> >> options. I think all runners can support it right? Spark code >>> needs >>> >> the merge of the v2 before being able to be implemented probably >>> but I >>> >> don't see any blocker. >>> >> >>> >> wdyt? >>> >> >>> >> Romain Manni-Bucau >>> >> @rmannibucau | Blog | Old Blog | Github | LinkedIn >>> >> >>> >> >>> >> 2017-11-19 8:19 GMT+01:00 Romain Manni-Bucau < >>> rmannibu...@gmail.com >>> >> <mailto:rmannibu...@gmail.com>>: >>> >> >>> >> > @Eugene: "workaround" as specific to the IO each time and >>> therefore >>> >> > still highlight a lack in the core. >>> >> > >>> >> > Other comments inline >>> >> > >>> >> > >>> >> > 2017-11-19 7:40 GMT+01:00 Robert Bradshaw >>> >> <rober...@google.com.invalid>: >>> >> >> There is a possible fourth issue that we don't handle well: >>> >> efficiency. For >>> >> >> very large bundles, it may be advantageous to avoid replaying >>> a >>> >> bunch of >>> >> >> idempotent operations if there were a way to record what ones >>> >> we're sure >>> >> >> went through. Not sure if that's the issue here (though one >>> could >>> >> possibly >>> >> >> do this with SDFs, one can preemptively returning periodically >>> >> before an >>> >> >> element (or portion thereof) is done). >>> >> > >>> >> > +1, also lead to the IO handling its own chunking/bundles and >>> >> > therefore solves all issues at once. >>> >> > >>> >> >> >>> >> >> On Sat, Nov 18, 2017 at 6:58 PM, Eugene Kirpichov < >>> >> >> kirpic...@google.com.invalid> wrote: >>> >> >> >>> >> >>> I disagree that the usage of document id in ES is a >>> "workaround" >>> >> - it does >>> >> >>> not address any *accidental *complexity >>> >> >>> <https://en.wikipedia.org/wiki/No_Silver_Bullet >>> >> <https://en.wikipedia.org/wiki/No_Silver_Bullet>> coming from >>> >> shortcomings >>> >> >>> of Beam, it addresses the *essential* complexity that a >>> >> distributed system >>> >> >>> forces one to take it as a fact of nature that the same write >>> >> >>> (mutation) will happen multiple times, so if you want a >>> mutation >>> >> to happen >>> >> >>> "as-if" it happened exactly once, the mutation itself must be >>> >> idempotent >>> >> >>> <https://en.wikipedia.org/wiki/Idempotence >>> >> <https://en.wikipedia.org/wiki/Idempotence>>. Insert-with-id >>> (upsert >>> >> >>> <https://en.wikipedia.org/wiki/Merge_(SQL) >>> >> <https://en.wikipedia.org/wiki/Merge_(SQL)>>) is a classic >>> example of >>> >> an >>> >> >>> idempotent mutation, and it's very good that Elasticsearch >>> >> provides it - if >>> >> >>> it didn't, no matter how good of an API Beam had, achieving >>> >> exactly-once >>> >> >>> writes would be theoretically impossible. Are we in >>> agreement on >>> >> this so >>> >> >>> far? >>> >> >>> >>> >> >>> Next: you seem to be discussing 3 issues together, all of >>> which >>> >> are valid >>> >> >>> issues, but they seem unrelated to me: >>> >> >>> 1. Exactly-once mutation >>> >> >>> 2. Batching multiple mutations into one RPC. >>> >> >>> 3. Backpressure >>> >> >>> >>> >> >>> #1: was considered above. The system the IO is talking to >>> has to >>> >> support >>> >> >>> idempotent mutations, in an IO-specific way, and the IO has >>> to >>> >> take >>> >> >>> advantage of them, in the IO-specific way - end of story. >>> >> > >>> >> > Agree but don't forget the original point was about "chunks" >>> and >>> >> not >>> >> > individual records. >>> >> > >>> >> >>> >>> >> >>> #2: a batch of idempotent operations is also idempotent, so >>> this >>> >> doesn't >>> >> >>> add anything new semantically. Syntactically - Beam already >>> >> allows you to >>> >> >>> write your own batching by notifying you of permitted batch >>> >> boundaries >>> >> >>> (Start/FinishBundle). Sure, it could do more, but from my >>> >> experience the >>> >> >>> batching in IOs I've seen is one of the easiest and least >>> >> error-prone >>> >> >>> parts, so I don't see something worth an extended discussion >>> >> here. >>> >> > >>> >> > "Beam already allows you to >>> >> > write your own batching by notifying you of permitted batch >>> >> boundaries >>> >> > (Start/FinishBundle)" >>> >> > >>> >> > Is wrong since the bundle is potentially the whole PCollection >>> >> (spark) >>> >> > so this is not even an option until you use the SDF (back to >>> the >>> >> same >>> >> > point). >>> >> > Once again the API looks fine but no implementation makes it >>> true. >>> >> It >>> >> > would be easy to change it in spark, flink can be ok since it >>> >> targets >>> >> > more the streaming case, not sure of others, any idea? >>> >> > >>> >> > >>> >> >>> >>> >> >>> #3: handling backpressure is a complex problem with multiple >>> >> facets: 1) how >>> >> >>> do you know you're being throttled, and by how much are you >>> >> exceeding the >>> >> >>> external system's capacity? >>> >> > >>> >> > This is the whole point of backpressure, the system sends it >>> back >>> >> to >>> >> > you (header like or status technic in general) >>> >> > >>> >> >>> 2) how do you communicate this signal to the >>> >> >>> runner? >>> >> > >>> >> > You are a client so you get the meta in the response - whatever >>> >> techno. >>> >> > >>> >> >>> 3) what does the runner do in response? >>> >> > >>> >> > Runner nothing but the IO adapts its handling as mentionned >>> before >>> >> > (wait and retry, skip, ... depending the config) >>> >> > >>> >> >>> 4) how do you wait until >>> >> >>> it's ok to try again? >>> >> > >>> >> > This is one point to probably enhance in beam but waiting in >>> the >>> >> > processing is an option if the source has some buffering >>> otherwise >>> >> it >>> >> > requires to have a buffer fallback and max size if the wait >>> mode is >>> >> > activated. >>> >> > >>> >> >>> You seem to be advocating for solving one facet of this >>> problem, >>> >> which is: >>> >> >>> you want it to be possible to signal to the runner "I'm being >>> >> throttled, >>> >> >>> please end the bundle", right? If so - I think this (ending >>> the >>> >> bundle) is >>> >> >>> unnecessary: the DoFn can simply do an exponential back-off >>> sleep >>> >> loop. >>> >> > >>> >> > Agree, never said the runner should know but GBK+output doesnt >>> work >>> >> > cause you dont own the GBK. >>> >> > >>> >> >>> This is e.g. what DatastoreIO does >>> >> >>> <https://github.com/apache/beam/blob/master/sdks/java/io/ >>> >> <https://github.com/apache/beam/blob/master/sdks/java/io/> >>> >> >>> google-cloud-platform/src/main/java/org/apache/beam/sdk/ >>> >> >>> io/gcp/datastore/DatastoreV1.java#L1318> >>> >> >>> and >>> >> >>> this is in general how most systems I've seen handle >>> >> backpressure. Is there >>> >> >>> something I'm missing? In particular, is there any compelling >>> >> reason why >>> >> >>> you think it'd be beneficial e.g. for DatastoreIO to commit >>> the >>> >> results of >>> >> >>> the bundle so far before processing other elements? >>> >> > >>> >> > It was more about ensuring you validate early a subset of the >>> whole >>> >> > bundle and avoid to reprocess it if it fails later. >>> >> > >>> >> > >>> >> > So to summarize I see 2 outcomes: >>> >> > >>> >> > 1. impl SDF in all runners >>> >> > 2. make the bundle size upper bounded - through a pipeline >>> option - >>> >> in >>> >> > all runners, not sure this one is doable everywhere since I >>> mainly >>> >> > checked spark case >>> >> > >>> >> >>> >>> >> >>> Again, it might be that I'm still misunderstanding what >>> you're >>> >> trying to >>> >> >>> say. One of the things it would help to clarify would be - >>> >> exactly what do >>> >> >>> you mean by "how batch frameworks solved that for years": >>> can you >>> >> point at >>> >> >>> an existing API in some other framework that achieves what >>> you >>> >> want? >>> >> >>> >>> >> >>> On Sat, Nov 18, 2017 at 1:02 PM Romain Manni-Bucau >>> >> <rmannibu...@gmail.com <mailto:rmannibu...@gmail.com>> >>> >> >>> >> >>> wrote: >>> >> >>> >>> >> >>> > Eugene, point - and issue with a single sample - is you can >>> >> always find >>> >> >>> > *workarounds* on a case by case basis as the id one with >>> ES but >>> >> beam >>> >> >>> doesnt >>> >> >>> > solve the problem as a framework. >>> >> >>> > >>> >> >>> > From my past, I clearly dont see how batch frameworks >>> solved >>> >> that for >>> >> >>> years >>> >> >>> > and beam is not able to do it - keep in mind it is the same >>> >> kind of >>> >> >>> techno, >>> >> >>> > it just uses different sources and bigger clusters so no >>> real >>> >> reason to >>> >> >>> not >>> >> >>> > have the same feature quality. The only potential reason i >>> see >>> >> is there >>> >> >>> is >>> >> >>> > no tracking of the state into the cluster - e2e. But i >>> dont see >>> >> why there >>> >> >>> > wouldnt be. Do I miss something here? >>> >> >>> > >>> >> >>> > An example could be: take a github crawler computing stats >>> on >>> >> the whole >>> >> >>> > girhub repos which is based on a rest client as example. >>> You >>> >> will need to >>> >> >>> > handle the rate limit and likely want to "commit" each >>> time you >>> >> reach a >>> >> >>> > rate limit with likely some buffering strategy with a max >>> size >>> >> before >>> >> >>> > really waiting. How do you do it with a GBK independent of >>> your >>> >> dofn? You >>> >> >>> > are not able to compose correctly the fn between them :(. >>> >> >>> > >>> >> >>> > >>> >> >>> > Le 18 nov. 2017 20:48, "Eugene Kirpichov" >>> >> <kirpic...@google.com.invalid> >>> >> >>> a >>> >> >>> > écrit : >>> >> >>> > >>> >> >>> > After giving this thread my best attempt at understanding >>> >> exactly what is >>> >> >>> > the problem and the proposed solution, I'm afraid I still >>> fail >>> >> to >>> >> >>> > understand both. To reiterate, I think the only way to make >>> >> progress here >>> >> >>> > is to be more concrete: (quote) take some IO that you think >>> >> could be >>> >> >>> easier >>> >> >>> > to write with your proposed API, give the contents of a >>> >> hypothetical >>> >> >>> > PCollection being written to this IO, give the code of a >>> >> hypothetical >>> >> >>> DoFn >>> >> >>> > implementing the write using your API, and explain what >>> you'd >>> >> expect to >>> >> >>> > happen at runtime. I'm going to re-engage in this thread >>> after >>> >> such an >>> >> >>> > example is given. >>> >> >>> > >>> >> >>> > On Sat, Nov 18, 2017, 5:00 AM Romain Manni-Bucau >>> >> <rmannibu...@gmail.com <mailto:rmannibu...@gmail.com>> >>> >> >>> >> >>> > wrote: >>> >> >>> > >>> >> >>> > > First bundle retry is unusable with dome runners like >>> spark >>> >> where the >>> >> >>> > > bundle size is the collection size / number of work. This >>> >> means a user >>> >> >>> > cant >>> >> >>> > > use bundle API or feature reliably and portably - which >>> is >>> >> beam >>> >> >>> promise. >>> >> >>> > > Aligning chunking and bundles would guarantee that bit >>> can be >>> >> not >>> >> >>> > desired, >>> >> >>> > > that is why i thought it can be another feature. >>> >> >>> > > >>> >> >>> > > GBK works until the IO knows about that and both >>> concepts are >>> >> not >>> >> >>> always >>> >> >>> > > orthogonal - backpressure like systems is a trivial >>> common >>> >> example. >>> >> >>> This >>> >> >>> > > means the IO (dofn) must be able to do it itself at some >>> >> point. >>> >> >>> > > >>> >> >>> > > Also note the GBK works only if the IO can take a list >>> which >>> >> is never >>> >> >>> the >>> >> >>> > > case today. >>> >> >>> > > >>> >> >>> > > Big questions for me are: is SDF the way to go since it >>> >> provides the >>> >> >>> > needed >>> >> >>> > > API bit is not yet supported? What about existing IO? >>> Should >>> >> beam >>> >> >>> provide >>> >> >>> > > an auto wrapping of dofn for that pre-aggregated support >>> and >>> >> simulate >>> >> >>> > > bundles to the actual IO impl to keep the existing API? >>> >> >>> > > >>> >> >>> > > >>> >> >>> > > Le 17 nov. 2017 19:20, "Raghu Angadi" >>> >> <rang...@google.com.invalid> a >>> >> >>> > > écrit : >>> >> >>> > > >>> >> >>> > > On Fri, Nov 17, 2017 at 1:02 AM, Romain Manni-Bucau < >>> >> >>> > rmannibu...@gmail.com <mailto:rmannibu...@gmail.com> >>> >> >>> >> >>> > > > >>> >> >>> > > wrote: >>> >> >>> > > >>> >> >>> > > > Yep, just take ES IO, if a part of a bundle fails you >>> are >>> >> in an >>> >> >>> > > > unmanaged state. This is the case for all O (of IO ;)). >>> >> Issue is not >>> >> >>> > > > much about "1" (the code it takes) but more the fact it >>> >> doesn't >>> >> >>> > > > integrate with runner features and retries potentially: >>> >> what happens >>> >> >>> > > > if a bundle has a failure? => undefined today. 2. I'm >>> fine >>> >> with it >>> >> >>> > > > while we know exactly what happens when we restart >>> after a >>> >> bundle >>> >> >>> > > > failure. With ES the timestamp can be used for >>> instance. >>> >> >>> > > > >>> >> >>> > > >>> >> >>> > > This deterministic batching can be achieved even now >>> with an >>> >> extra >>> >> >>> > > GroupByKey (and if you want ordering on top of that, will >>> >> need another >>> >> >>> > > GBK). Don't know if that is too costly in your case. I >>> would >>> >> need bit >>> >> >>> > more >>> >> >>> > > details on handling ES IO write retries to see it could >>> be >>> >> simplified. >>> >> >>> > Note >>> >> >>> > > that retries occur with or without any failures in your >>> DoFn. >>> >> >>> > > >>> >> >>> > > The biggest negative with GBK approach is that it doesn't >>> >> provide same >>> >> >>> > > guarantees on Flink. >>> >> >>> > > >>> >> >>> > > I don't see how GroubIntoBatches in Beam provides >>> specific >>> >> guarantees >>> >> >>> on >>> >> >>> > > deterministic batches. >>> >> >>> > > >>> >> >>> > > Thinking about it the SDF is really a way to do it since >>> the >>> >> SDF will >>> >> >>> > > > manage the bulking and associated with the runner >>> "retry" >>> >> it seems it >>> >> >>> > > > covers the needs. >>> >> >>> > > > >>> >> >>> > > > Romain Manni-Bucau >>> >> >>> > > > @rmannibucau | Blog | Old Blog | Github | LinkedIn >>> >> >>> > > > >>> >> >>> > > > >>> >> >>> > > > 2017-11-17 9:23 GMT+01:00 Eugene Kirpichov >>> >> >>> > <kirpic...@google.com.invalid >>> >> >>> > > >: >>> >> >>> > > > > I must admit I'm still failing to understand the >>> problem, >>> >> so let's >>> >> >>> > step >>> >> >>> > > > > back even further. >>> >> >>> > > > > >>> >> >>> > > > > Could you give an example of an IO that is currently >>> >> difficult to >>> >> >>> > > > implement >>> >> >>> > > > > specifically because of lack of the feature you're >>> >> talking about? >>> >> >>> > > > > >>> >> >>> > > > > I'm asking because I've reviewed almost all Beam IOs >>> and >>> >> don't >>> >> >>> recall >>> >> >>> > > > > seeing a similar problem. Sure, a lot of IOs do >>> batching >>> >> within a >>> >> >>> > > bundle, >>> >> >>> > > > > but 1) it doesn't take up much code (granted, it >>> would be >>> >> even >>> >> >>> easier >>> >> >>> > > if >>> >> >>> > > > > Beam did it for us) and 2) I don't remember any of >>> them >>> >> requiring >>> >> >>> the >>> >> >>> > > > > batches to be deterministic, and I'm having a hard >>> time >>> >> imagining >>> >> >>> > what >>> >> >>> > > > kind >>> >> >>> > > > > of storage system would be able to deduplicate if >>> batches >>> >> were >>> >> >>> > > > > deterministic but wouldn't be able to deduplicate if >>> they >>> >> weren't. >>> >> >>> > > > > >>> >> >>> > > > > On Thu, Nov 16, 2017 at 11:50 PM Romain Manni-Bucau < >>> >> >>> > > > rmannibu...@gmail.com <mailto:rmannibu...@gmail.com>> >>> >> >>> >> >>> > > > > wrote: >>> >> >>> > > > > >>> >> >>> > > > >> Ok, let me try to step back and summarize what we >>> have >>> >> today and >>> >> >>> > what >>> >> >>> > > I >>> >> >>> > > > >> miss: >>> >> >>> > > > >> >>> >> >>> > > > >> 1. we can handle chunking in beam through group in >>> batch >>> >> (or >>> >> >>> > > equivalent) >>> >> >>> > > > >> but: >>> >> >>> > > > >> > it is not built-in into the transforms (IO) >>> and it >>> >> is >>> >> >>> > controlled >>> >> >>> > > > >> from outside the transforms so no way for a >>> transform to >>> >> do it >>> >> >>> > > > >> properly without handling itself a composition and >>> links >>> >> between >>> >> >>> > > > >> multiple dofns to have notifications and potentially >>> >> react >>> >> >>> properly >>> >> >>> > or >>> >> >>> > > > >> handle backpressure from its backend >>> >> >>> > > > >> 2. there is no restart feature because there is no >>> real >>> >> state >>> >> >>> > handling >>> >> >>> > > > >> at the moment. this sounds fully delegated to the >>> runner >>> >> but I was >>> >> >>> > > > >> hoping to have more guarantees from the used API to >>> be >>> >> able to >>> >> >>> > restart >>> >> >>> > > > >> a pipeline (mainly batch since it can be irrelevant >>> or >>> >> delegates >>> >> >>> to >>> >> >>> > > > >> the backend for streams) and handle only not >>> commited >>> >> records so >>> >> >>> it >>> >> >>> > > > >> requires some persistence outside the main IO >>> storages >>> >> to do it >>> >> >>> > > > >> properly >>> >> >>> > > > >> > note this is somehow similar to the monitoring >>> >> topic which >>> >> >>> miss >>> >> >>> > > > >> persistence ATM so it can end up to beam to have a >>> >> pluggable >>> >> >>> storage >>> >> >>> > > > >> for a few concerns >>> >> >>> > > > >> >>> >> >>> > > > >> >>> >> >>> > > > >> Short term I would be happy with 1 solved properly, >>> long >>> >> term I >>> >> >>> hope >>> >> >>> > 2 >>> >> >>> > > > >> will be tackled without workarounds requiring custom >>> >> wrapping of >>> >> >>> IO >>> >> >>> > to >>> >> >>> > > > >> use a custom state persistence. >>> >> >>> > > > >> >>> >> >>> > > > >> >>> >> >>> > > > >> >>> >> >>> > > > >> Romain Manni-Bucau >>> >> >>> > > > >> @rmannibucau | Blog | Old Blog | Github | LinkedIn >>> >> >>> > > > >> >>> >> >>> > > > >> >>> >> >>> > > > >> 2017-11-17 7:44 GMT+01:00 Jean-Baptiste Onofré >>> >> <j...@nanthrax.net <mailto:j...@nanthrax.net>>: >>> >> >>> >> >>> > > > >> > Thanks for the explanation. Agree, we might talk >>> about >>> >> different >>> >> >>> > > > things >>> >> >>> > > > >> > using the same wording. >>> >> >>> > > > >> > >>> >> >>> > > > >> > I'm also struggling to understand the use case >>> (for a >>> >> generic >>> >> >>> > DoFn). >>> >> >>> > > > >> > >>> >> >>> > > > >> > Regards >>> >> >>> > > > >> > JB >>> >> >>> > > > >> > >>> >> >>> > > > >> > >>> >> >>> > > > >> > On 11/17/2017 07:40 AM, Eugene Kirpichov wrote: >>> >> >>> > > > >> >> >>> >> >>> > > > >> >> To avoid spending a lot of time pursuing a false >>> >> path, I'd like >>> >> >>> > to >>> >> >>> > > > say >>> >> >>> > > > >> >> straight up that SDF is definitely not going to >>> help >>> >> here, >>> >> >>> > despite >>> >> >>> > > > the >>> >> >>> > > > >> >> fact >>> >> >>> > > > >> >> that its API includes the term "checkpoint". In >>> SDF, >>> >> the >>> >> >>> > > "checkpoint" >>> >> >>> > > > >> >> captures the state of processing within a single >>> >> element. If >>> >> >>> > you're >>> >> >>> > > > >> >> applying an SDF to 1000 elements, it will, like >>> any >>> >> other DoFn, >>> >> >>> > be >>> >> >>> > > > >> applied >>> >> >>> > > > >> >> to each of them independently and in parallel, >>> and >>> >> you'll have >>> >> >>> > 1000 >>> >> >>> > > > >> >> checkpoints capturing the state of processing >>> each of >>> >> these >>> >> >>> > > elements, >>> >> >>> > > > >> >> which >>> >> >>> > > > >> >> is probably not what you want. >>> >> >>> > > > >> >> >>> >> >>> > > > >> >> I'm afraid I still don't understand what kind of >>> >> checkpoint you >>> >> >>> > > > need, if >>> >> >>> > > > >> >> it >>> >> >>> > > > >> >> is not just deterministic grouping into batches. >>> >> "Checkpoint" >>> >> >>> is >>> >> >>> > a >>> >> >>> > > > very >>> >> >>> > > > >> >> broad term and it's very possible that everybody >>> in >>> >> this thread >>> >> >>> > is >>> >> >>> > > > >> talking >>> >> >>> > > > >> >> about different things when saying it. So it >>> would >>> >> help if you >>> >> >>> > > could >>> >> >>> > > > >> give >>> >> >>> > > > >> >> a >>> >> >>> > > > >> >> more concrete example: for example, take some IO >>> that >>> >> you think >>> >> >>> > > > could be >>> >> >>> > > > >> >> easier to write with your proposed API, give the >>> >> contents of a >>> >> >>> > > > >> >> hypothetical >>> >> >>> > > > >> >> PCollection being written to this IO, give the >>> code >>> >> of a >>> >> >>> > > hypothetical >>> >> >>> > > > >> DoFn >>> >> >>> > > > >> >> implementing the write using your API, and >>> explain >>> >> what you'd >>> >> >>> > > expect >>> >> >>> > > > to >>> >> >>> > > > >> >> happen at runtime. >>> >> >>> > > > >> >> >>> >> >>> > > > >> >> On Thu, Nov 16, 2017 at 10:33 PM Romain >>> Manni-Bucau >>> >> >>> > > > >> >> <rmannibu...@gmail.com >>> >> <mailto:rmannibu...@gmail.com>> >>> >> >>> >> >>> > > > >> >> wrote: >>> >> >>> > > > >> >> >>> >> >>> > > > >> >>> @Eugene: yes and the other alternative of >>> Reuven too >>> >> but it is >>> >> >>> > > still >>> >> >>> > > > >> >>> 1. relying on timers, 2. not really checkpointed >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> In other words it seems all solutions are to >>> create >>> >> a chunk of >>> >> >>> > > size >>> >> >>> > > > 1 >>> >> >>> > > > >> >>> and replayable to fake the lack of chunking in >>> the >>> >> framework. >>> >> >>> > This >>> >> >>> > > > >> >>> always implies a chunk handling outside the >>> >> component >>> >> >>> (typically >>> >> >>> > > > >> >>> before for an output). My point is I think IO >>> need >>> >> it in their >>> >> >>> > own >>> >> >>> > > > >> >>> "internal" or at least control it themselves >>> since >>> >> the chunk >>> >> >>> > size >>> >> >>> > > is >>> >> >>> > > > >> >>> part of the IO handling most of the time. >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> I think JB spoke of the same "group before" >>> trick >>> >> using >>> >> >>> > > restrictions >>> >> >>> > > > >> >>> which can work I have to admit if SDF are >>> >> implemented by >>> >> >>> > runners. >>> >> >>> > > Is >>> >> >>> > > > >> >>> there a roadmap/status on that? Last time I >>> checked >>> >> SDF was a >>> >> >>> > > great >>> >> >>> > > > >> >>> API without support :(. >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> Romain Manni-Bucau >>> >> >>> > > > >> >>> @rmannibucau | Blog | Old Blog | Github | >>> LinkedIn >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> 2017-11-17 7:25 GMT+01:00 Eugene Kirpichov >>> >> >>> > > > >> >>> <kirpic...@google.com.invalid>: >>> >> >>> > > > >> >>>> >>> >> >>> > > > >> >>>> JB, not sure what you mean? SDFs and triggers >>> are >>> >> unrelated, >>> >> >>> > and >>> >> >>> > > > the >>> >> >>> > > > >> >>>> post >>> >> >>> > > > >> >>>> doesn't mention the word. Did you mean >>> something >>> >> else, e.g. >>> >> >>> > > > >> restriction >>> >> >>> > > > >> >>>> perhaps? Either way I don't think SDFs are the >>> >> solution here; >>> >> >>> > > SDFs >>> >> >>> > > > >> have >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> to >>> >> >>> > > > >> >>>> >>> >> >>> > > > >> >>>> do with the ability to split the processing of >>> *a >>> >> single >>> >> >>> > element* >>> >> >>> > > > over >>> >> >>> > > > >> >>>> multiple calls, whereas Romain I think is >>> asking >>> >> for >>> >> >>> repeatable >>> >> >>> > > > >> grouping >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> of >>> >> >>> > > > >> >>>> >>> >> >>> > > > >> >>>> *multiple* elements. >>> >> >>> > > > >> >>>> >>> >> >>> > > > >> >>>> Romain - does >>> >> >>> > > > >> >>>> >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> https://github.com/apache/beam/blob/master/sdks/java/ >>> >> <https://github.com/apache/beam/blob/master/sdks/java/> >>> >> >>> > > > core/src/main/java/org/apache/beam/sdk/transforms/ >>> >> >>> GroupIntoBatches.java >>> >> >>> > > > >> >>>> >>> >> >>> > > > >> >>>> do what >>> >> >>> > > > >> >>>> you want? >>> >> >>> > > > >> >>>> >>> >> >>> > > > >> >>>> On Thu, Nov 16, 2017 at 10:19 PM Jean-Baptiste >>> >> Onofré < >>> >> >>> > > > >> j...@nanthrax.net <mailto:j...@nanthrax.net>> >>> >> >>> > > > >> >>>> wrote: >>> >> >>> > > > >> >>>> >>> >> >>> > > > >> >>>>> It sounds like the "Trigger" in the Splittable >>> >> DoFn, no ? >>> >> >>> > > > >> >>>>> >>> >> >>> > > > >> >>>>> >>> >> https://beam.apache.org/blog/2017/08/16/splittable-do-fn >>> >> <https://beam.apache.org/blog/2017/08/16/splittable-do-fn>. >>> >> >>> >> >>> html >>> >> >>> > > > >> >>>>> >>> >> >>> > > > >> >>>>> Regards >>> >> >>> > > > >> >>>>> JB >>> >> >>> > > > >> >>>>> >>> >> >>> > > > >> >>>>> >>> >> >>> > > > >> >>>>> On 11/17/2017 06:56 AM, Romain Manni-Bucau >>> wrote: >>> >> >>> > > > >> >>>>>> >>> >> >>> > > > >> >>>>>> it gives the fn/transform the ability to >>> save a >>> >> state - it >>> >> >>> > can >>> >> >>> > > > get >>> >> >>> > > > >> >>>>>> back on "restart" / whatever unit we can use, >>> >> probably >>> >> >>> runner >>> >> >>> > > > >> >>>>>> dependent? Without that you need to rewrite >>> all >>> >> IO usage >>> >> >>> with >>> >> >>> > > > >> >>>>>> something like the previous pattern which >>> makes >>> >> the IO not >>> >> >>> > self >>> >> >>> > > > >> >>>>>> sufficient and kind of makes the entry cost >>> and >>> >> usage of >>> >> >>> beam >>> >> >>> > > way >>> >> >>> > > > >> >>>>>> further. >>> >> >>> > > > >> >>>>>> >>> >> >>> > > > >> >>>>>> In my mind it is exactly what >>> jbatch/spring-batch >>> >> uses but >>> >> >>> > > > adapted >>> >> >>> > > > >> to >>> >> >>> > > > >> >>>>>> beam (stream in particular) case. >>> >> >>> > > > >> >>>>>> >>> >> >>> > > > >> >>>>>> Romain Manni-Bucau >>> >> >>> > > > >> >>>>>> @rmannibucau | Blog | Old Blog | Github | >>> >> LinkedIn >>> >> >>> > > > >> >>>>>> >>> >> >>> > > > >> >>>>>> >>> >> >>> > > > >> >>>>>> 2017-11-17 6:49 GMT+01:00 Reuven Lax >>> >> >>> > <re...@google.com.invalid >>> >> >>> > > >: >>> >> >>> > > > >> >>>>>>> >>> >> >>> > > > >> >>>>>>> Romain, >>> >> >>> > > > >> >>>>>>> >>> >> >>> > > > >> >>>>>>> Can you define what you mean by checkpoint? >>> What >>> >> are the >>> >> >>> > > > semantics, >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> what >>> >> >>> > > > >> >>>>>>> >>> >> >>> > > > >> >>>>>>> does it accomplish? >>> >> >>> > > > >> >>>>>>> >>> >> >>> > > > >> >>>>>>> Reuven >>> >> >>> > > > >> >>>>>>> >>> >> >>> > > > >> >>>>>>> On Fri, Nov 17, 2017 at 1:40 PM, Romain >>> >> Manni-Bucau < >>> >> >>> > > > >> >>>>> >>> >> >>> > > > >> >>>>> rmannibu...@gmail.com >>> >> <mailto:rmannibu...@gmail.com>> >>> >> >>> >> >>> > > > >> >>>>>>> >>> >> >>> > > > >> >>>>>>> wrote: >>> >> >>> > > > >> >>>>>>> >>> >> >>> > > > >> >>>>>>>> Yes, what I propose earlier was: >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> I. checkpoint marker: >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> @AnyBeamAnnotation >>> >> >>> > > > >> >>>>>>>> @CheckpointAfter >>> >> >>> > > > >> >>>>>>>> public void someHook(SomeContext ctx); >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> II. pipeline.apply(ParDo.of(new >>> >> >>> > > > >> MyFn()).withCheckpointAlgorithm(new >>> >> >>> > > > >> >>>>>>>> CountingAlgo())) >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> III. (I like this one less) >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> // in the dofn >>> >> >>> > > > >> >>>>>>>> @CheckpointTester >>> >> >>> > > > >> >>>>>>>> public boolean shouldCheckpoint(); >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> IV. @Checkpointer Serializable >>> getCheckpoint(); >>> >> in the >>> >> >>> dofn >>> >> >>> > > per >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> element >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> Romain Manni-Bucau >>> >> >>> > > > >> >>>>>>>> @rmannibucau | Blog | Old Blog | Github | >>> >> LinkedIn >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> 2017-11-17 6:06 GMT+01:00 Raghu Angadi >>> >> >>> > > > <rang...@google.com.invalid >>> >> >>> > > > >> >>>> >>> >> >>> > > > >> >>>> : >>> >> >>> > > > >> >>>>>>>>> >>> >> >>> > > > >> >>>>>>>>> How would you define it (rough API is >>> fine)?. >>> >> Without >>> >> >>> more >>> >> >>> > > > >> details, >>> >> >>> > > > >> >>>>> >>> >> >>> > > > >> >>>>> it is >>> >> >>> > > > >> >>>>>>>>> >>> >> >>> > > > >> >>>>>>>>> not easy to see wider applicability and >>> >> feasibility in >>> >> >>> > > > runners. >>> >> >>> > > > >> >>>>>>>>> >>> >> >>> > > > >> >>>>>>>>> On Thu, Nov 16, 2017 at 1:13 PM, Romain >>> >> Manni-Bucau < >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> rmannibu...@gmail.com >>> >> <mailto:rmannibu...@gmail.com>> >>> >> >>> >> >>> > > > >> >>>>>>>>> >>> >> >>> > > > >> >>>>>>>>> wrote: >>> >> >>> > > > >> >>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>> This is a fair summary of the current >>> state >>> >> but also >>> >> >>> > where >>> >> >>> > > > beam >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> can >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> have a >>> >> >>> > > > >> >>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>> very strong added value and make big data >>> >> great and >>> >> >>> > smooth. >>> >> >>> > > > >> >>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>> Instead of this replay feature isnt >>> >> checkpointing >>> >> >>> > willable? >>> >> >>> > > > In >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> particular >>> >> >>> > > > >> >>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>> with SDF no? >>> >> >>> > > > >> >>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>> Le 16 nov. 2017 19:50, "Raghu Angadi" >>> >> >>> > > > >> <rang...@google.com.invalid> >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> a >>> >> >>> > > > >> >>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>> écrit : >>> >> >>> > > > >> >>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>> Core issue here is that there is no >>> explicit >>> >> concept >>> >> >>> of >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> 'checkpoint' >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> in >>> >> >>> > > > >> >>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>> Beam (UnboundedSource has a method >>> >> 'getCheckpointMark' >>> >> >>> > but >>> >> >>> > > > that >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> refers to >>> >> >>> > > > >> >>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>> the checkoint on external source). >>> Runners >>> >> do >>> >> >>> checkpoint >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> internally >>> >> >>> > > > >> >>>>> >>> >> >>> > > > >> >>>>> as >>> >> >>> > > > >> >>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>> implementation detail. Flink's >>> checkpoint >>> >> model is >>> >> >>> > > entirely >>> >> >>> > > > >> >>>>> >>> >> >>> > > > >> >>>>> different >>> >> >>> > > > >> >>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>> from >>> >> >>> > > > >> >>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>> Dataflow's and Spark's. >>> >> >>> > > > >> >>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>> @StableReplay helps, but it does not >>> >> explicitly talk >>> >> >>> > about >>> >> >>> > > a >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> checkpoint >>> >> >>> > > > >> >>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>> by >>> >> >>> > > > >> >>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>> design. >>> >> >>> > > > >> >>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>> If you are looking to achieve some >>> >> guarantees with a >>> >> >>> > > > >> sink/DoFn, I >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> think >>> >> >>> > > > >> >>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>> it >>> >> >>> > > > >> >>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>> is better to start with the >>> requirements. I >>> >> worked on >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> exactly-once >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> sink >>> >> >>> > > > >> >>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>> for >>> >> >>> > > > >> >>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>> Kafka (see KafkaIO.write().withEOS()), >>> where >>> >> we >>> >> >>> > > essentially >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> reshard >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> the >>> >> >>> > > > >> >>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>> elements and assign sequence numbers to >>> >> elements with >>> >> >>> in >>> >> >>> > > > each >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> shard. >>> >> >>> > > > >> >>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>> Duplicates in replays are avoided based >>> on >>> >> these >>> >> >>> > sequence >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> numbers. >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> DoFn >>> >> >>> > > > >> >>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>> state API is used to buffer out-of order >>> >> replays. The >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> implementation >>> >> >>> > > > >> >>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>> strategy works in Dataflow but not in >>> Flink >>> >> which has >>> >> >>> a >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> horizontal >>> >> >>> > > > >> >>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>> checkpoint. KafkaIO checks for >>> >> compatibility. >>> >> >>> > > > >> >>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>> On Wed, Nov 15, 2017 at 12:38 AM, Romain >>> >> Manni-Bucau < >>> >> >>> > > > >> >>>>>>>>>>> rmannibu...@gmail.com >>> >> <mailto:rmannibu...@gmail.com>> >>> >> >>> >> >>> > > > >> >>>>>>>>>>> wrote: >>> >> >>> > > > >> >>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>>> Hi guys, >>> >> >>> > > > >> >>>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>>> The subject is a bit provocative but >>> the >>> >> topic is >>> >> >>> real >>> >> >>> > > and >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> coming >>> >> >>> > > > >> >>>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>>> again and again with the beam usage: >>> how a >>> >> dofn can >>> >> >>> > > handle >>> >> >>> > > > >> some >>> >> >>> > > > >> >>>>>>>>>>>> "chunking". >>> >> >>> > > > >> >>>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>>> The need is to be able to commit each N >>> >> records but >>> >> >>> > with >>> >> >>> > > N >>> >> >>> > > > not >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> too >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> big. >>> >> >>> > > > >> >>>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>>> The natural API for that in beam is the >>> >> bundle one >>> >> >>> but >>> >> >>> > > > bundles >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> are >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> not >>> >> >>> > > > >> >>>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>>> reliable since they can be very small >>> >> (flink) - we >>> >> >>> can >>> >> >>> > > say >>> >> >>> > > > it >>> >> >>> > > > >> is >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> "ok" >>> >> >>> > > > >> >>>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>>> even if it has some perf impacts - or >>> too >>> >> big (spark >>> >> >>> > does >>> >> >>> > > > full >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> size >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> / >>> >> >>> > > > >> >>>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>>> #workers). >>> >> >>> > > > >> >>>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>>> The workaround is what we see in the ES >>> >> I/O: a >>> >> >>> maxSize >>> >> >>> > > > which >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> does >>> >> >>> > > > >> >>>>> >>> >> >>> > > > >> >>>>> an >>> >> >>> > > > >> >>>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>>> eager flush. The issue is that then the >>> >> checkpoint is >>> >> >>> > not >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >>> respected >>> >> >>> > > > >> >>>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>>> and you can process multiple times the >>> same >>> >> records. >>> >> >>> > > > >> >>>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>>> Any plan to make this API reliable and >>> >> controllable >>> >> >>> > from >>> >> >>> > > a >>> >> >>> > > > >> beam >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>>>>> point >>> >> >>> > > > >> >>>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>>> of view (at least in a max manner)? >>> >> >>> > > > >> >>>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>>> Thanks, >>> >> >>> > > > >> >>>>>>>>>>>> Romain Manni-Bucau >>> >> >>> > > > >> >>>>>>>>>>>> @rmannibucau | Blog | Old Blog | >>> Github | >>> >> LinkedIn >>> >> >>> > > > >> >>>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>>> >>> >> >>> > > > >> >>>>>>>>>> >>> >> >>> > > > >> >>>>>>>> >>> >> >>> > > > >> >>>>> >>> >> >>> > > > >> >>>>> -- >>> >> >>> > > > >> >>>>> Jean-Baptiste Onofré >>> >> >>> > > > >> >>>>> jbono...@apache.org <mailto: >>> jbono...@apache.org> >>> >> >>> > > > >> >>>>> http://blog.nanthrax.net >>> >> >>> > > > >> >>>>> Talend - http://www.talend.com >>> >> >>> > > > >> >>>>> >>> >> >>> > > > >> >>> >>> >> >>> > > > >> >> >>> >> >>> > > > >> > >>> >> >>> > > > >> > -- >>> >> >>> > > > >> > Jean-Baptiste Onofré >>> >> >>> > > > >> > jbono...@apache.org <mailto:jbono...@apache.org> >>> >> >>> > > > >> > http://blog.nanthrax.net >>> >> >>> > > > >> > Talend - http://www.talend.com >>> >> >>> > > > >> >>> >> >>> > > > >>> >> >>> > > >>> >> >>> > >>> >> >>> >>> >> >>> >> >>> > >>> > -- >>> > Jean-Baptiste Onofré >>> > jbono...@apache.org >>> > http://blog.nanthrax.net >>> > Talend - http://www.talend.com >>> >>> >> >>