Re: @TearDown guarantees

Romain Manni-Bucau Mon, 19 Feb 2018 15:31:10 -0800

+1, thks Eugene


Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/application-development/java-ee-8-high-performance>

2018-02-20 0:24 GMT+01:00 Eugene Kirpichov <kirpic...@google.com>:

> I've sent out a PR editing the Javadoc https://github.com/
> apache/beam/pull/4711 . Hopefully, that should be sufficient.
>
> On Mon, Feb 19, 2018 at 3:20 PM Reuven Lax <re...@google.com> wrote:
>
>> Ismael, your understanding is appropriate for FinishBundle.
>>
>> One basic issue with this understanding, is that the lifecycle of a DoFn
>> is much longer than a single bundle (which I think you expressed by adding
>> the *s). How long the DoFn lives is not defined. In fact a runner is
>> completely free to decide that it will _never_ destroy the DoFn, in which
>> case TearDown is never called simply because the DoFn was never torn down.
>>
>> Also, as mentioned before, the runner can only call TearDown in cases
>> where the shutdown is in its control. If the JVM is shut down externally,
>> the runner has no chance to call TearDown. This means that while TearDown
>> is appropriate for cleaning up in-process resources (open connections,
>> etc.), it's not the right answer for cleaning up persistent resources. If
>> you rely on TearDown to delete VMs or delete files, there will be cases in
>> which those files of VMs are not deleted.
>>
>> What we are _not_ saying is that the runner is free to just ignore
>> TearDown. If the runner is explicitly destroying a DoFn object, it should
>> call TearDown.
>>
>> Reuven
>>
>>
>> On Mon, Feb 19, 2018 at 2:35 PM, Ismaël Mejía <ieme...@gmail.com> wrote:
>>
>>> I also had a different understanding of the lifecycle of a DoFn.
>>>
>>> My understanding of the use case for every method in the DoFn was clear
>>> and
>>> perfectly aligned with Thomas explanation, but what I understood was
>>> that in a
>>> general terms ‘@Setup was where I got resources/prepare connections and
>>> @Teardown where I free them’, so calling Teardown seemed essential to
>>> have a
>>> complete lifecycle:
>>> Setup → StartBundle* → ProcessElement* → FinishBundle* → Teardown
>>>
>>> The fact that @Teardown could not be called is a new detail for me too,
>>> and I
>>> also find weird to have a method that may or not be called as part of an
>>> API,
>>> why would users implement teardown if it will not be called? In that case
>>> probably a cleaner approach would be to get rid of that method
>>> altogether, no?
>>>
>>> But well maybe that’s not so easy too, there was another point: Some user
>>> reported an issue with leaking resources using KafkaIO in the Spark
>>> runner, for
>>> ref.
>>> https://apachebeam.slack.com/archives/C1AAFJYMP/p1510596938000622
>>>
>>> In that moment my understanding was that there was something fishy
>>> because we
>>> should be calling Teardown to close correctly the connections and free
>>> the
>>> resources in case of exceptions on start/process/finish, so I filled a
>>> JIRA and
>>> fixed this by enforcing the call of teardown for the Spark runner and
>>> the Flink
>>> runner:
>>> https://issues.apache.org/jira/browse/BEAM-3187
>>> https://issues.apache.org/jira/browse/BEAM-3244
>>>
>>> As you can see not calling this method does have consequences at least
>>> for
>>> non-containerized runners. Of course a runner that uses containers could
>>> not
>>> care about cleaning the resources this way, but a long living JVM in a
>>> Hadoop
>>> environment probably won’t have the same luck. So I am not sure that
>>> having a
>>> loose semantic there is the right option, I mean, runners could simply
>>> guarantee
>>> that they call teardown and if teardown takes too long they can decide
>>> to send a
>>> signal or kill the process/container/etc and go ahead, that way at least
>>> users
>>> would have a motivation to implement the teardown method, otherwise it
>>> doesn’t
>>> make any sense to have it (API wise).
>>>
>>> On Mon, Feb 19, 2018 at 11:30 PM, Eugene Kirpichov <kirpic...@google.com>
>>> wrote:
>>> > Romain, would it be fair to say that currently the goal of your
>>> > participation in this discussion is to identify situations where
>>> @Teardown
>>> > in principle could have been called, but some of the current runners
>>> don't
>>> > make a good enough effort to call it? If yes - as I said before,
>>> please, by
>>> > all means, file bugs of the form "Runner X doesn't call @Teardown in
>>> > situation Y" if you're aware of any, and feel free to send PRs fixing
>>> runner
>>> > X to reliably call @Teardown in situation Y. I think we all agree that
>>> this
>>> > would be a good improvement.
>>> >
>>> > On Mon, Feb 19, 2018 at 2:03 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com>
>>> > wrote:
>>> >>
>>> >>
>>> >>
>>> >> Le 19 févr. 2018 22:56, "Reuven Lax" <re...@google.com> a écrit :
>>> >>
>>> >>
>>> >>
>>> >> On Mon, Feb 19, 2018 at 1:51 PM, Romain Manni-Bucau
>>> >> <rmannibu...@gmail.com> wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> Le 19 févr. 2018 21:28, "Reuven Lax" <re...@google.com> a écrit :
>>> >>>
>>> >>> How do you call teardown? There are cases in which the Java code
>>> gets no
>>> >>> indication that the restart is happening (e.g. cases where the
>>> machine
>>> >>> itself is taken down)
>>> >>>
>>> >>>
>>> >>> This is a bug, 0 downtime maintenance is very doable in 2018 ;).
>>> Crashes
>>> >>> are bugs, kill -9 to shutdown is a bug too. Other cases let call
>>> shutdown
>>> >>> with a hook worse case.
>>> >>
>>> >>
>>> >> What you say here is simply not true.
>>> >>
>>> >> There are many scenarios in which workers shutdown with no
>>> opportunity for
>>> >> any sort of shutdown hook. Sometimes the entire machine gets
>>> shutdown, and
>>> >> not even the OS will have much of a chance to do anything. At scale
>>> this
>>> >> will happen with some regularity, and a distributed system that
>>> assumes this
>>> >> will not happen is a poor distributed system.
>>> >>
>>> >>
>>> >> This is part of the infra and there is no reason the machine is
>>> shutdown
>>> >> without shutting down what runs on it before except if it is a bug in
>>> the
>>> >> software or setup. I can hear you maybe dont do it everywhere but
>>> there is
>>> >> no blocker to do it. Means you can shutdown the machines and guarantee
>>> >> teardown is called.
>>> >>
>>> >> Where i go is simply that it is doable and beam sdk core can assume
>>> setup
>>> >> is well done. If there is a best effort downside due to that - with
>>> the
>>> >> meaning you defined - it is an impl bug or a user installation issue.
>>> >>
>>> >> Technically all is true.
>>> >>
>>> >> What can prevent teardown is a hardware failure or so. This is fine
>>> and
>>> >> doesnt need to be in doc since it is life in IT and obvious or must
>>> be very
>>> >> explicit to avoid current ambiguity.
>>> >>
>>> >>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Mon, Feb 19, 2018, 12:24 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> Restarting doesnt mean you dont call teardown. Except a bug there
>>> is no
>>> >>>> reason - technically - it happens, no reason.
>>> >>>>
>>> >>>> Le 19 févr. 2018 21:14, "Reuven Lax" <re...@google.com> a écrit :
>>> >>>>>
>>> >>>>> Workers restarting is not a bug, it's standard often expected.
>>> >>>>>
>>> >>>>> On Mon, Feb 19, 2018, 12:03 PM Romain Manni-Bucau
>>> >>>>> <rmannibu...@gmail.com> wrote:
>>> >>>>>>
>>> >>>>>> Nothing, as mentionned it is a bug so recovery is a bug recovery
>>> >>>>>> (procedure)
>>> >>>>>>
>>> >>>>>> Le 19 févr. 2018 19:42, "Eugene Kirpichov" <kirpic...@google.com>
>>> a
>>> >>>>>> écrit :
>>> >>>>>>>
>>> >>>>>>> So what would you like to happen if there is a crash? The DoFn
>>> >>>>>>> instance no longer exists because the JVM it ran on no longer
>>> exists. What
>>> >>>>>>> should Teardown be called on?
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Mon, Feb 19, 2018, 10:20 AM Romain Manni-Bucau
>>> >>>>>>> <rmannibu...@gmail.com> wrote:
>>> >>>>>>>>
>>> >>>>>>>> This is what i want and not 999999 teardowns for 1000000 setups
>>> >>>>>>>> until there is an unexpected crash (= a bug).
>>> >>>>>>>>
>>> >>>>>>>> Le 19 févr. 2018 18:57, "Reuven Lax" <re...@google.com> a
>>> écrit :
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> On Mon, Feb 19, 2018 at 7:11 AM, Romain Manni-Bucau
>>> >>>>>>>>> <rmannibu...@gmail.com> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> 2018-02-19 15:57 GMT+01:00 Reuven Lax <re...@google.com>:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> On Mon, Feb 19, 2018 at 12:35 AM, Romain Manni-Bucau
>>> >>>>>>>>>>> <rmannibu...@gmail.com> wrote:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> @Reuven: in practise it is created by pool of 256 but leads
>>> to
>>> >>>>>>>>>>>> the same pattern, the teardown is just a "if (iCreatedThem)
>>> releaseThem();"
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> How do you control "256?" Even if you have a pool of 256
>>> workers,
>>> >>>>>>>>>>> nothing in Beam guarantees how many threads and DoFns are
>>> created per
>>> >>>>>>>>>>> worker. In theory the runner might decide to create 1000
>>> threads on each
>>> >>>>>>>>>>> worker.
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> Nop was the other way around, in this case on AWS you can get
>>> 256
>>> >>>>>>>>>> instances at once but not 512 (which will be 2x256). So when
>>> you compute the
>>> >>>>>>>>>> distribution you allocate to some fn the role to own the
>>> instance lookup and
>>> >>>>>>>>>> releasing.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> I still don't understand. Let's be more precise. If you write
>>> the
>>> >>>>>>>>> following code:
>>> >>>>>>>>>
>>> >>>>>>>>>    pCollection.apply(ParDo.of(new MyDoFn()));
>>> >>>>>>>>>
>>> >>>>>>>>> There is no way to control how many instances of MyDoFn are
>>> >>>>>>>>> created. The runner might decided to create a million
>>> instances of this
>>> >>>>>>>>> class across your worker pool, which means that you will get a
>>> million Setup
>>> >>>>>>>>> and Teardown calls.
>>> >>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> Anyway this was just an example of an external resource you
>>> must
>>> >>>>>>>>>> release. Real topic is that beam should define asap a
>>> guaranteed generic
>>> >>>>>>>>>> lifecycle to let user embrace its programming model.
>>> >>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> @Eugene:
>>> >>>>>>>>>>>> 1. wait logic is about passing the value which is not always
>>> >>>>>>>>>>>> possible (like 15% of cases from my raw estimate)
>>> >>>>>>>>>>>> 2. sdf: i'll try to detail why i mention SDF more here
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Concretely beam exposes a portable API (included in the SDK
>>> >>>>>>>>>>>> core). This API defines a *container* API and therefore
>>> implies bean
>>> >>>>>>>>>>>> lifecycles. I'll not detail them all but just use the
>>> sources and dofn (not
>>> >>>>>>>>>>>> sdf) to illustrate the idea I'm trying to develop.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> A. Source
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> A source computes a partition plan with 2 primitives:
>>> >>>>>>>>>>>> estimateSize and split. As an user you can expect both to
>>> be called on the
>>> >>>>>>>>>>>> same bean instance to avoid to pay the same connection
>>> cost(s) twice.
>>> >>>>>>>>>>>> Concretely:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> connect()
>>> >>>>>>>>>>>> try {
>>> >>>>>>>>>>>>   estimateSize()
>>> >>>>>>>>>>>>   split()
>>> >>>>>>>>>>>> } finally {
>>> >>>>>>>>>>>>   disconnect()
>>> >>>>>>>>>>>> }
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> this is not guaranteed by the API so you must do:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> connect()
>>> >>>>>>>>>>>> try {
>>> >>>>>>>>>>>>   estimateSize()
>>> >>>>>>>>>>>> } finally {
>>> >>>>>>>>>>>>   disconnect()
>>> >>>>>>>>>>>> }
>>> >>>>>>>>>>>> connect()
>>> >>>>>>>>>>>> try {
>>> >>>>>>>>>>>>   split()
>>> >>>>>>>>>>>> } finally {
>>> >>>>>>>>>>>>   disconnect()
>>> >>>>>>>>>>>> }
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> + a workaround with an internal estimate size since this
>>> >>>>>>>>>>>> primitive is often called in split but you dont want to
>>> connect twice in the
>>> >>>>>>>>>>>> second phase.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Why do you need that? Simply cause you want to define an
>>> API to
>>> >>>>>>>>>>>> implement sources which initializes the source bean and
>>> destroys it.
>>> >>>>>>>>>>>> I insists it is a very very basic concern for such API.
>>> However
>>> >>>>>>>>>>>> beam doesn't embraces it and doesn't assume it so building
>>> any API on top of
>>> >>>>>>>>>>>> beam is very hurtful today and for direct beam users you
>>> hit the exact same
>>> >>>>>>>>>>>> issues - check how IO are implemented, the static utilities
>>> which create
>>> >>>>>>>>>>>> volatile connections preventing to reuse existing
>>> connection in a single
>>> >>>>>>>>>>>> method
>>> >>>>>>>>>>>> (https://github.com/apache/beam/blob/master/sdks/java/io/
>>> elasticsearch/src/main/java/org/apache/beam/sdk/io/
>>> elasticsearch/ElasticsearchIO.java#L862).
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Same logic applies to the reader which is then created.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> B. DoFn & SDF
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> As a fn dev you expect the same from the beam runtime:
>>> init();
>>> >>>>>>>>>>>> try { while (...) process(); } finally { destroy(); } and
>>> that it is
>>> >>>>>>>>>>>> executed on the exact same instance to be able to be
>>> stateful at that level
>>> >>>>>>>>>>>> for expensive connections/operations/flow state handling.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> As you mentionned with the million example, this sequence
>>> should
>>> >>>>>>>>>>>> happen for each single instance so 1M times for your
>>> example.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Now why did I mention SDF several times? Because SDF is a
>>> >>>>>>>>>>>> generalisation of both cases (source and dofn). Therefore
>>> it creates way
>>> >>>>>>>>>>>> more instances and requires to have a way more
>>> strict/explicit definition of
>>> >>>>>>>>>>>> the exact lifecycle and which instance does what. Since
>>> beam handles the
>>> >>>>>>>>>>>> full lifecycle of the bean instances it must provide
>>> init/destroy hooks
>>> >>>>>>>>>>>> (setup/teardown) which can be stateful.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> If you take the JDBC example which was mentionned earlier.
>>> >>>>>>>>>>>> Today, because of the teardown issue it uses bundles. Since
>>> bundles size is
>>> >>>>>>>>>>>> not defined - and will not with SDF, it must use a pool to
>>> be able to reuse
>>> >>>>>>>>>>>> a connection instance to not correct performances. Now with
>>> the SDF and the
>>> >>>>>>>>>>>> split increase, how do you handle the pool size? Generally
>>> in batch you use
>>> >>>>>>>>>>>> a single connection per thread to avoid to consume all
>>> database connections.
>>> >>>>>>>>>>>> With a pool you have 2 choices: 1. use a pool of 1, 2. use
>>> a pool a bit
>>> >>>>>>>>>>>> higher but multiplied by the number of beans you will
>>> likely x2 or 3 the
>>> >>>>>>>>>>>> connection count and make the execution fail with "no more
>>> connection
>>> >>>>>>>>>>>> available". I you picked 1 (pool of #1), then you still
>>> have to have a
>>> >>>>>>>>>>>> reliable teardown by pool instance (close() generally) to
>>> ensure you release
>>> >>>>>>>>>>>> the pool and don't leak the connection information in the
>>> JVM. In all case
>>> >>>>>>>>>>>> you come back to the init()/destroy() lifecycle even if you
>>> fake to get
>>> >>>>>>>>>>>> connections with bundles.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Just to make it obvious: SDF mentions are just cause SDF
>>> imply
>>> >>>>>>>>>>>> all the current issues with the loose definition of the
>>> bean lifecycles at
>>> >>>>>>>>>>>> an exponential level, nothing else.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Romain Manni-Bucau
>>> >>>>>>>>>>>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> 2018-02-18 22:32 GMT+01:00 Eugene Kirpichov
>>> >>>>>>>>>>>> <kirpic...@google.com>:
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> The kind of whole-transform lifecycle you're mentioning
>>> can be
>>> >>>>>>>>>>>>> accomplished using the Wait transform as I suggested in
>>> the thread above,
>>> >>>>>>>>>>>>> and I believe it should become the canonical way to do
>>> that.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> (Would like to reiterate one more time, as the main author
>>> of
>>> >>>>>>>>>>>>> most design documents related to SDF and of its
>>> implementation in the Java
>>> >>>>>>>>>>>>> direct and dataflow runner that SDF is fully unrelated to
>>> the topic of
>>> >>>>>>>>>>>>> cleanup - I'm very confused as to why it keeps coming up)
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> On Sun, Feb 18, 2018, 1:15 PM Romain Manni-Bucau
>>> >>>>>>>>>>>>> <rmannibu...@gmail.com> wrote:
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> I kind of agree except transforms lack a lifecycle too. My
>>> >>>>>>>>>>>>>> understanding is that sdf could be a way to unify it and
>>> clean the api.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Otherwise how to normalize - single api -  lifecycle of
>>> >>>>>>>>>>>>>> transforms?
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Le 18 févr. 2018 21:32, "Ben Chambers" <
>>> bchamb...@apache.org>
>>> >>>>>>>>>>>>>> a écrit :
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> Are you sure that focusing on the cleanup of specific
>>> DoFn's
>>> >>>>>>>>>>>>>>> is appropriate? Many cases where cleanup is necessary,
>>> it is around an
>>> >>>>>>>>>>>>>>> entire composite PTransform. I think there have been
>>> discussions/proposals
>>> >>>>>>>>>>>>>>> around a more methodical "cleanup" option, but those
>>> haven't been
>>> >>>>>>>>>>>>>>> implemented, to the best of my knowledge.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> For instance, consider the steps of a FileIO:
>>> >>>>>>>>>>>>>>> 1. Write to a bunch (N shards) of temporary files
>>> >>>>>>>>>>>>>>> 2. When all temporary files are complete, attempt to do a
>>> >>>>>>>>>>>>>>> bulk copy to put them in the final destination.
>>> >>>>>>>>>>>>>>> 3. Cleanup all the temporary files.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> (This is often desirable because it minimizes the chance
>>> of
>>> >>>>>>>>>>>>>>> seeing partial/incomplete results in the final
>>> destination).
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> In the above, you'd want step 1 to execute on many
>>> workers,
>>> >>>>>>>>>>>>>>> likely using a ParDo (say N different workers).
>>> >>>>>>>>>>>>>>> The move step should only happen once, so on one worker.
>>> This
>>> >>>>>>>>>>>>>>> means it will be a different DoFn, likely with some
>>> stuff done to ensure it
>>> >>>>>>>>>>>>>>> runs on one worker.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> In such a case, cleanup / @TearDown of the DoFn is not
>>> >>>>>>>>>>>>>>> enough. We need an API for a PTransform to schedule some
>>> cleanup work for
>>> >>>>>>>>>>>>>>> when the transform is "done". In batch this is
>>> relatively straightforward,
>>> >>>>>>>>>>>>>>> but doesn't exist. This is the source of some problems,
>>> such as BigQuery
>>> >>>>>>>>>>>>>>> sink leaving files around that have failed to import
>>> into BigQuery.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> In streaming this is less straightforward -- do you want
>>> to
>>> >>>>>>>>>>>>>>> wait until the end of the pipeline? Or do you want to
>>> wait until the end of
>>> >>>>>>>>>>>>>>> the window? In practice, you just want to wait until you
>>> know nobody will
>>> >>>>>>>>>>>>>>> need the resource anymore.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> This led to some discussions around a "cleanup" API,
>>> where
>>> >>>>>>>>>>>>>>> you could have a transform that output resource objects.
>>> Each resource
>>> >>>>>>>>>>>>>>> object would have logic for cleaning it up. And there
>>> would be something
>>> >>>>>>>>>>>>>>> that indicated what parts of the pipeline needed that
>>> resource, and what
>>> >>>>>>>>>>>>>>> kind of temporal lifetime those objects had. As soon as
>>> that part of the
>>> >>>>>>>>>>>>>>> pipeline had advanced far enough that it would no longer
>>> need the resources,
>>> >>>>>>>>>>>>>>> they would get cleaned up. This can be done at pipeline
>>> shutdown, or
>>> >>>>>>>>>>>>>>> incrementally during a streaming pipeline, etc.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> Would something like this be a better fit for your use
>>> case?
>>> >>>>>>>>>>>>>>> If not, why is handling teardown within a single DoFn
>>> sufficient?
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> On Sun, Feb 18, 2018 at 11:53 AM Romain Manni-Bucau
>>> >>>>>>>>>>>>>>> <rmannibu...@gmail.com> wrote:
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Yes 1M. Lets try to explain you simplifying the overall
>>> >>>>>>>>>>>>>>>> execution. Each instance - one fn so likely in a thread
>>> of a worker - has
>>> >>>>>>>>>>>>>>>> its lifecycle. Caricaturally: "new" and garbage
>>> collection.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> In practise, new is often an unsafe allocate
>>> >>>>>>>>>>>>>>>> (deserialization) but it doesnt matter here.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> What i want is any "new" to have a following setup
>>> before
>>> >>>>>>>>>>>>>>>> any process or stattbundle and the last time beam has
>>> the instance before it
>>> >>>>>>>>>>>>>>>> is gc-ed and after last finishbundle it calls teardown.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> It is as simple as it.
>>> >>>>>>>>>>>>>>>> This way no need to comibe fn in a way making a fn not
>>> self
>>> >>>>>>>>>>>>>>>> contained to implement basic transforms.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Le 18 févr. 2018 20:07, "Reuven Lax" <re...@google.com>
>>> a
>>> >>>>>>>>>>>>>>>> écrit :
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> On Sun, Feb 18, 2018 at 10:50 AM, Romain Manni-Bucau
>>> >>>>>>>>>>>>>>>>> <rmannibu...@gmail.com> wrote:
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> Le 18 févr. 2018 19:28, "Ben Chambers"
>>> >>>>>>>>>>>>>>>>>> <bchamb...@apache.org> a écrit :
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> It feels like his thread may be a bit off-track.
>>> Rather
>>> >>>>>>>>>>>>>>>>>> than focusing on the semantics of the existing
>>> methods -- which have been
>>> >>>>>>>>>>>>>>>>>> noted to be meet many existing use cases -- it would
>>> be helpful to focus on
>>> >>>>>>>>>>>>>>>>>> more on the reason you are looking for something with
>>> different semantics.
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> Some possibilities (I'm not sure which one you are
>>> trying
>>> >>>>>>>>>>>>>>>>>> to do):
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> 1. Clean-up some external, global resource, that was
>>> >>>>>>>>>>>>>>>>>> initialized once during the startup of the pipeline.
>>> If this is the case,
>>> >>>>>>>>>>>>>>>>>> how are you ensuring it was really only initialized
>>> once (and not once per
>>> >>>>>>>>>>>>>>>>>> worker, per thread, per instance, etc.)? How do you
>>> know when the pipeline
>>> >>>>>>>>>>>>>>>>>> should release it? If the answer is "when it reaches
>>> step X", then what
>>> >>>>>>>>>>>>>>>>>> about a streaming pipeline?
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> When the dofn is no more needed logically ie when the
>>> >>>>>>>>>>>>>>>>>> batch is done or stream is stopped (manually or by a
>>> jvm shutdown)
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> I'm really not following what this means.
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Let's say that a pipeline is running 1000 workers, and
>>> each
>>> >>>>>>>>>>>>>>>>> worker is running 1000 threads (each running a copy of
>>> the same DoFn). How
>>> >>>>>>>>>>>>>>>>> many cleanups do you want (do you want 1000 * 1000 =
>>> 1M cleanups) and when
>>> >>>>>>>>>>>>>>>>> do you want it called? When the entire pipeline is
>>> shut down? When an
>>> >>>>>>>>>>>>>>>>> individual worker is about to shut down (which may be
>>> temporary - may be
>>> >>>>>>>>>>>>>>>>> about to start back up)? Something else?
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> 2. Finalize some resources that are used within some
>>> >>>>>>>>>>>>>>>>>> region of the pipeline. While, the DoFn lifecycle
>>> methods are not a good fit
>>> >>>>>>>>>>>>>>>>>> for this (they are focused on managing resources
>>> within the DoFn), you could
>>> >>>>>>>>>>>>>>>>>> model this on how FileIO finalizes the files that it
>>> produced. For instance:
>>> >>>>>>>>>>>>>>>>>>    a) ParDo generates "resource IDs" (or some token
>>> that
>>> >>>>>>>>>>>>>>>>>> stores information about resources)
>>> >>>>>>>>>>>>>>>>>>    b) "Require Deterministic Input" (to prevent
>>> retries
>>> >>>>>>>>>>>>>>>>>> from changing resource IDs)
>>> >>>>>>>>>>>>>>>>>>    c) ParDo that initializes the resources
>>> >>>>>>>>>>>>>>>>>>    d) Pipeline segments that use the resources, and
>>> >>>>>>>>>>>>>>>>>> eventually output the fact they're done
>>> >>>>>>>>>>>>>>>>>>    e) "Require Deterministic Input"
>>> >>>>>>>>>>>>>>>>>>    f) ParDo that frees the resources
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> By making the use of the resource part of the data it
>>> is
>>> >>>>>>>>>>>>>>>>>> possible to "checkpoint" which resources may be in
>>> use or have been finished
>>> >>>>>>>>>>>>>>>>>> by using the require deterministic input. This is
>>> important to ensuring
>>> >>>>>>>>>>>>>>>>>> everything is actually cleaned up.
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> I nees that but generic and not case by case to
>>> >>>>>>>>>>>>>>>>>> industrialize some api on top of beam.
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> 3. Some other use case that I may be missing? If it is
>>> >>>>>>>>>>>>>>>>>> this case, could you elaborate on what you are trying
>>> to accomplish? That
>>> >>>>>>>>>>>>>>>>>> would help me understand both the problems with
>>> existing options and
>>> >>>>>>>>>>>>>>>>>> possibly what could be done to help.
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> I understand there are sorkaround for almost all
>>> cases but
>>> >>>>>>>>>>>>>>>>>> means each transform is different in its lifecycle
>>> handling  except i
>>> >>>>>>>>>>>>>>>>>> dislike it a lot at a scale and as a user since you
>>> cant put any unified
>>> >>>>>>>>>>>>>>>>>> practise on top of beam, it also makes beam very hard
>>> to integrate or to use
>>> >>>>>>>>>>>>>>>>>> to build higher level libraries or softwares.
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> This is why i tried to not start the workaround
>>> >>>>>>>>>>>>>>>>>> discussions and just stay at API level.
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> -- Ben
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> On Sun, Feb 18, 2018 at 9:56 AM Romain Manni-Bucau
>>> >>>>>>>>>>>>>>>>>> <rmannibu...@gmail.com> wrote:
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> 2018-02-18 18:36 GMT+01:00 Eugene Kirpichov
>>> >>>>>>>>>>>>>>>>>>> <kirpic...@google.com>:
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> "Machine state" is overly low-level because many of
>>> the
>>> >>>>>>>>>>>>>>>>>>>> possible reasons can happen on a perfectly fine
>>> machine.
>>> >>>>>>>>>>>>>>>>>>>> If you'd like to rephrase it to "it will be called
>>> >>>>>>>>>>>>>>>>>>>> except in various situations where it's logically
>>> impossible or impractical
>>> >>>>>>>>>>>>>>>>>>>> to guarantee that it's called", that's fine. Or you
>>> can list some of the
>>> >>>>>>>>>>>>>>>>>>>> examples above.
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> Sounds ok to me
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> The main point for the user is, you *will* see
>>> >>>>>>>>>>>>>>>>>>>> non-preventable situations where it couldn't be
>>> called - it's not just
>>> >>>>>>>>>>>>>>>>>>>> intergalactic crashes - so if the logic is very
>>> important (e.g. cleaning up
>>> >>>>>>>>>>>>>>>>>>>> a large amount of temporary files, shutting down a
>>> large number of VMs you
>>> >>>>>>>>>>>>>>>>>>>> started etc), you have to express it using one of
>>> the other methods that
>>> >>>>>>>>>>>>>>>>>>>> have stricter guarantees (which obviously come at a
>>> cost, e.g. no
>>> >>>>>>>>>>>>>>>>>>>> pass-by-reference).
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> FinishBundle has the exact same guarantee sadly so
>>> not
>>> >>>>>>>>>>>>>>>>>>> which which other method you speak about. Concretely
>>> if you make it really
>>> >>>>>>>>>>>>>>>>>>> unreliable - this is what best effort sounds to me -
>>> then users can use it
>>> >>>>>>>>>>>>>>>>>>> to clean anything but if you make it "can happen but
>>> it is unexpected and
>>> >>>>>>>>>>>>>>>>>>> means something happent" then it is fine to have a
>>> manual - or auto if fancy
>>> >>>>>>>>>>>>>>>>>>> - recovery procedure. This is where it makes all the
>>> difference and impacts
>>> >>>>>>>>>>>>>>>>>>> the developpers, ops (all users basically).
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> On Sun, Feb 18, 2018 at 9:16 AM Romain Manni-Bucau
>>> >>>>>>>>>>>>>>>>>>>> <rmannibu...@gmail.com> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> Agree Eugene except that "best effort" means that.
>>> It
>>> >>>>>>>>>>>>>>>>>>>>> is also often used to say "at will" and this is
>>> what triggered this thread.
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> I'm fine using "except if the machine state
>>> prevents
>>> >>>>>>>>>>>>>>>>>>>>> it" but "best effort" is too open and can be very
>>> badly and wrongly
>>> >>>>>>>>>>>>>>>>>>>>> perceived by users (like I did).
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> Romain Manni-Bucau
>>> >>>>>>>>>>>>>>>>>>>>> @rmannibucau |  Blog | Old Blog | Github |
>>> LinkedIn |
>>> >>>>>>>>>>>>>>>>>>>>> Book
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> 2018-02-18 18:13 GMT+01:00 Eugene Kirpichov
>>> >>>>>>>>>>>>>>>>>>>>> <kirpic...@google.com>:
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> It will not be called if it's impossible to call
>>> it:
>>> >>>>>>>>>>>>>>>>>>>>>> in the example situation you have (intergalactic
>>> crash), and in a number of
>>> >>>>>>>>>>>>>>>>>>>>>> more common cases: eg in case the worker
>>> container has crashed (eg user code
>>> >>>>>>>>>>>>>>>>>>>>>> in a different thread called a C library over JNI
>>> and it segfaulted), JVM
>>> >>>>>>>>>>>>>>>>>>>>>> bug, crash due to user code OOM, in case the
>>> worker has lost network
>>> >>>>>>>>>>>>>>>>>>>>>> connectivity (then it may be called but it won't
>>> be able to do anything
>>> >>>>>>>>>>>>>>>>>>>>>> useful), in case this is running on a preemptible
>>> VM and it was preempted by
>>> >>>>>>>>>>>>>>>>>>>>>> the underlying cluster manager without notice or
>>> if the worker was too busy
>>> >>>>>>>>>>>>>>>>>>>>>> with other stuff (eg calling other Teardown
>>> functions) until the preemption
>>> >>>>>>>>>>>>>>>>>>>>>> timeout elapsed, in case the underlying hardware
>>> simply failed (which
>>> >>>>>>>>>>>>>>>>>>>>>> happens quite often at scale), and in many other
>>> conditions.
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> "Best effort" is the commonly used term to
>>> describe
>>> >>>>>>>>>>>>>>>>>>>>>> such behavior. Please feel free to file bugs for
>>> cases where you observed a
>>> >>>>>>>>>>>>>>>>>>>>>> runner not call Teardown in a situation where it
>>> was possible to call it but
>>> >>>>>>>>>>>>>>>>>>>>>> the runner made insufficient effort.
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> On Sun, Feb 18, 2018, 9:02 AM Romain Manni-Bucau
>>> >>>>>>>>>>>>>>>>>>>>>> <rmannibu...@gmail.com> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> 2018-02-18 18:00 GMT+01:00 Eugene Kirpichov
>>> >>>>>>>>>>>>>>>>>>>>>>> <kirpic...@google.com>:
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>> On Sun, Feb 18, 2018, 2:06 AM Romain Manni-Bucau
>>> >>>>>>>>>>>>>>>>>>>>>>>> <rmannibu...@gmail.com> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> Le 18 févr. 2018 00:23, "Kenneth Knowles"
>>> >>>>>>>>>>>>>>>>>>>>>>>>> <k...@google.com> a écrit :
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Feb 17, 2018 at 3:09 PM, Romain
>>> Manni-Bucau
>>> >>>>>>>>>>>>>>>>>>>>>>>>> <rmannibu...@gmail.com> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> If you give an example of a high-level need
>>> (e.g.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> "I'm trying to write an IO for system $x and
>>> it requires the following
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> initialization and the following cleanup
>>> logic and the following processing
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> in between") I'll be better able to help you.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Take a simple example of a transform
>>> requiring a
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> connection. Using bundles is a perf killer
>>> since size is not controlled.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Using teardown doesnt allow you to release
>>> the connection since it is a best
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> effort thing. Not releasing the connection
>>> makes you pay a lot - aws ;) - or
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> prevents you to launch other processings -
>>> concurrent limit.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> For this example @Teardown is an exact fit. If
>>> >>>>>>>>>>>>>>>>>>>>>>>>> things die so badly that @Teardown is not
>>> called then nothing else can be
>>> >>>>>>>>>>>>>>>>>>>>>>>>> called to close the connection either. What
>>> AWS service are you thinking of
>>> >>>>>>>>>>>>>>>>>>>>>>>>> that stays open for a long time when
>>> everything at the other end has died?
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> You assume connections are kind of stateless
>>> but
>>> >>>>>>>>>>>>>>>>>>>>>>>>> some (proprietary) protocols requires some
>>> closing exchanges which are not
>>> >>>>>>>>>>>>>>>>>>>>>>>>> only "im leaving".
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> For aws i was thinking about starting some
>>> services
>>> >>>>>>>>>>>>>>>>>>>>>>>>> - machines - on the fly in a pipeline startup
>>> and closing them at the end.
>>> >>>>>>>>>>>>>>>>>>>>>>>>> If teardown is not called you leak machines
>>> and money. You can say it can be
>>> >>>>>>>>>>>>>>>>>>>>>>>>> done another way...as the full pipeline ;).
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> I dont want to be picky but if beam cant
>>> handle its
>>> >>>>>>>>>>>>>>>>>>>>>>>>> components lifecycle it can be used at scale
>>> for generic pipelines and if
>>> >>>>>>>>>>>>>>>>>>>>>>>>> bound to some particular IO.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> What does prevent to enforce teardown -
>>> ignoring
>>> >>>>>>>>>>>>>>>>>>>>>>>>> the interstellar crash case which cant be
>>> handled by any human system?
>>> >>>>>>>>>>>>>>>>>>>>>>>>> Nothing technically. Why do you push to not
>>> handle it? Is it due to some
>>> >>>>>>>>>>>>>>>>>>>>>>>>> legacy code on dataflow or something else?
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>> Teardown *is* already documented and implemented
>>> >>>>>>>>>>>>>>>>>>>>>>>> this way (best-effort). So I'm not sure what
>>> kind of change you're asking
>>> >>>>>>>>>>>>>>>>>>>>>>>> for.
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> Remove "best effort" from the javadoc. If it is
>>> not
>>> >>>>>>>>>>>>>>>>>>>>>>> call then it is a bug and we are done :).
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> Also what does it mean for the users? Direct
>>> runner
>>> >>>>>>>>>>>>>>>>>>>>>>>>> does it so if a user udes the RI in test, he
>>> will get a different behavior
>>> >>>>>>>>>>>>>>>>>>>>>>>>> in prod? Also dont forget the user doesnt know
>>> what the IOs he composes use
>>> >>>>>>>>>>>>>>>>>>>>>>>>> so this is so impacting for the whole product
>>> than he must be handled IMHO.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> I understand the portability culture is new in
>>> big
>>> >>>>>>>>>>>>>>>>>>>>>>>>> data world but it is not a reason to ignore
>>> what people did for years and do
>>> >>>>>>>>>>>>>>>>>>>>>>>>> it wrong before doing right ;).
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> My proposal is to list what can prevent to
>>> >>>>>>>>>>>>>>>>>>>>>>>>> guarantee - in the normal IT conditions - the
>>> execution of teardown. Then we
>>> >>>>>>>>>>>>>>>>>>>>>>>>> see if we can handle it and only if there is a
>>> technical reason we cant we
>>> >>>>>>>>>>>>>>>>>>>>>>>>> make it experimental/unsupported in the api. I
>>> know spark and flink can, any
>>> >>>>>>>>>>>>>>>>>>>>>>>>> unknown blocker for other runners?
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> Technical note: even a kill should go through
>>> java
>>> >>>>>>>>>>>>>>>>>>>>>>>>> shutdown hooks otherwise your environment
>>> (beam enclosing software) is fully
>>> >>>>>>>>>>>>>>>>>>>>>>>>> unhandled and your overall system is
>>> uncontrolled. Only case where it is not
>>> >>>>>>>>>>>>>>>>>>>>>>>>> true is when the software is always owned by a
>>> vendor and never installed on
>>> >>>>>>>>>>>>>>>>>>>>>>>>> customer environment. In this case it belongd
>>> to the vendor to handle beam
>>> >>>>>>>>>>>>>>>>>>>>>>>>> API and not to beam to adjust its API for a
>>> vendor - otherwise all
>>> >>>>>>>>>>>>>>>>>>>>>>>>> unsupported features by one runner should be
>>> made optional right?
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> All state is not about network, even in
>>> distributed
>>> >>>>>>>>>>>>>>>>>>>>>>>>> systems so this is key to have an explicit and
>>> defined lifecycle.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> Kenn
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>
>>> >>>
>>> >>
>>> >>
>>> >
>>>
>>
>>

Re: @TearDown guarantees

Reply via email to