Re: [Proposal] Add waitToFinish(), cancel(), waitToRunning() to PipelineResult.

Robert Bradshaw Thu, 21 Jul 2016 16:33:07 -0700

On Thu, Jul 21, 2016 at 4:18 PM, Ben Chambers <bchamb...@apache.org> wrote:
> This health check seems redundant with just waiting a while and then
> checking on the status, other than returning earlier in the case of
> reaching a terminal state. What about adding:
>
> /**
>  * Returns the state after waiting the specified duration. Will return
> earlier if the pipeline
>  * reaches a terminal state.
>  */
> State getStateAfter(Duration duration);
>
> This seems to be a useful building block, both for the user's pipeline (in
> case they wanted to build something like wait and then check health) and
> also for the SDK (to implement waitUntilFinished, etc.)


A generic waitFor(Duration) which may return early if a terminal state
is entered seems useful. I don't know that we need a return value
here, given that we an then query the PipelineResult however we want
once this returns. waitUntilFinished is simply
waitFor(InfiniteDuration).

> On Thu, Jul 21, 2016 at 4:11 PM Pei He <pe...@google.com.invalid> wrote:
>
>> I am not in favor of supporting wait for every states or
>> waitUntilState(...).
>> One reason is PipelineResult.State is not well defined and is not
>> agreed upon runners.
>> Another reason is users might not want to wait for a particular state.
>> For example,
>> waitUntilFinish() is to wait for a terminal state.
>> So, even runners have different states, we still can define shared
>> properties, such as finished/terminal.

+1. Running is an intermediate state that doesn't have an obvious
mapping onto all runners, which is another reason it's odd to wait
until then. All runners have terminal states.

>> I think when users call waitUntilRunning(), they want to make sure the
>> pipeline is up running and is healthy.
> > Maybe we want to wait for at
>> least one element went through the pipeline.

-1, That might be a while... Also, you may not start generating data
until you pipline is up.

>> What about changing the waitUntilRunning() to the following?
>>
>> /**
>> * Check if the pipeline is health for the duration.
>> *
>> * Return true if the pipeline is healthy at the end of duration.
>> * Return false if the pipeline is not healthy at the end of duration.
>> * <p>It may return early if the pipeline is in an unrecoverable failure
>> state.
>> */
>> boolean PipelineResult.healthCheck(Duration duration)
>>
>> (I think this also addressed Robert's comment about waitToRunning())
>>
>> On Thu, Jul 21, 2016 at 1:08 PM, Kenneth Knowles <k...@google.com.invalid>
>> wrote:
>> > Some more comments:
>> >
>> >  - What are the allowed/expected state transitions prior to RUNNING?
>> Today,
>> > I presume it is any nonterminal state, so it can be UNKNOWN or STOPPED
>> > (which really means "not yet started") prior to RUNNING. Is this what we
>> > want?
>> >
>> >  - If a job can be paused, a transition from RUNNING to STOPPED, then
>> > waitUntilPaused(Duration) makes sense.
>> >
>> >  - Assuming there is some polling under the hood, are runners required to
>> > send back a full history of transitions? Or can transitions be missed,
>> with
>> > only the latest state retrieved?
>> >
>> >  - If the latter, then does waitUntilRunning() only wait until RUNNING or
>> > does it also return when it sees STOPPED, which could certainly indicate
>> > that the job transitioned to RUNNING then STOPPED in between polls. In
>> that
>> > case it is, today, the same as waitUntilStateIsKnown().
>> >
>> >  - The obvious limit of this discussion is waitUntilState(Duration,
>> > Set<State>), which is the same amount of work to implement. Am I correct
>> > that everyone in this thread thinks this generality is just not the right
>> > thing for a user API?
>> >
>> >  - This enum could probably use revision. I'd chose some combination of
>> > tightening the enum, making it extensible, and make some aspect of it
>> > free-form. Not sure where the best balance lies.
>> >
>> >
>> >
>> > On Thu, Jul 21, 2016 at 12:47 PM, Ben Chambers
>> <bchamb...@google.com.invalid
>> >> wrote:
>> >
>> >> (Minor Issue: I'd propose waitUntilDone and waitUntilRunning rather than
>> >> waitToRunning which reads oddly)
>> >>
>> >> The only reason to separate submission from waitUntilRunning would be if
>> >> you wanted to kick off several pipelines in quick succession, then wait
>> for
>> >> them all to be running. For instance:
>> >>
>> >> PipelineResult p1Future = p1.run();
>> >> PipelineResult p2Future = p2.run();
>> >> ...
>> >>
>> >> p1Future.waitUntilRunning();
>> >> p2Future.waitUntilRunning();
>> >> ...
>> >>
>> >> In this setup, you can more quickly start several pipelines, but your
>> main
>> >> program would wait and report any errors before exiting.
>> >>
>> >> On Thu, Jul 21, 2016 at 12:41 PM Robert Bradshaw
>> >> <rober...@google.com.invalid> wrote:
>> >>
>> >> > I'm in favor of the proposal. My only question is whether we need
>> >> > PipelineResult.waitToRunning(), instead I'd propose that run() block
>> >> > until the pipeline's running/successfully submitted (or failed). This
>> >> > would simplify the API--we'd only have one kind of wait that makes
>> >> > sense in all cases.
>> >> >
>> >> > What kinds of interactions would one want to have with the
>> >> > PipelineResults before it's running?
>> >> >
>> >> > On Thu, Jul 21, 2016 at 12:24 PM, Thomas Groh
>> <tg...@google.com.invalid>
>> >> > wrote:
>> >> > > TestPipeline is probably the one runner that can be expected to
>> block,
>> >> as
>> >> > > certainly JUnit tests and likely other tests will run the Pipeline,
>> and
>> >> > > succeed, even if the PipelineRunner throws an exception. Luckily,
>> this
>> >> > can
>> >> > > be added to TestPipeline.run(), which already has additional
>> behavior
>> >> > > associated with it (currently regarding the unwrapping of
>> >> > AssertionErrors)
>> >> > >
>> >> > > On Thu, Jul 21, 2016 at 11:40 AM, Kenneth Knowles
>> >> <k...@google.com.invalid
>> >> > >
>> >> > > wrote:
>> >> > >
>> >> > >> I like this proposal. It makes pipeline.run() seem like a pretty
>> >> normal
>> >> > >> async request, and easy to program with. It removes the implicit
>> >> > assumption
>> >> > >> in the prior design that main() is pretty much just "build and run
>> a
>> >> > >> pipeline".
>> >> > >>
>> >> > >> The part of this that I care about most is being able to write a
>> >> program
>> >> > >> (not the pipeline, but the program that launches one or more
>> >> pipelines)
>> >> > >> that has reasonable cross-runner behavior.
>> >> > >>
>> >> > >> One comment:
>> >> > >>
>> >> > >> On Wed, Jul 20, 2016 at 3:39 PM, Pei He <pe...@google.com.invalid>
>> >> > wrote:
>> >> > >> >
>> >> > >> > 4. PipelineRunner.run() should (but not required) do non-blocking
>> >> runs
>> >> > >> >
>> >> > >>
>> >> > >> I think we can elaborate on this a little bit. Obviously there
>> might
>> >> be
>> >> > >> "blocking" in terms of, say, an HTTP round-trip to submit the job,
>> but
>> >> > >> run() should never be non-terminating.
>> >> > >>
>> >> > >> For a test runner that finishes the pipeline quickly, I would be
>> fine
>> >> > with
>> >> > >> run() just executing the pipeline, but the PipelineResult should
>> still
>> >> > >> emulate the usual - just always returning a terminal status. It
>> would
>> >> be
>> >> > >> annoying to add waitToFinish() to the end of all our tests, but
>> >> leaving
>> >> > a
>> >> > >> run() makes the tests only work with special blocking runner
>> wrappers
>> >> > (and
>> >> > >> make them poor examples). A JUnit @Rule for test pipeline would
>> hide
>> >> all
>> >> > >> that, perhaps.
>> >> > >>
>> >> > >>
>> >> > >> Kenn
>> >> > >>
>> >> >
>> >>
>>

Re: [Proposal] Add waitToFinish(), cancel(), waitToRunning() to PipelineResult.

Reply via email to