[kepler-users] Stop and resume execution

Josep Maria Campanera Alsina Wed, 10 Sep 2008 17:45:51 +0200

Hi all,
I'm back again, I'm extremely interested also in these extension
actors that are able to manage the execution of a Kepler workflow.


But where can I find the COMAD and "Smart rerun" actors? I haven't
been able to allocate them in the Kepler repository.

All the best,

Josep Maria,


>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Kepler-users digest..."
>
>
> Today's Topics:
>
>   1.   Stop and resume execution... (Quentin BEY)
>   2. Re:  Stop and resume execution... (Bertram Ludaescher)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 26 Aug 2008 11:22:04 +0200
> From: Quentin BEY <quentin.bey at onera.fr>
> Subject: [kepler-users]  Stop and resume execution...
> To: kepler-users at ecoinformatics.org
> Message-ID: <1219742524.20366.16.camel at talence>
> Content-Type: text/plain
>
> Hi all,
>
> Once again I need help about Kepler's possibilities.
>
> I wonder if we can stop a workflow, quit Kepler, then reopen Kepler and
> resume the workflow. For instance, a workflow which take long time to
> execute stops because the computer shutdown (for whatever reason we
> ignore), if we know which actor was executing is there a simple way to
> resume execution from this actor?
>
>
> Thanks in advance,
>
>
> Quentin BEY -ONERA- France
>
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 26 Aug 2008 05:59:27 -0700
> From: "Bertram Ludaescher" <ludaesch at ucdavis.edu>
> Subject: Re: [kepler-users] Stop and resume execution...
> To: "Quentin BEY" <quentin.bey at onera.fr>
> Cc: kepler-users at ecoinformatics.org
> Message-ID:
>        <657a810a0808260559x305d349ahc64dcdf60e95fd88 at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi Quentin:
>
> Interesting question! There are several answers to this.
>
> First, "knowing which actor was executing" is generally not enough to resume
> execution:
> Consider a workflow executing with a PN (process network) director. Then all
> actors execute as independent processes (Java threads really), so all are
> executing simultaneously.
> In contrast, a (sub-)workflow executing under SDF or DDF will be executed
> within a single thread, so at most one actor is executing at a given time in
> such a workflow.
> (SDF creates a schedule "statically", i.e., prior to workflow execution,
> while DDF figures out which actors are ready to fire at runtime, then
> selects one and repeats)
>
> But what you really need is to maintain the "workflow state" (or some part
> of it) persistently, so that you can resume a stopped or failed workflow.
> One general way to do this is checkpointing, i.e., writing relevant
> information out to disk at certain times. While checkpointing can be very
> costly in general applications, in scientific workflows it can often be
> easier to do so, since usually components are loosely coupled, all
> information flow is visible via the channels (unless you do some
> side-effects outside the model), and actors are often (but not always)
> stateless.
>
> I'm aware of several extensions that allow one to resume Kepler workflows (I
> think Ptolemy might have further ways):
>
> -- One system has been called "smart rerun" (e.g. Ilkay Altintas or Dan
> Crawl can point you to it) and allows you to rerun a workflow with modified
> inputs and/or parameter settings, avoiding to re-execute parts that are
> "unchanged". I don't recall whether it handles only successful workflow runs
> (and optimizes their re-execution under change) or also partial (aborted)
> runs.
>
> -- Norbert Podhorszki has developed workflows where actors themselves write
> out to disk some small information (in his case: remote commands that
> successfully terminated) which is used upon re-running the workflow to only
> execute the commands not yet successfully completed previously.  Call this
> the "custom checkpointing" solution (instead of a general system extension,
> individual actors or workflows decide what to checkpoint; more work, but it
> can be more efficient to know what is needed to rerun).
>
> -- One new director and workflow programming model called COMAD makes
> visible most if not all of the execution state visible "on the wire" by
> streaming nested data collections between actors. Like in other approaches,
> the information on the wire can be written to disk and the workflow resumed
> based on this info.
>
> All these approaches are based on record information during runtime on disk
> (sometimes called 'provenance information'), which is then used when
> resuming the workflow.
>
> The above options are not the only ones (e.g. Ptolemy probably has
> additional ways to restart a failed model). Which variant to choose (or
> which new variant to develop) may depend on, among other things:
> -- the size of data flowing through channels (or the availability of
> persistent ids to large chunks of data)
> -- whether actors are stateful or stateless
> -- the director(s) programming/execution model being used
>
> Bertram
>
>
> On Tue, Aug 26, 2008 at 2:22 AM, Quentin BEY <quentin.bey at onera.fr> wrote:
>
>> Hi all,
>>
>> Once again I need help about Kepler's possibilities.
>>
>> I wonder if we can stop a workflow, quit Kepler, then reopen Kepler and
>> resume the workflow. For instance, a workflow which take long time to
>> execute stops because the computer shutdown (for whatever reason we
>> ignore), if we know which actor was executing is there a simple way to
>> resume execution from this actor?
>>
>>
>> Thanks in advance,
>>
>>
>> Quentin BEY -ONERA- France
>>
>> _______________________________________________
>> Kepler-users mailing list
>> Kepler-users at ecoinformatics.org
>> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-users
>>
>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: 
> <http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/kepler-users/attachments/20080826/d72e17d2/attachment.htm>
>
> ------------------------------
>
> _______________________________________________
> Kepler-users mailing list
> Kepler-users at ecoinformatics.org
> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-users
>
>
> End of Kepler-users Digest, Vol 39, Issue 3
> *******************************************
>

[kepler-users] Stop and resume execution

Reply via email to