Hi all, I'm back again, I'm extremely interested also in these extension actors that are able to manage the execution of a Kepler workflow.
But where can I find the COMAD and "Smart rerun" actors? I haven't been able to allocate them in the Kepler repository. All the best, Josep Maria, > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Kepler-users digest..." > > > Today's Topics: > > 1. Stop and resume execution... (Quentin BEY) > 2. Re: Stop and resume execution... (Bertram Ludaescher) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 26 Aug 2008 11:22:04 +0200 > From: Quentin BEY <quentin.bey at onera.fr> > Subject: [kepler-users] Stop and resume execution... > To: kepler-users at ecoinformatics.org > Message-ID: <1219742524.20366.16.camel at talence> > Content-Type: text/plain > > Hi all, > > Once again I need help about Kepler's possibilities. > > I wonder if we can stop a workflow, quit Kepler, then reopen Kepler and > resume the workflow. For instance, a workflow which take long time to > execute stops because the computer shutdown (for whatever reason we > ignore), if we know which actor was executing is there a simple way to > resume execution from this actor? > > > Thanks in advance, > > > Quentin BEY -ONERA- France > > > > ------------------------------ > > Message: 2 > Date: Tue, 26 Aug 2008 05:59:27 -0700 > From: "Bertram Ludaescher" <ludaesch at ucdavis.edu> > Subject: Re: [kepler-users] Stop and resume execution... > To: "Quentin BEY" <quentin.bey at onera.fr> > Cc: kepler-users at ecoinformatics.org > Message-ID: > <657a810a0808260559x305d349ahc64dcdf60e95fd88 at mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Hi Quentin: > > Interesting question! There are several answers to this. > > First, "knowing which actor was executing" is generally not enough to resume > execution: > Consider a workflow executing with a PN (process network) director. Then all > actors execute as independent processes (Java threads really), so all are > executing simultaneously. > In contrast, a (sub-)workflow executing under SDF or DDF will be executed > within a single thread, so at most one actor is executing at a given time in > such a workflow. > (SDF creates a schedule "statically", i.e., prior to workflow execution, > while DDF figures out which actors are ready to fire at runtime, then > selects one and repeats) > > But what you really need is to maintain the "workflow state" (or some part > of it) persistently, so that you can resume a stopped or failed workflow. > One general way to do this is checkpointing, i.e., writing relevant > information out to disk at certain times. While checkpointing can be very > costly in general applications, in scientific workflows it can often be > easier to do so, since usually components are loosely coupled, all > information flow is visible via the channels (unless you do some > side-effects outside the model), and actors are often (but not always) > stateless. > > I'm aware of several extensions that allow one to resume Kepler workflows (I > think Ptolemy might have further ways): > > -- One system has been called "smart rerun" (e.g. Ilkay Altintas or Dan > Crawl can point you to it) and allows you to rerun a workflow with modified > inputs and/or parameter settings, avoiding to re-execute parts that are > "unchanged". I don't recall whether it handles only successful workflow runs > (and optimizes their re-execution under change) or also partial (aborted) > runs. > > -- Norbert Podhorszki has developed workflows where actors themselves write > out to disk some small information (in his case: remote commands that > successfully terminated) which is used upon re-running the workflow to only > execute the commands not yet successfully completed previously. Call this > the "custom checkpointing" solution (instead of a general system extension, > individual actors or workflows decide what to checkpoint; more work, but it > can be more efficient to know what is needed to rerun). > > -- One new director and workflow programming model called COMAD makes > visible most if not all of the execution state visible "on the wire" by > streaming nested data collections between actors. Like in other approaches, > the information on the wire can be written to disk and the workflow resumed > based on this info. > > All these approaches are based on record information during runtime on disk > (sometimes called 'provenance information'), which is then used when > resuming the workflow. > > The above options are not the only ones (e.g. Ptolemy probably has > additional ways to restart a failed model). Which variant to choose (or > which new variant to develop) may depend on, among other things: > -- the size of data flowing through channels (or the availability of > persistent ids to large chunks of data) > -- whether actors are stateful or stateless > -- the director(s) programming/execution model being used > > Bertram > > > On Tue, Aug 26, 2008 at 2:22 AM, Quentin BEY <quentin.bey at onera.fr> wrote: > >> Hi all, >> >> Once again I need help about Kepler's possibilities. >> >> I wonder if we can stop a workflow, quit Kepler, then reopen Kepler and >> resume the workflow. For instance, a workflow which take long time to >> execute stops because the computer shutdown (for whatever reason we >> ignore), if we know which actor was executing is there a simple way to >> resume execution from this actor? >> >> >> Thanks in advance, >> >> >> Quentin BEY -ONERA- France >> >> _______________________________________________ >> Kepler-users mailing list >> Kepler-users at ecoinformatics.org >> http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-users >> >> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <http://mercury.nceas.ucsb.edu/ecoinformatics/pipermail/kepler-users/attachments/20080826/d72e17d2/attachment.htm> > > ------------------------------ > > _______________________________________________ > Kepler-users mailing list > Kepler-users at ecoinformatics.org > http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/kepler-users > > > End of Kepler-users Digest, Vol 39, Issue 3 > ******************************************* >

