Re: Persisting GFac job data

Saminda Wijeratne Tue, 21 May 2013 08:39:58 -0700

Thats is an excellent idea. We can have the job data field to be the
designated GFac job serialized data. The whatever GFacProvider should
adhere to it.


I'm still inclined to have the rest of the fields to ease of querying for
the required data. For example if we wanted all attempts on executing for a
particular node of a workflow or if we wanted to know which application
descriptions are faster in execution or more reliable etc. we can let the
query language deal with it. wdyt?


On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura <
danushka.menikkumb...@gmail.com> wrote:

> Saminda,
>
> I think the data container does not need to have a generic format. We can
> have a base class that facilitate object serialization/deserialization and
> let specific meta data structure implement them as required. We get the
> Registry API to serialize objects and save them in a meta data table (with
> just two columns?) and to deserialize as they are loaded off the registry.
>
> Danushka
>
>
> On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <samin...@gmail.com
> >wrote:
>
> > It has being apparent more and more that saving the data related to
> > executing a jobs from the GFac can be useful for many reasons such as,
> >
> > debugging
> > retrying
> > to make smart decisions on reliability/cost etc.
> > statistical analysis
> >
> > Thus we thought of saving the data related to GFac jobs in the registry
> in
> > order to facilitate feature such as above in the future.
> >
> > However a GFac job is potentially any sort of computing resource access
> > (GRAM/UNICORE/EC2 etc.). Therefore we need to come up with a generalized
> > data structure that can hold the data of any type of resource. Following
> > are the suggested data to save for a single GFac job execution,
> >
> > *experiment id, workflow instance id, node id* - pinpoint the node
> > execution
> > *service, host, application description ids *- pinpoint the descriptors
> > responsible
> > *local job id* - the unique job id retrieved/generated per execution
> > [PRIMARY KEY]
> > *job data* - data related executing the job (eg: the rsl in GRAM)
> > *submitted, completed time*
> > *completed status* - whether the job was successfull or ran in to errors
> > etc.
> > *metadata* - custom field to add anything user wants
> >
> > Your feedback is most welcome. The API related changes will also be
> > discussed once we have a proper data structure. We are hoping to
> implement
> > this within next few days.
> >
> > Thanks,
> > Saminda
> >
>

Re: Persisting GFac job data

Reply via email to