Okay... totally got behind on this thread... the purpose of this is that regardless of whether we are talking about CAS-PGE running in the resource manager or just some other generic resource manager job, typically any job requires some set of files to exist before it runs, a temp directory to work in, and temp directory cleanup... currently CAS-PGE (or any other job) has to implement this logic (this really should be controlled at a higher level -- this will also avoid directory collisions across jobs as well)... now if CAS-PGE needs a file from the filemgr that is something CAS-PGE should be responsible for.  So in relation to the emails below, pge-config.xml is the file that needs to exist on or be visible to the machine before CAS-PGE is run (CAS-PGE really shouldn't have to stage that file -- it makes for a hacky implementation in CAS-PGE anyway).

I invision such a change to the resource manager would include being able to specify a XML file with a list of need files for the job to run, and at runtime the resource manager would stage those files to the temp working directory it created for the job and then clean them up after job execution.  Something like:

<reqInput class="file.staging.class">
  <file src="" dest="path/relative/to/temp/working/dir/pge-config.xml"/>
</reqInput>

you could imagine that later you could even extend it to support zip packages which it could stage and unzip:
<reqInput class="file.staging.class">
  <file src="" dest="path/relative/to/temp/working/dir/package" postCopyHandler="unzip.logic.class"/>
</reqInput>

This would be ideal for cloud computing since you could then package up your JDK, binaries, etc and the resource manager would make sure they were installed on the machine before executing its job.

-brian

On May 01, 2012, at 11:11 PM, "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov> wrote:

Hey Brian,

Thanks, comments below:

On May 1, 2012, at 5:20 PM, Brian Foster wrote:

> hey guys,
>
> in the wengine branched CAS-PGE, it supported staging the CAS-PGE's XML config file to tmp directory so it could be parsed and then processed and then the staged config file was copied CAS-PGE's working directory (had to be copied later since the working directory information is in the config file). I think this should be something the resource manager should instead support... staging job binaries and config that is need to run the jobs would be a cleaner implementation than what wengine CAS-PGE does... CAS-PGE would still stage Products and ingest them itself (that is a CAS-PGE specified task), however the knowledge of getting CAS-PGE's configuration file which configures it should already be there when it runs... otherwise you kinda need configuration for CAS-PGE configuration (chicken and egg problem)... what you guys think?

Were you seeing this as resource manager functionality in terms of copying CAS-PGE's XML config file? Which one, the pge-config.xml
style file, or the cas-metadata file (dyn-met)? Also, does CAS-PGE have to solve that problem? I mean I think I agree with you in the sense
that this functionality should be provided, but perhaps provided by WorkflowTaskJob (the Workflow implementation of the Resource Manager
job). The issue here is that that's the standard interface between Workflow and Resource manager, and to have a different one for CAS-PGE
would defeat the purpose of having CAS-PGE as a specialized WorkflowTask (which I think it is).

So, this one is a weird one. My gut feeling is to say -- does CAS-PGE even need to be that meta? Isn't solving the file staging
for input products enough, and then saying that the system deployment has to be accessible via NFS, or HDFS, or some global
mount point? We are still using that paradigm fairly commonly e.g., in the Snow project at JPL, in the Square Kilometre Array efforts,
and on EDRN.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to