There is a new Globus Incubator Project called Falkon whose goal is exactly this, how to handle many small jobs efficiently. The net result is a few orders of magnitude better performance than with existing methods, and it should be able to handle job sizes of 1 second long efficiently (95%+) while managing 100s of processors.
Here are a few links: Web: http://dev.globus.org/wiki/Incubator/Falkon Paper: http://people.cs.uchicago.edu/~iraicu/research/docs/Falkon/Falkon_SC07_v42.pdf Code: svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon At the web above, there are mailing lists you can join, slides you can look through, other relevant papers, instructions on how to setup and run Falkon, and any of our branching work that is related to Falkon. Cheers, Ioan Jan Ploski wrote: > "李辉" <[EMAIL PROTECTED]> schrieb am 11/12/2007 10:54:08 AM: > >> Indeed,we are going to do the work you mentioned that packaging >> small jobs into a “big” Jobs. But It is only designed for a special >> applications. Is it possilbe to implement a more general componet >> which package samll jobs into “big” jobs,and then submit the “big” >> jobs to target site with GRAM ? When the LRM (Local resource >> management,like openbps,torque) receive the packaged “big” Jobs,the >> C application or Scripts on the target sites unpackage them into >> small jobs again. Then these samll jobs will be handled by >> openPbs(jobs may be stored in the job queues) or muti-threads >> program on target sites. >> I do not know weather this is a good idea. Does anybody have do some >> research on this problem,or is there some published papers about them ? >> > > I'm preparing a paper which describes a convenient solution which can > be implemented by job submitters. Some slides are available: > https://bi.offis.de/wisent/tiki-download_file.php?fileId=656 > > The "general component" mentioned within the presentation is also > available: > https://bi.offis.de/wisent/tiki-index.php?page=Condor-GT4-BigJobs > > The page is geared towards Condor users, but my MultiJob.pm module is > not in any way Condor-specific. You still have to "package small jobs > into big jobs" in an application-specific manner, but the module takes > care of all the synchronization required to run the "big job" at the > target site. > > In the longer term, it would be nice to have this functionality in Globus. > AFAICS, the current implementation of Globus multijobs doesn't cut it (or > is > just not documented well enough): I found no way to describe an atomic > multijob consisting of a single-processor job, followed by a > multi-processor > job, followed by a single-processor job at the same site. > > Regards, > Jan Ploski >
