On 10/10/2012 03:40 PM, Kapil Arya wrote: > Hi Orion, > > On Wed, Oct 10, 2012 at 5:22 PM, Orion Poplawski <[email protected] > <mailto:[email protected]>> wrote: > > I've been working on integrating dmtcp into gridengine. I've gotten > fairly > far, but I get the following when trying to restart a job saved on a > different > host: > > [16690] WARNING at connection.cpp:1237 in openFile; > REASON='JWARNING(false) > failed' > _path = /var/spool/gridengine/pollux/job_scripts/27709 > Message: Still waiting for the file to be created/restored by some other > process > > This is shell script being executed by the job, e.g.: > > bash 17202 orion 255r REG 8,2 64 529009 > /var/spool/gridengine/castor/job_scripts/27710 > > I can think of a couple ways to handle this: > > - copy the job script to a different location on a shared network > filesystem > before starting the original job that will be the same from every machine. > > > This is the easiest way to handle this. Another way would be to put this file > in the path relative to the application. That way, DMTCP will try to use the > relative path if the abspath is not found.
Yeah, that what I've opted for. I've put it with where I'm saving the restart file. Thanks! -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA, Boulder Office FAX: 303-415-9702 3380 Mitchell Lane [email protected] Boulder, CO 80301 http://www.nwra.com ------------------------------------------------------------------------------ Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev _______________________________________________ Dmtcp-forum mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
