Hi Orion,
On Wed, Oct 10, 2012 at 5:22 PM, Orion Poplawski <[email protected]>wrote:
> I've been working on integrating dmtcp into gridengine. I've gotten fairly
> far, but I get the following when trying to restart a job saved on a
> different
> host:
>
> [16690] WARNING at connection.cpp:1237 in openFile; REASON='JWARNING(false)
> failed'
> _path = /var/spool/gridengine/pollux/job_scripts/27709
> Message: Still waiting for the file to be created/restored by some other
> process
>
> This is shell script being executed by the job, e.g.:
>
> bash 17202 orion 255r REG 8,2 64 529009
> /var/spool/gridengine/castor/job_scripts/27710
>
> I can think of a couple ways to handle this:
>
> - copy the job script to a different location on a shared network
> filesystem
> before starting the original job that will be the same from every machine.
>
This is the easiest way to handle this. Another way would be to put this
file in the path relative to the application. That way, DMTCP will try to
use the relative path if the abspath is not found.
- Perhaps a dmtcp plugin that would transform the name? Is that possible?
>
That's a good idea but with a catch. Atleast in the current svn, I am not
sure how easy would it be to integrate the plugin. The hardship comes
mainly due to shared file descriptors. In case of non-shared file
descriptors, we can put some hooks to enable us to write the plugin.
Kapil
>
> Any other ideas?
>
> --
> Orion Poplawski
> Technical Manager 303-415-9701 x222
> NWRA, Boulder Office FAX: 303-415-9702
> 3380 Mitchell Lane [email protected]
> Boulder, CO 80301 http://www.nwra.com
>
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Dmtcp-forum mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum