Hi, guys -- A few questions:

1) I originally posted about my restart issues on 2/26/15.  I am wondering how 
things are
    coming w.r.t an updated version :-) ...

2) Am I right in guessing that a file on a remote drive is identified in part 
by the uuid
    of the local mount point, and in part by information about the remote 
drive?  I ask
    because it would appear that if a Grid Engine compute node crashes and gets 
rebuilt,
    restart of a job previously running there seems always to fail.  Note that 
a rebuild
    ends up giving the local disk of the compute node a new uuid, since the 
disk image is
    wiped and built from scratch.  However, the *remote* files remain the same. 
 It would
    seem that in such a case the remote identity is what should matter, not the 
local
    path name ...

    In fact, maybe the situation is that the file is identified by the uuid of 
the disk
    for '/' ... or something like that?  I claim that's broken for other 
mounts, such as
    the NFS mounts typical for my files ...  Anyway, I am wondering how this 
works, and
    how it is intended to work.

3) For java jar files, it appears that every checkpoint makes another copy of 
an open
    jar file -- even when (as far as I know) such files are read only.  Now the 
files are
    not big and I have plenty of storage, but it makes me wonder about the 
logic of
    the code in DMTCP ...

Regards -- Eliot Moss

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to