Hi, guys -- A few questions:
1) I originally posted about my restart issues on 2/26/15. I am wondering how
things are
coming w.r.t an updated version :-) ...
2) Am I right in guessing that a file on a remote drive is identified in part
by the uuid
of the local mount point, and in part by information about the remote
drive? I ask
because it would appear that if a Grid Engine compute node crashes and gets
rebuilt,
restart of a job previously running there seems always to fail. Note that
a rebuild
ends up giving the local disk of the compute node a new uuid, since the
disk image is
wiped and built from scratch. However, the *remote* files remain the same.
It would
seem that in such a case the remote identity is what should matter, not the
local
path name ...
In fact, maybe the situation is that the file is identified by the uuid of
the disk
for '/' ... or something like that? I claim that's broken for other
mounts, such as
the NFS mounts typical for my files ... Anyway, I am wondering how this
works, and
how it is intended to work.
3) For java jar files, it appears that every checkpoint makes another copy of
an open
jar file -- even when (as far as I know) such files are read only. Now the
files are
not big and I have plenty of storage, but it makes me wonder about the
logic of
the code in DMTCP ...
Regards -- Eliot Moss
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum