Hi Kit,

I am so sorry for the late response. Somehow I missed this email
earlier. I am CC'ing Artem Polyakov who wrote the TORQUE plugin for
DMTCP.

Thanks,
Kapil

On Wed, Jan 30, 2013 at 2:52 PM, Kit Menlove <[email protected]> wrote:
> Hi all,
>
>
>
> I’m using a cluster that uses Torque as the batch system.  About half of the
> time, checkpointing fails while copying the temporary output buffer/file
> with the following error:
>
>
>
> [27763] ERROR at connection.cpp:1214 in CopyFile;
> REASON='JASSERT(_real_system(command.c_str()) != -1) failed'
>
>
>
> The generic system command is “cp -f
> /var/spool/torque/spool/jobid.myserver.OU
> /checkpoint_dir/ckpt_myprog_52b886013bb1c112-27763-51060104_files/jobid.myserver.OU_99001”
>
>
>
> I’m using dmtcp_checkpoint (v1.2.6) with the --checkpoint-open-files option.
> Is anyone familiar with Torque enough to suggest why the file might not
> exist at the time of checkpointing, or what else might be the cause of the
> CopyFile failure?
>
>
>
> Thanks,
>
> Kit
>
>
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_jan
> _______________________________________________
> Dmtcp-forum mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>

------------------------------------------------------------------------------
Own the Future-Intel(R) Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest. Compete 
for recognition, cash, and the chance to get your game on Steam. 
$5K grand prize plus 10 genre and skill prizes. Submit your demo 
by 6/6/13. http://altfarm.mediaplex.com/ad/ck/12124-176961-30367-2
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to