Re: [gc3pie] Jobs get stuck in terminating stage and do not disappear from VM after being killed.

Riccardo Murri Thu, 06 Dec 2018 01:36:02 -0800

Dear Hanna,

sorry for my late reply -- I was busy moving to a new apartment...


Coming to your GC3Pie issue:

It indeed seems to be a memory issue, one of the jobs runs out of memory
> and the rest of the jobs on that instance terminate as well. I can cancel
> those jobs with 'gkill'. However, the jobs are not removed from the
> instance. And although GC3Pie sends new jobs to such an instance, it
> doesn't make full use of it (e.g. only 3 instead of 4 jobs are running).
> Except from increasing the memory per core, is there something I can do to
> at least probably remove the terminated jobs?
>

If a job is marked as TERMINATED, it will be automatically removed on the
next scheduling cycle.  The reason a VM is not fully utilized has probably
more to do with memory requirements: if you're running on UZH Science
Cloud, then every CPU has theoretically max 4GB of memory, but the
practical limit is a bit lower, as memory is used by Linux and the OS'
background processes.  So if you start your jobs with, say a requirement
for 4000MB of memory each that might work well when the VMs are fresh
booted but may fail later on as, e.g., only 3750MB are free for the last
job...  If this is the case, you should see an explicit message in this
regard in the DEBUG level log.

It is also possible that your jobs create many processes and some of them
are still running after the main application has been killed; you can check
the output of command `ps fauxww` to verify if this is the case (post it
here if you do not know how to interpret it).  If this is the case, you
would see the VMs' available memory diminish over time, as more and more
jobs are executed.  Rebooting a VM (when no jobs are running) would solve
it.

Ciao,
R

-- 
You received this message because you are subscribed to the Google Groups 
"gc3pie" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to gc3pie+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [gc3pie] Jobs get stuck in terminating stage and do not disappear from VM after being killed.

Reply via email to