The MCF Agents process shouldn't get hung up under normal operation.  If it
encounters a problem that may call its continued activity into question, it
shuts itself down.

There are two situations where the process could theoretically hang.

The first is when you are using file-based synch, and you forcibly kill
another ManifoldCF process so that it doesn't clean up locks after itself.
But if you are using Zookeeper, it should not ever fail to clean up after a
process is killed.

The second situation is when certain database conditions arise, and MCF
decides it needs to reset all its worker threads.  When it does this, it
blocks all worker threads from proceeding until it reaches a point where
they are all quiescent, and then it resets all of them at the same time.
When it is waiting for all threads to shut down in this way, if that never
completely happens, MCF will be paused forever.

What I'd like to do in that case is get a thread dump of the agents
process.  That will tell us what the problem is.

Karl


On Tue, Mar 2, 2021 at 12:53 PM <julien.massi...@francelabs.com> wrote:

> Hi Karl,
>
> I recently faced a weird case where a job in a "running" state was not
> doing
> anything for several hours. The MCF agent process was up but neither the
> Simple History nor the logs showed any activity. Since we could not wait
> more than 12 hours, we decided to restart the agent, and the job "went back
> on rails" and continued its work normally.
> In order to avoid as much as possible the need for such a manual
> intervention, I would have two questions:
> - Is there a way to "test" the agent process ? Like a "process ping" which
> can detect if the process is doing or ready to do something ? And if not,
> is
> there a way to implement such thing easily ? The idea being to make the
> detection and restart automatically rather than manually.
> - Knowing that we have activated the debug log level, would you have
> recommendation on what to look at to find a potential cause of such an
> issue
> ?
>
> Regards,
> Julien Massiera
>
>

Reply via email to