[jira] [Resolved] (UIMA-5567) UIMA-DUCC: Agent should recover its state after restart

Richard Eckart de Castilho (Jira) Thu, 12 Jan 2023 07:18:28 -0800


     [ 
https://issues.apache.org/jira/browse/UIMA-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Richard Eckart de Castilho resolved UIMA-5567.
----------------------------------------------
    Resolution: Abandoned

DUCC has been retired.

> UIMA-DUCC: Agent should recover its state after restart
> -------------------------------------------------------
>
>                 Key: UIMA-5567
>                 URL: https://issues.apache.org/jira/browse/UIMA-5567
>             Project: UIMA
>          Issue Type: Improvement
>          Components: DUCC
>            Reporter: Jaroslaw Cwiklik
>            Assignee: Jaroslaw Cwiklik
>            Priority: Major
>
> Currently bouncing an agent is not possible. After launching a child process, 
> an agent adds an entry in its Process Inventory and uses a Process handle to 
> call waitFor() to detect child termination. When an agent restarts, it looses 
> all its children and has no means to recover its inventory.
> The proposal is to change this behavior to allow agents to bounce and 
> subsequently recover their child processes.  The bounce may be required to 
> update agent code for example.
> An agent has two options to recover its child processes based on cgroup 
> availability.
> If cgroups are enabled, an agent on startup will read all PIDs from 
> cgroup.proc file. These PIDs reflect running child processes on a node. An 
> agent will create a skeleton inventory entry for each PID and fill in the 
> details when the OR state is received. The agent will use a PID to find a 
> matching process in the OR state. After the new inventory is recovered, the 
> timer based inventory update will fetch PIDs from cgroup.proc file again and 
> reconcile this with its inventory. To detect child process termination an 
> agent will compare PIDs in inventory agains PIDs from cgroup.proc. If a PID 
> is in inventory and not present in cgroup.proc, an agent will mark such 
> process as Stopped if deallocate flag is true, or will mark it as Failed if 
> deallocate flag is false. Any AP process that is no longer running will be 
> marked as Stopped.
> If cgroups are not enabled, an agent will recover its inventory from the OR 
> state. While in this mode, an agent will disable its Rogue Process Detector 
> and not attempt to detect alien processes. The timer based inventory update 
> will fetch PIDs from the OS (using ps command) and reconcile this with its 
> inventory. To detect child process termination an agent will compare PIDs in 
> inventory against PIDs obtained from the OS. If a PID is in inventory and not 
> present in the OS, an agent will mark such process as Stopped if deallocate 
> flag is true, or will mark it as Failed if deallocate flag is false. Any AP 
> process that is no longer running will be marked as Stopped.
> - An agent will no longer call waitFor() on a Process object returned from a 
> ProcessBuilder when a child process is launched
> - An agent will continue to drain stdout and stderr of a child process to 
> prevent the child (duccling) from hanging and to receive OS errors which may 
> occur when exec'ing a process (bad cmd line, etc).  After duccling calls 
> execve(), child process stdout and stderr are redirected to /dev/null and 
> nothing is expected from these streams by the agent. 
> - A child process will communicate state changes and initialization status to 
> an agent via a provided port. Question here is how the port is provided to a 
> child. Currently an agent uses -D (or env) to communicate its listener port 
> to a child. The port is determined when an agent starts up and can 
> potentially be different when an agent is bounced. So we either use a 
> Registry to store agent's port for a child to lookup or insist that an agent 
> has a fixed port. If an agent is bounced and such port is not available what 
> should happen?
> - An agent should support a new flag "-Dclean=[true|false]" which on startup 
> will force an agent to clean up (terminate) all child processes found in 
> cgroups. The code for doing this is already in place and its a default agent 
> procedure on startup. Still a question if this should be a default behavior. 
> Also the same flag should control what happens on agent shutdown. If clean= 
> true, the agent will terminate its children otherwise child processes will 
> remain running.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (UIMA-5567) UIMA-DUCC: Agent should recover its state after restart

Reply via email to