[ https://issues.apache.org/jira/browse/UIMA-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jerry Cwiklik updated UIMA-5567: -------------------------------- Summary: UIMA-DUCC: Agent should recover its state after restart (was: UIMA-DUCC: Agent should be able to recover its state after restart) > UIMA-DUCC: Agent should recover its state after restart > ------------------------------------------------------- > > Key: UIMA-5567 > URL: https://issues.apache.org/jira/browse/UIMA-5567 > Project: UIMA > Issue Type: Improvement > Components: DUCC > Reporter: Jerry Cwiklik > Assignee: Jerry Cwiklik > Fix For: future-DUCC > > > Currently bouncing an agent is not possible. After launching a child process, > an agent adds an entry in its Process Inventory and uses a Process handle to > call waitFor() to detect child termination. When an agent restarts, it looses > all its children and has no means to recover its inventory. > The proposal is to change this behavior to allow agents to bounce and > subsequently recover their child processes. The bounce may be required to > update agent code for example. > An agent has two options to recover its child processes based on cgroup > availability. > If cgroups are enabled, an agent on startup will read all PIDs from > cgroup.proc file. These PIDs reflect running child processes on a node. An > agent will create a skeleton inventory entry for each PID and fill in the > details when the OR state is received. The agent will use a PID to find a > matching process in the OR state. After the new inventory is recovered, the > timer based inventory update will fetch PIDs from cgroup.proc file again and > reconcile this with its inventory. To detect child process termination an > agent will compare PIDs in inventory agains PIDs from cgroup.proc. If a PID > is in inventory and not present in cgroup.proc, an agent will mark such > process as Stopped if deallocate flag is true, or will mark it as Failed if > deallocate flag is false. Any AP process that is no longer running will be > marked as Stopped. > If cgroups are not enabled, an agent will recover its inventory from the OR > state. While in this mode, an agent will disable its Rogue Process Detector > and not attempt to detect alien processes. The timer based inventory update > will fetch PIDs from the OS (using ps command) and reconcile this with its > inventory. To detect child process termination an agent will compare PIDs in > inventory against PIDs obtained from the OS. If a PID is in inventory and not > present in the OS, an agent will mark such process as Stopped if deallocate > flag is true, or will mark it as Failed if deallocate flag is false. Any AP > process that is no longer running will be marked as Stopped. > - An agent will no longer call waitFor() on a Process object returned from a > ProcessBuilder when a child process is launched > - An agent will continue to drain stdout and stderr of a child process to > prevent the child (duccling) from hanging and to receive OS errors which may > occur when exec'ing a process (bad cmd line, etc). After duccling calls > execve(), child process stdout and stderr are redirected to /dev/null and > nothing is expected from these streams by the agent. > - A child process will communicate state changes and initialization status to > an agent via a provided port. Question here is how the port is provided to a > child. Currently an agent uses -D (or env) to communicate its listener port > to a child. The port is determined when an agent starts up and can > potentially be different when an agent is bounced. So we either use a > Registry to store agent's port for a child to lookup or insist that an agent > has a fixed port. If an agent is bounced and such port is not available what > should happen? -- This message was sent by Atlassian JIRA (v6.4.14#64029)