[
https://issues.apache.org/jira/browse/HADOOP-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707242#action_12707242
]
Vinod K V commented on HADOOP-5420:
-----------------------------------
Looked at the patch. Some comments:
- The patch needs sync-up with the latest patch on HADOOP-5771
- Overall, I think the terms terminate and kill are better than the current
usage of kill and destroy.
TaskController.java
- killTask, destroyTask can better be terminateTask, killTask. And accordingly
javadoc should be more explicit.
- killTaskJVM: Null check of context.task belongs to LinuxTaskController, need
it to be moved there.
DefaultTaskController.java
- For WINDOWS process.destroy() is again not needed in destroyTask()
LinuxTaskController.java
- We completely bring down the TT if we fail to write the pidFile to any one
of the disks. This should change. We should fail only if we cannot write to any
of the disks.
- javadoc of setup() is incomplete. It should also quote the additional steps
of writing TT pid-file.
- Log message at +384 can be better: LOG.info("Written :: " + pid + " to file
" + ttPidFile);
- TT's pidfile should not be made world writable (changeDirectoryPermissions
does this). In fact, it can be set 000 permissions right-away. This would still
work as we run as root to read it.
- killHelper can better be something like finishTask()
- What should we do when the commands for SIGTERM/SIGKILL fail?
- buildTaskControllerExecutor() : For taskControllerCmd, you can instead use
an Array
Why the usage of
command.ordinal() in this method?
- I think it would be clear to have a separate buildKillTaskArgs(). It would
definitely help to also 'javadoc' the task-controller's command-line syntax for
buildLaunchTaskArgs and this new buildKillTaskArgs(). With this you can
completely remove the killHelper method()
ProcessTree.java
- killProcessGroup() can be renamed to terminateProcessGroup(), killProcess()
to terminateProcess(), sigKillProcessGroup to killProcessGroup() and
sigKillProcess to killProcess().
- Also, I think, with these semantics, we need a
ProcessTree.isProcessGroupAlive(pgrpid) along with the current
ProcessTree.isAlive(pid)
main.c
- The argc length check should be command specific.
- job_pid var can better be task_pid
- You have a printf statement in the case DESTROY_TASK_JVM. This can be
removed or redirected to log if needed.
- Return values of verify_parent like UNABLE_TO_READ_TT_PID_FILE,
OUT_OF_MEMORY are lost, as we are only returning
INVALID_PROCESS_LAUNCHING_TASKCONTROLLER in error scenarios. Can't we do
something here?
taskcontroller.c
- Documentation of kill_user_task() should be fixed. Currently, it only refers
to SIGTERM.
- The get_pid_path() function can be removed. Also references to it from code
documentation of run_task_as_user, kill_user_task can be removed too along with
the TT_SYS_DIR var from taskcontroller.h
- +343 "//Don't continue if the process is not alive anymore." should instead
be " // Don't continue if the process-group is not alive anymore."
TestKillSubProcessesWithTaskController
- Rename to TestKillSubProcessesWithLinuxTaskController
- Remove unused imports
- Add Apache license header and add javadoc similar to other tests with
LinuxTaskController
- Just like in HADOOP-5771, we may want to verify job-ownership in this test
too.
- The test ran successfully, but it is running the jobs as the same user as
the user running the cluster. Need to fix this.
- Also, I see a lot of messages like this : "WARN mapred.LinuxTaskController
(LinuxTaskController.java:destroyTask(454)) - Exception thrown while sending
destroy to the Task VM org.apache.hadoop.util.Shell$ExitCodeException:". Need
to debug the cause for these.
We also need a test to verify cleanup of tasks that ignore SIGTERM.
> Support killing of process groups in LinuxTaskController binary
> ---------------------------------------------------------------
>
> Key: HADOOP-5420
> URL: https://issues.apache.org/jira/browse/HADOOP-5420
> Project: Hadoop Core
> Issue Type: Bug
> Reporter: Sreekanth Ramakrishnan
> Assignee: Sreekanth Ramakrishnan
> Attachments: hadoop-5420-1.patch, hadoop-5420-2.patch,
> hadoop-5420-3.patch, hadoop-5420.patch
>
>
> Support setsid based kill in LinuxTaskController.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.