Hi @groden,
I am running hadoop 2.7.3 in pseudo distribution mode on ubuntu 16.04
through a Virtual Machine. I am facing the same issue. My ubuntu logs
off whenever i submit a new hadoop job. I would like to try your
workaround. Can you provide me a link/explain on how to download and
override procps-3.3.10 source code.
I am a beginner with ubuntu. Please help!
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to procps in Ubuntu.
https://bugs.launchpad.net/bugs/1610499
Title:
hadoop crash: /bin/kill in ubuntu16.04 has bug in killing process
group
Status in procps package in Ubuntu:
Confirmed
Bug description:
when i run hadoop in ubuntu 16.04, ssh will exit, all process which
belong to hadoop user will be killed ,through debug ,i found the
/bin/kill in ubuntu16.04 has a bug , it has bug in killing process
group .
Ubuntu version is:
Description:Ubuntu 16.04.1 LTS
Release:16.04
(1)The way to repeat this bug
It is easy to repeat this bug , run “/bin/kill -15 -12345” or any like
“/bin/kill -15 -1” in ubuntu16.04 , it will kill all the process .
(2)Cause analysis
The code of /bin/kill in ubuntu16.04 come from procps-3.3.10 , when I run
“/bin/kill -15 -1” , it actually send signal 15 to -1 ,
-1 mean it will kill all the process .
(3)The bug in procps-3.3.10/skill.c ,I think the code "pid =
(long)('0' - optopt) " is not right .
static void __attribute__ ((__noreturn__)) kill_main(int argc, char
**argv)
{
case '?':
if (!isdigit(optopt)) {
xwarnx(_("invalid argument %c"), optopt);
kill_usage(stderr);
} else {
/* Special case for signal digit negative
* PIDs */
pid = (long)('0' - optopt);
if (kill((pid_t)pid, signo) != 0)
exitvalue = EXIT_FAILURE;
exit(exitvalue);
}
loop=0;
}
(4) the cause
sometimes when the resource is tight or a hadoop container lost connection
in sometime, the nodemanager will kill this container , it send a signal to
kill this jvm process ,it is a normal behavior for hadoop to kill a task and
then reexecute this task. but with this kill bug ,it kill all the process
belong to a hadoop user .
(5) The way to workaround
I copy /bin/kill in ubuntu14.04 to override /bin/kill in ubuntu16.04, it is
ok in this way . I also think it is better to ask procps-3.3.10 maintainer to
solve their bug,but i don't know how to contact them .
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/procps/+bug/1610499/+subscriptions
--
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help : https://help.launchpad.net/ListHelp