[ https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063895#comment-16063895 ]
James Peach edited comment on MESOS-7160 at 6/26/17 10:20 PM: -------------------------------------------------------------- This morning, my VM doesn't reproduce this, however it definitely happened :) The normal code path is that the {{exec}} failure causes an abort. The supervisor then gets SIGTERM (need to read more code to see why). The signal handler it has installed issued SIGKILL. If the SIGTERM delivery is delayed, then the second abort in the supervisor could trigger. {noformat} [pid 2738] execve("/bin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 vars */ <unfinished ...> [pid 2737] wait4(2738, <unfinished ...> [pid 2738] <... execve resumed> ) = -1 ENOENT (No such file or directory) [pid 2738] execve("/usr/sbin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 vars */) = -1 ENOENT (No such file or directory) [pid 2738] execve("/usr/bin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 vars */) = -1 ENOENT (No such file or directory) [pid 2738] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=2738, si_uid=0} --- ... [pid 2737] <... wait4 resumed> 0x7f27e8901f44, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) [pid 2737] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=2708, si_uid=0} --- [pid 2738] +++ killed by SIGKILL +++ [pid 2737] +++ killed by SIGKILL +++ {noformat} was (Author: jamespeach): This morning, my VM doesn't reproduce this, however it definitely happened :) The normal code path is that the {{exec}} failure causes an abort. The supervisor then gets SIGTERM (need to read more code to see why). The signal handler it has installed issued SIGKILL. If the SIGTERM delivery is delayed, then the second abort in the supervisor could trigger. {{noformat}} [pid 2738] execve("/bin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 vars */ <unfinished ...> [pid 2737] wait4(2738, <unfinished ...> [pid 2738] <... execve resumed> ) = -1 ENOENT (No such file or directory) [pid 2738] execve("/usr/sbin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 vars */) = -1 ENOENT (No such file or directory) [pid 2738] execve("/usr/bin/perf", ["perf", "--version"], 0x4bb6fc0 /* 21 vars */) = -1 ENOENT (No such file or directory) [pid 2738] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=2738, si_uid=0} --- ... [pid 2737] <... wait4 resumed> 0x7f27e8901f44, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) [pid 2737] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=2708, si_uid=0} --- [pid 2738] +++ killed by SIGKILL +++ [pid 2737] +++ killed by SIGKILL +++ {{noformat}} > Parsing of perf version segfaults > --------------------------------- > > Key: MESOS-7160 > URL: https://issues.apache.org/jira/browse/MESOS-7160 > Project: Mesos > Issue Type: Bug > Components: test > Reporter: Benjamin Bannier > Assignee: Andrei Budnik > > Parsing the perf version [fails with a segfault in ASF > CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/], > {noformat} > E0222 20:54:03.033464 805 perf.cpp:237] Failed to get perf version: Failed > to execute perf: terminated with signal Aborted (core dumped) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)