For the Assimilation code I use the full pathname of the binary from
/proc to tell if it's "one of mine". That's not perfect if you're using
an interpreted language. It works quite well for compiled languages.
On 10/20/2014 01:17 PM, Lars Ellenberg wrote:
> Recent discussions with Dejan made me again more prominently aware of a
> few issues we probably all know about, but usually dismis as having not
> much relevance in the real-world.
>
> The facts:
>
> * a pidfile typically only stores a pid
> * a pidfile may "stale", not properly cleaned up
> when the pid it references died.
> * pids are recycled
>
> This is more an issue if kernel.pid_max is small
> wrt the number of processes created per unit time,
> for example on some embeded systems,
> or on some very busy systems.
>
> But it may be an issue on any system,
> even a mostly idle one, given "bad luck^W timing",
> see below.
>
> A common idiom in resource agents is to
>
> kill_that_pid_and_wait_until_dead()
> {
> local pid=$1
> is_alive $pid || return 0
> kill -TERM $pid
> while is_alive $pid ; sleep 1; done
> return 0
> }
>
> The naïve implementation of is_alive() is
> is_alive() { kill -0 $1 ; }
>
> This is the main issue:
> -----------------------
>
> If the last-used-pid is just a bit smaller then $pid,
> during the sleep 1, $pid may die,
> and the OS may already have created a new process with that exact pid.
>
> Using above "is_alive", kill_that_pid() will not notice that the
> to-be-killed pid has actually terminated while that new process runs.
> Which may be a very long time if that is some other long running daemon.
>
> This may result in stop failure and resulting node level fencing.
>
> The question is, which better way do we have to detect if some pid died
> after we killed it. Or, related, and even better: how to detect if the
> process currently running with some pid is in fact still the process
> referenced by the pidfile.
>
> I have two suggestions.
>
> (I am trying to avoid bashisms in here.
> But maybe I overlook some.
> Also, the code is typed, not sourced from some working script,
> so there may be logic bugs and typos.
> My intent should be obvious enough, though.)
>
> using "cd /proc/$pid; stat ."
> -----------------------------
>
> # this is most likely linux specific
> kill_that_pid_and_wait_until_dead()
> {
> local pid=$1
> (
> cd /proc/$pid || return 0
> kill -TERM $pid
> while stat . ; sleep 1; done
> )
> return 0
> }
>
> Once pid dies, /proc/$pid will become stale (but not completely go away,
> because it is our cwd), and stat . will return "No such process".
>
> Variants:
>
> using test -ef
> --------------
>
> exec 7</proc/$pid || return 0
> kill -TERM $pid
> while :; do
> exec 8</proc/$pid || break
> test /proc/self/fd/7 -ef /proc/self/fd/8 || break
> sleep 1
> done
> exec 7<&- 8<&-
>
> using stat -c %Y /proc/$pid
> ---------------------------
>
> ctime0=$(stat -c %Y /proc/$pid)
> kill -TERM $pid
> while ctime=$(stat -c %Y /proc/$pid) && [ $ctime = $ctime0 ] ; do sleep
> 1; done
>
>
> Why not use the inode number I hear you say.
> Because it is not stable. Sorry.
> Don't believe me? Don't want to read kernel source?
> Try it yourself:
>
> sleep 120 & k=$!
> stat /proc/$k
> echo 3 > /proc/sys/vm/drop_caches
> stat /proc/$k
>
> But that leads me to an other proposal:
> store the starttime together with the pid in a pidfile.
>
> For linux that would be:
>
> (see proc(5) for /proc/pid/stat field meanings.
> note that (comm) may contain both whitespace and ")",
> which is the reason for my sed | cut below)
>
> spawn_create_exclusive_pid_starttime()
> {
> local pidfile=$1
> shift
> local reset
> case $- in *C*) reset=":";; *) set -C; reset="set +C";; esac
> if ! exec 3>$pidfile ; then
> $reset
> return 1
> fi
>
> $reset
> setsid sh -c '
> read pid _ < /proc/self/stat
> starttime=$(sed -e 's/^.*) //' /proc/$pid/stat | cut -d' ' -f
> 20)
> >&3 echo $pid $starttime
> 3>&- exec "$@"
> ' -- "$@" &
> return 0
> }
>
> It does not seem possible to cycle through all available pids
> within fractions of time smaller than the granularity of starttime,
> so "pid starttime" should be a unique tuple (until the next reboot --
> at least on linux, starttime is measured as strictly monotonic "uptime").
>
>
> If we have "pid starttime" in the pidfile,
> we can:
>
> get_proc_pid_starttime()
> {
> proc_pid_starttime=$(sed -e 's/^.*) //' /proc/$pid/stat) || return 1
> proc_pid_starttime=$(echo "$proc_pid_starttime" | cut -d' ' -f 20)
> }
>
> kill_using_pidfile()
> {
> local pidfile=$1
> local pid starttime proc_pid_starttime
>
> test -e $pidfile || return # already dead
> read pid starttime <$pidfile || return # unreadable
>
> # check pid and starttime are both present, numeric only, ...
> # I have a version that distinguishes 16 distinct error
> # conditions; this is the short version only...
>
> local i=0
> while
> get_proc_pid_starttime &&
> [ "$starttime" = "$proc_pid_starttime" ]
> do
> : $(( i+=1 ))
> [ $i = 1 ] && kill -TERM $pid
> # MAYBE # [ $i = 30 ] && kill -KILL $pid
> sleep 1
> done
>
> # it's not (anymore) the process we where looking for
> # remove that pidfile.
>
> rm -f "$pidfile"
> }
>
> In other OSes, ps may be able to give a good enough equivalent?
>
> Any comments?
>
> Thanks,
> Lars
>
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/