Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing

Alan Robertson Tue, 21 Oct 2014 05:18:08 -0700

On 10/21/2014 2:29 AM, Lars Ellenberg wrote:

On Mon, Oct 20, 2014 at 11:21:36PM +0200, Lars Ellenberg wrote:

On Mon, Oct 20, 2014 at 03:04:31PM -0600, Alan Robertson wrote:

On 10/20/2014 02:52 PM, Alan Robertson wrote:

For the Assimilation code I use the full pathname of the binary from
/proc to tell if it's "one of mine".  That's not perfect if you're using
an interpreted language.  It works quite well for compiled languages.

It works just as well (or as bad) from interpreted languages:
readlink /proc/$pid/exe
(very old linux has a fsid:inode encoding there, but I digress)


But that does solve a different subset of problems,
has race conditions in itself, and breaks if you have updated the binary
since start of that service (which does happen).

Sorry, I lost the original.
Alan then wrote:

It only breaks if you change the *name* of the binary.  Updating the
binary contents has no effect.  Changing the name of the binary is
pretty unusual - or so it seems to me.  Did I miss something?

And if you do, you should stop with the binary with the old version and
start it with the new one.  Very few methods are going to deal well with
radical changes in the service without stopping it with the old script,
updating, and starting with the new script.

Well, the "pid starttime" method does...

I don't believe I see the race condition.

Does not matter.

It won't loop, and it's not fooled by pid wraparound.  What else are you
looking for? [Guess I missed something else here]

pid + exe is certainly is better than the pid alone.
It may even be "good enough".

But it still has shortcomings.

/proc/pid/exe is not stable,
(changes to "deleted" if the binary is deleted)
could be accounted for.

/proc/pid/exe links to the interpreter (python, bash, java, whatever)

Even if it is a "real" binary, (pid, /proc/pid/exe) is
still NOT unique for pid re-use after wrap around:
think different instances of mysql or whatever.
(yes, it gets increasingly unlikely...)

For most cases, a persistent daemon is a compiled language. Of coursenot all, but all the ones I personally care about ;-)


However, (pid, starttime) *is* unique (for the lifetime of the pidfile,
as long as that is stored on tmpfs resp. cleared after reboot).
(unless you tell me you can eat through pid_max, or at least the
currently unused pids, within the granularity of starttime...)

So that's why I propose to use (pid, starttime) tuple.

If you see problems with (pid, starttime), please speak up.
If you have something *better*, please speak up.
If you just have something "different",
feel free to tell us anyways :-)

The contents of the pidfile are specified by the LSB (or at least theywere at some time in the past) That's why I use just the pid. Thecurrent version specifies that the first line of a pidfile consists ofone or more numbers, and any subsequent lines should be ignored. If yougo the way you do, I'd suggest other data be put on a separate lines.

You might compare what you're doing tohttp://refspecs.linuxbase.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/iniscrptfunc.html

Instead of storing the start time explicitly, you could touch the pidfile's creation time to match that of the process ;-) That's harder todo in the shell, unfortunately...


    -- Alan
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing

Reply via email to