Re: Stale pid file problem, and a proposed solution

2020-01-21 Thread Marti Farrelly via rsync
On Mon 20 Jan 2020, 16:40 Joseph C. Sible via rsync, 
wrote:

> Today, rsyncd manages its pid file by open()ing it with O_CREAT|O_EXCL
> at startup, and then unlink()ing it at shutdown. If the open() fails
> at startup because the file already exists, then rsyncd will assume
> another instance of itself is already running and not start.
>
> However, there's a problem with this approach: if rsyncd is terminated
> without being able to clean up (e.g., kill -9, or the server losing
> power), then the stale pid file will prevent rsyncd from ever
> restarting until an administrator manually intervenes.
>
> I propose a solution to this problem: open the file without O_EXCL,
> then try to take an exclusive lock on the whole file (we already use
> file locks to limit max connections, so this change wouldn't add any
> new requirements to rsyncd). If we can't get the lock, then abort, and
> if we can, then truncate the file and write our PID into it. Since
> locks never outlive the process that took them, this fixes the stale
> pid file problem.
>
> Does this seem like a reasonable idea? If so, I'll write and submit a
> patch that implements it.
>
> Joseph C. Sible
>
> --
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
>
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Stale pid file problem, and a proposed solution

2020-01-20 Thread raf via rsync
Joseph C. Sible via rsync wrote:

> Today, rsyncd manages its pid file by open()ing it with O_CREAT|O_EXCL
> at startup, and then unlink()ing it at shutdown. If the open() fails
> at startup because the file already exists, then rsyncd will assume
> another instance of itself is already running and not start.
> 
> However, there's a problem with this approach: if rsyncd is terminated
> without being able to clean up (e.g., kill -9, or the server losing
> power), then the stale pid file will prevent rsyncd from ever
> restarting until an administrator manually intervenes.
> 
> I propose a solution to this problem: open the file without O_EXCL,
> then try to take an exclusive lock on the whole file (we already use
> file locks to limit max connections, so this change wouldn't add any
> new requirements to rsyncd). If we can't get the lock, then abort, and
> if we can, then truncate the file and write our PID into it. Since
> locks never outlive the process that took them, this fixes the stale
> pid file problem.
> 
> Does this seem like a reasonable idea? If so, I'll write and submit a
> patch that implements it.
> 
> Joseph C. Sible

I think that's very sensible. It's what my daemon program does
(libslack.org/daemon) to ensure a single instance of a daemon.
It probably means that the pidfile shouldn't be on an NFS-mounted
file system but hopefully that won't be a problem for anyone.
Or could that be a problem?

cheers,
raf


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Stale pid file problem, and a proposed solution

2020-01-20 Thread Joseph C. Sible via rsync
Today, rsyncd manages its pid file by open()ing it with O_CREAT|O_EXCL
at startup, and then unlink()ing it at shutdown. If the open() fails
at startup because the file already exists, then rsyncd will assume
another instance of itself is already running and not start.

However, there's a problem with this approach: if rsyncd is terminated
without being able to clean up (e.g., kill -9, or the server losing
power), then the stale pid file will prevent rsyncd from ever
restarting until an administrator manually intervenes.

I propose a solution to this problem: open the file without O_EXCL,
then try to take an exclusive lock on the whole file (we already use
file locks to limit max connections, so this change wouldn't add any
new requirements to rsyncd). If we can't get the lock, then abort, and
if we can, then truncate the file and write our PID into it. Since
locks never outlive the process that took them, this fixes the stale
pid file problem.

Does this seem like a reasonable idea? If so, I'll write and submit a
patch that implements it.

Joseph C. Sible

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html