Re: bug in pw, -STABLE [patch]
On Sun, Jun 23, 2002 at 05:14:58PM -0700 I heard the voice of Paul Herman, and lo! it spake thus: Clearly, at the root of our disagreement is what we both perceive the problem to be. Oh, certainly; that's what makes it fun :) I don't see problems in the current implementation, aside from bugs that lead to unexpected behavior, i.e. passwd file corruption. You see the problem as a deficiency in the implementation itself, and wish to protect the user from shooting themselves in the foot. Well, we're in violent agreement on the first one. I'm just using that as an opportunity to smack down the second and kill two birds with one stone. Not only do I think that's impossible[*], I choose to fight for my right to shoot myself in the foot as quickly and efficiently as possible, but that's where we'll disagree, and I'll just leave it at that and wish you a good night's sleep. Wouldn't work; the determined foot-screwer would simply cd /etc mv aliases somethingelse ln master.passwd aliases vi aliases. Since we don't have mandatory file locking, it's neither possible nor my intention to prevent people from doing things intentionally; I'm just trying to remove the ways they can do it accidentally using the tools we provide. I'm all for leaving options for people to intentionally de-toe, or convince the system that they know what they're doing while they shoot caterpillars between their toes. Your approach will (I think; I haven't tested, so it's tough to be sure) solve the problem that sparked this, which is that pw(8) has a race condition allowing multiple invocations to step up each other's toes. However, it doesn't do anything about the larger problem of maintaining consistency in the passwd subsystem as a whole, which is where I'm aiming. I also think my approach (once documented, at any rate) would jump out a bit more at people writing programs that adjust the auth information. And, additionally, we took the opportunity to take one MORE step back from the problem, and implement the pid_*() functions which abstract the implementation of this sort of locking, making is easy to apply in other places. Besides, this is a codeocracy; I changed more lines of code than you did, to say nothing of a MANPAGE! My solution MUST be better! 8-} [*] Patch to vi refusing to edit filenames containing master.passwd withheld by request. ;-) rm -f /usr/bin/vi ln -s /usr/local/bin/pico /usr/bin/vi That'll shake people up enough that they won't edit anything. I know it would have ME waking up screaming... -- Matthew Fuller (MF4839) | [EMAIL PROTECTED] Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ The only reason I'm burning my candle at both ends, is because I haven't figured out how to light the middle yet To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bug in pw, -STABLE [patch]
On Sun, Jun 23, 2002 at 05:07:40PM -0700 I heard the voice of Terry Lambert, and lo! it spake thus: The problem with your proposed patch is that it breaks the ability to allow authentication against the database while it is undergoing modification, which may be a prolonged period. Would it? For starters, this locks master.passwd, not spwd.db, so any well-behaved program that used getpw*() like a good little monkey wouldn't even notice it. In fact, with Paul's change and the rename() alteration, it'd be BETTER for programs that tried to parse master.passwd, because as it currently stands, there are times when it's incomplete (because of the line-by-line copy, which caused us both a few moments of illness when we noticed it). -- Matthew Fuller (MF4839) | [EMAIL PROTECTED] Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ The only reason I'm burning my candle at both ends, is because I haven't figured out how to light the middle yet To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bug in pw, -STABLE [patch]
On Mon, 24 Jun 2002, Geoffrey C. Speicher wrote: So we either need to have a compelling solution or get a committer to step in and make up our minds for us. I think the best thing to do is file a PR for this. -Paul. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bug in pw, -STABLE [patch]
On Mon, 24 Jun 2002, Paul Herman wrote: On Mon, 24 Jun 2002, Geoffrey C. Speicher wrote: So we either need to have a compelling solution or get a committer to step in and make up our minds for us. I think the best thing to do is file a PR for this. With or without a patch? Geoff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bug in pw, -STABLE [patch]
Matthew D. Fuller wrote: Terry Lambert, and lo! it spake thus: The problem with your proposed patch is that it breaks the ability to allow authentication against the database while it is undergoing modification, which may be a prolonged period. Would it? For starters, this locks master.passwd, not spwd.db, so any well-behaved program that used getpw*() like a good little monkey wouldn't even notice it. In fact, with Paul's change and the rename() alteration, it'd be BETTER for programs that tried to parse master.passwd, because as it currently stands, there are times when it's incomplete (because of the line-by-line copy, which caused us both a few moments of illness when we noticed it). Whatever locking protocol you come up with needs to be partcipated in by everyone. Historically, passwd file manipulations have used link count locking. The real issue here is that there are a number of incremental file update utilities for managmenet of very large installations, and a large number of historical programs from places like comp.unix.sources that use file locks, rather than advisory range locks. Any locking protocol using advisory locking must lock out readers while there are writers. What you are really locking (or should be) is directory entries here, which means using the same exclusion space for the locking as you use for the update operation, which is a lock on the parent directory (e.g. as is used internally to the FS rename operation). --- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bug in pw, -STABLE [patch]
On Mon, 24 Jun 2002, Paul Herman wrote: On Mon, 24 Jun 2002, Geoffrey C. Speicher wrote: So we either need to have a compelling solution or get a committer to step in and make up our minds for us. I think the best thing to do is file a PR for this. More than one person has already beaten us to the punch: bin/23501, closed on Fri Jun 22 08:43:09 PDT 2001, describes the exact same problem with pw(8), and contains a patch that was verified to fix the problem, but it was apparently never committed. bin/38676, submitted on Tue May 28 19:40:08 PDT 2002, requests that the original patch be committed, but has gotten no feedback. Next? Geoff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bug in pw, -STABLE [patch]
On Mon, Jun 24, 2002 at 09:56:31AM -0700 I heard the voice of Paul Herman, and lo! it spake thus: On Mon, 24 Jun 2002, Geoffrey C. Speicher wrote: So we either need to have a compelling solution or get a committer to step in and make up our minds for us. I think the best thing to do is file a PR for this. I'll probably do so (referencing also the previous ones) once I finish testing this; the downside is that I may not get a chance to plunk a few straight solid hours into it until this weekend (darn that paying work stuff!), so we may have to find new realms of the subject to flam^Wdebate until then, just to keep the ol' adrenaline flowing. Regardless of the outcome of the passwd issue, I think the pid_*() functions in libutil will be useful for other things that do those kind of functions. Just glancing at my /var/run, I can see cron, inetd, mountd, moused, and syslogd, just among the stuff in our base system that isn't contrib/-ish (like named, sshd, ntpd, etc). -- Matthew Fuller (MF4839) | [EMAIL PROTECTED] Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ The only reason I'm burning my candle at both ends, is because I haven't figured out how to light the middle yet To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Locking the passwd subsystem (was Re: bug in pw, -STABLE [patch])
[ Moved to -hackers since this isn't really a STABLE-specific issue, but rather a general thing. Bcc's to -stable for continuity's sake. ] This discussion has gone on, on and off, for months now, so I'm going to recap a bit. For the people in it, we've mostly paged out the details by now, and for the people not in it, this will be necessary catchup. pw(8) can sometimes corrupt the passwd file when it's running. This is due to the fact that, while it munges things, it locks /etc/master.passwd. This fails to really work right between invocations, because as part of its dirty work, pw(8) moves master.passwd away and creates a new one from scratch. So, an external lockfile is needed. There's no real common locking among the other consumers (chpass(1), passwd(1), vipw(8), pwd_mkdb(8), etc) either, so we might as well make a big common locking mechanism out of it. And while we're doing that, it would be nifty to centralize some functions for doing locking like that, too, instead of coding it from scratch in every application that needs it. The patch on this mail accomplishes this part of the objective; once we can all settle on that, we can easily handle locking all the consumers neatly using these functions. And now, for the carrying on. On Sun, Jun 23, 2002 at 09:07:15AM -0400 I heard the voice of Geoffrey C. Speicher, and lo! it spake thus: Actually, now that I think about it, even a daemon wouldn't guarantee better results in every such scenario. At first it would seem that the daemon would get a chance to clean up the pid file long before another process wanted it (during step 4). However, if the timing is just right then p1 can die the moment that its pid would be reused, and chances are that the daemon wouldn't catch it in time. Well, a daemon would have a BETTER chance, though nowhere near a guarantee. The theory being that the daemon would check every so often (5 minutes sounds good to me), whereas with the library, you potentially could not check for days, depending on how often you run pw(8). Even on a heavily used system, 5 minutes wouldn't likely be enough to reuse a PID, unless you had random PID assignment or some such. Now, this is a problem. There's a race condition here. It's a very small window, to be sure, but I'm not quite sure how to close it. After I think you must've figured it out judging by your post that just came through as I was writing that previous paragraph. :) I think so. See comments in the code. (Re: Oliver's statements about the general evil of PID files) The beauty of encapsulating the pid file operations into a library is that the implementation can be changed to create a socket instead of, or (more likely) in addition to, a pid file. In fact, this may be just the thing we need to close that pid gets reused hole. Since Matt already has the functions mostly implemented, I'll defer to him. Well, there's nothing necessarily stopping us from adding more stuff later, of course. Personally, I've set my sights a bit lower; for the moment, I'm more concerned with Let's not corrupt the passwd database than with This must never block unless totally absolutely completely necessary and should always know where its cheese is, which is what the socket-theory is pointing at. One giant code upheaval at a time :) One possibility would be to make procfs(5) show the process start time (rather than the current time) for the ctime of the /proc/pid directory. Then, one could look at the PID file, and if its mtime (not ctime; see in below code where we could use a pre-existing file) is older than the ctime of the /proc/pid directory, we'd know the PID had changed. Getting the actual process start time would require libkvm (and thus setgid kmem), or spawning ps(1) and parsing output, otherwise. ANYWAY: Attached is a patch (relative to src/) for my first run through creating the functions to handle PID-file creation, deletion, and checking-on-creation. It consists of two functions: - pid_begin(), which creates a PID file if one doesn't exist, and checks to see if the listed process is still around (taking over the PID file if it's not) if it does. - pid_end(), which deletes the PID file. Code is reasonably well commented. Manpage included. Shaken, not stirred. Slippery when wet. This has gone through a compile test, but hasn't been live-tested. This will provide the basis then to move forward and lock all the programs that access the passwd database (for writing, of course; no reason to lock readers) in a common way. -- Matthew Fuller (MF4839) | [EMAIL PROTECTED] Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ The only reason I'm burning my candle at both ends, is because I haven't figured out how to light the middle yet Index: lib/libutil/Makefile === RCS file: /usr/cvs/src/lib/libutil/Makefile,v retrieving
Re: Locking the passwd subsystem (was Re: bug in pw, -STABLE [patch])
On Sun, Jun 23, 2002 at 11:17:35AM -0400 I heard the voice of Geoffrey C. Speicher, and lo! it spake thus: 5-minute window. Still unliklely, but not quite as unlikely as the non-daemon scenario. And it all goes out when window when kern.randompid=1, so it's kinda moot. Looks good from here. I've attached some minor patches (also relative to src/) to pid_util.{3,c}. The first corrects some minor typos in the man page and the second is to make pid_begin() work as advertised when read() fails (set errno and return -1 rather than return errno). Yes, I have a brain. No, really I do. I just don't have enough coffee :) (I haven't tested any of this yet either.) I just slapped together a few quick test programs. Here's an updated patch, incorporating your fixes, as well as a bugfix and an almost-bugfix my testing showed up (hint: just because you ftruncate() to 0-length, doesn't mean your offset becomes 0 too, dangit), and eliminated an extra variable. -- Matthew Fuller (MF4839) | [EMAIL PROTECTED] Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ The only reason I'm burning my candle at both ends, is because I haven't figured out how to light the middle yet Index: lib/libutil/Makefile === RCS file: /usr/cvs/src/lib/libutil/Makefile,v retrieving revision 1.46 diff -u -r1.46 Makefile --- lib/libutil/Makefile8 May 2002 00:50:07 - 1.46 +++ lib/libutil/Makefile23 Jun 2002 13:31:01 - @@ -8,7 +8,7 @@ CFLAGS+=-DINET6 SRCS= _secure_path.c auth.c fparseln.c login.c login_auth.c \ login_cap.c login_class.c login_crypt.c login_ok.c login_times.c \ - login_tty.c logout.c logwtmp.c property.c pty.c \ + login_tty.c logout.c logwtmp.c pid_util.c property.c pty.c \ pw_util.c realhostname.c stub.c \ trimdomain.c uucplock.c INCS= libutil.h login_cap.h @@ -16,7 +16,7 @@ MAN+= login.3 login_auth.3 login_tty.3 logout.3 logwtmp.3 pty.3 \ login_cap.3 login_class.3 login_times.3 login_ok.3 \ _secure_path.3 uucplock.3 property.3 auth.3 realhostname.3 \ - realhostname_sa.3 trimdomain.3 fparseln.3 + realhostname_sa.3 trimdomain.3 fparseln.3 pid_util.3 MAN+= login.conf.5 auth.conf.5 MLINKS+= property.3 properties_read.3 property.3 properties_free.3 MLINKS+= property.3 property_find.3 @@ -39,5 +39,6 @@ MLINKS+=login_auth.3 auth_checknologin.3 login_auth.3 auth_cat.3 MLINKS+=uucplock.3 uu_lock.3 uucplock.3 uu_lock_txfr.3 \ uucplock.3 uu_unlock.3 uucplock.3 uu_lockerr.3 +MLINKS+=pid_util.3 pid_begin.3 pid_end.3 .include bsd.lib.mk Index: lib/libutil/libutil.h === RCS file: /usr/cvs/src/lib/libutil/libutil.h,v retrieving revision 1.37 diff -u -r1.37 libutil.h --- lib/libutil/libutil.h 8 May 2002 00:50:07 - 1.37 +++ lib/libutil/libutil.h 23 Jun 2002 12:35:30 - @@ -82,6 +82,8 @@ struct sockaddr; intrealhostname_sa(char *host, size_t hsize, struct sockaddr *addr, int addrlen); +int pid_begin(const char *pidfile, mode_t mode, int flags); +int pid_end(const char *pidfile); #ifdef _STDIO_H_ /* avoid adding new includes */ char *fparseln(FILE *, size_t *, size_t *, const char[3], int); #endif @@ -128,5 +130,8 @@ /* pw_scan() */ #define PWSCAN_MASTER 0x01 #define PWSCAN_WARN0x02 + +/* pid_begin() */ +#define PID_NOBLOCK0x01 #endif /* !_LIBUTIL_H_ */ Index: lib/libutil/pid_util.c === --- /dev/null Sun Jun 23 11:44:00 2002 +++ lib/libutil/pid_util.c Sun Jun 23 11:50:59 2002 @@ -0,0 +1,178 @@ +/*- + * Copyright (c) 2002 Matthew D. Fuller [EMAIL PROTECTED] + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF
Re: bug in pw, -STABLE [patch]
[ pulled over to -hackers as well ] On Sun, 23 Jun 2002, Matthew D. Fuller wrote: On Sun, Jun 23, 2002 at 09:00:52AM -0700 I heard the voice of Paul Herman, and lo! it spake thus: On Sun, 23 Jun 2002, Geoffrey C. Speicher wrote: How so? I'm not suggesting unlink(2)ing /etc/master.passwd or /etc/spwd.db at all. No, but pw(8) does; making it not do so would require reasonably extensive rewriting. Indeed it does do this. However, that seems to be the crux of the problem. The current pw(8) behavior was (probably) not only a convenient way to update files in /etc, but an attempt to narrow (not eliminate) the window for race conditions. If this is fundamentaly flawed, *this* should be fixed, and not have a larger band aid put over it. I can't imagine it would be too extensive of a rewrite. The temp file code could be kept, and in fileupd.c:fileupdate() instead of rename(/etc/master.passwd.new, /etc/master.passwd), it just copies the contents of the .new file into the original file (that has the O_EXLOCK.) -Paul. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bug in pw, -STABLE [patch]
On Sun, 23 Jun 2002, Matthew D. Fuller wrote: On Sun, Jun 23, 2002 at 09:59:10AM -0700 I heard the voice of Paul Herman, and lo! it spake thus: I can't imagine it would be too extensive of a rewrite. The temp file code could be kept, and in fileupd.c:fileupdate() instead of rename(/etc/master.passwd.new, /etc/master.passwd), it just copies the contents of the .new file into the original file (that has the O_EXLOCK.) Yes, but that would open up the problem that cause the rename(2) to be used in the first place; i.e., rename() is atomic; read(2)/write(2)'ing a bunch of data isn't, so a system crash at the wrong time would leave you with a partial master.passwd. Fine, then lock them both, and use rename(). Patch attached. In fact, if you look at fileupdate(), you see that it already gains an exclusive lock on the temp file, but not the original /etc/master.passwd (if you will.) I think this is a bug, because the original is getting modified (at least in name space), so that should be locked while pw(8) is operating. What do you think? -Paul. Index: fileupd.c === RCS file: /u02/ncvs/src/usr.sbin/pw/fileupd.c,v retrieving revision 1.9 diff -u -r1.9 fileupd.c --- fileupd.c 26 Oct 1999 04:27:13 - 1.9 +++ fileupd.c 23 Jun 2002 18:30:16 - @@ -76,7 +76,7 @@ if (pfxlen = 1) rc = EINVAL; else { - intinfd = open(filename, O_RDWR | O_CREAT, fmode); + intinfd = open(filename, O_RDWR | O_CREAT | O_EXLOCK | O_NONBLOCK, +fmode); if (infd == -1) rc = errno; To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Locking the passwd subsystem (was Re: bug in pw, -STABLE [patch])
And here's a run at Stage 2. This adapts a subset of programs to use the pid_*() locking supplied. The attached patch updates: pwd.h, libutil (libutil.h and pw_util.c), chpass (chpass.c), pw (pw.c), pwd_mkdb (Makefile, pwd_mkdb.8, pwd_mkdb.c), vipw (vipw.c). This does NOT include passwd(1), since that does its dirty work in PAM, which I definately don't feel like delving into. Note that this patch MAY be a little off when applied over the previous patch; I'm working on them in seperate trees, to try and keep the stages discrete. But a little manual fiddling should get them reconciled in short order. -- Matthew Fuller (MF4839) | [EMAIL PROTECTED] Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ The only reason I'm burning my candle at both ends, is because I haven't figured out how to light the middle yet Index: include/pwd.h === RCS file: /usr/cvs/src/include/pwd.h,v retrieving revision 1.12 diff -u -r1.12 pwd.h --- include/pwd.h 9 Jun 2002 19:39:18 - 1.12 +++ include/pwd.h 23 Jun 2002 18:16:21 - @@ -66,6 +66,9 @@ #define_PATH_MASTERPASSWD /etc/master.passwd #define_MASTERPASSWD master.passwd +#define_PATH_PWDLOCK /var/run/pwd.lock +#define_MODE_PWDLOCK (S_IRUSR | S_IWUSR) + #define_PATH_MP_DB /etc/pwd.db #define_MP_DB pwd.db #define_PATH_SMP_DB/etc/spwd.db Index: lib/libutil/libutil.h === RCS file: /usr/cvs/src/lib/libutil/libutil.h,v retrieving revision 1.37 diff -u -r1.37 libutil.h --- lib/libutil/libutil.h 8 May 2002 00:50:07 - 1.37 +++ lib/libutil/libutil.h 23 Jun 2002 18:38:18 - @@ -95,6 +95,8 @@ intpw_init(const char *_dir, const char *_master); char *pw_make(struct passwd *_pw); intpw_mkdb(const char *_user); +intpw_globlock(int); +intpw_globunlock(void); intpw_lock(void); struct passwd *pw_scan(const char *_line, int _flags); const char *pw_tempname(void); @@ -124,6 +126,9 @@ #defineFPARSELN_UNESCCOMM 0x04 #defineFPARSELN_UNESCREST 0x08 #defineFPARSELN_UNESCALL 0x0f + +/* pw_globlock() */ +#definePW_NOBLOCK PID_NOBLOCK /* pw_scan() */ #define PWSCAN_MASTER 0x01 Index: lib/libutil/pw_util.c === RCS file: /usr/cvs/src/lib/libutil/pw_util.c,v retrieving revision 1.25 diff -u -r1.25 pw_util.c --- lib/libutil/pw_util.c 8 May 2002 14:52:32 - 1.25 +++ lib/libutil/pw_util.c 23 Jun 2002 18:46:19 - @@ -79,6 +79,7 @@ static char passwd_dir[PATH_MAX]; static char tempname[PATH_MAX]; static int initialized; +static int globlocked = 0; void pw_cont(int sig) @@ -163,6 +164,28 @@ } /* + * Lock the password subsystem globally. + */ +int +pw_globlock(int flags) +{ + int retval; + + if( (retval=pid_begin(_PATH_PWDLOCK, _MODE_PWDLOCK, flags)) == 0 ); + globlocked=1; + return(retval); +} + +/* + * Unlock the global lock + */ +int pw_globunlock(void) +{ + + return(pid_end(_PATH_PWDLOCK)); +} + +/* * Lock the master password file. */ int @@ -258,10 +281,10 @@ case 0: /* child */ if (user == NULL) - execl(_PATH_PWD_MKDB, pwd_mkdb, -p, + execl(_PATH_PWD_MKDB, pwd_mkdb, -p, -n, -d, passwd_dir, tempname, NULL); else - execl(_PATH_PWD_MKDB, pwd_mkdb, -p, + execl(_PATH_PWD_MKDB, pwd_mkdb, -p, -n, -d, passwd_dir, -u, user, tempname, NULL); _exit(1); default: @@ -353,6 +376,11 @@ } if (lockfd != -1) close(lockfd); + if (globlocked != 0) + { + globlocked=0; + pid_end(_PATH_PWDLOCK); + } errno = serrno; } Index: usr.bin/chpass/chpass.c === RCS file: /usr/cvs/src/usr.bin/chpass/chpass.c,v retrieving revision 1.23 diff -u -r1.23 chpass.c --- usr.bin/chpass/chpass.c 8 May 2002 00:54:28 - 1.23 +++ usr.bin/chpass/chpass.c 23 Jun 2002 18:42:25 - @@ -258,6 +258,8 @@ case _PWF_FILES: if (pw_init(NULL, NULL)) err(1, pw_init()); + if (pw_globlock(PW_NOBLOCK) 0) + err(1, pw_globlock()); if ((pfd = pw_lock()) == -1) { pw_fini(); err(1, pw_lock()); Index: usr.sbin/pw/pw.c === RCS file: /usr/cvs/src/usr.sbin/pw/pw.c,v retrieving
Re: bug in pw, -STABLE [patch]
On Sun, Jun 23, 2002 at 11:32:54AM -0700 I heard the voice of Paul Herman, and lo! it spake thus: In fact, if you look at fileupdate(), you see that it already gains an exclusive lock on the temp file, but not the original /etc/master.passwd (if you will.) I think this is a bug, because the original is getting modified (at least in name space), so that should be locked while pw(8) is operating. I'm not sure. Hm. I think that MAY prevent some of the corruption cases. On the other hand, we're really addressing a broader spectrum of issues here. The was pw(8) manipulates the passwd database is rather different from how chpass(1) or vipw(8) do it; this unifies it all (well, starts the process). Also, looking down, it seems that pw(8) will ALWAYS try to do a line-by-line copy of the temp file back into the main file, which would be Bad Bad Bad (tm); it does this because using rename() all the time loses the flock() lock. However, if we use this global locking, we can dispense with that, and remove some code complexity, to say nothing of a giant waste of time copying line-by-line. I'm trying to kill several birds with as small a stone as I can; I've had my eye on this whole subsystem for a while, and I'd really like to re-center it all around pw(8). vipw(8) kinda has to be its own beast, but chpass(1) and adduser(8)/rmuser(8) are prime candidates to be reworked to use pw(8) for their dirty work. Let's try this: updated patch for pw(8) including my global locking, the addition of flock()'ing both old and new files (it's not quite necessary, since it's all under a global lock, but it never hurts to double-check) as in your patch, and removing the line-by-line copy and using rename() all the time. (I suggest tabstop=4 or even =2; that file is *DEEP*) -- Matthew Fuller (MF4839) | [EMAIL PROTECTED] Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ The only reason I'm burning my candle at both ends, is because I haven't figured out how to light the middle yet Index: fileupd.c === RCS file: /usr/cvs/src/usr.sbin/pw/fileupd.c,v retrieving revision 1.9 diff -u -r1.9 fileupd.c --- fileupd.c 26 Oct 1999 04:27:13 - 1.9 +++ fileupd.c 23 Jun 2002 19:22:16 - @@ -76,7 +76,8 @@ if (pfxlen = 1) rc = EINVAL; else { - intinfd = open(filename, O_RDWR | O_CREAT, fmode); + intinfd = open(filename, + O_RDWR | O_CREAT | O_EXLOCK | O_NONBLOCK, fmode); if (infd == -1) rc = errno; @@ -92,7 +93,8 @@ strcpy(file, filename); strcat(file, .new); - outfd = open(file, O_RDWR | O_CREAT | O_TRUNC | O_EXLOCK, fmode); + outfd = open(file, + O_RDWR | O_CREAT | O_TRUNC | O_EXLOCK | +O_NONBLOCK, fmode); if (outfd == -1) rc = errno; else { @@ -169,27 +171,24 @@ rc = errno; /* Failed to update */ else { /* -* Copy data back into the +* Move data back into +the * original file and truncate */ rewind(infp); rewind(outfp); - while (fgets(line, linesize, outfp) != NULL) - fputs(line, infp); /* -* If there was a problem with copying -* we will just rename 'file.new' -* to 'file'. -* This is a gross hack, but we may have -* corrupted the original file -* Unfortunately, it will lose the inode -* and hence the lock. +* Use rename() to +move the new file
Re: bug in pw, -STABLE [patch]
On Sun, 23 Jun 2002, Matthew D. Fuller wrote: On Sun, Jun 23, 2002 at 11:32:54AM -0700 I heard the voice of Paul Herman, and lo! it spake thus: In fact, if you look at fileupdate(), you see that it already gains an exclusive lock on the temp file, but not the original /etc/master.passwd (if you will.) I think this is a bug, because the original is getting modified (at least in name space), so that should be locked while pw(8) is operating. I think that MAY prevent some of the corruption cases. On the other hand, we're really addressing a broader spectrum of issues here. The was pw(8) manipulates the passwd database is rather different from how chpass(1) or vipw(8) do it; this unifies it all (well, starts the process). Also, looking down, it seems that pw(8) will ALWAYS try to do a line-by-line copy of the temp file back into the main file, which would be Bad Bad Bad (tm); I think you might have your infp confused with your outfp. It's not writing to the original live file, it's just writing the new temp file. That part of the code is OK. As for the giant lock, it would be like closing down the entire airport after someone finds a knife in the snack bar, and in my opinion an this is an unwise and brutal thing to do. Not to mention stale /var/run/ files that might be lying around... I do appreciate your work, but please don't commit this until the real issue is solved, and the dust settles. There still may be some use for it... plenty of time until the next release cycle. Geoff: Does the patch for fileupd.c in the last mail hinder the file corruption you are seeing? (I suggest tabstop=4 or even =2; that file is *DEEP*) Yss! :-) -Paul. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bug in pw, -STABLE [patch]
On Sun, Jun 23, 2002 at 01:19:02PM -0700 I heard the voice of Paul Herman, and lo! it spake thus: I think you might have your infp confused with your outfp. It's not writing to the original live file, it's just writing the new temp file. That part of the code is OK. I'm talking about down around line 177 thru (unpatched), where it's copying back to the primary file. As it stands now (I hadn't realized in my prior reading) it's line-by-line'ing it: /* * Copy data back into the * original file and truncate */ rewind(infp); rewind(outfp); while (fgets(line, linesize, outfp) != NULL) fputs(line, infp); /* * If there was a problem with copying * we will just rename 'file.new' * to 'file'. * This is a gross hack, but we may have * corrupted the original file * Unfortunately, it will lose the inode * and hence the lock. */ if (fflush(infp) == EOF || ferror(infp)) rename(file, filename); else ftruncate(infd, ftell(infp)); which is Very Bad (tm) in that it's not very resilient against system crashes. The rename() solution is much safer. As for the giant lock, it would be like closing down the entire airport after someone finds a knife in the snack bar, and in my opinion an this is an unwise and brutal thing to do. No, it's more like closing the snack bar while you clean it, rather than closing the first steam tray while cleaning that, then the second steam tray, then the third booth on the left side, then... That way, when you open the snack bar, you know that the whole thing is clean (or, depending on your skill and work ethic, at least equally dirty ;). In many passwd-manipulations, the file can't be safely modified in-place, so it has to be done out-of-place, then atomically shoved in, which requires a rename() or the ilk; the downside is that you can't flock() the file in question and handle that easily. One also has to consider the impact with other programs, potentially not written by us, that modify things; a wrapper lock on changes to the auth information is really the only way to ensure consistency; otherwise, you can't REALLY be sure that the passwd and group files are in sync, to say nothing of preventing crashes from mangling things. Not to mention stale /var/run/ files that might be lying around... See the code I put in pid_begin() to handle just that case, that being the reason the function was written in the first place (otherwise, I could dispose of a lot of mess, and just use an empty file as a mutex for it). I do appreciate your work, but please don't commit this until the real issue is solved, and the dust settles. There still may be some use for it... plenty of time until the next release cycle. Commit? How would I do that? :P -- Matthew Fuller (MF4839) | [EMAIL PROTECTED] Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ The only reason I'm burning my candle at both ends, is because I haven't figured out how to light the middle yet To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bug in pw, -STABLE [patch]
On Sun, 23 Jun 2002, Matthew D. Fuller wrote: I'm talking about down around line 177 thru (unpatched), where it's copying back to the primary file. [...] which is Very Bad (tm) in that it's not very resilient against system crashes. The rename() solution is much safer. Ah, now I see, thanks. No contention there, we are in total agreement that this is a Bad Thing, and that rename() is sufficient. No, it's more like closing the snack bar while you clean it, rather than closing the first steam tray while cleaning that, then the second steam tray, then the third booth on the left side, then... OK, you can change the analogy, but you're still closing the snack bar, and you turn away customers. Most 24 hour snack bars clean each table at a time as each one frees up. Moral of that story: This would mean for pw(8) that I can't update the system passwords and the passwords in all my jails at the same time. There is no reason to enforce that policy. Back to the situation at hand. You make a few points that should be clarified, I'll summarize them in QA format: Q: rename() is good, but you can't flock() the file in question and handle it easily, can you? A: Indeed, you can. flocks() are even preserved across renames. Consider: lockf -k foo sleep 60 mv foo bar lockf -t 1 -k bar true The second lockf will error. Q: What about other programs not written by us that modify things? A: Agreed, this is a problem. However, no system in place will prevent at any time anyone with the proper credentials from doing: echo junk /etc/master.passwd At least if the programs in the base agree on a system (be it flock() or /var/run lock) it will work for the base system. This doesn't speak for or against any one method. Yes, a lock on changes to the auth information is the only way to ensure consistency, which is what this is all about, but HOW to accomplish this issue at hand. Commit? How would I do that? :P Sorry, it wasn't as much directed at you as it was more of a general plea. Kind of like this one: Why use lockfiles, when we can lock files? Let's lock files and keep the snack bar open! :-) -Paul. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bug in pw, -STABLE [patch]
OK, this is the end for me today. I'm fairly sure, somehow, it's time to get some sleep ;p On Sun, Jun 23, 2002 at 03:51:46PM -0700 I heard the voice of Paul Herman, and lo! it spake thus: Moral of that story: This would mean for pw(8) that I can't update the system passwords and the passwords in all my jails at the same time. There is no reason to enforce that policy. No, it means that (with configuration) you can only update them once at a time in each jail. I'm all in favor of adding options, either command line, or pw.conf (or both) to put the lockfile somewhere else; -V ${dir} already makes pw(8) read ${dir}/pw.conf instead of the default /etc/pw.conf, so we're 90% of the way there already. Heck, I wouldn't be too worried about adjusting the lockfile location based on -V automagically. I'm also not opposed to adding an option to not bother with this locking; I already added one to pwd_mkdb(8) (primarily because it's called from libutil's pw_mkdb(3), which is called by vipw(8)... deadlocks suck). I just haven't done either NOW; nor have I yet added options to 'block on the lock and keep trying' (or decided to make blocking the default and have a switch for non-blocking) - this is experimental implementation stage, after all. I don't claim this is perfect for every case; I'm just taking aim at the common case, where it's currently far too easy to accidentally screw yourself in the foot. Q: rename() is good, but you can't flock() the file in question and handle it easily, can you? A: Indeed, you can. flocks() are even preserved across renames. Yes, I've seen that. A: Agreed, this is a problem. However, no system in place will prevent at any time anyone with the proper credentials from doing: I was aiming more at: User A adds user with pw(8) * pw(8) gets together stuff, internally adds user and group * pw(8) builds homedirs, skel files, changes ownership * pw(8) opens and locks master.passwd (and assorted temp files) and starts rewriting - User B adds group % User B's pw(8) (or other) invocation adds group with GID that User A's pw(8) had already assigned and chmod'd for user above * User A's pw(8) either errors out leaving job half done, or adds group with duplicate GID, or (not coded) backs out everything it's done and starts over, or (ditto) backs out everything it's done and errors back to User A, who screams and bitches and dumps User B into the East River after sundown. Similar situations can be thought up. My POV on this is that more than 1 file has to be coordinated here, so why not just make 1 file responsible for the coordination? I don't care THAT much if it's called /var/run/pwd.lock or /etc/passwd.lock or /kernel (I suggest testing the latter on a scratch system). It just seems to me the simplest and least error-prone way of doing it. Yes, a lock on changes to the auth information is the only way to ensure consistency, which is what this is all about, but HOW to accomplish this issue at hand. Now, mind you; were this a high-traffic subsystem, where one would normally have 2 or 3 or 4 requests (changes) outstanding, with a constant stream coming in keeping the queue full, I'd be out there screaming for fine-grained locking of individual pieces of it. But really; how many systems are there (that are using stock passwd and getpw* and pw/pwd_mkdb/etc) that are adding users at even a significant fraction of the speed the system can process them? A coarse lock is easier on the programmer (especially being as we have quickie functions that do the job for them), the admin (easier to see an extra lock file, then a hung process holding a flock()), and pretty much everything else. Why use lockfiles, when we can lock files? Let's lock files and keep the snack bar open! :-) We are locking a file :) We're just locking one file for the whole subsystem, so the pieces can be assured of being in sync. So, where's a snack bar with no CVS repository or text editor or diff(1); I need a vacation! -- Matthew Fuller (MF4839) | [EMAIL PROTECTED] Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ The only reason I'm burning my candle at both ends, is because I haven't figured out how to light the middle yet To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bug in pw, -STABLE [patch]
Paul Herman wrote: Fine, then lock them both, and use rename(). Patch attached. In fact, if you look at fileupdate(), you see that it already gains an exclusive lock on the temp file, but not the original /etc/master.passwd (if you will.) I think this is a bug, because the original is getting modified (at least in name space), so that should be locked while pw(8) is operating. What do you think? Don't use an exclusive open. Use a standard lock file lock mechanism. The problem with your proposed patch is that it breaks the ability to allow authentication against the database while it is undergoing modification, which may be a prolonged period. The correct way to deal with this is to make a hard link against the file, verify that the link count is exactly 2, and if it's not, back off and fail. You are only permitted to write the file if you are the one that made the second link. Since both rename and link are atomic operations in the directory, you are safe. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bug in pw, -STABLE [patch]
On Sun, 23 Jun 2002, Matthew D. Fuller wrote: I don't claim this is perfect for every case; I'm just taking aim at the common case, where it's currently far too easy to accidentally screw yourself in the foot. ...and have the screw (or bullet) delivered as precisely, quickly and efficiently as possible. Clearly, at the root of our disagreement is what we both perceive the problem to be. I don't see problems in the current implementation, aside from bugs that lead to unexpected behavior, i.e. passwd file corruption. You see the problem as a deficiency in the implementation itself, and wish to protect the user from shooting themselves in the foot. Not only do I think that's impossible[*], I choose to fight for my right to shoot myself in the foot as quickly and efficiently as possible, but that's where we'll disagree, and I'll just leave it at that and wish you a good night's sleep. -Paul. [*] Patch to vi refusing to edit filenames containing master.passwd withheld by request. ;-) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bug in pw, -STABLE [patch]
Matthew D. Fuller writes: ... Why use lockfiles, when we can lock files? Let's lock files and keep the snack bar open! :-) We are locking a file :) We're just locking one file for the whole subsystem, so the pieces can be assured of being in sync. Time to implement hierarchic lock. hlock dir lead to fail hlock every file in that dir with ref count == 1 ?? Extremly unefficient -- @BABOLO http://links.ru/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bug in pw, -STABLE [patch]
On Sun, 23 Jun 2002, Paul Herman wrote: You see the problem as a deficiency in the implementation itself, and wish to protect the user from shooting themselves in the foot. Not only do I think that's impossible[*], I choose to fight for my right to shoot myself in the foot as quickly and efficiently as possible, but that's where we'll disagree, and I'll just leave it at that and wish you a good night's sleep. Not to be snooty, but there will always be plenty of ways for you to shoot yourself in the foot if you want to play that way. For regular users of pw(8) who don't want to get shot, it needs to be fixed when used within its intended bounds. As far as I'm concerned, it comes down to what is most feasible to implement correctly without an unreasonably large effort. In reality, it comes down to how a committer would like to see it work, since the committers are the ones who ultimately have to deal with and are responsible for the change. So we either need to have a compelling solution or get a committer to step in and make up our minds for us. Geoff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bug in pw, -STABLE [patch]
[I'm bringing the discussion back to -hackers, since it was omitted in your original reply, Paul. To get everyone up to speed, Paul suggested that calling unlink() _before_ close() should solve the race condition mentioned in my original message. However, that still leads to corruption, and we're trying to figure out why.] On Fri, 17 May 2002, Paul Herman wrote: The file isn't actually unlinked until p1 closes the fd Sure it is. Try a little C: fd = open(watchme, O_CREAT|O_EXLOCK); sleep(10); unlink(watchme); sleep(10); close(fd); You'll see the file disappear after 10 seconds, not 20. Hmm. Rechecking the man page for unlink(2) I see that I worded the above incorrectly. The file is unlinked when you unlink(), but not actually removed until close(). Also, open() looks like it blocks until it can get the lock But when does open() block in the following scenario? p1:open() - open lock file p2:open() - BLOCK p1:unlink()- remove link only, do not remove file p2:open() - WAKEUP p1:close() - close fd Does it block before it opens a fd, or after then but before it gains an exclusive lock? In other words, did p2 open a new file after it woke up, or did it open the same file that p1 had opened? If the latter is true then we could extend the above: p3:open() - open lock file (since p1 unlinked p1 p2's copy) p2/p3:update master.passwd - BOOM If the former is true, then I'm pretty much out of ideas. I don't think the corruption is due to something else, since simply removing the call to unlink() does in fact fix the problem. Geoff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bug in pw, -STABLE [patch]
On Sat, 18 May 2002, Geoffrey C. Speicher wrote: The file isn't actually unlinked until p1 closes the fd Hmm. Rechecking the man page for unlink(2) I see that I worded the above incorrectly. The file is unlinked when you unlink(), but not actually removed until close(). Well, unlink and remove are the same in my vocabulary. The file is removed when you unlink it, but the will lock remain until you close it. But when does open() block in the following scenario? p1:open()- open lock file p2:open()- BLOCK p1:unlink() - remove link only, do not remove file p2:open()- WAKEUP p1:close() - close fd p2 blocks until p1 closes the fd. The problem in this example is, by the time p2 gains the lock, the file in non-existant in /var/lock, which allows p3 to grab a new 2nd lock, which gives you the corruption. You should open() with O_NONBLOCK in order to solve your problem, but the calling process will fail. In that case, pw should exit with a sysexits(3) error to notify the caller. Having pw loop for a few seconds while trying to obtain the lock (like lockf(1) does) might be nice. -Paul. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message