Re: bleh. Re: ufs_rename panic
As pointed by Ken - we do have alot of file renames (qmail). But 2-nd solution, directory-only rename serialization, probably won't affect performance as much. But i believe it's not only us who's gonna have problem when exploit code will be known by everybody sooner or later Thanks! Kirk McKusick wrote: The potentially slow, but utterly effective way to fix this race is to only allow one rename at a time per filesystem. It is implemented by adding a flag in the mount structure and using it to serialize calls to rename. When only one rename can happen at a time, the race cannot occur. If this proves to be too much of a slow down, it would be possible to only serialize directory renames. As these are (presumably) much rarer the slow down would be less noticable. Kirk McKusick =-=-=-=-=-= Date: Wed, 19 Feb 2003 15:10:09 -0800 From: Yevgeniy Aleynikov [EMAIL PROTECTED] To: Matt Dillon [EMAIL PROTECTED] CC: Kirk McKusick [EMAIL PROTECTED], Ian Dowse [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], Ken Pizzini [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: bleh. Re: ufs_rename panic X-ASK-Info: Confirmed by User Just reminder that this problem is local kernel panic DoS (which can do filesystem corruption) with very simple trigger code and it still exists. And it's been almost 2 years since i wrote about it. Workaround (commenting out panic call) doesnt fix the problem. Server still crashes (not so often though) from virtual memory failures like this: panic: vm_fault: fault on nofault entry, addr: d0912000 mp_lock = 0102; cpuid = 1; lapic.id = boot() called on cpu#1 (kgdb) bt #0 0xc0175662 in dumpsys () #1 0xc017542c in boot () #2 0xc0175894 in poweroff_wait () #3 0xc01e7c18 in vm_fault () #4 0xc0219d32 in trap_pfault () #5 0xc021990b in trap () #6 0xc01e008a in ufs_dirrewrite () #7 0xc01e31a4 in ufs_rename () #8 0xc01e4645 in ufs_vnoperate () #9 0xc01a9121 in rename () #10 0xc021a44d in syscall2 () #11 0xc02077cb in Xint0x80_syscall () How can i help to resolve this problem ASAP? Thanks! Matt Dillon wrote: Well, I've gone through hell trying to fix the rename()/rmdir()/remove() races and failed utterly. There are far more race conditions then even my last posting indicated, and there are *severe* problems fixing NFS to deal with even Ian's suggestion... it turns out that NFS's nfs_namei() permanently adjusts the mbuf while processing the path name, making restarts impossible. The only solution is to implement namei cache path locking and formalize the 'nameidata' structure, which means ripping up a lot of code because nearly the entire code base currently plays with the contents of 'nameidata' willy-nilly. Nothing else will work. It's not something that I can consider doing now. In the mean time I am going to remove the panic()'s in question. This means that in ufs_rename() the machine will silently ignore the race (not do the rename) instead of panic. It's all that can be done for the moment. It solve the security/attack issue. We'll have to attack the races as a separate issue. The patch to remove the panics is utterly trivial and I will commit it after I test it. -Matt -- Yevgeniy Aleynikov | Sr. Systems Engineer - USE InfoSpace INC 601 108th Ave NE | Suite 1200 | Bellevue, WA 98004 Tel 425.709.8214 | Fax 425.201.6160 | Mobile 425.418.8924 [EMAIL PROTECTED] | http://www.infospaceinc.com Discover what you can do.TM To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bleh. Re: ufs_rename panic
Yevgeniy Aleynikov wrote: As pointed by Ken - we do have alot of file renames (qmail). But 2-nd solution, directory-only rename serialization, probably won't affect performance as much. But i believe it's not only us who's gonna have problem when exploit code will be known by everybody sooner or later Dan's non-atomicity assumption on renames is incorrect. Even if it's were correct, it's possible to recover fully following a failure, because metadata updates are ordered (there is a real synchronization between dependent operations). I think that a workaround would be to comment the directory fsync() code out of qmail, which apparently thinks it's running on extfs or an async mounted FFS. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bleh. Re: ufs_rename panic
Date: Fri, 21 Feb 2003 15:26:01 -0800 From: Terry Lambert [EMAIL PROTECTED] To: Yevgeniy Aleynikov [EMAIL PROTECTED] CC: Kirk McKusick [EMAIL PROTECTED], Matt Dillon [EMAIL PROTECTED], Ian Dowse [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], Ken Pizzini [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: bleh. Re: ufs_rename panic Yevgeniy Aleynikov wrote: As pointed by Ken - we do have alot of file renames (qmail). But 2-nd solution, directory-only rename serialization, probably won't affect performance as much. But i believe it's not only us who's gonna have problem when exploit code will be known by everybody sooner or later Dan's non-atomicity assumption on renames is incorrect. Even if it's were correct, it's possible to recover fully following a failure, because metadata updates are ordered (there is a real synchronization between dependent operations). I think that a workaround would be to comment the directory fsync() code out of qmail, which apparently thinks it's running on extfs or an async mounted FFS. -- Terry You cannot get rid of the fsync calls in qmail. You have to distinguish between a filesystem that is recoverable and one which loses data. When receiving an incoming message, SMTP requires that the receiver have the message in stable store before acknowledging receipt. The only way to know that it is in stable store is to fsync it before responding. Kirk McKusick To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bleh. Re: ufs_rename panic
On Fri, 21 Feb 2003, Terry Lambert wrote: Yevgeniy Aleynikov wrote: As pointed by Ken - we do have alot of file renames (qmail). But 2-nd solution, directory-only rename serialization, probably won't affect performance as much. But i believe it's not only us who's gonna have problem when exploit code will be known by everybody sooner or later Dan's non-atomicity assumption on renames is incorrect. Even if it's were correct, it's possible to recover fully following a failure, because metadata updates are ordered (there is a real synchronization between dependent operations). I think that a workaround would be to comment the directory fsync() code out of qmail, which apparently thinks it's running on extfs or an async mounted FFS. If you don't want to lose mail then qmail needs to do a fsync after it does the rename. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bleh. Re: ufs_rename panic
Kirk McKusick wrote: Yevgeniy Aleynikov wrote: As pointed by Ken - we do have alot of file renames (qmail). But 2-nd solution, directory-only rename serialization, probably won't affect performance as much. But i believe it's not only us who's gonna have problem when exploit code will be known by everybody sooner or later Dan's non-atomicity assumption on renames is incorrect. Even if it's were correct, it's possible to recover fully following a failure, because metadata updates are ordered (there is a real synchronization between dependent operations). I think that a workaround would be to comment the directory fsync() code out of qmail, which apparently thinks it's running on extfs or an async mounted FFS. -- Terry You cannot get rid of the fsync calls in qmail. You have to distinguish between a filesystem that is recoverable and one which loses data. When receiving an incoming message, SMTP requires that the receiver have the message in stable store before acknowledging receipt. The only way to know that it is in stable store is to fsync it before responding. The issue is specifically with the rename code, which is a metadata operation, not with the storing of application data. The fsync's in question are those to the fd of the directory, not the fd of the application data. Sorry if it wasn't clear from my statement that Dan's non-atomicity assumption on renames is incorrect meant that it only applied to the fsync() calls dealing with the rename. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bleh. Re: ufs_rename panic
Julian Elischer wrote: On Fri, 21 Feb 2003, Terry Lambert wrote: Dan's non-atomicity assumption on renames is incorrect. [ ... ] I think that a workaround would be to comment the directory fsync() code out of qmail, which apparently thinks it's running on extfs or an async mounted FFS. If you don't want to lose mail then qmail needs to do a fsync after it does the rename. See other mail. I was referring only to fsync() of directory data. Sorry if that wasn't clear because of the intervening paragraph. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bleh. Re: ufs_rename panic
Just reminder that this problem is local kernel panic DoS (which can do filesystem corruption) with very simple trigger code and it still exists. And it's been almost 2 years since i wrote about it. Workaround (commenting out panic call) doesnt fix the problem. Server still crashes (not so often though) from virtual memory failures like this: panic: vm_fault: fault on nofault entry, addr: d0912000 mp_lock = 0102; cpuid = 1; lapic.id = boot() called on cpu#1 (kgdb) bt #0 0xc0175662 in dumpsys () #1 0xc017542c in boot () #2 0xc0175894 in poweroff_wait () #3 0xc01e7c18 in vm_fault () #4 0xc0219d32 in trap_pfault () #5 0xc021990b in trap () #6 0xc01e008a in ufs_dirrewrite () #7 0xc01e31a4 in ufs_rename () #8 0xc01e4645 in ufs_vnoperate () #9 0xc01a9121 in rename () #10 0xc021a44d in syscall2 () #11 0xc02077cb in Xint0x80_syscall () How can i help to resolve this problem ASAP? Thanks! Matt Dillon wrote: Well, I've gone through hell trying to fix the rename()/rmdir()/remove() races and failed utterly. There are far more race conditions then even my last posting indicated, and there are *severe* problems fixing NFS to deal with even Ian's suggestion... it turns out that NFS's nfs_namei() permanently adjusts the mbuf while processing the path name, making restarts impossible. The only solution is to implement namei cache path locking and formalize the 'nameidata' structure, which means ripping up a lot of code because nearly the entire code base currently plays with the contents of 'nameidata' willy-nilly. Nothing else will work. It's not something that I can consider doing now. In the mean time I am going to remove the panic()'s in question. This means that in ufs_rename() the machine will silently ignore the race (not do the rename) instead of panic. It's all that can be done for the moment. It solve the security/attack issue. We'll have to attack the races as a separate issue. The patch to remove the panics is utterly trivial and I will commit it after I test it. -Matt -- Yevgeniy Aleynikov | Sr. Systems Engineer - USE InfoSpace INC 601 108th Ave NE | Suite 1200 | Bellevue, WA 98004 Tel 425.709.8214 | Fax 425.201.6160 | Mobile 425.418.8924 [EMAIL PROTECTED] | http://www.infospaceinc.com Discover what you can do.TM To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bleh. Re: ufs_rename panic
The potentially slow, but utterly effective way to fix this race is to only allow one rename at a time per filesystem. It is implemented by adding a flag in the mount structure and using it to serialize calls to rename. When only one rename can happen at a time, the race cannot occur. If this proves to be too much of a slow down, it would be possible to only serialize directory renames. As these are (presumably) much rarer the slow down would be less noticable. Kirk McKusick =-=-=-=-=-= Date: Wed, 19 Feb 2003 15:10:09 -0800 From: Yevgeniy Aleynikov [EMAIL PROTECTED] To: Matt Dillon [EMAIL PROTECTED] CC: Kirk McKusick [EMAIL PROTECTED], Ian Dowse [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], Ken Pizzini [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: bleh. Re: ufs_rename panic X-ASK-Info: Confirmed by User Just reminder that this problem is local kernel panic DoS (which can do filesystem corruption) with very simple trigger code and it still exists. And it's been almost 2 years since i wrote about it. Workaround (commenting out panic call) doesnt fix the problem. Server still crashes (not so often though) from virtual memory failures like this: panic: vm_fault: fault on nofault entry, addr: d0912000 mp_lock = 0102; cpuid = 1; lapic.id = boot() called on cpu#1 (kgdb) bt #0 0xc0175662 in dumpsys () #1 0xc017542c in boot () #2 0xc0175894 in poweroff_wait () #3 0xc01e7c18 in vm_fault () #4 0xc0219d32 in trap_pfault () #5 0xc021990b in trap () #6 0xc01e008a in ufs_dirrewrite () #7 0xc01e31a4 in ufs_rename () #8 0xc01e4645 in ufs_vnoperate () #9 0xc01a9121 in rename () #10 0xc021a44d in syscall2 () #11 0xc02077cb in Xint0x80_syscall () How can i help to resolve this problem ASAP? Thanks! Matt Dillon wrote: Well, I've gone through hell trying to fix the rename()/rmdir()/remove() races and failed utterly. There are far more race conditions then even my last posting indicated, and there are *severe* problems fixing NFS to deal with even Ian's suggestion... it turns out that NFS's nfs_namei() permanently adjusts the mbuf while processing the path name, making restarts impossible. The only solution is to implement namei cache path locking and formalize the 'nameidata' structure, which means ripping up a lot of code because nearly the entire code base currently plays with the contents of 'nameidata' willy-nilly. Nothing else will work. It's not something that I can consider doing now. In the mean time I am going to remove the panic()'s in question. This means that in ufs_rename() the machine will silently ignore the race (not do the rename) instead of panic. It's all that can be done for the moment. It solve the security/attack issue. We'll have to attack the races as a separate issue. The patch to remove the panics is utterly trivial and I will commit it after I test it. -Matt -- Yevgeniy Aleynikov | Sr. Systems Engineer - USE InfoSpace INC 601 108th Ave NE | Suite 1200 | Bellevue, WA 98004 Tel 425.709.8214 | Fax 425.201.6160 | Mobile 425.418.8924 [EMAIL PROTECTED] | http://www.infospaceinc.com Discover what you can do.TM To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bleh. Re: ufs_rename panic
McKusick wrote: The potentially slow, but utterly effective way to fix this race is to only allow one rename at a time per filesystem. Can we serialize unprivileged renames per mount as an alternate work around? Later Mark Hittinger [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: patch #3 (was Re: bleh. Re: ufs_rename panic)
FYI: http://www.securityfocus.com/cgi-bin/archive.pl?id=1mid=221337start=2001-10-15end=2001-10-21 (about how things done in Linux). Zhihui Zhang wrote: (1) I am always wondering why not use a global rename lock so that there is only one rename operation in progress at any time. This method is used by GFS and probably Linux. This could make the code simply. Maybe we can even get rid of the relookup() stuff. This may reduce concurrency, but rename should not be a frequent operation. -- Yevgeniy Aleynikov Infospace, Inc. SysAdmin, USE Work: (206)357-4594 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bleh. Re: ufs_rename panic
Here's another stable panic (not very often but on different boxes too). -- Yevgeniy Aleynikov Infospace, Inc. SysAdmin, USE Work: (206)357-4594 SMP 2 cpus IdlePTD 3039232 initial pcb at 2666a0 panicstr: ffs_valloc: dup alloc panic messages: --- panic: ffs_valloc: dup alloc mp_lock = 0101; cpuid = 1; lapic.id = boot() called on cpu#1 syncing disks... 2 #0 dumpsys () at ../../kern/kern_shutdown.c:473 473 if (dumping++) { (kgdb) bt #0 dumpsys () at ../../kern/kern_shutdown.c:473 #1 0xc015e9df in boot (howto=256) at ../../kern/kern_shutdown.c:313 #2 0xc015ede0 in poweroff_wait (junk=0xc0232e01, howto=-1071436320) at ../../kern/kern_shutdown.c:581 #3 0xc01b6600 in ffs_valloc (pvp=0xdf902600, mode=33188, cred=0xc5b83380, vpp=0xdf541c9c) at ../../ufs/ffs/ffs_alloc.c:609 #4 0xc01c32ef in ufs_makeinode (mode=33188, dvp=0xdf902600, vpp=0xdf541edc, cnp=0xdf541ef0) at ../../ufs/ufs/ufs_vnops.c:2097 #5 0xc01c0978 in ufs_create (ap=0xdf541df8) at ../../ufs/ufs/ufs_vnops.c:194 #6 0xc01c363d in ufs_vnoperate (ap=0xdf541df8) at ../../ufs/ufs/ufs_vnops.c:2382 #7 0xc0192c88 in vn_open (ndp=0xdf541ec4, fmode=1538, cmode=420) at vnode_if.h:106 #8 0xc018ee6c in open (p=0xdf4c9a00, uap=0xdf541f80) at ../../kern/vfs_syscalls.c:1077 #9 0xc0205011 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 1, tf_esi = 0, tf_ebp = -1077939600, tf_isp = -548134956, tf_ebx = -1077939604, tf_edx = 1537, tf_ecx = 137102023, tf_eax = 5, tf_trapno = 7, tf_err = 2, tf_eip = 674553728, tf_cs = 31, tf_eflags = 514, tf_esp = -1077939656, tf_ss = 47}) at ../../i386/i386/trap.c:1155 #10 0xc01f291b in Xint0x80_syscall () --- fs-fs_contigdirs[cg]--; } ino = (ino_t)ffs_hashalloc(pip, cg, (long)ipref, mode, (allocfcn_t *)ffs_nodealloccg); if (ino == 0) goto noinodes; error = VFS_VGET(pvp-v_mount, ino, vpp); if (error) { UFS_VFREE(pvp, ino, mode); return (error); } ip = VTOI(*vpp); if (ip-i_mode) { printf(mode = 0%o, inum = %lu, fs = %s\n, ip-i_mode, (u_long)ip-i_number, fs-fs_fsmnt); panic(ffs_valloc: dup alloc); } #3 0xc01b6600 in ffs_valloc (pvp=0xdf902600, mode=33188, cred=0xc5b83380, vpp=0xdf541c9c) at ../../ufs/ffs/ffs_alloc.c:609 (kgdb) print *pvp $6 = {v_flag = 0, v_usecount = 1, v_writecount = 0, v_holdcnt = 2, v_id = 69744400, v_mount = 0xc4174e00, v_op = 0xc4078700, v_freelist = { tqe_next = 0x0, tqe_prev = 0xe01f999c}, v_mntvnodes = { le_next = 0xdf949f00, le_prev = 0xe02ef924}, v_cleanblkhd = { tqh_first = 0xce8697ac, tqh_last = 0xce8697b4}, v_dirtyblkhd = { tqh_first = 0x0, tqh_last = 0xdf902634}, v_synclist = { le_next = 0xe034c7c0, le_prev = 0xe04904bc}, v_numoutput = 0, v_type = VDIR, v_un = {vu_mountedhere = 0x0, vu_socket = 0x0, vu_spec = { vu_specinfo = 0x0, vu_specnext = {sle_next = 0x0}}, vu_fifoinfo = 0x0}, v_lease = 0x0, v_lastw = 0, v_cstart = 0, v_lasta = 0, v_clen = 0, v_object = 0x0, v_interlock = {lock_data = 0}, v_vnlock = 0xc448d400, v_tag = VT_UFS, v_data = 0xc448d400, v_cache_src = {lh_first = 0xc5630f40}, v_cache_dst = {tqh_first = 0xc5dcfb80, tqh_last = 0xc5dcfb90}, v_dd = 0xdfac9b00, v_ddid = 69744310, v_pollinfo = {vpi_lock = { lock_data = 0}, vpi_selinfo = {si_pid = 0, si_note = {slh_first = 0x0}, si_flags = 0}, vpi_events = 0, vpi_revents = 0}, v_vxproc = 0x0} (kgdb) print **vpp $2 = {v_flag = 0, v_usecount = 1, v_writecount = 0, v_holdcnt = 0, v_id = 69818641, v_mount = 0xc4174e00, v_op = 0xc4078700, v_freelist = { tqe_next = 0xdf941dc0, tqe_prev = 0xc02670d0}, v_mntvnodes = { le_next = 0xdfd79a80, le_prev = 0xc4174e18}, v_cleanblkhd = { tqh_first = 0x0, tqh_last = 0xdfa89e2c}, v_dirtyblkhd = {tqh_first = 0x0, tqh_last = 0xdfa89e34}, v_synclist = {le_next = 0x0, le_prev = 0xc407b0cc}, v_numoutput = 0, v_type = VDIR, v_un = { vu_mountedhere = 0x0, vu_socket = 0x0, vu_spec = {vu_specinfo = 0x0, vu_specnext = {sle_next = 0x0}}, vu_fifoinfo = 0x0}, v_lease = 0x0, v_lastw = 0, v_cstart = 0, v_lasta = 0, v_clen = 0, v_object = 0x0, v_interlock = {lock_data = 0}, v_vnlock = 0xc55f1200, v_tag = VT_UFS, v_data = 0xc55f1200, v_cache_src = {lh_first = 0x0}, v_cache_dst = { tqh_first = 0x0, tqh_last = 0xdfa89e80}, v_dd = 0xdfa89e00, v_ddid = 0, v_pollinfo = {vpi_lock = {lock_data = 0}, vpi_selinfo = {si_pid = 0, si_note = {slh_first = 0x0}, si_flags = 0}, vpi_events = 0, vpi_revents = 0}, v_vxproc = 0x0} (kgdb) print mode $3
Re: bleh. Re: ufs_rename panic
On Sun, Oct 07, 2001 at 06:09:45PM -0700, Kris Kennaway wrote: On Fri, Oct 05, 2001 at 04:17:14PM -0700, Yevgeniy Aleynikov wrote: And why FreeBSD security officer's email address always bounces my mail? Don't know, it's apparently still working fine for others. Wild guess: FreeBSD.org's anti-spam getting in your way? -- | / o / /_ _ email: [EMAIL PROTECTED] |/|/ / / /( (_) Bulte Arnhem, The Netherlands To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bleh. Re: ufs_rename panic
Well, I've gone through hell trying to fix the rename()/rmdir()/remove() races and failed utterly. There are far more race conditions then even my last posting indicated, and there are *severe* problems fixing NFS to deal with even Ian's suggestion... it turns out that NFS's nfs_namei() permanently adjusts the mbuf while processing the path name, making restarts impossible. The only solution is to implement namei cache path locking and formalize the 'nameidata' structure, which means ripping up a lot of code because nearly the entire code base currently plays with the contents of 'nameidata' willy-nilly. Nothing else will work. It's not something that I can consider doing now. In the mean time I am going to remove the panic()'s in question. This means that in ufs_rename() the machine will silently ignore the race (not do the rename) instead of panic. It's all that can be done for the moment. It solve the security/attack issue. We'll have to attack the races as a separate issue. The patch to remove the panics is utterly trivial and I will commit it after I test it. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bleh. Re: ufs_rename panic
On Fri, Oct 05, 2001 at 04:17:14PM -0700, Yevgeniy Aleynikov wrote: And why FreeBSD security officer's email address always bounces my mail? Don't know, it's apparently still working fine for others. Kris msg29046/pgp0.pgp Description: PGP signature
Re: bleh. Re: ufs_rename panic
: :It seems like there's no activity on this subject. :This is local DoS, guys. if it gets on public (which is probably gonna :be soon) everything everywhere will be crashing, and there's no stable :fix ready. :How can i help to accelerate this problem solution? : : And why FreeBSD security officer's email address always bounces my :mail? : :Thanks! My most recently posted patch will solve your problem. I am reworking it as per Ian's suggestions before I commit, and will also implement the same feature in rename() (for files). Then I will do a standard commit-to-current-wait-commit-to-stable sequence. But due to the complexity of the changes (even after simplifying them), it is going to be another few days before anything gets into -stable officially. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: patch #3 (was Re: bleh. Re: ufs_rename panic)
In message [EMAIL PROTECTED], Matt Dillon writes: :This seems rather large compared to Ian Dowse's version.. Are you sure that :you're doing this the right way? Adding a whole new locking mechanism :when the simple VRENAME flag to be enough seems like a bit of overkill.. Matt addresses the problem more completely than my patch does, so the differences in patch size and files touched are to be expected. In particular, the NFS server and unionfs code need to be changed in the same way as the syscalls, and the IN_RENAME flag can be removed from the ufs code, both of which are included in Matt's patch. Ian's doesn't fix any of the filesystem semantics bugs, it only prevents the panic from occuring. This is certainly correct, though the IN_RENAME flag in the UFS code currently has a few such semantics bugs where EINVAL can be returned in cases that would succeed if rename() was atomic. When a vnode cannot be renamed/unlinked/rmdir'd because it is being renamed, the operation should be retried until it succeeds, sleeping as necessary. As I understand it, this is mostly dealt with by Matt's patch, but not at all by mine. If you remove the filesystem semantics fixes from my patch you essentially get Ian's patch except that I integrated the vnode flag in namei/lookup whereas Ian handles it manually in the syscall code. The addition of the SOFTLOCKLEAF code is quite a major change, so it would be very useful if you could describe exactly what it does, what its semantics are, and how it fits into the rename problem. My understanding of the problem is that VOP_RENAME is quite unique in that it is the only VOP that must modify entries in two separate directories. To avoid deadlock, it is not possible (very hard anyway) to lock all 4 vnodes (source node, source parent, target node, target parent) before calling VOP_RENAME. Instead, the approach taken is to lock only the target node and its parent, and have the VOP_RENAME implementation jump back and forth between locking the source and locking the target as necessary. Hence VOP_RENAME is the only VOP that must modify a node that is passed in unlocked. Because the source node and parent are not locked, there is the possibility that the source node could be renamed or removed at any time before VOP_RENAME finally gets around to locking it and removing it. Something needs to protect the source node against being renamed/removed between the point that the source node is initially looked up and the point that it is finally locked. Both Matt's SOFTLOCKLEAF and the VRENAME flag are there to provide this protection. It is the fact that this problem is entirely unique to VOP_RENAME that leads me to think that adding the generic SOFTLOCKLEAF code is overkill. The following fragment also suggests that maybe the approach doesn't actually fit in that well: fromnd.ni_cnd.cn_flags = ~SOFTLOCKLEAF;/* XXX hack */ error = VOP_RENAME(fromnd.ni_dvp, fromnd.ni_vp, fromnd.ni_cnd, tond.ni_dvp, tond.ni_vp, tond.ni_cnd); fromnd.ni_cnd.cn_flags |= SOFTLOCKLEAF; NDFREE(fromnd, NDF_ONLY_PNBUF NDF_ONLY_SOFTLOCKLEAF); The way that vclearsoftlock() is used to clear a flag in an unlocked vnode is also not ideal. This should probably be protected at least by v_interlock as other flags are. The syscalls that need to be changed (rename, unlink, rmdir) could possibly use vn_* style wrapper functions to reduce the amount of code that must understand the new locking mechanism, although I'm not sure if this is practical for the NFS case. It might also be a good time to remove the WILLRELE from VOP_RENAME, which would simplify some of the surrounding code. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: patch #3 (was Re: bleh. Re: ufs_rename panic)
:The rename routine is probably the most convoluted in the entire file :system code (FFS). Now that everybody's memory is fresh, I would like to :ask something about it: : :(1) I am always wondering why not use a global rename lock so that there :is only one rename operation in progress at any time. This method is :used by GFS and probably Linux. This could make the code simply. Maybe :we can even get rid of the relookup() stuff. : :This may reduce concurrency, but rename should not be a frequent :operation. Well, you could say that about virtually any filesystem operation. Bitmaps are shared, for example. It is a bad idea to try to code simplistic solutions to complex problems. Throughout the code history of BSDs we have had to constantly make adjustments to algorithms that were originally not designed to scale past what the authors originally believed was reasonable. :(2) In the code of 4.3-release, we grab the source inode while holding the :locks of target inodes. In ufs_rename(), we have: : :if ((error = vn_lock(fvp, LK_EXCLUSIVE, p)) != 0) : goto abortit; : :I wonder whether this could cause deadlock. I think locking more than :one inode should be done in some sequence (ie. order them by inode :number). Hmm. Yes, there might possibly be a problem there. We may be safe due to the fact that only directory scans and rename hold multiple vnodes locked, and in this case the destination directory holding the destination file is already locked. However, if the source directory/file gets ripped out from under rename() the 'new' location of the source directory/file could cause a deadlock against another process. It would be very difficult to generate it though. :(4) This is not related to rename(). But ufs_mknod() reload the inode :through VFS_VGET() to avoid duplicate aliases. I can not see why it :is necessary. I asked this before, but have not got any satisfactory :responses. :... :Any ideas are welcome. Thanks, : :-Zhihui I don't know the answer to this at the moment. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: patch #3 (was Re: bleh. Re: ufs_rename panic)
:The addition of the SOFTLOCKLEAF code is quite a major change, so :it would be very useful if you could describe exactly what it does, :what its semantics are, and how it fits into the rename problem. Setting SOFTLOCKLEAF in namei will set the VSOFTLOCK flag in the returned vnode (whether the returned vnode is locked or not), and namei() will fail with EAGAIN if the VSOFTLOCK flag is already set. An extra reference is added to the returned vnode which either the caller must free or NDFREE must free (note that VOP_RENAME is not responsible for freeing this extra reference, so the API itself does not actually change). The caller must either call vclearsoftlock() or call NDFREE() with the appropriate flags to clear the flag and dereference the vnode. int gc = 0; vagain(gc); vagain() is a routine that falls through the first time, initializing 'gc' to a global counter. If later on you get an EAGAIN from a SOFTLOCK failure and loop back up to the vagain(), it will block the process until whomever owns the softlock has 'probably' released it. This allows the system call to restart internally and attempt the operation again from scratch. :Because the source node and parent are not locked, there is the :possibility that the source node could be renamed or removed at :any time before VOP_RENAME finally gets around to locking it and :removing it. Something needs to protect the source node against :being renamed/removed between the point that the source node is :initially looked up and the point that it is finally locked. Both :Matt's SOFTLOCKLEAF and the VRENAME flag are there to provide this :protection. : :It is the fact that this problem is entirely unique to VOP_RENAME :that leads me to think that adding the generic SOFTLOCKLEAF code :is overkill. The following fragment also suggests that maybe the :approach doesn't actually fit in that well: Well, maybe. I have my eye on possibly seeing a way to fix the race-to-root directory scanning program too, so I decided to implement VSOFTLOCK formally rather then as a hack. :The way that vclearsoftlock() is used to clear a flag in an unlocked :vnode is also not ideal. This should probably be protected at least :by v_interlock as other flags are. : :Ian In -current, definitely. I'm not sure why v_interlock even exists in -stable. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: patch #3 (was Re: bleh. Re: ufs_rename panic)
On Wed, 3 Oct 2001, Ken Pizzini wrote: Zhihui Zhang [EMAIL PROTECTED] wrote: (3) Matt says For example, if you have two hardlinked files residing in different directories both get renamed simultaniously, one of the rename()s can fail even though there is no conflict Can you explain this a little bit more? Consider: mkdir foo bar echo fubar foo/a ln foo/a bar/a Should it be: ln foo/a bar/b instead? mv foo/a foo/b mv bar/a bar/b There is no reason why that last line should fail, though it could return EINVAL under some situations using some of the proposed patches. --Ken Pizzini To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bleh. Re: ufs_rename panic
Ok, I'm adding -hackers... another email thread got going in -committers. Here is a patch set for -stable. It isn't perfect but it does appear to solve the problem. The one case I don't handle right is if you have a hardlinked file and two renames in two different directories on the same file occur at the same time... that will (improperly) return an error code when, in fact, it's perfectly acceptable to do that. This patch appears to fix the utterly trivial crash reproduction that Yevgeniy was able to produce with a couple of simple race scripts running in the background. What I've done is add a SOFTLOCKLEAF capability to namei(). If set, and the file/directory exists, namei() will generate an extra VREF() on the vnode and set the VSOFTLOCK flag in vp-v_flag. If the vnode already has VSOFTLOCK set, namei() will return EINVAL. Then in rename() and rmdir() I set SOFTLOCKLEAF for the namei resolution and, of course, clean things up when everything is done. The ufs_rename() and ufs_rmdir() code no longer have to do the IN_RENAME hack at all, because it's now handled. This patch set does not yet include fixes to unionfs or the nfs server and is for informational purposes only. Comments? -Matt Index: kern/vfs_lookup.c === RCS file: /home/ncvs/src/sys/kern/vfs_lookup.c,v retrieving revision 1.38.2.3 diff -u -r1.38.2.3 vfs_lookup.c --- kern/vfs_lookup.c 2001/08/31 19:36:49 1.38.2.3 +++ kern/vfs_lookup.c 2001/10/02 19:04:21 @@ -372,11 +372,20 @@ error = EISDIR; goto bad; } + if ((cnp-cn_flags SOFTLOCKLEAF) + (dp-v_flag VSOFTLOCK)) { + error = EINVAL; + goto bad; + } if (wantparent) { ndp-ni_dvp = dp; VREF(dp); } ndp-ni_vp = dp; + if (cnp-cn_flags SOFTLOCKLEAF) { + VREF(dp); + vsetsoftlock(dp); + } if (!(cnp-cn_flags (LOCKPARENT | LOCKLEAF))) VOP_UNLOCK(dp, 0, p); /* XXX This should probably move to the top of function. */ @@ -565,13 +574,20 @@ error = EROFS; goto bad2; } + if ((cnp-cn_flags SOFTLOCKLEAF) (dp-v_flag VSOFTLOCK)) { + error = EINVAL; + goto bad2; + } if (cnp-cn_flags SAVESTART) { ndp-ni_startdir = ndp-ni_dvp; VREF(ndp-ni_startdir); } if (!wantparent) vrele(ndp-ni_dvp); - + if (cnp-cn_flags SOFTLOCKLEAF) { + VREF(dp); + vsetsoftlock(dp); + } if ((cnp-cn_flags LOCKLEAF) == 0) VOP_UNLOCK(dp, 0, p); return (0); @@ -654,6 +670,15 @@ error = ENOTDIR; goto bad; } + if ((cnp-cn_flags SOFTLOCKLEAF) + (dp-v_flag VSOFTLOCK)) { + error = EINVAL; + goto bad; + } + if (cnp-cn_flags SOFTLOCKLEAF) { + VREF(dp); + vsetsoftlock(dp); + } if (!(cnp-cn_flags LOCKLEAF)) VOP_UNLOCK(dp, 0, p); *vpp = dp; @@ -707,6 +732,10 @@ error = EROFS; goto bad2; } + if ((cnp-cn_flags SOFTLOCKLEAF) (dp-v_flag VSOFTLOCK)) { + error = EINVAL; + goto bad2; + } /* ASSERT(dvp == ndp-ni_startdir) */ if (cnp-cn_flags SAVESTART) VREF(dvp); @@ -718,6 +747,10 @@ ((cnp-cn_flags (NOOBJ|LOCKLEAF)) == LOCKLEAF)) vfs_object_create(dp, cnp-cn_proc, cnp-cn_cred); + if (cnp-cn_flags SOFTLOCKLEAF) { + VREF(dp); + vsetsoftlock(dp); + } if ((cnp-cn_flags LOCKLEAF) == 0) VOP_UNLOCK(dp, 0, p); return (0); Index: kern/vfs_subr.c === RCS file: /home/ncvs/src/sys/kern/vfs_subr.c,v retrieving revision 1.249.2.11 diff -u -r1.249.2.11 vfs_subr.c --- kern/vfs_subr.c 2001/09/11 09:49:53 1.249.2.11 +++ kern/vfs_subr.c 2001/10/02 18:45:55 @@ -2927,6 +2961,12 @@ struct nameidata *ndp; const uint flags; { + if (!(flags NDF_NO_FREE_SOFTLOCKLEAF) + (ndp-ni_cnd.cn_flags SOFTLOCKLEAF) + ndp-ni_vp) { + vclearsoftlock(ndp-ni_vp); + vrele(ndp-ni_vp); + } if (!(flags NDF_NO_FREE_PNBUF)
Re: bleh. Re: ufs_rename panic
In message [EMAIL PROTECTED], Matt Dillon writes: What I've done is add a SOFTLOCKLEAF capability to namei(). If set, and the file/directory exists, namei() will generate an extra VREF() on the vnode and set the VSOFTLOCK flag in vp-v_flag. If the vnode already has VSOFTLOCK set, namei() will return EINVAL. I just tried a more direct approach, which is to implement a flag at the vnode layer that is roughly equivalent to UFS's IN_RENAME flag. This keeps the changes local to vfs_syscalls.c except for the addition of a new vnode flag in vnode.h. A patch is below. It doesn't include the changes to remove IN_RENAME etc, but these could be done later anyway. The basic idea is that the rename syscall locks the source node just for long enough to mark it with VRENAME. It then keeps an extra reference on the source node so that it can clear VRENAME before returning. The syscalls unlink(), rmdir() and rename() also check for VRENAME before proceeding with the operation, and act appropriately if it is found set. One case that is not being handled well is where the target of a rename has VRENAME set; the patch just causes rename to return EINVAL, but a better approach would be to unlock everything and try again. I don't know how to deal with the case of vn_lock(fvp, ...) failing at the end of rename() either. Only lightly tested, so expect lots of bugs... Ian Index: sys/vnode.h === RCS file: /dump/FreeBSD-CVS/src/sys/sys/vnode.h,v retrieving revision 1.157 diff -u -r1.157 vnode.h --- sys/vnode.h 13 Sep 2001 22:52:42 - 1.157 +++ sys/vnode.h 2 Oct 2001 19:06:41 - @@ -163,8 +163,8 @@ #defineVXLOCK 0x00100 /* vnode is locked to change underlying type */ #defineVXWANT 0x00200 /* thread is waiting for vnode */ #defineVBWAIT 0x00400 /* waiting for output to complete */ +#defineVRENAME 0x00800 /* rename operation on progress */ #defineVNOSYNC 0x01000 /* unlinked, stop syncing */ -/* open for business0x01000 */ #defineVOBJBUF 0x02000 /* Allocate buffers in VM object */ #defineVCOPYONWRITE0x04000 /* vnode is doing copy-on-write */ #defineVAGE0x08000 /* Insert vnode at head of free list */ Index: kern/vfs_syscalls.c === RCS file: /dump/FreeBSD-CVS/src/sys/kern/vfs_syscalls.c,v retrieving revision 1.206 diff -u -r1.206 vfs_syscalls.c --- kern/vfs_syscalls.c 22 Sep 2001 03:07:41 - 1.206 +++ kern/vfs_syscalls.c 2 Oct 2001 20:29:54 - @@ -1573,6 +1573,9 @@ if (vp-v_flag VROOT) error = EBUSY; } + /* Claim that the node is already gone if it is being renamed. */ + if (vp-v_flag VRENAME) + error = ENOENT; if (vn_start_write(nd.ni_dvp, mp, V_NOWAIT) != 0) { NDFREE(nd, NDF_ONLY_PNBUF); vrele(vp); @@ -2879,20 +2882,29 @@ struct mount *mp; struct vnode *tvp, *fvp, *tdvp; struct nameidata fromnd, tond; - int error; + int err1, error; bwillwrite(); - NDINIT(fromnd, DELETE, WANTPARENT | SAVESTART, UIO_USERSPACE, - SCARG(uap, from), td); + NDINIT(fromnd, DELETE, WANTPARENT | LOCKLEAF | SAVESTART, + UIO_USERSPACE, SCARG(uap, from), td); if ((error = namei(fromnd)) != 0) return (error); fvp = fromnd.ni_vp; - if ((error = vn_start_write(fvp, mp, V_WAIT | PCATCH)) != 0) { + if (fvp-v_flag VRENAME) + /* The node is being renamed; claim it has already gone. */ + error = ENOENT; + if (!error) + error = vn_start_write(fvp, mp, V_WAIT | PCATCH); + if (error) { NDFREE(fromnd, NDF_ONLY_PNBUF); vrele(fromnd.ni_dvp); - vrele(fvp); + vput(fvp); + fvp = NULL; goto out1; } + fvp-v_flag |= VRENAME; + vref(fvp); + VOP_UNLOCK(fvp, 0, td); NDINIT(tond, RENAME, LOCKPARENT | LOCKLEAF | NOCACHE | SAVESTART | NOOBJ, UIO_USERSPACE, SCARG(uap, to), td); if (fromnd.ni_vp-v_type == VDIR) @@ -2929,6 +2941,10 @@ !bcmp(fromnd.ni_cnd.cn_nameptr, tond.ni_cnd.cn_nameptr, fromnd.ni_cnd.cn_namelen)) error = -1; + if (tvp != NULL (tvp-v_flag VRENAME)) { + /* XXX, should just unlock everything and retry. */ + error = EINVAL; + } out: if (!error) { VOP_LEASE(tdvp, td, td-td_proc-p_ucred, LEASE_WRITE); @@ -2961,6 +2977,18 @@ ASSERT_VOP_UNLOCKED(tond.ni_dvp, rename); ASSERT_VOP_UNLOCKED(tond.ni_vp, rename); out1: + if (fvp != NULL) { + /* We set the VRENAME flag and did an extra vref(fvp) above.
patch #3 (was Re: bleh. Re: ufs_rename panic)
Here is the latest patch I have. It appears to completely solve the problem. I have shims in unionfs and nfs for the moment. The patch is against -stable. * Implements SOFTLOCKLEAF namei() option * Implements EAGAIN error appropriate tsleep/retry code * Universal for rename() rmdir(). Final patch will also probably implement it on unlink() to solve (pre-existing) unlink/rename regular file race. * Tested very well w/ UFS, should be compatible with and work for direct access to other filesystems that still use IN_RENAME. * Tested for collision probability. Answer: Very low even when one tries on purpose. There is no need to implement a more sophisticated fine-grained tsleep. -Matt Index: kern/vfs_lookup.c === RCS file: /home/ncvs/src/sys/kern/vfs_lookup.c,v retrieving revision 1.38.2.3 diff -u -r1.38.2.3 vfs_lookup.c --- kern/vfs_lookup.c 2001/08/31 19:36:49 1.38.2.3 +++ kern/vfs_lookup.c 2001/10/02 20:06:33 @@ -372,6 +372,11 @@ error = EISDIR; goto bad; } + if (cnp-cn_flags SOFTLOCKLEAF) { + if ((error = vsetsoftlock(dp)) != 0) + goto bad; + VREF(dp); + } if (wantparent) { ndp-ni_dvp = dp; VREF(dp); @@ -565,13 +570,17 @@ error = EROFS; goto bad2; } + if (cnp-cn_flags SOFTLOCKLEAF) { + if ((error = vsetsoftlock(dp)) != 0) + goto bad2; + VREF(dp); + } if (cnp-cn_flags SAVESTART) { ndp-ni_startdir = ndp-ni_dvp; VREF(ndp-ni_startdir); } if (!wantparent) vrele(ndp-ni_dvp); - if ((cnp-cn_flags LOCKLEAF) == 0) VOP_UNLOCK(dp, 0, p); return (0); @@ -654,6 +663,11 @@ error = ENOTDIR; goto bad; } + if (cnp-cn_flags SOFTLOCKLEAF) { + if ((error = vsetsoftlock(dp)) != 0) + goto bad; + VREF(dp); + } if (!(cnp-cn_flags LOCKLEAF)) VOP_UNLOCK(dp, 0, p); *vpp = dp; @@ -707,6 +721,11 @@ error = EROFS; goto bad2; } + if (cnp-cn_flags SOFTLOCKLEAF) { + if ((error = vsetsoftlock(dp)) != 0) + goto bad2; + VREF(dp); + } /* ASSERT(dvp == ndp-ni_startdir) */ if (cnp-cn_flags SAVESTART) VREF(dvp); @@ -715,8 +734,9 @@ vrele(dvp); if (vn_canvmio(dp) == TRUE - ((cnp-cn_flags (NOOBJ|LOCKLEAF)) == LOCKLEAF)) + ((cnp-cn_flags (NOOBJ|LOCKLEAF)) == LOCKLEAF)) { vfs_object_create(dp, cnp-cn_proc, cnp-cn_cred); + } if ((cnp-cn_flags LOCKLEAF) == 0) VOP_UNLOCK(dp, 0, p); Index: kern/vfs_subr.c === RCS file: /home/ncvs/src/sys/kern/vfs_subr.c,v retrieving revision 1.249.2.11 diff -u -r1.249.2.11 vfs_subr.c --- kern/vfs_subr.c 2001/09/11 09:49:53 1.249.2.11 +++ kern/vfs_subr.c 2001/10/02 22:55:38 @@ -130,6 +132,8 @@ #endif struct nfs_public nfs_pub; /* publicly exported FS */ static vm_zone_t vnode_zone; +static int vagain_count = 1; +static int vagain_waiting = 0; /* * The workitem queue. @@ -2927,6 +2963,13 @@ struct nameidata *ndp; const uint flags; { + if (!(flags NDF_NO_FREE_SOFTLOCKLEAF) + (ndp-ni_cnd.cn_flags SOFTLOCKLEAF) + ndp-ni_vp) { + vclearsoftlock(ndp-ni_vp); + ndp-ni_cnd.cn_flags = ~SOFTLOCKLEAF; + vrele(ndp-ni_vp); + } if (!(flags NDF_NO_FREE_PNBUF) (ndp-ni_cnd.cn_flags HASBUF)) { zfree(namei_zone, ndp-ni_cnd.cn_pnbuf); @@ -2955,3 +2998,55 @@ ndp-ni_startdir = NULL; } } + +/* + * vsetsoftlock() -set the VSOFTLOCK flag on the vnode, return + * EAGAIN if it has already been set by someone else. + * + * note: we could further refine the collision by setting a VSOFTLOCKCOLL + * flag and then only waking up waiters when the colliding vnode is + * released. However, this sort of collision does not happen often + * enough for such an addition to yield any improvement in performance. + */ + +int +vsetsoftlock(struct vnode *vp) +{ + int s; + int error = 0; + + s = splbio(); + if (vp-v_flag VSOFTLOCK) + error = EAGAIN; + else +
Re: patch #3 (was Re: bleh. Re: ufs_rename panic)
: :Matt Dillon wrote: : Here is the latest patch I have. It appears to completely solve the : problem. I have shims in unionfs and nfs for the moment. : :This seems rather large compared to Ian Dowse's version.. Are you sure that :you're doing this the right way? Adding a whole new locking mechanism :when the simple VRENAME flag to be enough seems like a bit of overkill.. Ian's doesn't fix any of the filesystem semantics bugs, it only prevents the panic from occuring. For example, if you have two hardlinked files residing in different directories both get renamed simultaniously, one of the rename()s can fail even though there is no conflict. If you have a simultanious rmdir() and rename(), the rename() can return an unexpected error code. And so forth. If you remove the filesystem semantics fixes from my patch you essentially get Ian's patch except that I integrated the vnode flag in namei/lookup whereas Ian handles it manually in the syscall code. Also, Ian's patch only effects system calls. It doesn't do the same fixes for nfs (server side) or unionfs. -Matt :I'm not criticizing your work, I am just wondering if you have considered :Ian's work and feel that it is wrong or the wrong direction.. His certainly :appears more elegant than yours so I want to understand why you feel yours :is better than his. : :freebsd-hackers :Message-id: [EMAIL PROTECTED] : :Cheers, :-Peter :-- :Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] :All of this is for nothing if we don't go to the stars - JMS/B5 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bleh. Re: ufs_rename panic
The problems all arise from the fact that we unlock the source while we look up the destination, and when we return to relookup the source, it may have changed/moved/disappeared. The reason to unlock the source before looking up the destination was to avoid deadlocking against ourselves on a lock that we held associated with the source. Since we now allow recursive locks on vnodes, it is no longer necessary to release the source before looking up the destination. So, it seems to me that the correct fix is to *not* release the source after looking it up, but rather hold it locked while we look up the destination. We can completely get rid of relookup and lots of other hairy code and generally make rename much simpler. Am I missing something here? ~Kirk To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: bleh. Re: ufs_rename panic
:The problems all arise from the fact that we unlock the source :while we look up the destination, and when we return to relookup :the source, it may have changed/moved/disappeared. The reason to :unlock the source before looking up the destination was to avoid :deadlocking against ourselves on a lock that we held associated :with the source. Since we now allow recursive locks on vnodes, it :is no longer necessary to release the source before looking up :the destination. So, it seems to me that the correct fix is to :*not* release the source after looking it up, but rather hold it :locked while we look up the destination. We can completely get :rid of relookup and lots of other hairy code and generally make :rename much simpler. Am I missing something here? : : ~Kirk That was the first thing I thought of, but unfortunately it is still possible to deadlock against another process... for example, a process doing an (unrelated) rename in the reverse direction. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message