Re: bleh. Re: ufs_rename panic

2003-02-21 Thread Yevgeniy Aleynikov
As pointed by Ken - we do have alot of file renames (qmail).
But 2-nd solution, directory-only rename serialization, probably won't 
affect performance as much.

But i believe it's not only us who's gonna have problem when exploit 
code will be known by everybody sooner or later

Thanks!

Kirk McKusick wrote:
The potentially slow, but utterly effective way to fix this race
is to only allow one rename at a time per filesystem. It is
implemented by adding a flag in the mount structure and using
it to serialize calls to rename. When only one rename can happen
at a time, the race cannot occur.
If this proves to be too much of a slow down, it would be possible
to only serialize directory renames. As these are (presumably) much
rarer the slow down would be less noticable.
	Kirk McKusick

=-=-=-=-=-=

Date: Wed, 19 Feb 2003 15:10:09 -0800
From: Yevgeniy Aleynikov [EMAIL PROTECTED]
To: Matt Dillon [EMAIL PROTECTED]
CC: Kirk McKusick [EMAIL PROTECTED], Ian Dowse [EMAIL PROTECTED],
   [EMAIL PROTECTED], [EMAIL PROTECTED], Ken Pizzini [EMAIL PROTECTED],
   [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED],
   [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED],
   [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: bleh. Re: ufs_rename panic
X-ASK-Info: Confirmed by User
Just reminder that this problem is local kernel panic DoS (which can do 
filesystem corruption) with very simple trigger code and it still exists.

And it's been almost 2 years since i wrote about it.

Workaround (commenting out panic call) doesnt fix the problem.
Server still crashes (not so often though) from virtual memory failures 
like this:

panic: vm_fault: fault on nofault entry, addr: d0912000
mp_lock = 0102; cpuid = 1; lapic.id = 
boot() called on cpu#1
(kgdb) bt
#0  0xc0175662 in dumpsys ()
#1  0xc017542c in boot ()
#2  0xc0175894 in poweroff_wait ()
#3  0xc01e7c18 in vm_fault ()
#4  0xc0219d32 in trap_pfault ()
#5  0xc021990b in trap ()
#6  0xc01e008a in ufs_dirrewrite ()
#7  0xc01e31a4 in ufs_rename ()
#8  0xc01e4645 in ufs_vnoperate ()
#9  0xc01a9121 in rename ()
#10 0xc021a44d in syscall2 ()
#11 0xc02077cb in Xint0x80_syscall ()
How can i help to resolve this problem ASAP?

Thanks!

Matt Dillon wrote:

   Well, I've gone through hell trying to fix the rename()/rmdir()/remove()
   races and failed utterly.  There are far more race conditions then even
   my last posting indicated, and there are *severe* problems fixing NFS
   to deal with even Ian's suggestion... it turns out that NFS's nfs_namei()
   permanently adjusts the mbuf while processing the path name, making
   restarts impossible.
   The only solution is to implement namei cache path locking and formalize
   the 'nameidata' structure, which means ripping up a lot of code because
   nearly the entire code base currently plays with the contents of 
   'nameidata' willy-nilly.  Nothing else will work.  It's not something
   that I can consider doing now.

   In the mean time I am going to remove the panic()'s in question.  This
   means that in ufs_rename() the machine will silently ignore the race 
   (not do the rename) instead of panic.  It's all that can be done for
   the moment.  It solve the security/attack issue.  We'll have to attack
   the races as a separate issue.  The patch to remove the panics is utterly
   trivial and I will commit it after I test it.

		-Matt





--
Yevgeniy Aleynikov | Sr. Systems Engineer - USE
InfoSpace INC 601 108th Ave NE | Suite 1200 | Bellevue, WA 98004
Tel 425.709.8214 | Fax 425.201.6160 | Mobile 425.418.8924
[EMAIL PROTECTED] | http://www.infospaceinc.com
Discover what you can do.TM

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message


Re: bleh. Re: ufs_rename panic

2003-02-21 Thread Terry Lambert
Yevgeniy Aleynikov wrote:
 As pointed by Ken - we do have alot of file renames (qmail).
 But 2-nd solution, directory-only rename serialization, probably won't
 affect performance as much.
 
 But i believe it's not only us who's gonna have problem when exploit
 code will be known by everybody sooner or later

Dan's non-atomicity assumption on renames is incorrect.

Even if it's were correct, it's possible to recover fully following
a failure, because metadata updates are ordered (there is a real
synchronization between dependent operations).

I think that a workaround would be to comment the directory fsync()
code out of qmail, which apparently thinks it's running on extfs
or an async mounted FFS.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message


Re: bleh. Re: ufs_rename panic

2003-02-21 Thread Kirk McKusick
Date: Fri, 21 Feb 2003 15:26:01 -0800
From: Terry Lambert [EMAIL PROTECTED]
To: Yevgeniy Aleynikov [EMAIL PROTECTED]
CC: Kirk McKusick [EMAIL PROTECTED],
   Matt Dillon [EMAIL PROTECTED],
   Ian Dowse [EMAIL PROTECTED],
   [EMAIL PROTECTED], [EMAIL PROTECTED],
   Ken Pizzini [EMAIL PROTECTED],
   [EMAIL PROTECTED], [EMAIL PROTECTED],
   [EMAIL PROTECTED],
   [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED],
   [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: bleh. Re: ufs_rename panic

Yevgeniy Aleynikov wrote:
 As pointed by Ken - we do have alot of file renames (qmail).
 But 2-nd solution, directory-only rename serialization, probably
 won't affect performance as much.
 
 But i believe it's not only us who's gonna have problem when exploit
 code will be known by everybody sooner or later

Dan's non-atomicity assumption on renames is incorrect.

Even if it's were correct, it's possible to recover fully following
a failure, because metadata updates are ordered (there is a real
synchronization between dependent operations).

I think that a workaround would be to comment the directory fsync()
code out of qmail, which apparently thinks it's running on extfs
or an async mounted FFS.

-- Terry

You cannot get rid of the fsync calls in qmail. You have to distinguish
between a filesystem that is recoverable and one which loses data.
When receiving an incoming message, SMTP requires that the receiver
have the message in stable store before acknowledging receipt. The
only way to know that it is in stable store is to fsync it before
responding.

Kirk McKusick

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message


Re: bleh. Re: ufs_rename panic

2003-02-21 Thread Julian Elischer


On Fri, 21 Feb 2003, Terry Lambert wrote:

 Yevgeniy Aleynikov wrote:
  As pointed by Ken - we do have alot of file renames (qmail).
  But 2-nd solution, directory-only rename serialization, probably won't
  affect performance as much.
  
  But i believe it's not only us who's gonna have problem when exploit
  code will be known by everybody sooner or later
 
 Dan's non-atomicity assumption on renames is incorrect.
 
 Even if it's were correct, it's possible to recover fully following
 a failure, because metadata updates are ordered (there is a real
 synchronization between dependent operations).
 
 I think that a workaround would be to comment the directory fsync()
 code out of qmail, which apparently thinks it's running on extfs
 or an async mounted FFS.

If you don't want to lose mail then qmail needs to do a fsync after it
does the rename.


 
 -- Terry
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-hackers in the body of the message
 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message


Re: bleh. Re: ufs_rename panic

2003-02-21 Thread Terry Lambert
Kirk McKusick wrote:
 Yevgeniy Aleynikov wrote:
  As pointed by Ken - we do have alot of file renames (qmail).
  But 2-nd solution, directory-only rename serialization, probably
  won't affect performance as much.
 
  But i believe it's not only us who's gonna have problem when exploit
  code will be known by everybody sooner or later
 
 Dan's non-atomicity assumption on renames is incorrect.
 
 Even if it's were correct, it's possible to recover fully following
 a failure, because metadata updates are ordered (there is a real
 synchronization between dependent operations).
 
 I think that a workaround would be to comment the directory fsync()
 code out of qmail, which apparently thinks it's running on extfs
 or an async mounted FFS.
 
 -- Terry
 
 You cannot get rid of the fsync calls in qmail. You have to distinguish
 between a filesystem that is recoverable and one which loses data.
 When receiving an incoming message, SMTP requires that the receiver
 have the message in stable store before acknowledging receipt. The
 only way to know that it is in stable store is to fsync it before
 responding.

The issue is specifically with the rename code, which is a metadata
operation, not with the storing of application data.

The fsync's in question are those to the fd of the directory, not
the fd of the application data.  Sorry if it wasn't clear from my
statement that Dan's non-atomicity assumption on renames is
incorrect meant that it only applied to the fsync() calls dealing
with the rename.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message


Re: bleh. Re: ufs_rename panic

2003-02-21 Thread Terry Lambert
Julian Elischer wrote:
 On Fri, 21 Feb 2003, Terry Lambert wrote:
  Dan's non-atomicity assumption on renames is incorrect.
[ ... ]
  I think that a workaround would be to comment the directory fsync()
  code out of qmail, which apparently thinks it's running on extfs
  or an async mounted FFS.
 
 If you don't want to lose mail then qmail needs to do a fsync after it
 does the rename.

See other mail.  I was referring only to fsync() of directory
data.  Sorry if that wasn't clear because of the intervening
paragraph.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message


Re: bleh. Re: ufs_rename panic

2003-02-19 Thread Yevgeniy Aleynikov
Just reminder that this problem is local kernel panic DoS (which can do 
filesystem corruption) with very simple trigger code and it still exists.

And it's been almost 2 years since i wrote about it.


Workaround (commenting out panic call) doesnt fix the problem.
Server still crashes (not so often though) from virtual memory failures 
like this:

panic: vm_fault: fault on nofault entry, addr: d0912000
mp_lock = 0102; cpuid = 1; lapic.id = 
boot() called on cpu#1


(kgdb) bt
#0  0xc0175662 in dumpsys ()
#1  0xc017542c in boot ()
#2  0xc0175894 in poweroff_wait ()
#3  0xc01e7c18 in vm_fault ()
#4  0xc0219d32 in trap_pfault ()
#5  0xc021990b in trap ()
#6  0xc01e008a in ufs_dirrewrite ()
#7  0xc01e31a4 in ufs_rename ()
#8  0xc01e4645 in ufs_vnoperate ()
#9  0xc01a9121 in rename ()
#10 0xc021a44d in syscall2 ()
#11 0xc02077cb in Xint0x80_syscall ()


How can i help to resolve this problem ASAP?

Thanks!

Matt Dillon wrote:
Well, I've gone through hell trying to fix the rename()/rmdir()/remove()
races and failed utterly.  There are far more race conditions then even
my last posting indicated, and there are *severe* problems fixing NFS
to deal with even Ian's suggestion... it turns out that NFS's nfs_namei()
permanently adjusts the mbuf while processing the path name, making
restarts impossible.

The only solution is to implement namei cache path locking and formalize
the 'nameidata' structure, which means ripping up a lot of code because
nearly the entire code base currently plays with the contents of 
'nameidata' willy-nilly.  Nothing else will work.  It's not something
that I can consider doing now.

In the mean time I am going to remove the panic()'s in question.  This
means that in ufs_rename() the machine will silently ignore the race 
(not do the rename) instead of panic.  It's all that can be done for
the moment.  It solve the security/attack issue.  We'll have to attack
the races as a separate issue.  The patch to remove the panics is utterly
trivial and I will commit it after I test it.

		-Matt




--
Yevgeniy Aleynikov | Sr. Systems Engineer - USE
InfoSpace INC 601 108th Ave NE | Suite 1200 | Bellevue, WA 98004
Tel 425.709.8214 | Fax 425.201.6160 | Mobile 425.418.8924
[EMAIL PROTECTED] | http://www.infospaceinc.com

Discover what you can do.TM


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: bleh. Re: ufs_rename panic

2003-02-19 Thread Kirk McKusick
The potentially slow, but utterly effective way to fix this race
is to only allow one rename at a time per filesystem. It is
implemented by adding a flag in the mount structure and using
it to serialize calls to rename. When only one rename can happen
at a time, the race cannot occur.

If this proves to be too much of a slow down, it would be possible
to only serialize directory renames. As these are (presumably) much
rarer the slow down would be less noticable.

Kirk McKusick

=-=-=-=-=-=

Date: Wed, 19 Feb 2003 15:10:09 -0800
From: Yevgeniy Aleynikov [EMAIL PROTECTED]
To: Matt Dillon [EMAIL PROTECTED]
CC: Kirk McKusick [EMAIL PROTECTED], Ian Dowse [EMAIL PROTECTED],
   [EMAIL PROTECTED], [EMAIL PROTECTED], Ken Pizzini [EMAIL PROTECTED],
   [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED],
   [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED],
   [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: bleh. Re: ufs_rename panic
X-ASK-Info: Confirmed by User

Just reminder that this problem is local kernel panic DoS (which can do 
filesystem corruption) with very simple trigger code and it still exists.

And it's been almost 2 years since i wrote about it.


Workaround (commenting out panic call) doesnt fix the problem.
Server still crashes (not so often though) from virtual memory failures 
like this:

panic: vm_fault: fault on nofault entry, addr: d0912000
mp_lock = 0102; cpuid = 1; lapic.id = 
boot() called on cpu#1


(kgdb) bt
#0  0xc0175662 in dumpsys ()
#1  0xc017542c in boot ()
#2  0xc0175894 in poweroff_wait ()
#3  0xc01e7c18 in vm_fault ()
#4  0xc0219d32 in trap_pfault ()
#5  0xc021990b in trap ()
#6  0xc01e008a in ufs_dirrewrite ()
#7  0xc01e31a4 in ufs_rename ()
#8  0xc01e4645 in ufs_vnoperate ()
#9  0xc01a9121 in rename ()
#10 0xc021a44d in syscall2 ()
#11 0xc02077cb in Xint0x80_syscall ()


How can i help to resolve this problem ASAP?

Thanks!

Matt Dillon wrote:
 Well, I've gone through hell trying to fix the rename()/rmdir()/remove()
 races and failed utterly.  There are far more race conditions then even
 my last posting indicated, and there are *severe* problems fixing NFS
 to deal with even Ian's suggestion... it turns out that NFS's nfs_namei()
 permanently adjusts the mbuf while processing the path name, making
 restarts impossible.
 
 The only solution is to implement namei cache path locking and formalize
 the 'nameidata' structure, which means ripping up a lot of code because
 nearly the entire code base currently plays with the contents of 
 'nameidata' willy-nilly.  Nothing else will work.  It's not something
 that I can consider doing now.
 
 In the mean time I am going to remove the panic()'s in question.  This
 means that in ufs_rename() the machine will silently ignore the race 
 (not do the rename) instead of panic.  It's all that can be done for
 the moment.  It solve the security/attack issue.  We'll have to attack
 the races as a separate issue.  The patch to remove the panics is utterly
 trivial and I will commit it after I test it.
 
   -Matt
 
 
 

-- 
Yevgeniy Aleynikov | Sr. Systems Engineer - USE
InfoSpace INC 601 108th Ave NE | Suite 1200 | Bellevue, WA 98004
Tel 425.709.8214 | Fax 425.201.6160 | Mobile 425.418.8924
[EMAIL PROTECTED] | http://www.infospaceinc.com

Discover what you can do.TM


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: bleh. Re: ufs_rename panic

2003-02-19 Thread Mark Hittinger

 McKusick wrote:
 The potentially slow, but utterly effective way to fix this race
 is to only allow one rename at a time per filesystem.

Can we serialize unprivileged renames per mount as an alternate work around?

Later

Mark Hittinger
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: patch #3 (was Re: bleh. Re: ufs_rename panic)

2001-10-19 Thread Yevgeniy Aleynikov


FYI:

http://www.securityfocus.com/cgi-bin/archive.pl?id=1mid=221337start=2001-10-15end=2001-10-21

(about how things done in Linux).


Zhihui Zhang wrote:

 (1) I am always wondering why not use a global rename lock so that there
 is only one rename operation in progress at any time. This method is
 used by GFS and probably Linux.  This could make the code simply. Maybe
 we can even get rid of the relookup() stuff.
 
 This may reduce concurrency, but rename should not be a frequent
 operation.
 

-- 
Yevgeniy Aleynikov
Infospace, Inc.
SysAdmin, USE
Work: (206)357-4594

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: bleh. Re: ufs_rename panic

2001-10-11 Thread Yevgeniy Aleynikov

Here's another stable panic (not very often but on different boxes too).

-- 
Yevgeniy Aleynikov
Infospace, Inc.
SysAdmin, USE
Work: (206)357-4594

SMP 2 cpus
IdlePTD 3039232
initial pcb at 2666a0
panicstr: ffs_valloc: dup alloc
panic messages:
---
panic: ffs_valloc: dup alloc
mp_lock = 0101; cpuid = 1; lapic.id = 
boot() called on cpu#1

syncing disks... 2 


#0  dumpsys () at ../../kern/kern_shutdown.c:473
473 if (dumping++) {
(kgdb) bt
#0  dumpsys () at ../../kern/kern_shutdown.c:473
#1  0xc015e9df in boot (howto=256) at ../../kern/kern_shutdown.c:313
#2  0xc015ede0 in poweroff_wait (junk=0xc0232e01, howto=-1071436320)
at ../../kern/kern_shutdown.c:581
#3  0xc01b6600 in ffs_valloc (pvp=0xdf902600, mode=33188, cred=0xc5b83380, 
vpp=0xdf541c9c) at ../../ufs/ffs/ffs_alloc.c:609
#4  0xc01c32ef in ufs_makeinode (mode=33188, dvp=0xdf902600, vpp=0xdf541edc, 
cnp=0xdf541ef0) at ../../ufs/ufs/ufs_vnops.c:2097
#5  0xc01c0978 in ufs_create (ap=0xdf541df8) at ../../ufs/ufs/ufs_vnops.c:194
#6  0xc01c363d in ufs_vnoperate (ap=0xdf541df8)
at ../../ufs/ufs/ufs_vnops.c:2382
#7  0xc0192c88 in vn_open (ndp=0xdf541ec4, fmode=1538, cmode=420)
at vnode_if.h:106
#8  0xc018ee6c in open (p=0xdf4c9a00, uap=0xdf541f80)
at ../../kern/vfs_syscalls.c:1077
#9  0xc0205011 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, 
  tf_edi = 1, tf_esi = 0, tf_ebp = -1077939600, tf_isp = -548134956, 
  tf_ebx = -1077939604, tf_edx = 1537, tf_ecx = 137102023, tf_eax = 5, 
  tf_trapno = 7, tf_err = 2, tf_eip = 674553728, tf_cs = 31, 
  tf_eflags = 514, tf_esp = -1077939656, tf_ss = 47})
at ../../i386/i386/trap.c:1155
#10 0xc01f291b in Xint0x80_syscall ()



---
 fs-fs_contigdirs[cg]--;
 }
 ino = (ino_t)ffs_hashalloc(pip, cg, (long)ipref, mode,
 (allocfcn_t *)ffs_nodealloccg);
 if (ino == 0)
 goto noinodes;
 error = VFS_VGET(pvp-v_mount, ino, vpp);
 if (error) {
 UFS_VFREE(pvp, ino, mode);
 return (error);
 }
 ip = VTOI(*vpp);
 if (ip-i_mode) {
 printf(mode = 0%o, inum = %lu, fs = %s\n,
 ip-i_mode, (u_long)ip-i_number, fs-fs_fsmnt);
 panic(ffs_valloc: dup alloc);
 }


#3  0xc01b6600 in ffs_valloc (pvp=0xdf902600, mode=33188, cred=0xc5b83380, 
vpp=0xdf541c9c) at ../../ufs/ffs/ffs_alloc.c:609

(kgdb) print *pvp
$6 = {v_flag = 0, v_usecount = 1, v_writecount = 0, v_holdcnt = 2, 
  v_id = 69744400, v_mount = 0xc4174e00, v_op = 0xc4078700, v_freelist = {
tqe_next = 0x0, tqe_prev = 0xe01f999c}, v_mntvnodes = {
le_next = 0xdf949f00, le_prev = 0xe02ef924}, v_cleanblkhd = {
tqh_first = 0xce8697ac, tqh_last = 0xce8697b4}, v_dirtyblkhd = {
tqh_first = 0x0, tqh_last = 0xdf902634}, v_synclist = {
le_next = 0xe034c7c0, le_prev = 0xe04904bc}, v_numoutput = 0, 
  v_type = VDIR, v_un = {vu_mountedhere = 0x0, vu_socket = 0x0, vu_spec = {
  vu_specinfo = 0x0, vu_specnext = {sle_next = 0x0}}, vu_fifoinfo = 0x0}, 
  v_lease = 0x0, v_lastw = 0, v_cstart = 0, v_lasta = 0, v_clen = 0, 
  v_object = 0x0, v_interlock = {lock_data = 0}, v_vnlock = 0xc448d400, 
  v_tag = VT_UFS, v_data = 0xc448d400, v_cache_src = {lh_first = 0xc5630f40}, 
  v_cache_dst = {tqh_first = 0xc5dcfb80, tqh_last = 0xc5dcfb90}, 
  v_dd = 0xdfac9b00, v_ddid = 69744310, v_pollinfo = {vpi_lock = {
  lock_data = 0}, vpi_selinfo = {si_pid = 0, si_note = {slh_first = 0x0}, 
  si_flags = 0}, vpi_events = 0, vpi_revents = 0}, v_vxproc = 0x0}



(kgdb) print **vpp
$2 = {v_flag = 0, v_usecount = 1, v_writecount = 0, v_holdcnt = 0, 
  v_id = 69818641, v_mount = 0xc4174e00, v_op = 0xc4078700, v_freelist = {
tqe_next = 0xdf941dc0, tqe_prev = 0xc02670d0}, v_mntvnodes = {
le_next = 0xdfd79a80, le_prev = 0xc4174e18}, v_cleanblkhd = {
tqh_first = 0x0, tqh_last = 0xdfa89e2c}, v_dirtyblkhd = {tqh_first = 0x0, 
tqh_last = 0xdfa89e34}, v_synclist = {le_next = 0x0, 
le_prev = 0xc407b0cc}, v_numoutput = 0, v_type = VDIR, v_un = {
vu_mountedhere = 0x0, vu_socket = 0x0, vu_spec = {vu_specinfo = 0x0, 
  vu_specnext = {sle_next = 0x0}}, vu_fifoinfo = 0x0}, v_lease = 0x0, 
  v_lastw = 0, v_cstart = 0, v_lasta = 0, v_clen = 0, v_object = 0x0, 
  v_interlock = {lock_data = 0}, v_vnlock = 0xc55f1200, v_tag = VT_UFS, 
  v_data = 0xc55f1200, v_cache_src = {lh_first = 0x0}, v_cache_dst = {
tqh_first = 0x0, tqh_last = 0xdfa89e80}, v_dd = 0xdfa89e00, v_ddid = 0, 
  v_pollinfo = {vpi_lock = {lock_data = 0}, vpi_selinfo = {si_pid = 0, 
  si_note = {slh_first = 0x0}, si_flags = 0}, vpi_events = 0, 
vpi_revents = 0}, v_vxproc = 0x0}

(kgdb) print mode
$3 

Re: bleh. Re: ufs_rename panic

2001-10-08 Thread Wilko Bulte

On Sun, Oct 07, 2001 at 06:09:45PM -0700, Kris Kennaway wrote:
 On Fri, Oct 05, 2001 at 04:17:14PM -0700, Yevgeniy Aleynikov wrote:
 
   And why FreeBSD security officer's email address always bounces my
  mail?
 
 Don't know, it's apparently still working fine for others.

Wild guess: FreeBSD.org's anti-spam getting in your way?

-- 
|   / o / /_  _ email:  [EMAIL PROTECTED]
|/|/ / / /(  (_)  Bulte Arnhem, The Netherlands 

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: bleh. Re: ufs_rename panic

2001-10-07 Thread Matt Dillon

Well, I've gone through hell trying to fix the rename()/rmdir()/remove()
races and failed utterly.  There are far more race conditions then even
my last posting indicated, and there are *severe* problems fixing NFS
to deal with even Ian's suggestion... it turns out that NFS's nfs_namei()
permanently adjusts the mbuf while processing the path name, making
restarts impossible.

The only solution is to implement namei cache path locking and formalize
the 'nameidata' structure, which means ripping up a lot of code because
nearly the entire code base currently plays with the contents of 
'nameidata' willy-nilly.  Nothing else will work.  It's not something
that I can consider doing now.

In the mean time I am going to remove the panic()'s in question.  This
means that in ufs_rename() the machine will silently ignore the race 
(not do the rename) instead of panic.  It's all that can be done for
the moment.  It solve the security/attack issue.  We'll have to attack
the races as a separate issue.  The patch to remove the panics is utterly
trivial and I will commit it after I test it.

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: bleh. Re: ufs_rename panic

2001-10-07 Thread Kris Kennaway

On Fri, Oct 05, 2001 at 04:17:14PM -0700, Yevgeniy Aleynikov wrote:

  And why FreeBSD security officer's email address always bounces my
 mail?

Don't know, it's apparently still working fine for others.

Kris



msg29046/pgp0.pgp
Description: PGP signature


Re: bleh. Re: ufs_rename panic

2001-10-05 Thread Matt Dillon


:
:It seems like there's no activity on this subject.
:This is local DoS, guys. if it gets on public (which is probably gonna
:be soon) everything everywhere will be crashing, and there's no stable
:fix ready.
:How can i help to accelerate this problem solution? 
:
: And why FreeBSD security officer's email address always bounces my
:mail?
:
:Thanks!

My most recently posted patch will solve your problem.  I am reworking
it as per Ian's suggestions before I commit, and will also implement
the same feature in rename() (for files).  Then I will do a standard
commit-to-current-wait-commit-to-stable sequence.  But due to the
complexity of the changes (even after simplifying them), it is going to
be another few days before anything gets into -stable officially.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: patch #3 (was Re: bleh. Re: ufs_rename panic)

2001-10-03 Thread Ian Dowse

In message [EMAIL PROTECTED], Matt Dillon writes:

:This seems rather large compared to Ian Dowse's version..  Are you sure that
:you're doing this the right way?  Adding a whole new locking mechanism
:when the simple VRENAME flag to be enough seems like a bit of overkill..

Matt addresses the problem more completely than my patch does, so
the differences in patch size and files touched are to be expected.
In particular, the NFS server and unionfs code need to be changed
in the same way as the syscalls, and the IN_RENAME flag can be
removed from the ufs code, both of which are included in Matt's
patch.

Ian's doesn't fix any of the filesystem semantics bugs, it only prevents
the panic from occuring.

This is certainly correct, though the IN_RENAME flag in the UFS
code currently has a few such semantics bugs where EINVAL can be
returned in cases that would succeed if rename() was atomic. When
a vnode cannot be renamed/unlinked/rmdir'd because it is being
renamed, the operation should be retried until it succeeds, sleeping
as necessary. As I understand it, this is mostly dealt with by
Matt's patch, but not at all by mine.

If you remove the filesystem semantics fixes from my patch you 
essentially get Ian's patch except that I integrated the vnode flag
in namei/lookup whereas Ian handles it manually in the syscall code.

The addition of the SOFTLOCKLEAF code is quite a major change, so
it would be very useful if you could describe exactly what it does,
what its semantics are, and how it fits into the rename problem.

My understanding of the problem is that VOP_RENAME is quite unique
in that it is the only VOP that must modify entries in two separate
directories. To avoid deadlock, it is not possible (very hard anyway)
to lock all 4 vnodes (source node, source parent, target node,
target parent) before calling VOP_RENAME. Instead, the approach
taken is to lock only the target node and its parent, and have the
VOP_RENAME implementation jump back and forth between locking the
source and locking the target as necessary. Hence VOP_RENAME is
the only VOP that must modify a node that is passed in unlocked.

Because the source node and parent are not locked, there is the
possibility that the source node could be renamed or removed at
any time before VOP_RENAME finally gets around to locking it and
removing it. Something needs to protect the source node against
being renamed/removed between the point that the source node is
initially looked up and the point that it is finally locked. Both
Matt's SOFTLOCKLEAF and the VRENAME flag are there to provide this
protection.

It is the fact that this problem is entirely unique to VOP_RENAME
that leads me to think that adding the generic SOFTLOCKLEAF code
is overkill. The following fragment also suggests that maybe the
approach doesn't actually fit in that well:

fromnd.ni_cnd.cn_flags = ~SOFTLOCKLEAF;/* XXX hack */
error = VOP_RENAME(fromnd.ni_dvp, fromnd.ni_vp, fromnd.ni_cnd,
tond.ni_dvp, tond.ni_vp, tond.ni_cnd);
fromnd.ni_cnd.cn_flags |= SOFTLOCKLEAF;
NDFREE(fromnd, NDF_ONLY_PNBUF  NDF_ONLY_SOFTLOCKLEAF);

The way that vclearsoftlock() is used to clear a flag in an unlocked
vnode is also not ideal. This should probably be protected at least
by v_interlock as other flags are.


The syscalls that need to be changed (rename, unlink, rmdir) could
possibly use vn_* style wrapper functions to reduce the amount of
code that must understand the new locking mechanism, although I'm
not sure if this is practical for the NFS case. It might also be
a good time to remove the WILLRELE from VOP_RENAME, which would
simplify some of the surrounding code.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: patch #3 (was Re: bleh. Re: ufs_rename panic)

2001-10-03 Thread Matt Dillon


:The rename routine is probably the most convoluted in the entire file
:system code (FFS). Now that everybody's memory is fresh, I would like to
:ask something about it:
:
:(1) I am always wondering why not use a global rename lock so that there
:is only one rename operation in progress at any time. This method is
:used by GFS and probably Linux.  This could make the code simply. Maybe 
:we can even get rid of the relookup() stuff.
:
:This may reduce concurrency, but rename should not be a frequent
:operation.

Well, you could say that about virtually any filesystem operation.
Bitmaps are shared, for example.  It is a bad idea to try to code
simplistic solutions to complex problems.  Throughout the code history
of BSDs we have had to constantly make adjustments to algorithms that
were originally not designed to scale past what the authors originally
believed was reasonable.

:(2) In the code of 4.3-release, we grab the source inode while holding the
:locks of target inodes.  In ufs_rename(), we have:
:
:if ((error = vn_lock(fvp, LK_EXCLUSIVE, p)) != 0)
: goto abortit;
:
:I wonder whether this could cause deadlock. I think locking more than
:one inode should be done in some sequence (ie. order them by inode 
:number).

Hmm.  Yes, there might possibly be a problem there.  We may be
safe due to the fact that only directory scans and rename hold multiple
vnodes locked, and in this case the destination directory holding the
destination file is already locked.  However, if the source directory/file
gets ripped out from under rename() the 'new' location of the source
directory/file could cause a deadlock against another process.  It
would be very difficult to generate it though.

:(4) This is not related to rename().  But ufs_mknod() reload the inode 
:through VFS_VGET() to avoid duplicate aliases.  I can not see why it
:is necessary. I asked this before, but have not got any satisfactory
:responses.
:...
:Any ideas are welcome. Thanks,
:
:-Zhihui

I don't know the answer to this at the moment.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: patch #3 (was Re: bleh. Re: ufs_rename panic)

2001-10-03 Thread Matt Dillon

:The addition of the SOFTLOCKLEAF code is quite a major change, so
:it would be very useful if you could describe exactly what it does,
:what its semantics are, and how it fits into the rename problem.

Setting SOFTLOCKLEAF in namei will set the VSOFTLOCK flag in the
returned vnode (whether the returned vnode is locked or not), and
namei() will fail with EAGAIN if the VSOFTLOCK flag is already set.
An extra reference is added to the returned vnode which either
the caller must free or NDFREE must free (note that VOP_RENAME is
not responsible for freeing this extra reference, so the API itself
does not actually change).  The caller must either call vclearsoftlock()
or call NDFREE() with the appropriate flags to clear the flag and
dereference the vnode.

int gc = 0;
vagain(gc);

vagain() is a routine that falls through the first time, initializing
'gc' to a global counter.  If later on you get an EAGAIN from a
SOFTLOCK failure and loop back up to the vagain(), it will block the
process until whomever owns the softlock has 'probably' released it.
This allows the system call to restart internally and attempt the
operation again from scratch.

:Because the source node and parent are not locked, there is the
:possibility that the source node could be renamed or removed at
:any time before VOP_RENAME finally gets around to locking it and
:removing it. Something needs to protect the source node against
:being renamed/removed between the point that the source node is
:initially looked up and the point that it is finally locked. Both
:Matt's SOFTLOCKLEAF and the VRENAME flag are there to provide this
:protection.
:
:It is the fact that this problem is entirely unique to VOP_RENAME
:that leads me to think that adding the generic SOFTLOCKLEAF code
:is overkill. The following fragment also suggests that maybe the
:approach doesn't actually fit in that well:

Well, maybe.  I have my eye on possibly seeing a way to fix
the race-to-root directory scanning program too, so I decided
to implement VSOFTLOCK formally rather then as a hack.

:The way that vclearsoftlock() is used to clear a flag in an unlocked
:vnode is also not ideal. This should probably be protected at least
:by v_interlock as other flags are.
:
:Ian

In -current, definitely.  I'm not sure why v_interlock even exists
in -stable.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: patch #3 (was Re: bleh. Re: ufs_rename panic)

2001-10-03 Thread Zhihui Zhang



On Wed, 3 Oct 2001, Ken Pizzini wrote:

 Zhihui Zhang [EMAIL PROTECTED] wrote:
  (3) Matt says For example, if you have two hardlinked files residing in
  different directories both get renamed simultaniously, one of the 
  rename()s can fail even though there is no conflict
  
  Can you explain this a little bit more?
 
 Consider:
   mkdir foo bar
   echo fubar  foo/a
   ln foo/a bar/a

Should it be:   ln foo/a bar/b   instead?

   mv foo/a foo/b  mv bar/a bar/b
 
 There is no reason why that last line should fail, though it could
 return EINVAL under some situations using some of the proposed
 patches.
 
   --Ken Pizzini
 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: bleh. Re: ufs_rename panic

2001-10-02 Thread Matt Dillon


Ok, I'm adding -hackers... another email thread got going in -committers.

Here is a patch set for -stable.  It isn't perfect but it does appear
to solve the problem.  The one case I don't handle right is if you have
a hardlinked file and two renames in two different directories on the same
file occur at the same time... that will (improperly) return an error 
code when, in fact, it's perfectly acceptable to do that.

This patch appears to fix the utterly trivial crash reproduction that
Yevgeniy was able to produce with a couple of simple race scripts running
in the background.

What I've done is add a SOFTLOCKLEAF capability to namei().  If set, and
the file/directory exists, namei() will generate an extra VREF() on 
the vnode and set the VSOFTLOCK flag in vp-v_flag.  If the vnode already
has VSOFTLOCK set, namei() will return EINVAL.

Then in rename() and rmdir() I set SOFTLOCKLEAF for the namei resolution
and, of course, clean things up when everything is done.

The ufs_rename() and ufs_rmdir() code no longer have to do the IN_RENAME
hack at all, because it's now handled.

This patch set does not yet include fixes to unionfs or the nfs server
and is for informational purposes only.  Comments?

-Matt

Index: kern/vfs_lookup.c
===
RCS file: /home/ncvs/src/sys/kern/vfs_lookup.c,v
retrieving revision 1.38.2.3
diff -u -r1.38.2.3 vfs_lookup.c
--- kern/vfs_lookup.c   2001/08/31 19:36:49 1.38.2.3
+++ kern/vfs_lookup.c   2001/10/02 19:04:21
@@ -372,11 +372,20 @@
error = EISDIR;
goto bad;
}
+   if ((cnp-cn_flags  SOFTLOCKLEAF) 
+   (dp-v_flag  VSOFTLOCK)) {
+   error = EINVAL;
+   goto bad;
+   }
if (wantparent) {
ndp-ni_dvp = dp;
VREF(dp);
}
ndp-ni_vp = dp;
+   if (cnp-cn_flags  SOFTLOCKLEAF) {
+   VREF(dp);
+   vsetsoftlock(dp);
+   }
if (!(cnp-cn_flags  (LOCKPARENT | LOCKLEAF)))
VOP_UNLOCK(dp, 0, p);
/* XXX This should probably move to the top of function. */
@@ -565,13 +574,20 @@
error = EROFS;
goto bad2;
}
+   if ((cnp-cn_flags  SOFTLOCKLEAF)  (dp-v_flag  VSOFTLOCK)) {
+   error = EINVAL;
+   goto bad2;
+   }
if (cnp-cn_flags  SAVESTART) {
ndp-ni_startdir = ndp-ni_dvp;
VREF(ndp-ni_startdir);
}
if (!wantparent)
vrele(ndp-ni_dvp);
-
+   if (cnp-cn_flags  SOFTLOCKLEAF) {
+   VREF(dp);
+   vsetsoftlock(dp);
+   }
if ((cnp-cn_flags  LOCKLEAF) == 0)
VOP_UNLOCK(dp, 0, p);
return (0);
@@ -654,6 +670,15 @@
error = ENOTDIR;
goto bad;
}
+   if ((cnp-cn_flags  SOFTLOCKLEAF) 
+   (dp-v_flag  VSOFTLOCK)) {
+   error = EINVAL;
+   goto bad;
+   }
+   if (cnp-cn_flags  SOFTLOCKLEAF) {
+   VREF(dp);
+   vsetsoftlock(dp);
+   }
if (!(cnp-cn_flags  LOCKLEAF))
VOP_UNLOCK(dp, 0, p);
*vpp = dp;
@@ -707,6 +732,10 @@
error = EROFS;
goto bad2;
}
+   if ((cnp-cn_flags  SOFTLOCKLEAF)  (dp-v_flag  VSOFTLOCK)) {
+   error = EINVAL;
+   goto bad2;
+   }
/* ASSERT(dvp == ndp-ni_startdir) */
if (cnp-cn_flags  SAVESTART)
VREF(dvp);
@@ -718,6 +747,10 @@
((cnp-cn_flags  (NOOBJ|LOCKLEAF)) == LOCKLEAF))
vfs_object_create(dp, cnp-cn_proc, cnp-cn_cred);
 
+   if (cnp-cn_flags  SOFTLOCKLEAF) {
+   VREF(dp);
+   vsetsoftlock(dp);
+   }
if ((cnp-cn_flags  LOCKLEAF) == 0)
VOP_UNLOCK(dp, 0, p);
return (0);
Index: kern/vfs_subr.c
===
RCS file: /home/ncvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.249.2.11
diff -u -r1.249.2.11 vfs_subr.c
--- kern/vfs_subr.c 2001/09/11 09:49:53 1.249.2.11
+++ kern/vfs_subr.c 2001/10/02 18:45:55
@@ -2927,6 +2961,12 @@
  struct nameidata *ndp;
  const uint flags;
 {
+   if (!(flags  NDF_NO_FREE_SOFTLOCKLEAF) 
+   (ndp-ni_cnd.cn_flags  SOFTLOCKLEAF) 
+   ndp-ni_vp) {
+   vclearsoftlock(ndp-ni_vp);
+   vrele(ndp-ni_vp);
+   }
if (!(flags  NDF_NO_FREE_PNBUF) 

Re: bleh. Re: ufs_rename panic

2001-10-02 Thread Ian Dowse

In message [EMAIL PROTECTED], Matt Dillon writes:
What I've done is add a SOFTLOCKLEAF capability to namei().  If set, and
the file/directory exists, namei() will generate an extra VREF() on 
the vnode and set the VSOFTLOCK flag in vp-v_flag.  If the vnode already
has VSOFTLOCK set, namei() will return EINVAL.

I just tried a more direct approach, which is to implement a flag
at the vnode layer that is roughly equivalent to UFS's IN_RENAME
flag. This keeps the changes local to vfs_syscalls.c except for
the addition of a new vnode flag in vnode.h.

A patch is below. It doesn't include the changes to remove IN_RENAME
etc, but these could be done later anyway.

The basic idea is that the rename syscall locks the source node
just for long enough to mark it with VRENAME. It then keeps an
extra reference on the source node so that it can clear VRENAME
before returning. The syscalls unlink(), rmdir() and rename() also
check for VRENAME before proceeding with the operation, and act
appropriately if it is found set. One case that is not being handled
well is where the target of a rename has VRENAME set; the patch just
causes rename to return EINVAL, but a better approach would be to
unlock everything and try again. I don't know how to deal with the
case of vn_lock(fvp, ...) failing at the end of rename() either.

Only lightly tested, so expect lots of bugs...

Ian

Index: sys/vnode.h
===
RCS file: /dump/FreeBSD-CVS/src/sys/sys/vnode.h,v
retrieving revision 1.157
diff -u -r1.157 vnode.h
--- sys/vnode.h 13 Sep 2001 22:52:42 -  1.157
+++ sys/vnode.h 2 Oct 2001 19:06:41 -
@@ -163,8 +163,8 @@
 #defineVXLOCK  0x00100 /* vnode is locked to change underlying type */
 #defineVXWANT  0x00200 /* thread is waiting for vnode */
 #defineVBWAIT  0x00400 /* waiting for output to complete */
+#defineVRENAME 0x00800 /* rename operation on progress */
 #defineVNOSYNC 0x01000 /* unlinked, stop syncing */
-/* open for business0x01000 */
 #defineVOBJBUF 0x02000 /* Allocate buffers in VM object */
 #defineVCOPYONWRITE0x04000 /* vnode is doing copy-on-write */
 #defineVAGE0x08000 /* Insert vnode at head of free list */
Index: kern/vfs_syscalls.c
===
RCS file: /dump/FreeBSD-CVS/src/sys/kern/vfs_syscalls.c,v
retrieving revision 1.206
diff -u -r1.206 vfs_syscalls.c
--- kern/vfs_syscalls.c 22 Sep 2001 03:07:41 -  1.206
+++ kern/vfs_syscalls.c 2 Oct 2001 20:29:54 -
@@ -1573,6 +1573,9 @@
if (vp-v_flag  VROOT)
error = EBUSY;
}
+   /* Claim that the node is already gone if it is being renamed. */
+   if (vp-v_flag  VRENAME)
+   error = ENOENT;
if (vn_start_write(nd.ni_dvp, mp, V_NOWAIT) != 0) {
NDFREE(nd, NDF_ONLY_PNBUF);
vrele(vp);
@@ -2879,20 +2882,29 @@
struct mount *mp;
struct vnode *tvp, *fvp, *tdvp;
struct nameidata fromnd, tond;
-   int error;
+   int err1, error;
 
bwillwrite();
-   NDINIT(fromnd, DELETE, WANTPARENT | SAVESTART, UIO_USERSPACE,
-   SCARG(uap, from), td);
+   NDINIT(fromnd, DELETE, WANTPARENT | LOCKLEAF | SAVESTART,
+   UIO_USERSPACE, SCARG(uap, from), td);
if ((error = namei(fromnd)) != 0)
return (error);
fvp = fromnd.ni_vp;
-   if ((error = vn_start_write(fvp, mp, V_WAIT | PCATCH)) != 0) {
+   if (fvp-v_flag  VRENAME)
+   /* The node is being renamed; claim it has already gone. */
+   error = ENOENT;
+   if (!error)
+   error = vn_start_write(fvp, mp, V_WAIT | PCATCH);
+   if (error) {
NDFREE(fromnd, NDF_ONLY_PNBUF);
vrele(fromnd.ni_dvp);
-   vrele(fvp);
+   vput(fvp);
+   fvp = NULL;
goto out1;
}
+   fvp-v_flag |= VRENAME;
+   vref(fvp);
+   VOP_UNLOCK(fvp, 0, td);
NDINIT(tond, RENAME, LOCKPARENT | LOCKLEAF | NOCACHE | SAVESTART | NOOBJ,
UIO_USERSPACE, SCARG(uap, to), td);
if (fromnd.ni_vp-v_type == VDIR)
@@ -2929,6 +2941,10 @@
!bcmp(fromnd.ni_cnd.cn_nameptr, tond.ni_cnd.cn_nameptr,
  fromnd.ni_cnd.cn_namelen))
error = -1;
+   if (tvp != NULL  (tvp-v_flag  VRENAME)) {
+   /* XXX, should just unlock everything and retry. */
+   error = EINVAL;
+   }
 out:
if (!error) {
VOP_LEASE(tdvp, td, td-td_proc-p_ucred, LEASE_WRITE);
@@ -2961,6 +2977,18 @@
ASSERT_VOP_UNLOCKED(tond.ni_dvp, rename);
ASSERT_VOP_UNLOCKED(tond.ni_vp, rename);
 out1:
+   if (fvp != NULL) {
+   /* We set the VRENAME flag and did an extra vref(fvp) above. 

patch #3 (was Re: bleh. Re: ufs_rename panic)

2001-10-02 Thread Matt Dillon

Here is the latest patch I have.  It appears to completely solve the
problem.  I have shims in unionfs and nfs for the moment.

The patch is against -stable.

* Implements SOFTLOCKLEAF namei() option

* Implements EAGAIN error  appropriate tsleep/retry code

* Universal for rename()  rmdir(). Final patch will also probably
  implement it on unlink() to solve (pre-existing) unlink/rename regular
  file race.

* Tested very well w/ UFS, should be compatible with and work for
  direct access to other filesystems that still use IN_RENAME.

* Tested for collision probability.  Answer: Very low even when
  one tries on purpose.  There is no need to implement a more
  sophisticated fine-grained tsleep.

-Matt


Index: kern/vfs_lookup.c
===
RCS file: /home/ncvs/src/sys/kern/vfs_lookup.c,v
retrieving revision 1.38.2.3
diff -u -r1.38.2.3 vfs_lookup.c
--- kern/vfs_lookup.c   2001/08/31 19:36:49 1.38.2.3
+++ kern/vfs_lookup.c   2001/10/02 20:06:33
@@ -372,6 +372,11 @@
error = EISDIR;
goto bad;
}
+   if (cnp-cn_flags  SOFTLOCKLEAF) {
+   if ((error = vsetsoftlock(dp)) != 0)
+   goto bad;
+   VREF(dp);
+   }
if (wantparent) {
ndp-ni_dvp = dp;
VREF(dp);
@@ -565,13 +570,17 @@
error = EROFS;
goto bad2;
}
+   if (cnp-cn_flags  SOFTLOCKLEAF) {
+   if ((error = vsetsoftlock(dp)) != 0)
+   goto bad2;
+   VREF(dp);
+   }
if (cnp-cn_flags  SAVESTART) {
ndp-ni_startdir = ndp-ni_dvp;
VREF(ndp-ni_startdir);
}
if (!wantparent)
vrele(ndp-ni_dvp);
-
if ((cnp-cn_flags  LOCKLEAF) == 0)
VOP_UNLOCK(dp, 0, p);
return (0);
@@ -654,6 +663,11 @@
error = ENOTDIR;
goto bad;
}
+   if (cnp-cn_flags  SOFTLOCKLEAF) {
+   if ((error = vsetsoftlock(dp)) != 0)
+   goto bad;
+   VREF(dp);
+   }
if (!(cnp-cn_flags  LOCKLEAF))
VOP_UNLOCK(dp, 0, p);
*vpp = dp;
@@ -707,6 +721,11 @@
error = EROFS;
goto bad2;
}
+   if (cnp-cn_flags  SOFTLOCKLEAF) {
+   if ((error = vsetsoftlock(dp)) != 0)
+   goto bad2;
+   VREF(dp);
+   }
/* ASSERT(dvp == ndp-ni_startdir) */
if (cnp-cn_flags  SAVESTART)
VREF(dvp);
@@ -715,8 +734,9 @@
vrele(dvp);
 
if (vn_canvmio(dp) == TRUE 
-   ((cnp-cn_flags  (NOOBJ|LOCKLEAF)) == LOCKLEAF))
+   ((cnp-cn_flags  (NOOBJ|LOCKLEAF)) == LOCKLEAF)) {
vfs_object_create(dp, cnp-cn_proc, cnp-cn_cred);
+   }
 
if ((cnp-cn_flags  LOCKLEAF) == 0)
VOP_UNLOCK(dp, 0, p);
Index: kern/vfs_subr.c
===
RCS file: /home/ncvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.249.2.11
diff -u -r1.249.2.11 vfs_subr.c
--- kern/vfs_subr.c 2001/09/11 09:49:53 1.249.2.11
+++ kern/vfs_subr.c 2001/10/02 22:55:38
@@ -130,6 +132,8 @@
 #endif
 struct nfs_public nfs_pub; /* publicly exported FS */
 static vm_zone_t vnode_zone;
+static int vagain_count = 1;
+static int vagain_waiting = 0;
 
 /*
  * The workitem queue.
@@ -2927,6 +2963,13 @@
  struct nameidata *ndp;
  const uint flags;
 {
+   if (!(flags  NDF_NO_FREE_SOFTLOCKLEAF) 
+   (ndp-ni_cnd.cn_flags  SOFTLOCKLEAF) 
+   ndp-ni_vp) {
+   vclearsoftlock(ndp-ni_vp);
+   ndp-ni_cnd.cn_flags = ~SOFTLOCKLEAF;
+   vrele(ndp-ni_vp);
+   }
if (!(flags  NDF_NO_FREE_PNBUF) 
(ndp-ni_cnd.cn_flags  HASBUF)) {
zfree(namei_zone, ndp-ni_cnd.cn_pnbuf);
@@ -2955,3 +2998,55 @@
ndp-ni_startdir = NULL;
}
 }
+
+/*
+ * vsetsoftlock() -set the VSOFTLOCK flag on the vnode, return
+ * EAGAIN if it has already been set by someone else.
+ *
+ * note: we could further refine the collision by setting a VSOFTLOCKCOLL
+ * flag and then only waking up waiters when the colliding vnode is
+ * released.  However, this sort of collision does not happen often
+ * enough for such an addition to yield any improvement in performance.
+ */
+
+int
+vsetsoftlock(struct vnode *vp)
+{
+   int s;
+   int error = 0;
+
+   s = splbio();
+   if (vp-v_flag  VSOFTLOCK)
+   error = EAGAIN;
+   else
+   

Re: patch #3 (was Re: bleh. Re: ufs_rename panic)

2001-10-02 Thread Matt Dillon


:
:Matt Dillon wrote:
: Here is the latest patch I have.  It appears to completely solve the
: problem.  I have shims in unionfs and nfs for the moment.
:
:This seems rather large compared to Ian Dowse's version..  Are you sure that
:you're doing this the right way?  Adding a whole new locking mechanism
:when the simple VRENAME flag to be enough seems like a bit of overkill..

Ian's doesn't fix any of the filesystem semantics bugs, it only prevents
the panic from occuring.  For example, if you have two hardlinked files
residing in different directories both get renamed simultaniously, one
of the rename()s can fail even though there is no conflict.  If you
have a simultanious rmdir() and rename(), the rename() can return an
unexpected error code.  And so forth.

If you remove the filesystem semantics fixes from my patch you 
essentially get Ian's patch except that I integrated the vnode flag
in namei/lookup whereas Ian handles it manually in the syscall code.

Also, Ian's patch only effects system calls.  It doesn't do the same
fixes for nfs (server side) or unionfs.

-Matt

:I'm not criticizing your work, I am just wondering if you have considered
:Ian's work and feel that it is wrong or the wrong direction..  His certainly
:appears more elegant than yours so I want to understand why you feel yours
:is better than his.
:
:freebsd-hackers
:Message-id: [EMAIL PROTECTED]
:
:Cheers,
:-Peter
:--
:Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
:All of this is for nothing if we don't go to the stars - JMS/B5

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: bleh. Re: ufs_rename panic

2001-10-02 Thread Kirk McKusick

The problems all arise from the fact that we unlock the source
while we look up the destination, and when we return to relookup
the source, it may have changed/moved/disappeared. The reason to
unlock the source before looking up the destination was to avoid
deadlocking against ourselves on a lock that we held associated 
with the source. Since we now allow recursive locks on vnodes, it
is no longer necessary to release the source before looking up
the destination. So, it seems to me that the correct fix is to
*not* release the source after looking it up, but rather hold it
locked while we look up the destination. We can completely get
rid of relookup and lots of other hairy code and generally make
rename much simpler. Am I missing something here?

~Kirk

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: bleh. Re: ufs_rename panic

2001-10-02 Thread Matt Dillon


:The problems all arise from the fact that we unlock the source
:while we look up the destination, and when we return to relookup
:the source, it may have changed/moved/disappeared. The reason to
:unlock the source before looking up the destination was to avoid
:deadlocking against ourselves on a lock that we held associated 
:with the source. Since we now allow recursive locks on vnodes, it
:is no longer necessary to release the source before looking up
:the destination. So, it seems to me that the correct fix is to
:*not* release the source after looking it up, but rather hold it
:locked while we look up the destination. We can completely get
:rid of relookup and lots of other hairy code and generally make
:rename much simpler. Am I missing something here?
:
:   ~Kirk

   That was the first thing I thought of, but unfortunately it
   is still possible to deadlock against another process...
   for example, a process doing an (unrelated) rename in the reverse 
   direction.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message