Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-08-01 Thread Mouse
>> It's worth reminding that -o tcp is an option.
> Not for NFS through a (stateful) filtering router, no.

True.

But then, not over network hops that drop port 2049, either.  Break the
assumptions underlying the 'net and you have to expect breakage from
stuff built atop it.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-08-01 Thread Hauke Fath
At 10:45 Uhr +0200 31.07.2019, Edgar Fuß wrote:
>Thanks to riastradh@, this tuned out to be caused by an (UDP, hard) HFS
>mount combined with a mis-configured IPFilter that blocked all but the
>first fragment of a fragmented NFS reply (e.g., readdir) combined with a
>NetBSD design error (or so Taylor says) that a vnode lock may be held
>accross I/O, in this case, network I/O.

I ran into a similar issue 2004ish, connecting RedHat Linux clients to a
(NetBSD) nfs (udp)  server through a (NetBSD, ipfilter) filtering router.
Darren back then told me Linux sends fragmented packets tail-first, which
ipfilter was not prepared to deal with.

I switched to pf, which was able deal with the scenario just fine, and
didn't look back.

Cheerio,
hauke


--
"It's never straight up and down" (DEVO)




Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-07-31 Thread maya
On Wed, Jul 31, 2019 at 07:11:54AM -0700, Jason Thorpe wrote:
> 
> > On Jul 31, 2019, at 1:45 AM, Edgar Fuß  wrote:
> > 
> > NetBSD design error (or so Taylor says) that a vnode lock may be held 
> > accross I/O
> 
> 100%
> 
> NetBSD's VFS locking protocol needs a serious overhaul.  At least one other 
> BSD-family VFS (the one in XNU) completely eliminated locking of vnodes at 
> the VFS layer (it's all pushed into the file system back-ends who now have 
> more control over their own locking requirements).  It does have some 
> additional complexities around reference / busy counting and vnode identity, 
> but it works very well in practice.
> 
> I don't know what FreeBSD has done in this area.
> 
> -- thorpej
> 

IMNT_MPSAFE, which NFS isn't?


Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-07-31 Thread maya
On Wed, Jul 31, 2019 at 11:42:26AM -0500, Don Lee wrote:
> If you go back a few years, you can find a thread where I reported tstile 
> lockups on PPC. I don’t remember the details, but it was back in 6.1 as I 
> recall. This is not a new problem, and not limited to NFS. I still have a 
> similar problem with my 7.2 system, usually triggered when I do backups 
> (dump/restore). The dump operation locks up and cannot be killed. The system 
> continues, except any process that trips over the tstile also locks up. 
> Eventually, the system grinds to a complete halt. (can’t even log in) If I 
> catch it before that point, I can almost reboot, but I have to power cycle to 
> kill the tstile process(es), or the reboot also hangs.

It's worth reminding that -o tcp is an option.


Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-07-31 Thread Don Lee
If you go back a few years, you can find a thread where I reported tstile 
lockups on PPC. I don’t remember the details, but it was back in 6.1 as I 
recall. This is not a new problem, and not limited to NFS. I still have a 
similar problem with my 7.2 system, usually triggered when I do backups 
(dump/restore). The dump operation locks up and cannot be killed. The system 
continues, except any process that trips over the tstile also locks up. 
Eventually, the system grinds to a complete halt. (can’t even log in) If I 
catch it before that point, I can almost reboot, but I have to power cycle to 
kill the tstile process(es), or the reboot also hangs.

-dgl-

> On Jul 31, 2019, at 9:11 AM, Jason Thorpe  wrote:
> 
> 
>> On Jul 31, 2019, at 1:45 AM, Edgar Fuß  wrote:
>> 
>> NetBSD design error (or so Taylor says) that a vnode lock may be held 
>> accross I/O
> 
> 100%
> 
> NetBSD's VFS locking protocol needs a serious overhaul.  At least one other 
> BSD-family VFS (the one in XNU) completely eliminated locking of vnodes at 
> the VFS layer (it's all pushed into the file system back-ends who now have 
> more control over their own locking requirements).  It does have some 
> additional complexities around reference / busy counting and vnode identity, 
> but it works very well in practice.
> 
> I don't know what FreeBSD has done in this area.
> 
> -- thorpej
> 



Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-07-31 Thread Jason Thorpe


> On Jul 31, 2019, at 1:45 AM, Edgar Fuß  wrote:
> 
> NetBSD design error (or so Taylor says) that a vnode lock may be held accross 
> I/O

100%

NetBSD's VFS locking protocol needs a serious overhaul.  At least one other 
BSD-family VFS (the one in XNU) completely eliminated locking of vnodes at the 
VFS layer (it's all pushed into the file system back-ends who now have more 
control over their own locking requirements).  It does have some additional 
complexities around reference / busy counting and vnode identity, but it works 
very well in practice.

I don't know what FreeBSD has done in this area.

-- thorpej