subject:"NFS lockup after UDP fragments getting lost \(was\: 8.1 tstile lockup after nfs send error 51\)"

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-08-01 Thread Mouse

>> It's worth reminding that -o tcp is an option.
> Not for NFS through a (stateful) filtering router, no.

True.

But then, not over network hops that drop port 2049, either.  Break the
assumptions underlying the 'net and you have to expect breakage from
stuff built atop it.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-08-01 Thread Hauke Fath

At 19:14 Uhr + 31.07.2019, m...@netbsd.org wrote:
>It's worth reminding that -o tcp is an option.

Not for NFS through a (stateful) filtering router, no. Reboot the router,
and you will have to walk up to every client and reboot it. With NFS over
UDP, the clients will recover.

Cheerio,
hauke


--
"It's never straight up and down" (DEVO)

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-08-01 Thread Hauke Fath

At 10:45 Uhr +0200 31.07.2019, Edgar Fuß wrote:
>Thanks to riastradh@, this tuned out to be caused by an (UDP, hard) HFS
>mount combined with a mis-configured IPFilter that blocked all but the
>first fragment of a fragmented NFS reply (e.g., readdir) combined with a
>NetBSD design error (or so Taylor says) that a vnode lock may be held
>accross I/O, in this case, network I/O.

I ran into a similar issue 2004ish, connecting RedHat Linux clients to a
(NetBSD) nfs (udp)  server through a (NetBSD, ipfilter) filtering router.
Darren back then told me Linux sends fragmented packets tail-first, which
ipfilter was not prepared to deal with.

I switched to pf, which was able deal with the scenario just fine, and
didn't look back.

Cheerio,
hauke

--
"It's never straight up and down" (DEVO)

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-07-31 Thread maya

On Wed, Jul 31, 2019 at 07:11:54AM -0700, Jason Thorpe wrote:
> 
> > On Jul 31, 2019, at 1:45 AM, Edgar Fuß  wrote:
> > 
> > NetBSD design error (or so Taylor says) that a vnode lock may be held 
> > accross I/O
> 
> 100%
> 
> NetBSD's VFS locking protocol needs a serious overhaul.  At least one other 
> BSD-family VFS (the one in XNU) completely eliminated locking of vnodes at 
> the VFS layer (it's all pushed into the file system back-ends who now have 
> more control over their own locking requirements).  It does have some 
> additional complexities around reference / busy counting and vnode identity, 
> but it works very well in practice.
> 
> I don't know what FreeBSD has done in this area.
> 
> -- thorpej
> 

IMNT_MPSAFE, which NFS isn't?

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-07-31 Thread maya

On Wed, Jul 31, 2019 at 11:42:26AM -0500, Don Lee wrote:
> If you go back a few years, you can find a thread where I reported tstile 
> lockups on PPC. I don’t remember the details, but it was back in 6.1 as I 
> recall. This is not a new problem, and not limited to NFS. I still have a 
> similar problem with my 7.2 system, usually triggered when I do backups 
> (dump/restore). The dump operation locks up and cannot be killed. The system 
> continues, except any process that trips over the tstile also locks up. 
> Eventually, the system grinds to a complete halt. (can’t even log in) If I 
> catch it before that point, I can almost reboot, but I have to power cycle to 
> kill the tstile process(es), or the reboot also hangs.

It's worth reminding that -o tcp is an option.

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-07-31 Thread Don Lee

If you go back a few years, you can find a thread where I reported tstile 
lockups on PPC. I don’t remember the details, but it was back in 6.1 as I 
recall. This is not a new problem, and not limited to NFS. I still have a 
similar problem with my 7.2 system, usually triggered when I do backups 
(dump/restore). The dump operation locks up and cannot be killed. The system 
continues, except any process that trips over the tstile also locks up. 
Eventually, the system grinds to a complete halt. (can’t even log in) If I 
catch it before that point, I can almost reboot, but I have to power cycle to 
kill the tstile process(es), or the reboot also hangs.

-dgl-

> On Jul 31, 2019, at 9:11 AM, Jason Thorpe  wrote:
> 
> 
>> On Jul 31, 2019, at 1:45 AM, Edgar Fuß  wrote:
>> 
>> NetBSD design error (or so Taylor says) that a vnode lock may be held 
>> accross I/O
> 
> 100%
> 
> NetBSD's VFS locking protocol needs a serious overhaul.  At least one other 
> BSD-family VFS (the one in XNU) completely eliminated locking of vnodes at 
> the VFS layer (it's all pushed into the file system back-ends who now have 
> more control over their own locking requirements).  It does have some 
> additional complexities around reference / busy counting and vnode identity, 
> but it works very well in practice.
> 
> I don't know what FreeBSD has done in this area.
> 
> -- thorpej
>

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-07-31 Thread Jason Thorpe

> On Jul 31, 2019, at 1:45 AM, Edgar Fuß  wrote:
> 
> NetBSD design error (or so Taylor says) that a vnode lock may be held accross 
> I/O

100%

NetBSD's VFS locking protocol needs a serious overhaul.  At least one other 
BSD-family VFS (the one in XNU) completely eliminated locking of vnodes at the 
VFS layer (it's all pushed into the file system back-ends who now have more 
control over their own locking requirements).  It does have some additional 
complexities around reference / busy counting and vnode identity, but it works 
very well in practice.

I don't know what FreeBSD has done in this area.

-- thorpej

NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-07-31 Thread Edgar Fuß

Thanks to riastradh@, this tuned out to be caused by an (UDP, hard) HFS mount 
combined with a mis-configured IPFilter that blocked all but the first fragment 
of a fragmented NFS reply (e.g., readdir) combined with a NetBSD design error 
(or so Taylor says) that a vnode lock may be held accross I/O, in this case, 
network I/O.

It should be reproducable with a default NFS mount and a
block in all with frag-body
IPFilter rule and then trying to readdir.

Now, in some cases, the machine in question recovered after fixing the filter 
rules, in others, it didn't, forcing a reboot. This strikes me as a bug because 
the same lock-up could as well have been caused by network problems instead of 
ipfilter mis-configuration.

It looks like the operation to which the reply was lost sometimes doesn't get 
retried. Do we have some weird bug where the first fragment arriving stops the 
timeout but the blocking of the remaining fragments cause it to wedge?

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

8 matches

Site Navigation

Mail list logo

Footer information