Re: Suddenly frozen fcntl/stat call on NFS over TCP with MTU 9000

2008-09-15 Thread John Baldwin
On Monday 15 September 2008 11:57:02 am Tim Chen wrote:
> Currently I was running a mail server using a netapp filer as backend
> storage.
> >From time to time, the whole system get stuck and lasted for 3-5 minutes.
> But
> after that, everything recovers normally. During the "stuck" moment, using
> ps
> auxw shows 200-300 of mail delivery agent(MDA) processes staying in "D"
> status.
> The command df certainly does not reponse either.

Can you use 'ps axl' to determine the wait mesg ("wchan") of the stuck threads 
when they hang?  If it is "lockf", then make sure you have an up-to-date 
RELENG_6 kernel as there was a recent fix for a "lockf" hang.

Alternatively, if things are stuck in "nfsreq", it may be useful to use 
tcpdump to look at the NFS requests your client is making.  nfsstat can also 
be useful as you can see which counters are increasing during a hang.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Suddenly frozen fcntl/stat call on NFS over TCP with MTU 9000

2008-09-15 Thread Tim Chen
On Tue, Sep 16, 2008 at 4:06 AM, John Baldwin <[EMAIL PROTECTED]> wrote:

> On Monday 15 September 2008 11:57:02 am Tim Chen wrote:
> > Currently I was running a mail server using a netapp filer as backend
> > storage.
> > >From time to time, the whole system get stuck and lasted for 3-5
> minutes.
> > But
> > after that, everything recovers normally. During the "stuck" moment,
> using
> > ps
> > auxw shows 200-300 of mail delivery agent(MDA) processes staying in "D"
> > status.
> > The command df certainly does not reponse either.
>
> Can you use 'ps axl' to determine the wait mesg ("wchan") of the stuck
> threads
> when they hang?  If it is "lockf", then make sure you have an up-to-date
> RELENG_6 kernel as there was a recent fix for a "lockf" hang.
>

Thanks for your suggestion. After trying to 'ps axl', it seems all the "D
status" process were in nfs,nfsreq,nfsreq. Can you give some hint how to
keep delving the problem?

My system is RELENG_7 within one week, I always make world to keep my system
up to date.


>
> Alternatively, if things are stuck in "nfsreq", it may be useful to use
> tcpdump to look at the NFS requests your client is making.  nfsstat can
> also
> be useful as you can see which counters are increasing during a hang.
>
> When system was stuck, counters of nfsstat grows slowly. It seems only
read, write, create, remove in RPC counts were increased.

As to tcpdump, since I am not familiar with that, I will try to read some
doc and make some tests.

Thanks very much for your kindly help. Hope the problem can be solved soon.

Sincerely,
Tim Chen
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Suddenly frozen fcntl/stat call on NFS over TCP with MTU 9000

2008-09-16 Thread John Baldwin
On Tuesday 16 September 2008 02:02:14 am Tim Chen wrote:
> On Tue, Sep 16, 2008 at 4:06 AM, John Baldwin <[EMAIL PROTECTED]> wrote:
> 
> > On Monday 15 September 2008 11:57:02 am Tim Chen wrote:
> > > Currently I was running a mail server using a netapp filer as backend
> > > storage.
> > > >From time to time, the whole system get stuck and lasted for 3-5
> > minutes.
> > > But
> > > after that, everything recovers normally. During the "stuck" moment,
> > using
> > > ps
> > > auxw shows 200-300 of mail delivery agent(MDA) processes staying in "D"
> > > status.
> > > The command df certainly does not reponse either.
> >
> > Can you use 'ps axl' to determine the wait mesg ("wchan") of the stuck
> > threads
> > when they hang?  If it is "lockf", then make sure you have an up-to-date
> > RELENG_6 kernel as there was a recent fix for a "lockf" hang.
> >
> 
> Thanks for your suggestion. After trying to 'ps axl', it seems all the "D
> status" process were in nfs,nfsreq,nfsreq. Can you give some hint how to
> keep delving the problem?
> 
> My system is RELENG_7 within one week, I always make world to keep my system
> up to date.
> 
> 
> >
> > Alternatively, if things are stuck in "nfsreq", it may be useful to use
> > tcpdump to look at the NFS requests your client is making.  nfsstat can
> > also
> > be useful as you can see which counters are increasing during a hang.
> >
> > When system was stuck, counters of nfsstat grows slowly. It seems only
> read, write, create, remove in RPC counts were increased.
> 
> As to tcpdump, since I am not familiar with that, I will try to read some
> doc and make some tests.
> 
> Thanks very much for your kindly help. Hope the problem can be solved soon.

Also, do the nfsstats thing I suggested.  During a hang, you can do something 
like 'nfsstat > one ; sleep 1 ; nfsstat > two' and compare the 'one' 
and 'two' files to see which counters (if any) are being bumped during the 
hang.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"