Re: 2.6.24-rc6 reproducible raid5 hang

dean gaudet Sun, 30 Dec 2007 09:58:27 -0800

On Sat, 29 Dec 2007, Dan Williams wrote:

> On Dec 29, 2007 1:58 PM, dean gaudet <[EMAIL PROTECTED]> wrote:
> > On Sat, 29 Dec 2007, Dan Williams wrote:
> >
> > > On Dec 29, 2007 9:48 AM, dean gaudet <[EMAIL PROTECTED]> wrote:
> > > > hmm bummer, i'm doing another test (rsync 3.5M inodes from another box) 
> > > > on
> > > > the same 64k chunk array and had raised the stripe_cache_size to 1024...
> > > > and got a hang.  this time i grabbed stripe_cache_active before bumping
> > > > the size again -- it was only 905 active.  as i recall the bug we were
> > > > debugging a year+ ago the active was at the size when it would hang.  so
> > > > this is probably something new.
> > >
> > > I believe I am seeing the same issue and am trying to track down
> > > whether XFS is doing something unexpected, i.e. I have not been able
> > > to reproduce the problem with EXT3.  MD tries to increase throughput
> > > by letting some stripe work build up in batches.  It looks like every
> > > time your system has hung it has been in the 'inactive_blocked' state
> > > i.e. > 3/4 of stripes active.  This state should automatically
> > > clear...
> >
> > cool, glad you can reproduce it :)
> >
> > i have a bit more data... i'm seeing the same problem on debian's
> > 2.6.22-3-amd64 kernel, so it's not new in 2.6.24.
> >
> 
> This is just brainstorming at this point, but it looks like xfs can
> submit more requests in the bi_end_io path such that it can lock
> itself out of the RAID array.  The sequence that concerns me is:
> 
> return_io->xfs_buf_end_io->xfs_buf_io_end->xfs_buf_iodone_work->xfs_buf_iorequest->make_request-><hang>
> 
> I need verify whether this path is actually triggering, but if we are
> in an inactive_blocked condition this new request will be put on a
> wait queue and we'll never get to the release_stripe() call after
> return_io().  It would be interesting to see if this is new XFS
> behavior in recent kernels.



i have evidence pointing to d89d87965dcbe6fe4f96a2a7e8421b3a75f634d1

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d89d87965dcbe6fe4f96a2a7e8421b3a75f634d1

which was Neil's change in 2.6.22 for deferring generic_make_request
until there's enough stack space for it.

with my git tree sync'd to that commit my test cases fail in under 20
minutes uptime (i rebooted and tested 3x).  sync'd to the commit previous
to it i've got 8h of run-time now without the problem.

this isn't definitive of course since it does seem to be timing
dependent, but since all failures have occured much earlier than that
for me so far i think this indicates this change is either the cause of
the problem or exacerbates an existing raid5 problem.

given that this problem looks like a very rare problem i saw with 2.6.18
(raid5+xfs there too) i'm thinking Neil's commit may just exacerbate an
existing problem... not that i have evidence either way.

i've attached a new kernel log with a hang at d89d87965d... and the
reduced config file i was using for the bisect.  hopefully the hang
looks the same as what we were seeing at 2.6.24-rc6.  let me know.

-dean

kern.log.d89d87965d.bz2
Description: Binary data

config-2.6.21-b1.bz2
Description: Binary data

Re: 2.6.24-rc6 reproducible raid5 hang

Reply via email to