subject:"Processes spinning forever, apparently in lock_timer

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-10-29 Thread Bruno Wolff III

On Fri, Sep 21, 2007 at 16:58:15 +0100,
  Hugh Dickins <[EMAIL PROTECTED]> wrote:
> 
> But once I look harder at it, I wonder what would have kept
> 2.6.18 to 2.6.23 safe from the same issue: per-cpu deltas from
> the global vm stats too low to get synched back to global, yet
> adding up to something which misleads balance_dirty_pages into
> an indefinite loop e.g. total nr_writeback actually 0, but
> appearing more than dirty_thresh in the global approximation.

I have been seeing this problem since Fedora Core 5's 2.6.20-1.2300.fc5
kernel. The last Fedora kernel that didn't have this problem was
2.6.19-1.2288.2.4.fc5, which I am still running on one machine.

I have put up vmstat and proc/meminfo data at: http://wolff.to/bruno/vmlog
There are a number of iterations of the output and they are mixed together,
though the vmstat info was written when a buffer was full, so there are
chunks of lines together. This was from a machine running Fedora 7's
2.6.23.1-4.fc7 kernel.

The machine is a single processor P3 with 512 MB of memory. There are
three hard drives, two of which are used in software raid 1 arrays.
I have ext3 file systems mounted noatime. I can provide more details if
you need anything else.

I can get lockups to happen fairly often. Often lots of firefox activity
and/or some other disk heavy task such as yum update will get it to happen.

I am not subscribed to LMKL, so please keep me copied on replies.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-10-29 Thread Bruno Wolff III

On Fri, Sep 21, 2007 at 16:58:15 +0100,
  Hugh Dickins [EMAIL PROTECTED] wrote:
 
 But once I look harder at it, I wonder what would have kept
 2.6.18 to 2.6.23 safe from the same issue: per-cpu deltas from
 the global vm stats too low to get synched back to global, yet
 adding up to something which misleads balance_dirty_pages into
 an indefinite loop e.g. total nr_writeback actually 0, but
 appearing more than dirty_thresh in the global approximation.

I have been seeing this problem since Fedora Core 5's 2.6.20-1.2300.fc5
kernel. The last Fedora kernel that didn't have this problem was
2.6.19-1.2288.2.4.fc5, which I am still running on one machine.

I have put up vmstat and proc/meminfo data at: http://wolff.to/bruno/vmlog
There are a number of iterations of the output and they are mixed together,
though the vmstat info was written when a buffer was full, so there are
chunks of lines together. This was from a machine running Fedora 7's
2.6.23.1-4.fc7 kernel.

The machine is a single processor P3 with 512 MB of memory. There are
three hard drives, two of which are used in software raid 1 arrays.
I have ext3 file systems mounted noatime. I can provide more details if
you need anything else.

I can get lockups to happen fairly often. Often lots of firefox activity
and/or some other disk heavy task such as yum update will get it to happen.

I am not subscribed to LMKL, so please keep me copied on replies.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-22 Thread richard kennedy

On Fri, 2007-09-21 at 03:33 -0700, Andrew Morton wrote:
> On Fri, 21 Sep 2007 11:25:41 +0100 richard kennedy <[EMAIL PROTECTED]> wrote:
> 
> > > That's all a bit crappy if the wrong races happen and some other task is
> > > somehow exceeding the dirty limits each time this task polls them.  Seems
> > > unlikely that such a condition would persist forever.
> > > 
> > > So the question is, why do we have large amounts of dirty pages for one
> > > disk which appear to be sitting there not getting written?
> > 
> > The lockup I'm seeing intermittently occurs when I have 2+ tasks copying
> > large files (1Gb+) on sda & a small read-mainly mysql db app running on
> > sdb. The lockup seems to happen just after the copies finish -- there
> > are lots of dirty pages but nothing left to write them until kupdate
> > gets round to it. 
> 
> Then what happens?  The system recovers?
when my system is locked up I get this from sysrq-w
(2.6.23-rc7)

SysRq : Show Blocked State
auditdD 8100b422fd28 0  1999  1
 8100b422fd78 0086  8100b4103020
 0286 8100b4103020 8100bf8af020 8100b41032b8
 0001 0001  0003
Call Trace:
 [] :jbd:log_wait_commit+0xa3/0xf5
 [] autoremove_wake_function+0x0/0x2e
 [] :jbd:journal_stop+0x1be/0x1ee
 [] __writeback_single_inode+0x1f4/0x332
 [] vfs_statfs_native+0x29/0x34
 [] sync_inode+0x24/0x36
 [] :ext3:ext3_sync_file+0xb4/0xc8
 [] mutex_lock+0x1e/0x2f
 [] do_fsync+0x52/0xa4
 [] __do_fsync+0x23/0x36
 [] system_call+0x7e/0x83

syslogd   D 8100b3f67d28 0  2022  1
 8100b3f67d78 0086  0001
 0003 810037c66810 8100bf8af020 810037c66aa8
 0001 0001  0003
Call Trace:
 [] :jbd:log_wait_commit+0xa3/0xf5
 [] autoremove_wake_function+0x0/0x2e
 [] :jbd:journal_stop+0x1be/0x1ee
 [] __writeback_single_inode+0x1f4/0x332
 [] sync_inode+0x24/0x36
 [] :ext3:ext3_sync_file+0xb4/0xc8
 [] mutex_lock+0x1e/0x2f
 [] do_fsync+0x52/0xa4
 [] __do_fsync+0x23/0x36
 [] tracesys+0xdc/0xe1


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-22 Thread richard kennedy

On Fri, 2007-09-21 at 03:33 -0700, Andrew Morton wrote:
 On Fri, 21 Sep 2007 11:25:41 +0100 richard kennedy [EMAIL PROTECTED] wrote:
 
   That's all a bit crappy if the wrong races happen and some other task is
   somehow exceeding the dirty limits each time this task polls them.  Seems
   unlikely that such a condition would persist forever.
   
   So the question is, why do we have large amounts of dirty pages for one
   disk which appear to be sitting there not getting written?
  
  The lockup I'm seeing intermittently occurs when I have 2+ tasks copying
  large files (1Gb+) on sda  a small read-mainly mysql db app running on
  sdb. The lockup seems to happen just after the copies finish -- there
  are lots of dirty pages but nothing left to write them until kupdate
  gets round to it. 
 
 Then what happens?  The system recovers?
when my system is locked up I get this from sysrq-w
(2.6.23-rc7)

SysRq : Show Blocked State
auditdD 8100b422fd28 0  1999  1
 8100b422fd78 0086  8100b4103020
 0286 8100b4103020 8100bf8af020 8100b41032b8
 0001 0001  0003
Call Trace:
 [8802c8e9] :jbd:log_wait_commit+0xa3/0xf5
 [810482d9] autoremove_wake_function+0x0/0x2e
 [8802770b] :jbd:journal_stop+0x1be/0x1ee
 [810a79f7] __writeback_single_inode+0x1f4/0x332
 [8108c0b0] vfs_statfs_native+0x29/0x34
 [810a8362] sync_inode+0x24/0x36
 [8803c350] :ext3:ext3_sync_file+0xb4/0xc8
 [81240730] mutex_lock+0x1e/0x2f
 [810aa7fa] do_fsync+0x52/0xa4
 [810aa86f] __do_fsync+0x23/0x36
 [8100bc8e] system_call+0x7e/0x83

syslogd   D 8100b3f67d28 0  2022  1
 8100b3f67d78 0086  0001
 0003 810037c66810 8100bf8af020 810037c66aa8
 0001 0001  0003
Call Trace:
 [8802c8e9] :jbd:log_wait_commit+0xa3/0xf5
 [810482d9] autoremove_wake_function+0x0/0x2e
 [8802770b] :jbd:journal_stop+0x1be/0x1ee
 [810a79f7] __writeback_single_inode+0x1f4/0x332
 [810a8362] sync_inode+0x24/0x36
 [8803c350] :ext3:ext3_sync_file+0xb4/0xc8
 [81240730] mutex_lock+0x1e/0x2f
 [810aa7fa] do_fsync+0x52/0xa4
 [810aa86f] __do_fsync+0x23/0x36
 [8100be0c] tracesys+0xdc/0xe1


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread Peter Zijlstra

On Fri, 21 Sep 2007 16:58:15 +0100 (BST) Hugh Dickins
<[EMAIL PROTECTED]> wrote:

> But once I look harder at it, I wonder what would have kept
> 2.6.18 to 2.6.23 safe from the same issue: per-cpu deltas from
> the global vm stats too low to get synched back to global, yet
> adding up to something which misleads balance_dirty_pages into
> an indefinite loop e.g. total nr_writeback actually 0, but
> appearing more than dirty_thresh in the global approximation.

This could only happen when: dirty_thresh < nr_cpus * per_cpu_max_delta

> Looking at the 2.6.18-2.6.23 code, I'm uncertain what to try instead.
> There is a refresh_vm_stats function which we could call (then retest
> the break condition) just before resorting to congestion_wait.  But
> the big NUMA people might get very upset with me calling that too
> often: causing a thundering herd of bouncing cachelines which that
> was all designed to avoid.  And it's not obvious to me what condition
> to test for dirty_thresh "too low".

That could be modeled on the error limit I have. For this particular
case that would end up looking like:

  nr_online_cpus * pcp->stat_threshold.

> I believe Peter gave all this quite a lot of thought when he was
> making the rc6-mm1 changes, and I'd rather defer to him for a
> suggestion of what best to do in earlier releases.  Or maybe he'll
> just point out how this couldn't have been a problem before.

As outlined above, and I don't think we'll ever have such a low
dirty_limit. But who knows :-)

> Or there is is Richard's patch, which I haven't considered, but
> Andrew was not quite satisfied with it - partly because he'd like
> to understand how the situation could come about first, perhaps
> we have now got an explanation.

I'm with Andrew on this, that is, quite puzzled on how all this arises.

Testing those writeback-fix-* patches might help rule out (or point to)
a mis-function of pdflush.

The theory that one task will spin in balance_dirty_pages() on a
bdi that does not actually have many dirty pages, doesn't sound
plausible because eventually the total dirty count (well, actually
dirty+unstable+writeback) should subside again. This theory can
cause crappy latencies, but should not 'hang' the machine.

> (The original bug report was indeed on SMP, but I haven't seen
> anyone say that's a necessary condition for the hang: it would
> be if this is the issue.  And Richard writes at one point of the
> system only responding to AltSysRq: that would be surprising for
> this issue, though it's possible that a task in balance_dirty_pages
> is holding an i_mutex that everybody else comes to need.)

Are we actually holding i_mutex on paths that lead into
balance_dirty_pages? that does (from my admittedly limited knowledge of
the vfs) sound like trouble, since we'd need it to complete writeback.

All quite puzzling.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread Chuck Ebbert

On 09/21/2007 11:58 AM, Hugh Dickins wrote:
> Looking at the 2.6.18-2.6.23 code, I'm uncertain what to try instead.
> There is a refresh_vm_stats function which we could call (then retest
> the break condition) just before resorting to congestion_wait.  But
> the big NUMA people might get very upset with me calling that too
> often: causing a thundering herd of bouncing cachelines which that
> was all designed to avoid.  And it's not obvious to me what condition
> to test for dirty_thresh "too low".
> 
> I believe Peter gave all this quite a lot of thought when he was
> making the rc6-mm1 changes, and I'd rather defer to him for a
> suggestion of what best to do in earlier releases.  Or maybe he'll
> just point out how this couldn't have been a problem before.
> 
> Or there is is Richard's patch, which I haven't considered, but
> Andrew was not quite satisfied with it - partly because he'd like
> to understand how the situation could come about first, perhaps
> we have now got an explanation.
> 

My patch is just Richard's with the added requirement that we loop
some minimum number of times before exiting balance_dirty_pages()
with unfinished work.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread Hugh Dickins

On Fri, 21 Sep 2007, Andy Whitcroft wrote:
> This sounds an awful lot like the same problem I reported with fsck
> hanging.  I believe that Hugh had a candidate patch for that, which was
> related to dirty tracking limits.  It seems that that patch tested, and
> acked by Peter.  All on lkml under:
> 
>   2.6.23-rc6-mm1 -- mkfs stuck in 'D'

You may well be right.

My initial reaction was to point out that my patch was to a 
2.6.23-rc6-mm1 implementation detail, whereas this thread is
about a problem seen since 2.6.20 or perhaps earlier.

But once I look harder at it, I wonder what would have kept
2.6.18 to 2.6.23 safe from the same issue: per-cpu deltas from
the global vm stats too low to get synched back to global, yet
adding up to something which misleads balance_dirty_pages into
an indefinite loop e.g. total nr_writeback actually 0, but
appearing more than dirty_thresh in the global approximation.

Certainly my rc6-mm1 patch won't work here, and all it was doing
was apply some safety already added by Peter to one further case.

Looking at the 2.6.18-2.6.23 code, I'm uncertain what to try instead.
There is a refresh_vm_stats function which we could call (then retest
the break condition) just before resorting to congestion_wait.  But
the big NUMA people might get very upset with me calling that too
often: causing a thundering herd of bouncing cachelines which that
was all designed to avoid.  And it's not obvious to me what condition
to test for dirty_thresh "too low".

I believe Peter gave all this quite a lot of thought when he was
making the rc6-mm1 changes, and I'd rather defer to him for a
suggestion of what best to do in earlier releases.  Or maybe he'll
just point out how this couldn't have been a problem before.

Or there is is Richard's patch, which I haven't considered, but
Andrew was not quite satisfied with it - partly because he'd like
to understand how the situation could come about first, perhaps
we have now got an explanation.

(The original bug report was indeed on SMP, but I haven't seen
anyone say that's a necessary condition for the hang: it would
be if this is the issue.  And Richard writes at one point of the
system only responding to AltSysRq: that would be surprising for
this issue, though it's possible that a task in balance_dirty_pages
is holding an i_mutex that everybody else comes to need.)

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread Chuck Ebbert

On 09/21/2007 05:39 AM, Andy Whitcroft wrote:
> This sounds an awful lot like the same problem I reported with fsck
> hanging.  I believe that Hugh had a candidate patch for that, which was
> related to dirty tracking limits.  It seems that that patch tested, and
> acked by Peter.  All on lkml under:
> 
>   2.6.23-rc6-mm1 -- mkfs stuck in 'D'
> 

Unfortunately that whole thread is about fixing bugs in an already
much-changed balance_dirty_pages() in -mm.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread richard kennedy

On Fri, 2007-09-21 at 03:33 -0700, Andrew Morton wrote:
> On Fri, 21 Sep 2007 11:25:41 +0100 richard kennedy <[EMAIL PROTECTED]> wrote:
> 
> > > That's all a bit crappy if the wrong races happen and some other task is
> > > somehow exceeding the dirty limits each time this task polls them.  Seems
> > > unlikely that such a condition would persist forever.
> > > 
> > > So the question is, why do we have large amounts of dirty pages for one
> > > disk which appear to be sitting there not getting written?
> > 
> > The lockup I'm seeing intermittently occurs when I have 2+ tasks copying
> > large files (1Gb+) on sda & a small read-mainly mysql db app running on
> > sdb. The lockup seems to happen just after the copies finish -- there
> > are lots of dirty pages but nothing left to write them until kupdate
> > gets round to it. 
> 
> Then what happens?  The system recovers?
Nothing -- it stays stuck forever.
I don't think kupdate is getting started, I added some debug in there
but haven't found out anything useful yet. But I am trying to build a
better test case, the one I've got at the moment can take hours to
trigger this problem. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread Andrew Morton

On Fri, 21 Sep 2007 11:25:41 +0100 richard kennedy <[EMAIL PROTECTED]> wrote:

> > That's all a bit crappy if the wrong races happen and some other task is
> > somehow exceeding the dirty limits each time this task polls them.  Seems
> > unlikely that such a condition would persist forever.
> > 
> > So the question is, why do we have large amounts of dirty pages for one
> > disk which appear to be sitting there not getting written?
> 
> The lockup I'm seeing intermittently occurs when I have 2+ tasks copying
> large files (1Gb+) on sda & a small read-mainly mysql db app running on
> sdb. The lockup seems to happen just after the copies finish -- there
> are lots of dirty pages but nothing left to write them until kupdate
> gets round to it. 

Then what happens?  The system recovers?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread richard kennedy

On Thu, 2007-09-20 at 15:36 -0700, Andrew Morton wrote:
> On Thu, 20 Sep 2007 18:04:38 -0400
> Chuck Ebbert <[EMAIL PROTECTED]> wrote:
> 
> > > 
> > >> Can we get some kind of band-aid, like making the endless 'for' loop in
> > >> balance_dirty_pages() terminate after some number of iterations? Clearly
> > >> if we haven't written "write_chunk" pages after a few tries, *and* we
> > >> haven't encountered congestion, there's no point in trying forever...
> > > 
> > > Did my above questions get looked at?
> > > 
> > > Is anyone able to reproduce this?
> > > 
> > > Do we have a clue what's happening?
> > 
> > There are a ton of dirty pages for one disk, and zero or close to zero dirty
> > for a different one. Kernel spins forever trying to write some arbitrary
> > minimum amount of data ("write_chunk" pages) to the second disk...
> 
> That should be OK.  The caller will sit in that loop, sleeping in
> congestion_wait(), polling the correct backing-dev occasionally and waiting
> until the dirty limits subside to an acceptable limit, at which stage this:
> 
>   if (nr_reclaimable +
>   global_page_state(NR_WRITEBACK)
>   <= dirty_thresh)
>   break;
> 
> 
> will happen and we leave balance_dirty_pages().
> 
> That's all a bit crappy if the wrong races happen and some other task is
> somehow exceeding the dirty limits each time this task polls them.  Seems
> unlikely that such a condition would persist forever.
> 
> So the question is, why do we have large amounts of dirty pages for one
> disk which appear to be sitting there not getting written?

The lockup I'm seeing intermittently occurs when I have 2+ tasks copying
large files (1Gb+) on sda & a small read-mainly mysql db app running on
sdb. The lockup seems to happen just after the copies finish -- there
are lots of dirty pages but nothing left to write them until kupdate
gets round to it. 
BTW kupdate can loop for long periods of time when a disk is under this
kind of load -- I regularly see it take over 20 seconds. and often it's
unable to start as there are no pdflush threads available.

> Do we know if there's any writeout at all happening when the system is in
> this state?
> 
No there doesn't seem to be any activity at all -- my machine is
completely unresponsive only sysrq works.


> I guess it's possible that the dirty inodes on the "other" disk got
> themselves onto the wrong per-sb inode list, or are on the correct list,
> but in the correct place.  If so, these:
> 
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-2.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-3.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-4.patch
> writeback-fix-comment-use-helper-function.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-5.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-6.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-7.patch
> writeback-fix-periodic-superblock-dirty-inode-flushing.patch
> 
> from 2.6.23-rc6-mm1 should help.
 

> Did anyone try running /bin/sync when the system is in this state?
I'm not able to run anything in this state, but sysrq-s doesn't make any
difference.

Richard 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread Andy Whitcroft

This sounds an awful lot like the same problem I reported with fsck
hanging.  I believe that Hugh had a candidate patch for that, which was
related to dirty tracking limits.  It seems that that patch tested, and
acked by Peter.  All on lkml under:

2.6.23-rc6-mm1 -- mkfs stuck in 'D'

-apw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread Andrew Morton

On Fri, 21 Sep 2007 10:08:08 +0200 Matthias Hensler <[EMAIL PROTECTED]> wrote:

> On Thu, Sep 20, 2007 at 03:36:54PM -0700, Andrew Morton wrote:
> > That's all a bit crappy if the wrong races happen and some other task
> > is somehow exceeding the dirty limits each time this task polls them.
> > Seems unlikely that such a condition would persist forever.
> 
> How exactly do you define forever? It looks to me, that this condition
> never resolves on its own, at least not in a window of several hours
> (the system used to get stuck around 3am and was normally rebooted
> between 7am and 8am, so at least hours are not enough to have the problem
> resolve on its own).

That's forever.

> > So the question is, why do we have large amounts of dirty pages for
> > one disk which appear to be sitting there not getting written?
> 
> Unfortunately I have no idea. The full stacktrace for all processes was
> attached to the original bugreport, maybe that can give a clue to that.
> 
> > Do we know if there's any writeout at all happening when the system is
> > in this state?
> 
> Not sure about that. The system is responsible on a still open SSH
> session, allowing several tasks still to be executed. However, that SSH
> session gets stuck very fast if the wrong commands are executed. New
> logins are not possible (Connection is akzepted but resulting in a
> timeout 60 seconds later, so most likely /bin/login is not able to log
> into wtmp).
> 
> From all that I suspect that there is no more write activity on that
> system, but cannot say for sure.

Easiest would be to run `vmstat 1' in a separate ssh session then just
leave it running.

> > Did anyone try running /bin/sync when the system is in this state?
> 
> I did not, no.

Would be interesting if poss, please.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread Matthias Hensler

On Thu, Sep 20, 2007 at 03:36:54PM -0700, Andrew Morton wrote:
> That's all a bit crappy if the wrong races happen and some other task
> is somehow exceeding the dirty limits each time this task polls them.
> Seems unlikely that such a condition would persist forever.

How exactly do you define forever? It looks to me, that this condition
never resolves on its own, at least not in a window of several hours
(the system used to get stuck around 3am and was normally rebooted
between 7am and 8am, so at least hours are not enough to have the problem
resolve on its own).

> So the question is, why do we have large amounts of dirty pages for
> one disk which appear to be sitting there not getting written?

Unfortunately I have no idea. The full stacktrace for all processes was
attached to the original bugreport, maybe that can give a clue to that.

> Do we know if there's any writeout at all happening when the system is
> in this state?

Not sure about that. The system is responsible on a still open SSH
session, allowing several tasks still to be executed. However, that SSH
session gets stuck very fast if the wrong commands are executed. New
logins are not possible (Connection is akzepted but resulting in a
timeout 60 seconds later, so most likely /bin/login is not able to log
into wtmp).

From all that I suspect that there is no more write activity on that
system, but cannot say for sure.

> Did anyone try running /bin/sync when the system is in this state?

I did not, no.

Regards,
Matthias

pgpVjssP9xl28.pgp
Description: PGP signature

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread Matthias Hensler

On Thu, Sep 20, 2007 at 03:36:54PM -0700, Andrew Morton wrote:
 That's all a bit crappy if the wrong races happen and some other task
 is somehow exceeding the dirty limits each time this task polls them.
 Seems unlikely that such a condition would persist forever.

How exactly do you define forever? It looks to me, that this condition
never resolves on its own, at least not in a window of several hours
(the system used to get stuck around 3am and was normally rebooted
between 7am and 8am, so at least hours are not enough to have the problem
resolve on its own).

 So the question is, why do we have large amounts of dirty pages for
 one disk which appear to be sitting there not getting written?

Unfortunately I have no idea. The full stacktrace for all processes was
attached to the original bugreport, maybe that can give a clue to that.

 Do we know if there's any writeout at all happening when the system is
 in this state?

Not sure about that. The system is responsible on a still open SSH
session, allowing several tasks still to be executed. However, that SSH
session gets stuck very fast if the wrong commands are executed. New
logins are not possible (Connection is akzepted but resulting in a
timeout 60 seconds later, so most likely /bin/login is not able to log
into wtmp).

From all that I suspect that there is no more write activity on that
system, but cannot say for sure.

 Did anyone try running /bin/sync when the system is in this state?

I did not, no.

Regards,
Matthias


pgpVjssP9xl28.pgp
Description: PGP signature

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread Andrew Morton

On Fri, 21 Sep 2007 10:08:08 +0200 Matthias Hensler [EMAIL PROTECTED] wrote:

 On Thu, Sep 20, 2007 at 03:36:54PM -0700, Andrew Morton wrote:
  That's all a bit crappy if the wrong races happen and some other task
  is somehow exceeding the dirty limits each time this task polls them.
  Seems unlikely that such a condition would persist forever.
 
 How exactly do you define forever? It looks to me, that this condition
 never resolves on its own, at least not in a window of several hours
 (the system used to get stuck around 3am and was normally rebooted
 between 7am and 8am, so at least hours are not enough to have the problem
 resolve on its own).

That's forever.

  So the question is, why do we have large amounts of dirty pages for
  one disk which appear to be sitting there not getting written?
 
 Unfortunately I have no idea. The full stacktrace for all processes was
 attached to the original bugreport, maybe that can give a clue to that.
 
  Do we know if there's any writeout at all happening when the system is
  in this state?
 
 Not sure about that. The system is responsible on a still open SSH
 session, allowing several tasks still to be executed. However, that SSH
 session gets stuck very fast if the wrong commands are executed. New
 logins are not possible (Connection is akzepted but resulting in a
 timeout 60 seconds later, so most likely /bin/login is not able to log
 into wtmp).
 
 From all that I suspect that there is no more write activity on that
 system, but cannot say for sure.

Easiest would be to run `vmstat 1' in a separate ssh session then just
leave it running.

  Did anyone try running /bin/sync when the system is in this state?
 
 I did not, no.

Would be interesting if poss, please.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread Andy Whitcroft

This sounds an awful lot like the same problem I reported with fsck
hanging.  I believe that Hugh had a candidate patch for that, which was
related to dirty tracking limits.  It seems that that patch tested, and
acked by Peter.  All on lkml under:

2.6.23-rc6-mm1 -- mkfs stuck in 'D'

-apw
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread richard kennedy

On Thu, 2007-09-20 at 15:36 -0700, Andrew Morton wrote:
 On Thu, 20 Sep 2007 18:04:38 -0400
 Chuck Ebbert [EMAIL PROTECTED] wrote:
 
   
   Can we get some kind of band-aid, like making the endless 'for' loop in
   balance_dirty_pages() terminate after some number of iterations? Clearly
   if we haven't written write_chunk pages after a few tries, *and* we
   haven't encountered congestion, there's no point in trying forever...
   
   Did my above questions get looked at?
   
   Is anyone able to reproduce this?
   
   Do we have a clue what's happening?
  
  There are a ton of dirty pages for one disk, and zero or close to zero dirty
  for a different one. Kernel spins forever trying to write some arbitrary
  minimum amount of data (write_chunk pages) to the second disk...
 
 That should be OK.  The caller will sit in that loop, sleeping in
 congestion_wait(), polling the correct backing-dev occasionally and waiting
 until the dirty limits subside to an acceptable limit, at which stage this:
 
   if (nr_reclaimable +
   global_page_state(NR_WRITEBACK)
   = dirty_thresh)
   break;
 
 
 will happen and we leave balance_dirty_pages().
 
 That's all a bit crappy if the wrong races happen and some other task is
 somehow exceeding the dirty limits each time this task polls them.  Seems
 unlikely that such a condition would persist forever.
 
 So the question is, why do we have large amounts of dirty pages for one
 disk which appear to be sitting there not getting written?

The lockup I'm seeing intermittently occurs when I have 2+ tasks copying
large files (1Gb+) on sda  a small read-mainly mysql db app running on
sdb. The lockup seems to happen just after the copies finish -- there
are lots of dirty pages but nothing left to write them until kupdate
gets round to it. 
BTW kupdate can loop for long periods of time when a disk is under this
kind of load -- I regularly see it take over 20 seconds. and often it's
unable to start as there are no pdflush threads available.

 Do we know if there's any writeout at all happening when the system is in
 this state?
 
No there doesn't seem to be any activity at all -- my machine is
completely unresponsive only sysrq works.


 I guess it's possible that the dirty inodes on the other disk got
 themselves onto the wrong per-sb inode list, or are on the correct list,
 but in the correct place.  If so, these:
 
 writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists.patch
 writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-2.patch
 writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-3.patch
 writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-4.patch
 writeback-fix-comment-use-helper-function.patch
 writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-5.patch
 writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-6.patch
 writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-7.patch
 writeback-fix-periodic-superblock-dirty-inode-flushing.patch
 
 from 2.6.23-rc6-mm1 should help.
 

 Did anyone try running /bin/sync when the system is in this state?
I'm not able to run anything in this state, but sysrq-s doesn't make any
difference.

Richard 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread Andrew Morton

On Fri, 21 Sep 2007 11:25:41 +0100 richard kennedy [EMAIL PROTECTED] wrote:

  That's all a bit crappy if the wrong races happen and some other task is
  somehow exceeding the dirty limits each time this task polls them.  Seems
  unlikely that such a condition would persist forever.
  
  So the question is, why do we have large amounts of dirty pages for one
  disk which appear to be sitting there not getting written?
 
 The lockup I'm seeing intermittently occurs when I have 2+ tasks copying
 large files (1Gb+) on sda  a small read-mainly mysql db app running on
 sdb. The lockup seems to happen just after the copies finish -- there
 are lots of dirty pages but nothing left to write them until kupdate
 gets round to it. 

Then what happens?  The system recovers?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread richard kennedy

On Fri, 2007-09-21 at 03:33 -0700, Andrew Morton wrote:
 On Fri, 21 Sep 2007 11:25:41 +0100 richard kennedy [EMAIL PROTECTED] wrote:
 
   That's all a bit crappy if the wrong races happen and some other task is
   somehow exceeding the dirty limits each time this task polls them.  Seems
   unlikely that such a condition would persist forever.
   
   So the question is, why do we have large amounts of dirty pages for one
   disk which appear to be sitting there not getting written?
  
  The lockup I'm seeing intermittently occurs when I have 2+ tasks copying
  large files (1Gb+) on sda  a small read-mainly mysql db app running on
  sdb. The lockup seems to happen just after the copies finish -- there
  are lots of dirty pages but nothing left to write them until kupdate
  gets round to it. 
 
 Then what happens?  The system recovers?
Nothing -- it stays stuck forever.
I don't think kupdate is getting started, I added some debug in there
but haven't found out anything useful yet. But I am trying to build a
better test case, the one I've got at the moment can take hours to
trigger this problem. 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread Chuck Ebbert

On 09/21/2007 05:39 AM, Andy Whitcroft wrote:
 This sounds an awful lot like the same problem I reported with fsck
 hanging.  I believe that Hugh had a candidate patch for that, which was
 related to dirty tracking limits.  It seems that that patch tested, and
 acked by Peter.  All on lkml under:
 
   2.6.23-rc6-mm1 -- mkfs stuck in 'D'
 

Unfortunately that whole thread is about fixing bugs in an already
much-changed balance_dirty_pages() in -mm.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread Chuck Ebbert

On 09/21/2007 11:58 AM, Hugh Dickins wrote:
 Looking at the 2.6.18-2.6.23 code, I'm uncertain what to try instead.
 There is a refresh_vm_stats function which we could call (then retest
 the break condition) just before resorting to congestion_wait.  But
 the big NUMA people might get very upset with me calling that too
 often: causing a thundering herd of bouncing cachelines which that
 was all designed to avoid.  And it's not obvious to me what condition
 to test for dirty_thresh too low.
 
 I believe Peter gave all this quite a lot of thought when he was
 making the rc6-mm1 changes, and I'd rather defer to him for a
 suggestion of what best to do in earlier releases.  Or maybe he'll
 just point out how this couldn't have been a problem before.
 
 Or there is is Richard's patch, which I haven't considered, but
 Andrew was not quite satisfied with it - partly because he'd like
 to understand how the situation could come about first, perhaps
 we have now got an explanation.
 

My patch is just Richard's with the added requirement that we loop
some minimum number of times before exiting balance_dirty_pages()
with unfinished work.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread Hugh Dickins

On Fri, 21 Sep 2007, Andy Whitcroft wrote:
 This sounds an awful lot like the same problem I reported with fsck
 hanging.  I believe that Hugh had a candidate patch for that, which was
 related to dirty tracking limits.  It seems that that patch tested, and
 acked by Peter.  All on lkml under:
 
   2.6.23-rc6-mm1 -- mkfs stuck in 'D'

You may well be right.

My initial reaction was to point out that my patch was to a 
2.6.23-rc6-mm1 implementation detail, whereas this thread is
about a problem seen since 2.6.20 or perhaps earlier.

But once I look harder at it, I wonder what would have kept
2.6.18 to 2.6.23 safe from the same issue: per-cpu deltas from
the global vm stats too low to get synched back to global, yet
adding up to something which misleads balance_dirty_pages into
an indefinite loop e.g. total nr_writeback actually 0, but
appearing more than dirty_thresh in the global approximation.

Certainly my rc6-mm1 patch won't work here, and all it was doing
was apply some safety already added by Peter to one further case.

Looking at the 2.6.18-2.6.23 code, I'm uncertain what to try instead.
There is a refresh_vm_stats function which we could call (then retest
the break condition) just before resorting to congestion_wait.  But
the big NUMA people might get very upset with me calling that too
often: causing a thundering herd of bouncing cachelines which that
was all designed to avoid.  And it's not obvious to me what condition
to test for dirty_thresh too low.

I believe Peter gave all this quite a lot of thought when he was
making the rc6-mm1 changes, and I'd rather defer to him for a
suggestion of what best to do in earlier releases.  Or maybe he'll
just point out how this couldn't have been a problem before.

Or there is is Richard's patch, which I haven't considered, but
Andrew was not quite satisfied with it - partly because he'd like
to understand how the situation could come about first, perhaps
we have now got an explanation.

(The original bug report was indeed on SMP, but I haven't seen
anyone say that's a necessary condition for the hang: it would
be if this is the issue.  And Richard writes at one point of the
system only responding to AltSysRq: that would be surprising for
this issue, though it's possible that a task in balance_dirty_pages
is holding an i_mutex that everybody else comes to need.)

Hugh
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-21 Thread Peter Zijlstra

On Fri, 21 Sep 2007 16:58:15 +0100 (BST) Hugh Dickins
[EMAIL PROTECTED] wrote:

 But once I look harder at it, I wonder what would have kept
 2.6.18 to 2.6.23 safe from the same issue: per-cpu deltas from
 the global vm stats too low to get synched back to global, yet
 adding up to something which misleads balance_dirty_pages into
 an indefinite loop e.g. total nr_writeback actually 0, but
 appearing more than dirty_thresh in the global approximation.

This could only happen when: dirty_thresh  nr_cpus * per_cpu_max_delta

 Looking at the 2.6.18-2.6.23 code, I'm uncertain what to try instead.
 There is a refresh_vm_stats function which we could call (then retest
 the break condition) just before resorting to congestion_wait.  But
 the big NUMA people might get very upset with me calling that too
 often: causing a thundering herd of bouncing cachelines which that
 was all designed to avoid.  And it's not obvious to me what condition
 to test for dirty_thresh too low.

That could be modeled on the error limit I have. For this particular
case that would end up looking like:

  nr_online_cpus * pcp-stat_threshold.

 I believe Peter gave all this quite a lot of thought when he was
 making the rc6-mm1 changes, and I'd rather defer to him for a
 suggestion of what best to do in earlier releases.  Or maybe he'll
 just point out how this couldn't have been a problem before.

As outlined above, and I don't think we'll ever have such a low
dirty_limit. But who knows :-)

 Or there is is Richard's patch, which I haven't considered, but
 Andrew was not quite satisfied with it - partly because he'd like
 to understand how the situation could come about first, perhaps
 we have now got an explanation.

I'm with Andrew on this, that is, quite puzzled on how all this arises.

Testing those writeback-fix-* patches might help rule out (or point to)
a mis-function of pdflush.

The theory that one task will spin in balance_dirty_pages() on a
bdi that does not actually have many dirty pages, doesn't sound
plausible because eventually the total dirty count (well, actually
dirty+unstable+writeback) should subside again. This theory can
cause crappy latencies, but should not 'hang' the machine.

 (The original bug report was indeed on SMP, but I haven't seen
 anyone say that's a necessary condition for the hang: it would
 be if this is the issue.  And Richard writes at one point of the
 system only responding to AltSysRq: that would be surprising for
 this issue, though it's possible that a task in balance_dirty_pages
 is holding an i_mutex that everybody else comes to need.)

Are we actually holding i_mutex on paths that lead into
balance_dirty_pages? that does (from my admittedly limited knowledge of
the vfs) sound like trouble, since we'd need it to complete writeback.

All quite puzzling.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-20 Thread Chuck Ebbert

On 09/20/2007 06:36 PM, Andrew Morton wrote:
> 
> So the question is, why do we have large amounts of dirty pages for one
> disk which appear to be sitting there not getting written?
> 
> Do we know if there's any writeout at all happening when the system is in
> this state?
> 
> I guess it's possible that the dirty inodes on the "other" disk got
> themselves onto the wrong per-sb inode list, or are on the correct list,
> but in the correct place.  If so, these:
> 
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-2.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-3.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-4.patch
> writeback-fix-comment-use-helper-function.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-5.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-6.patch
> writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-7.patch
> writeback-fix-periodic-superblock-dirty-inode-flushing.patch
> 
> from 2.6.23-rc6-mm1 should help.

Yikes! Simple fixes would be better.

Patch that is confirmed to fix the problem for this user is below, but
that one could cause other problems. I was looking for some band-aid
could be shown to be harmless...

http://lkml.org/lkml/2007/8/2/89:

--
--- linux-2.6.22.1/mm/page-writeback.c.orig 2007-07-30 16:36:09.0 
+0100
+++ linux-2.6.22.1/mm/page-writeback.c  2007-07-31 16:26:43.0 +0100
@@ -250,6 +250,8 @@ static void balance_dirty_pages(struct a
pages_written += write_chunk - wbc.nr_to_write;
if (pages_written >= write_chunk)
break;  /* We've done our duty */
+   if (!wbc.encountered_congestion && wbc.nr_to_write > 0)
+   break;  /* didn't find enough to do */
}
congestion_wait(WRITE, HZ/10);
}

 
> 
> Did anyone try running /bin/sync when the system is in this state?
> 

Reporter is in the CC: list.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-20 Thread Andrew Morton

On Thu, 20 Sep 2007 18:04:38 -0400
Chuck Ebbert <[EMAIL PROTECTED]> wrote:

> > 
> >> Can we get some kind of band-aid, like making the endless 'for' loop in
> >> balance_dirty_pages() terminate after some number of iterations? Clearly
> >> if we haven't written "write_chunk" pages after a few tries, *and* we
> >> haven't encountered congestion, there's no point in trying forever...
> > 
> > Did my above questions get looked at?
> > 
> > Is anyone able to reproduce this?
> > 
> > Do we have a clue what's happening?
> 
> There are a ton of dirty pages for one disk, and zero or close to zero dirty
> for a different one. Kernel spins forever trying to write some arbitrary
> minimum amount of data ("write_chunk" pages) to the second disk...

That should be OK.  The caller will sit in that loop, sleeping in
congestion_wait(), polling the correct backing-dev occasionally and waiting
until the dirty limits subside to an acceptable limit, at which stage this:

if (nr_reclaimable +
global_page_state(NR_WRITEBACK)
<= dirty_thresh)
break;

will happen and we leave balance_dirty_pages().

That's all a bit crappy if the wrong races happen and some other task is
somehow exceeding the dirty limits each time this task polls them.  Seems
unlikely that such a condition would persist forever.

So the question is, why do we have large amounts of dirty pages for one
disk which appear to be sitting there not getting written?

Do we know if there's any writeout at all happening when the system is in
this state?

I guess it's possible that the dirty inodes on the "other" disk got
themselves onto the wrong per-sb inode list, or are on the correct list,
but in the correct place.  If so, these:

writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-2.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-3.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-4.patch
writeback-fix-comment-use-helper-function.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-5.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-6.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-7.patch
writeback-fix-periodic-superblock-dirty-inode-flushing.patch

from 2.6.23-rc6-mm1 should help.

Did anyone try running /bin/sync when the system is in this state?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-20 Thread Chuck Ebbert

On 09/20/2007 05:29 PM, Andrew Morton wrote:
> On Thu, 20 Sep 2007 17:07:15 -0400
> Chuck Ebbert <[EMAIL PROTECTED]> wrote:
> 
>> On 08/09/2007 12:55 PM, Andrew Morton wrote:
>>> On Thu, 9 Aug 2007 11:59:43 +0200 Matthias Hensler <[EMAIL PROTECTED]> 
>>> wrote:
>>>
 On Sat, Aug 04, 2007 at 10:44:26AM +0200, Matthias Hensler wrote:
> On Fri, Aug 03, 2007 at 11:34:07AM -0700, Andrew Morton wrote:
> [...]
> I am also willing to try the patch posted by Richard.
 I want to give some update here:

 1. We finally hit the problem on a third system, with a total different
setup and hardware. However, again high I/O load caused the problem
and the affected filesystems were mounted with noatime.

 2. I installed a recompiled kernel with just the two line patch from
Richard Kennedy (http://lkml.org/lkml/2007/8/2/89). That system has 5
days uptime now and counting. I believe the patch fixed the problem.
However, I will continue running "vmstat 1" and the endless loop of
"cat /proc/meminfo", just in case I am wrong.

>>> Did we ever see the /proc/meminfo and /proc/vmstat output during the stall?
>>>
>>> If Richard's patch has indeed fixed it then this confirms that we're seeing
>>> contention over the dirty-memory limits.  Richard's patch isn't really the
>>> right one because it allows unlimited dirty-memory windup in some situations
>>> (large number of disks with small writes, or when we perform queue 
>>> congestion
>>> avoidance).
>>>
>>> As you're seeing this happening when multiple disks are being written to it 
>>> is
>>> possible that the per-device-dirty-threshold patches which recently went 
>>> into
>>> -mm (and which appear to have a bug) will fix it.
>>>
>>> But I worry that the stall appears to persist *forever*.  That would 
>>> indicate
>>> that we have a dirty-memory accounting leak, or that for some reason the
>>> system has decided to stop doing writeback to one or more queues (might be
>>> caused by an error in a lower-level driver's queue congestion state 
>>> management).
>>>
>>> If it is the latter, then it could be that running "sync" will clear the
>>> problem.  Temporarily, at least.  Because sync will ignore the queue 
>>> congestion
>>> state.
>>>
>> This is still a problem for people, and no fix is in sight until 2.6.24.
> 
> Any bugzilla urls or anything like that?

https://bugzilla.redhat.com/show_bug.cgi?id=249563

> 
>> Can we get some kind of band-aid, like making the endless 'for' loop in
>> balance_dirty_pages() terminate after some number of iterations? Clearly
>> if we haven't written "write_chunk" pages after a few tries, *and* we
>> haven't encountered congestion, there's no point in trying forever...
> 
> Did my above questions get looked at?
> 
> Is anyone able to reproduce this?
> 
> Do we have a clue what's happening?

There are a ton of dirty pages for one disk, and zero or close to zero dirty
for a different one. Kernel spins forever trying to write some arbitrary
minimum amount of data ("write_chunk" pages) to the second disk...

> 
> Is that function just spinning around, failing to start writeout against
> any pages at all?  If so, how come?

Yes, it spins forever. Just removing the "noatime" mount option for the
second disk generates enough dirty data to keep the system functional.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-20 Thread Andrew Morton

On Thu, 20 Sep 2007 17:07:15 -0400
Chuck Ebbert <[EMAIL PROTECTED]> wrote:

> On 08/09/2007 12:55 PM, Andrew Morton wrote:
> > On Thu, 9 Aug 2007 11:59:43 +0200 Matthias Hensler <[EMAIL PROTECTED]> 
> > wrote:
> > 
> >> On Sat, Aug 04, 2007 at 10:44:26AM +0200, Matthias Hensler wrote:
> >>> On Fri, Aug 03, 2007 at 11:34:07AM -0700, Andrew Morton wrote:
> >>> [...]
> >>> I am also willing to try the patch posted by Richard.
> >> I want to give some update here:
> >>
> >> 1. We finally hit the problem on a third system, with a total different
> >>setup and hardware. However, again high I/O load caused the problem
> >>and the affected filesystems were mounted with noatime.
> >>
> >> 2. I installed a recompiled kernel with just the two line patch from
> >>Richard Kennedy (http://lkml.org/lkml/2007/8/2/89). That system has 5
> >>days uptime now and counting. I believe the patch fixed the problem.
> >>However, I will continue running "vmstat 1" and the endless loop of
> >>"cat /proc/meminfo", just in case I am wrong.
> >>
> > 
> > Did we ever see the /proc/meminfo and /proc/vmstat output during the stall?
> > 
> > If Richard's patch has indeed fixed it then this confirms that we're seeing
> > contention over the dirty-memory limits.  Richard's patch isn't really the
> > right one because it allows unlimited dirty-memory windup in some situations
> > (large number of disks with small writes, or when we perform queue 
> > congestion
> > avoidance).
> > 
> > As you're seeing this happening when multiple disks are being written to it 
> > is
> > possible that the per-device-dirty-threshold patches which recently went 
> > into
> > -mm (and which appear to have a bug) will fix it.
> > 
> > But I worry that the stall appears to persist *forever*.  That would 
> > indicate
> > that we have a dirty-memory accounting leak, or that for some reason the
> > system has decided to stop doing writeback to one or more queues (might be
> > caused by an error in a lower-level driver's queue congestion state 
> > management).
> > 
> > If it is the latter, then it could be that running "sync" will clear the
> > problem.  Temporarily, at least.  Because sync will ignore the queue 
> > congestion
> > state.
> > 
> 
> This is still a problem for people, and no fix is in sight until 2.6.24.

Any bugzilla urls or anything like that?

> Can we get some kind of band-aid, like making the endless 'for' loop in
> balance_dirty_pages() terminate after some number of iterations? Clearly
> if we haven't written "write_chunk" pages after a few tries, *and* we
> haven't encountered congestion, there's no point in trying forever...

Did my above questions get looked at?

Is anyone able to reproduce this?

Do we have a clue what's happening?

Is that function just spinning around, failing to start writeout against
any pages at all?  If so, how come?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-20 Thread Chuck Ebbert

On 08/09/2007 12:55 PM, Andrew Morton wrote:
> On Thu, 9 Aug 2007 11:59:43 +0200 Matthias Hensler <[EMAIL PROTECTED]> wrote:
> 
>> On Sat, Aug 04, 2007 at 10:44:26AM +0200, Matthias Hensler wrote:
>>> On Fri, Aug 03, 2007 at 11:34:07AM -0700, Andrew Morton wrote:
>>> [...]
>>> I am also willing to try the patch posted by Richard.
>> I want to give some update here:
>>
>> 1. We finally hit the problem on a third system, with a total different
>>setup and hardware. However, again high I/O load caused the problem
>>and the affected filesystems were mounted with noatime.
>>
>> 2. I installed a recompiled kernel with just the two line patch from
>>Richard Kennedy (http://lkml.org/lkml/2007/8/2/89). That system has 5
>>days uptime now and counting. I believe the patch fixed the problem.
>>However, I will continue running "vmstat 1" and the endless loop of
>>"cat /proc/meminfo", just in case I am wrong.
>>
> 
> Did we ever see the /proc/meminfo and /proc/vmstat output during the stall?
> 
> If Richard's patch has indeed fixed it then this confirms that we're seeing
> contention over the dirty-memory limits.  Richard's patch isn't really the
> right one because it allows unlimited dirty-memory windup in some situations
> (large number of disks with small writes, or when we perform queue congestion
> avoidance).
> 
> As you're seeing this happening when multiple disks are being written to it is
> possible that the per-device-dirty-threshold patches which recently went into
> -mm (and which appear to have a bug) will fix it.
> 
> But I worry that the stall appears to persist *forever*.  That would indicate
> that we have a dirty-memory accounting leak, or that for some reason the
> system has decided to stop doing writeback to one or more queues (might be
> caused by an error in a lower-level driver's queue congestion state 
> management).
> 
> If it is the latter, then it could be that running "sync" will clear the
> problem.  Temporarily, at least.  Because sync will ignore the queue 
> congestion
> state.
> 

This is still a problem for people, and no fix is in sight until 2.6.24.
Can we get some kind of band-aid, like making the endless 'for' loop in
balance_dirty_pages() terminate after some number of iterations? Clearly
if we haven't written "write_chunk" pages after a few tries, *and* we
haven't encountered congestion, there's no point in trying forever...

[not even compile tested patch follows]

---
 mm/page-writeback.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

--- linux-2.6.22.noarch.orig/mm/page-writeback.c
+++ linux-2.6.22.noarch/mm/page-writeback.c
@@ -208,11 +208,12 @@ static void balance_dirty_pages(struct a
long background_thresh;
long dirty_thresh;
unsigned long pages_written = 0;
+   int i;
unsigned long write_chunk = sync_writeback_pages();
 
struct backing_dev_info *bdi = mapping->backing_dev_info;
 
-   for (;;) {
+   for (i = 0; ; i++) {
struct writeback_control wbc = {
.bdi= bdi,
.sync_mode  = WB_SYNC_NONE,
@@ -250,6 +251,8 @@ static void balance_dirty_pages(struct a
pages_written += write_chunk - wbc.nr_to_write;
if (pages_written >= write_chunk)
break;  /* We've done our duty */
+   if (i >= write_chunk && !wbc.encountered_congestion)
+   break;  /* nothing to write? */
}
congestion_wait(WRITE, HZ/10);
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-20 Thread Chuck Ebbert

On 08/09/2007 12:55 PM, Andrew Morton wrote:
 On Thu, 9 Aug 2007 11:59:43 +0200 Matthias Hensler [EMAIL PROTECTED] wrote:
 
 On Sat, Aug 04, 2007 at 10:44:26AM +0200, Matthias Hensler wrote:
 On Fri, Aug 03, 2007 at 11:34:07AM -0700, Andrew Morton wrote:
 [...]
 I am also willing to try the patch posted by Richard.
 I want to give some update here:

 1. We finally hit the problem on a third system, with a total different
setup and hardware. However, again high I/O load caused the problem
and the affected filesystems were mounted with noatime.

 2. I installed a recompiled kernel with just the two line patch from
Richard Kennedy (http://lkml.org/lkml/2007/8/2/89). That system has 5
days uptime now and counting. I believe the patch fixed the problem.
However, I will continue running vmstat 1 and the endless loop of
cat /proc/meminfo, just in case I am wrong.

 
 Did we ever see the /proc/meminfo and /proc/vmstat output during the stall?
 
 If Richard's patch has indeed fixed it then this confirms that we're seeing
 contention over the dirty-memory limits.  Richard's patch isn't really the
 right one because it allows unlimited dirty-memory windup in some situations
 (large number of disks with small writes, or when we perform queue congestion
 avoidance).
 
 As you're seeing this happening when multiple disks are being written to it is
 possible that the per-device-dirty-threshold patches which recently went into
 -mm (and which appear to have a bug) will fix it.
 
 But I worry that the stall appears to persist *forever*.  That would indicate
 that we have a dirty-memory accounting leak, or that for some reason the
 system has decided to stop doing writeback to one or more queues (might be
 caused by an error in a lower-level driver's queue congestion state 
 management).
 
 If it is the latter, then it could be that running sync will clear the
 problem.  Temporarily, at least.  Because sync will ignore the queue 
 congestion
 state.
 

This is still a problem for people, and no fix is in sight until 2.6.24.
Can we get some kind of band-aid, like making the endless 'for' loop in
balance_dirty_pages() terminate after some number of iterations? Clearly
if we haven't written write_chunk pages after a few tries, *and* we
haven't encountered congestion, there's no point in trying forever...

[not even compile tested patch follows]

---
 mm/page-writeback.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

--- linux-2.6.22.noarch.orig/mm/page-writeback.c
+++ linux-2.6.22.noarch/mm/page-writeback.c
@@ -208,11 +208,12 @@ static void balance_dirty_pages(struct a
long background_thresh;
long dirty_thresh;
unsigned long pages_written = 0;
+   int i;
unsigned long write_chunk = sync_writeback_pages();
 
struct backing_dev_info *bdi = mapping-backing_dev_info;
 
-   for (;;) {
+   for (i = 0; ; i++) {
struct writeback_control wbc = {
.bdi= bdi,
.sync_mode  = WB_SYNC_NONE,
@@ -250,6 +251,8 @@ static void balance_dirty_pages(struct a
pages_written += write_chunk - wbc.nr_to_write;
if (pages_written = write_chunk)
break;  /* We've done our duty */
+   if (i = write_chunk  !wbc.encountered_congestion)
+   break;  /* nothing to write? */
}
congestion_wait(WRITE, HZ/10);
}
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-20 Thread Andrew Morton

On Thu, 20 Sep 2007 17:07:15 -0400
Chuck Ebbert [EMAIL PROTECTED] wrote:

 On 08/09/2007 12:55 PM, Andrew Morton wrote:
  On Thu, 9 Aug 2007 11:59:43 +0200 Matthias Hensler [EMAIL PROTECTED] 
  wrote:
  
  On Sat, Aug 04, 2007 at 10:44:26AM +0200, Matthias Hensler wrote:
  On Fri, Aug 03, 2007 at 11:34:07AM -0700, Andrew Morton wrote:
  [...]
  I am also willing to try the patch posted by Richard.
  I want to give some update here:
 
  1. We finally hit the problem on a third system, with a total different
 setup and hardware. However, again high I/O load caused the problem
 and the affected filesystems were mounted with noatime.
 
  2. I installed a recompiled kernel with just the two line patch from
 Richard Kennedy (http://lkml.org/lkml/2007/8/2/89). That system has 5
 days uptime now and counting. I believe the patch fixed the problem.
 However, I will continue running vmstat 1 and the endless loop of
 cat /proc/meminfo, just in case I am wrong.
 
  
  Did we ever see the /proc/meminfo and /proc/vmstat output during the stall?
  
  If Richard's patch has indeed fixed it then this confirms that we're seeing
  contention over the dirty-memory limits.  Richard's patch isn't really the
  right one because it allows unlimited dirty-memory windup in some situations
  (large number of disks with small writes, or when we perform queue 
  congestion
  avoidance).
  
  As you're seeing this happening when multiple disks are being written to it 
  is
  possible that the per-device-dirty-threshold patches which recently went 
  into
  -mm (and which appear to have a bug) will fix it.
  
  But I worry that the stall appears to persist *forever*.  That would 
  indicate
  that we have a dirty-memory accounting leak, or that for some reason the
  system has decided to stop doing writeback to one or more queues (might be
  caused by an error in a lower-level driver's queue congestion state 
  management).
  
  If it is the latter, then it could be that running sync will clear the
  problem.  Temporarily, at least.  Because sync will ignore the queue 
  congestion
  state.
  
 
 This is still a problem for people, and no fix is in sight until 2.6.24.

Any bugzilla urls or anything like that?

 Can we get some kind of band-aid, like making the endless 'for' loop in
 balance_dirty_pages() terminate after some number of iterations? Clearly
 if we haven't written write_chunk pages after a few tries, *and* we
 haven't encountered congestion, there's no point in trying forever...

Did my above questions get looked at?

Is anyone able to reproduce this?

Do we have a clue what's happening?

Is that function just spinning around, failing to start writeout against
any pages at all?  If so, how come?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-20 Thread Chuck Ebbert

On 09/20/2007 05:29 PM, Andrew Morton wrote:
 On Thu, 20 Sep 2007 17:07:15 -0400
 Chuck Ebbert [EMAIL PROTECTED] wrote:
 
 On 08/09/2007 12:55 PM, Andrew Morton wrote:
 On Thu, 9 Aug 2007 11:59:43 +0200 Matthias Hensler [EMAIL PROTECTED] 
 wrote:

 On Sat, Aug 04, 2007 at 10:44:26AM +0200, Matthias Hensler wrote:
 On Fri, Aug 03, 2007 at 11:34:07AM -0700, Andrew Morton wrote:
 [...]
 I am also willing to try the patch posted by Richard.
 I want to give some update here:

 1. We finally hit the problem on a third system, with a total different
setup and hardware. However, again high I/O load caused the problem
and the affected filesystems were mounted with noatime.

 2. I installed a recompiled kernel with just the two line patch from
Richard Kennedy (http://lkml.org/lkml/2007/8/2/89). That system has 5
days uptime now and counting. I believe the patch fixed the problem.
However, I will continue running vmstat 1 and the endless loop of
cat /proc/meminfo, just in case I am wrong.

 Did we ever see the /proc/meminfo and /proc/vmstat output during the stall?

 If Richard's patch has indeed fixed it then this confirms that we're seeing
 contention over the dirty-memory limits.  Richard's patch isn't really the
 right one because it allows unlimited dirty-memory windup in some situations
 (large number of disks with small writes, or when we perform queue 
 congestion
 avoidance).

 As you're seeing this happening when multiple disks are being written to it 
 is
 possible that the per-device-dirty-threshold patches which recently went 
 into
 -mm (and which appear to have a bug) will fix it.

 But I worry that the stall appears to persist *forever*.  That would 
 indicate
 that we have a dirty-memory accounting leak, or that for some reason the
 system has decided to stop doing writeback to one or more queues (might be
 caused by an error in a lower-level driver's queue congestion state 
 management).

 If it is the latter, then it could be that running sync will clear the
 problem.  Temporarily, at least.  Because sync will ignore the queue 
 congestion
 state.

 This is still a problem for people, and no fix is in sight until 2.6.24.
 
 Any bugzilla urls or anything like that?

https://bugzilla.redhat.com/show_bug.cgi?id=249563

 
 Can we get some kind of band-aid, like making the endless 'for' loop in
 balance_dirty_pages() terminate after some number of iterations? Clearly
 if we haven't written write_chunk pages after a few tries, *and* we
 haven't encountered congestion, there's no point in trying forever...
 
 Did my above questions get looked at?
 
 Is anyone able to reproduce this?
 
 Do we have a clue what's happening?

There are a ton of dirty pages for one disk, and zero or close to zero dirty
for a different one. Kernel spins forever trying to write some arbitrary
minimum amount of data (write_chunk pages) to the second disk...

 
 Is that function just spinning around, failing to start writeout against
 any pages at all?  If so, how come?

Yes, it spins forever. Just removing the noatime mount option for the
second disk generates enough dirty data to keep the system functional.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-20 Thread Andrew Morton

On Thu, 20 Sep 2007 18:04:38 -0400
Chuck Ebbert [EMAIL PROTECTED] wrote:

  
  Can we get some kind of band-aid, like making the endless 'for' loop in
  balance_dirty_pages() terminate after some number of iterations? Clearly
  if we haven't written write_chunk pages after a few tries, *and* we
  haven't encountered congestion, there's no point in trying forever...
  
  Did my above questions get looked at?
  
  Is anyone able to reproduce this?
  
  Do we have a clue what's happening?
 
 There are a ton of dirty pages for one disk, and zero or close to zero dirty
 for a different one. Kernel spins forever trying to write some arbitrary
 minimum amount of data (write_chunk pages) to the second disk...

That should be OK.  The caller will sit in that loop, sleeping in
congestion_wait(), polling the correct backing-dev occasionally and waiting
until the dirty limits subside to an acceptable limit, at which stage this:

if (nr_reclaimable +
global_page_state(NR_WRITEBACK)
= dirty_thresh)
break;


will happen and we leave balance_dirty_pages().

That's all a bit crappy if the wrong races happen and some other task is
somehow exceeding the dirty limits each time this task polls them.  Seems
unlikely that such a condition would persist forever.

So the question is, why do we have large amounts of dirty pages for one
disk which appear to be sitting there not getting written?

Do we know if there's any writeout at all happening when the system is in
this state?

I guess it's possible that the dirty inodes on the other disk got
themselves onto the wrong per-sb inode list, or are on the correct list,
but in the correct place.  If so, these:

writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-2.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-3.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-4.patch
writeback-fix-comment-use-helper-function.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-5.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-6.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-7.patch
writeback-fix-periodic-superblock-dirty-inode-flushing.patch

from 2.6.23-rc6-mm1 should help.


Did anyone try running /bin/sync when the system is in this state?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-09-20 Thread Chuck Ebbert

On 09/20/2007 06:36 PM, Andrew Morton wrote:
 
 So the question is, why do we have large amounts of dirty pages for one
 disk which appear to be sitting there not getting written?
 
 Do we know if there's any writeout at all happening when the system is in
 this state?
 
 I guess it's possible that the dirty inodes on the other disk got
 themselves onto the wrong per-sb inode list, or are on the correct list,
 but in the correct place.  If so, these:
 
 writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists.patch
 writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-2.patch
 writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-3.patch
 writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-4.patch
 writeback-fix-comment-use-helper-function.patch
 writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-5.patch
 writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-6.patch
 writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-7.patch
 writeback-fix-periodic-superblock-dirty-inode-flushing.patch
 
 from 2.6.23-rc6-mm1 should help.

Yikes! Simple fixes would be better.

Patch that is confirmed to fix the problem for this user is below, but
that one could cause other problems. I was looking for some band-aid
could be shown to be harmless...

http://lkml.org/lkml/2007/8/2/89:

--
--- linux-2.6.22.1/mm/page-writeback.c.orig 2007-07-30 16:36:09.0 
+0100
+++ linux-2.6.22.1/mm/page-writeback.c  2007-07-31 16:26:43.0 +0100
@@ -250,6 +250,8 @@ static void balance_dirty_pages(struct a
pages_written += write_chunk - wbc.nr_to_write;
if (pages_written = write_chunk)
break;  /* We've done our duty */
+   if (!wbc.encountered_congestion  wbc.nr_to_write  0)
+   break;  /* didn't find enough to do */
}
congestion_wait(WRITE, HZ/10);
}

 
 
 Did anyone try running /bin/sync when the system is in this state?
 

Reporter is in the CC: list.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-08-09 Thread Matthias Hensler

On Thu, Aug 09, 2007 at 09:55:34AM -0700, Andrew Morton wrote:
> On Thu, 9 Aug 2007 11:59:43 +0200 Matthias Hensler <[EMAIL PROTECTED]> wrote:
> > On Sat, Aug 04, 2007 at 10:44:26AM +0200, Matthias Hensler wrote:
> > > On Fri, Aug 03, 2007 at 11:34:07AM -0700, Andrew Morton wrote:
> > > [...]
> > > I am also willing to try the patch posted by Richard.
> > 
> > I want to give some update here:
> 
> Did we ever see the /proc/meminfo and /proc/vmstat output during the stall?

So far not, sorry. The problem did not reoccur since I am running the
kernel with that patch now.

> As you're seeing this happening when multiple disks are being written
> to it is possible that the per-device-dirty-threshold patches which
> recently went into -mm (and which appear to have a bug) will fix it.

All affected systems have two devices which are cross mounted, eg. it
has /home on disk 1 and /var/spool/imap on disk 2.

On the respectively device there is the mirror partition (mirror of
/home on disk 2 and mirror of /var/spool/imap on disk 1).

Normally the system hangs when running rsync from /var/spool/imap to its
corresponding mirror partition.

We have around 10 different partitions (all in LVM), but so far the hang
mostly started running rsync on the imapspool (which has by far most of
the files on the system).

_However_: we also had this hang while running a backup to an external
FTP server (so only reading the filesystems apart from the usual system
activity).

And: the third system we had this problem on has no IMAP spool at all.
The hang occured while running a backup together with "yum update".

It might be related that all the systems uses these crossmounts over two
different disks. I wasn't not able to reproduce that on my homesystem
with only one disk, either because the activity-pattern was different,
or it really needs two disks to run into that issue.

> But I worry that the stall appears to persist *forever*.  That would indicate
> that we have a dirty-memory accounting leak, or that for some reason the
> system has decided to stop doing writeback to one or more queues (might be
> caused by an error in a lower-level driver's queue congestion state 
> management).

Well, all the time I was still connected to one of that machines it was
possible to clear the situation by killing a lot of processes. That is
mostly killing all httpd, smtpd, imapd, amavisd, crond, spamassassin and
rsync processes. Eventually the system responded normal again (and I
could cleanly reboot it). However the final killed process which
resolved the issue was non deterministic so far.

The workload is different on all the servers:

Server 1 processes around 10-20k mails per day but also servers a lot
 of HTTP requests.
Server 2 is only a mailserver processing around 5-10k mails per day.
Server 3 just serves HTTP requests (and a bit DNS).

> If it is the latter, then it could be that running "sync" will clear
> the problem.  Temporarily, at least.  Because sync will ignore the
> queue congestion state.

I have to admit that I never tried to sync in such a case, since mostly
I had only one open SSH session and tried to find the root cause.

So far the problem occured first on server 2 around march or april
without changes to the machine (we have a Changelog for every server:
there was a single kernel update that time, however we reverted back
after the first issue and run a kernel which was stable for several
weeks before and encountered the problem again). In the beginning we
encountered the problem maybe twice a week, getting worse within the
next weeks. Several major updates (kernel and distribution were made)
without resolving the problem.

Since end of april server 1 began showing the same problem (running a
different Fedora version and kernel at that time), first slowly (once a
week or so) then more regular.

Around july we had a hang nearly every day.

Server 3 had the problem only once now, but that server has no high
workload.

We spent a lot of time investigating the issue, but since all servers
use a different hardware, different setups (beside from the crossmounts
with noatime) and even different base systems (Fedora Core 5+6, Fedora
7, different kernels: 2.6.18, .19, .20, .21 and .22) I think that we can
rule out hardware problems. I think the issue might be there some time
now, but is hit more often now since the workload increased a lot over
the last months.

We ruled a lot out over the month (eg. syslog was replaced, many not so
important services were stopped, schedular was changed), without
changes. Just reverting from "noatime" in the fstab to "default" fixed
it reliable so far.

As said I am still running "vmstat 1" and catting of /proc/meminfo just
in case. If there is anything I can do beside that to clearify the
problem I will try to help.

Regards,
Matthias

pgpz51pHUHUC4.pgp
Description: PGP signature

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-08-09 Thread Andrew Morton

On Thu, 9 Aug 2007 11:59:43 +0200 Matthias Hensler <[EMAIL PROTECTED]> wrote:

> On Sat, Aug 04, 2007 at 10:44:26AM +0200, Matthias Hensler wrote:
> > On Fri, Aug 03, 2007 at 11:34:07AM -0700, Andrew Morton wrote:
> > [...]
> > I am also willing to try the patch posted by Richard.
> 
> I want to give some update here:
> 
> 1. We finally hit the problem on a third system, with a total different
>setup and hardware. However, again high I/O load caused the problem
>and the affected filesystems were mounted with noatime.
> 
> 2. I installed a recompiled kernel with just the two line patch from
>Richard Kennedy (http://lkml.org/lkml/2007/8/2/89). That system has 5
>days uptime now and counting. I believe the patch fixed the problem.
>However, I will continue running "vmstat 1" and the endless loop of
>"cat /proc/meminfo", just in case I am wrong.
> 

Did we ever see the /proc/meminfo and /proc/vmstat output during the stall?

If Richard's patch has indeed fixed it then this confirms that we're seeing
contention over the dirty-memory limits.  Richard's patch isn't really the
right one because it allows unlimited dirty-memory windup in some situations
(large number of disks with small writes, or when we perform queue congestion
avoidance).

As you're seeing this happening when multiple disks are being written to it is
possible that the per-device-dirty-threshold patches which recently went into
-mm (and which appear to have a bug) will fix it.

But I worry that the stall appears to persist *forever*.  That would indicate
that we have a dirty-memory accounting leak, or that for some reason the
system has decided to stop doing writeback to one or more queues (might be
caused by an error in a lower-level driver's queue congestion state management).

If it is the latter, then it could be that running "sync" will clear the
problem.  Temporarily, at least.  Because sync will ignore the queue congestion
state.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-08-09 Thread Matthias Hensler

On Sat, Aug 04, 2007 at 10:44:26AM +0200, Matthias Hensler wrote:
> On Fri, Aug 03, 2007 at 11:34:07AM -0700, Andrew Morton wrote:
> [...]
> I am also willing to try the patch posted by Richard.

I want to give some update here:

1. We finally hit the problem on a third system, with a total different
   setup and hardware. However, again high I/O load caused the problem
   and the affected filesystems were mounted with noatime.

2. I installed a recompiled kernel with just the two line patch from
   Richard Kennedy (http://lkml.org/lkml/2007/8/2/89). That system has 5
   days uptime now and counting. I believe the patch fixed the problem.
   However, I will continue running "vmstat 1" and the endless loop of
   "cat /proc/meminfo", just in case I am wrong.

Regards,
Matthias

pgp6P4MHrbGpI.pgp
Description: PGP signature

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-08-09 Thread Matthias Hensler

On Sat, Aug 04, 2007 at 10:44:26AM +0200, Matthias Hensler wrote:
 On Fri, Aug 03, 2007 at 11:34:07AM -0700, Andrew Morton wrote:
 [...]
 I am also willing to try the patch posted by Richard.

I want to give some update here:

1. We finally hit the problem on a third system, with a total different
   setup and hardware. However, again high I/O load caused the problem
   and the affected filesystems were mounted with noatime.

2. I installed a recompiled kernel with just the two line patch from
   Richard Kennedy (http://lkml.org/lkml/2007/8/2/89). That system has 5
   days uptime now and counting. I believe the patch fixed the problem.
   However, I will continue running vmstat 1 and the endless loop of
   cat /proc/meminfo, just in case I am wrong.

Regards,
Matthias


pgp6P4MHrbGpI.pgp
Description: PGP signature

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-08-09 Thread Andrew Morton

On Thu, 9 Aug 2007 11:59:43 +0200 Matthias Hensler [EMAIL PROTECTED] wrote:

 On Sat, Aug 04, 2007 at 10:44:26AM +0200, Matthias Hensler wrote:
  On Fri, Aug 03, 2007 at 11:34:07AM -0700, Andrew Morton wrote:
  [...]
  I am also willing to try the patch posted by Richard.
 
 I want to give some update here:
 
 1. We finally hit the problem on a third system, with a total different
setup and hardware. However, again high I/O load caused the problem
and the affected filesystems were mounted with noatime.
 
 2. I installed a recompiled kernel with just the two line patch from
Richard Kennedy (http://lkml.org/lkml/2007/8/2/89). That system has 5
days uptime now and counting. I believe the patch fixed the problem.
However, I will continue running vmstat 1 and the endless loop of
cat /proc/meminfo, just in case I am wrong.
 

Did we ever see the /proc/meminfo and /proc/vmstat output during the stall?

If Richard's patch has indeed fixed it then this confirms that we're seeing
contention over the dirty-memory limits.  Richard's patch isn't really the
right one because it allows unlimited dirty-memory windup in some situations
(large number of disks with small writes, or when we perform queue congestion
avoidance).

As you're seeing this happening when multiple disks are being written to it is
possible that the per-device-dirty-threshold patches which recently went into
-mm (and which appear to have a bug) will fix it.

But I worry that the stall appears to persist *forever*.  That would indicate
that we have a dirty-memory accounting leak, or that for some reason the
system has decided to stop doing writeback to one or more queues (might be
caused by an error in a lower-level driver's queue congestion state management).

If it is the latter, then it could be that running sync will clear the
problem.  Temporarily, at least.  Because sync will ignore the queue congestion
state.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-08-09 Thread Matthias Hensler

On Thu, Aug 09, 2007 at 09:55:34AM -0700, Andrew Morton wrote:
 On Thu, 9 Aug 2007 11:59:43 +0200 Matthias Hensler [EMAIL PROTECTED] wrote:
  On Sat, Aug 04, 2007 at 10:44:26AM +0200, Matthias Hensler wrote:
   On Fri, Aug 03, 2007 at 11:34:07AM -0700, Andrew Morton wrote:
   [...]
   I am also willing to try the patch posted by Richard.
  
  I want to give some update here:
 
 Did we ever see the /proc/meminfo and /proc/vmstat output during the stall?

So far not, sorry. The problem did not reoccur since I am running the
kernel with that patch now.

 As you're seeing this happening when multiple disks are being written
 to it is possible that the per-device-dirty-threshold patches which
 recently went into -mm (and which appear to have a bug) will fix it.

All affected systems have two devices which are cross mounted, eg. it
has /home on disk 1 and /var/spool/imap on disk 2.

On the respectively device there is the mirror partition (mirror of
/home on disk 2 and mirror of /var/spool/imap on disk 1).

Normally the system hangs when running rsync from /var/spool/imap to its
corresponding mirror partition.

We have around 10 different partitions (all in LVM), but so far the hang
mostly started running rsync on the imapspool (which has by far most of
the files on the system).

_However_: we also had this hang while running a backup to an external
FTP server (so only reading the filesystems apart from the usual system
activity).

And: the third system we had this problem on has no IMAP spool at all.
The hang occured while running a backup together with yum update.

It might be related that all the systems uses these crossmounts over two
different disks. I wasn't not able to reproduce that on my homesystem
with only one disk, either because the activity-pattern was different,
or it really needs two disks to run into that issue.

 But I worry that the stall appears to persist *forever*.  That would indicate
 that we have a dirty-memory accounting leak, or that for some reason the
 system has decided to stop doing writeback to one or more queues (might be
 caused by an error in a lower-level driver's queue congestion state 
 management).

Well, all the time I was still connected to one of that machines it was
possible to clear the situation by killing a lot of processes. That is
mostly killing all httpd, smtpd, imapd, amavisd, crond, spamassassin and
rsync processes. Eventually the system responded normal again (and I
could cleanly reboot it). However the final killed process which
resolved the issue was non deterministic so far.

The workload is different on all the servers:

Server 1 processes around 10-20k mails per day but also servers a lot
 of HTTP requests.
Server 2 is only a mailserver processing around 5-10k mails per day.
Server 3 just serves HTTP requests (and a bit DNS).

 If it is the latter, then it could be that running sync will clear
 the problem.  Temporarily, at least.  Because sync will ignore the
 queue congestion state.

I have to admit that I never tried to sync in such a case, since mostly
I had only one open SSH session and tried to find the root cause.

So far the problem occured first on server 2 around march or april
without changes to the machine (we have a Changelog for every server:
there was a single kernel update that time, however we reverted back
after the first issue and run a kernel which was stable for several
weeks before and encountered the problem again). In the beginning we
encountered the problem maybe twice a week, getting worse within the
next weeks. Several major updates (kernel and distribution were made)
without resolving the problem.

Since end of april server 1 began showing the same problem (running a
different Fedora version and kernel at that time), first slowly (once a
week or so) then more regular.

Around july we had a hang nearly every day.

Server 3 had the problem only once now, but that server has no high
workload.

We spent a lot of time investigating the issue, but since all servers
use a different hardware, different setups (beside from the crossmounts
with noatime) and even different base systems (Fedora Core 5+6, Fedora
7, different kernels: 2.6.18, .19, .20, .21 and .22) I think that we can
rule out hardware problems. I think the issue might be there some time
now, but is hit more often now since the workload increased a lot over
the last months.

We ruled a lot out over the month (eg. syslog was replaced, many not so
important services were stopped, schedular was changed), without
changes. Just reverting from noatime in the fstab to default fixed
it reliable so far.

As said I am still running vmstat 1 and catting of /proc/meminfo just
in case. If there is anything I can do beside that to clearify the
problem I will try to help.

Regards,
Matthias


pgpz51pHUHUC4.pgp
Description: PGP signature

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-08-04 Thread Matthias Hensler

On Fri, Aug 03, 2007 at 11:34:07AM -0700, Andrew Morton wrote:
> (attempting to cc Matthias.  If I have the wrong one, please fix it up)

You got the correct one.

> > Looks like the same problem with spinlock unfairness we've seen
> > elsewhere: it seems to be looping here? Or is everyone stuck
> > just waiting for writeout?
> [...]
> I think.  Or perhaps lock_timer_base() really has gone and got stuck.  One
> possibility is that gcc has decided to cache timer->base in a register
> rather than rereading it around that loop, which would be bad.  Do:
> 
> gdb vmlinux
> (gdb) x/100i lock_timer_base

This is from an affected kernel, but not the kernel matching my
stack-trace. Hope it is useful anyway. If not maybe Chuck can provide
the vmlinux file for the 2.6.22.1-27.fc7 kernel (the debuginfo seems to
be deleted from all mirrors).

(gdb) x/100i lock_timer_base
0xc102ffda :   push   %ebp
0xc102ffdb : mov%edx,%ebp
0xc102ffdd : push   %edi
0xc102ffde : mov%eax,%edi
0xc102ffe0 : push   %esi
0xc102ffe1 : push   %ebx
0xc102ffe2 : mov0x14(%edi),%ebx
0xc102ffe5 :mov%ebx,%esi
0xc102ffe7 :and$0xfffe,%esi
0xc102ffea :je 0xc1030004 
0xc102ffec :mov%esi,%eax
0xc102ffee :call   0xc122ff35 <_spin_lock_irqsave>
0xc102fff3 :mov%eax,0x0(%ebp)
0xc102fff6 :cmp0x14(%edi),%ebx
0xc102fff9 :je 0xc1030008 
0xc102fffb :mov%eax,%edx
0xc102fffd :mov%esi,%eax
0xc102 :call   0xc122ff95 
<_spin_unlock_irqrestore>
0xc1030004 :pause  
0xc1030006 :jmp0xc102ffe2 
0xc1030008 :mov%esi,%eax
0xc103000a :pop%ebx
0xc103000b :pop%esi
0xc103000c :pop%edi
0xc103000d :pop%ebp
0xc103000e :ret
0xc103000f : push   %esi
0xc1030010 :   or $0x,%esi
0xc1030013 :   push   %ebx
0xc1030014 :   mov%eax,%ebx
0xc1030016 :   sub$0x4,%esp
0xc1030019 :  mov%esp,%edx
0xc103001b :  call   0xc102ffda 
0xc1030020 :  cmp%ebx,0x4(%eax)
0xc1030023 :  mov%eax,%ecx
0xc1030025 :  je 0xc1030049 

0xc1030027 :  xor%esi,%esi
0xc1030029 :  cmpl   $0x0,(%ebx)
0xc103002c :  je 0xc1030049 

0xc103002e :  mov(%ebx),%edx
0xc1030030 :  mov$0x1,%si
0xc1030034 :  mov0x4(%ebx),%eax
0xc1030037 :  mov%eax,0x4(%edx)
0xc103003a :  mov%edx,(%eax)
0xc103003c :  movl   $0x200200,0x4(%ebx)
0xc1030043 :  movl   $0x0,(%ebx)
0xc1030049 :  mov(%esp),%edx
0xc103004c :  mov%ecx,%eax
0xc103004e :  call   0xc122ff95 
<_spin_unlock_irqrestore>
0xc1030053 :  mov%esi,%eax
0xc1030055 :  pop%ebx
---Type  to continue, or q  to quit---
0xc1030056 :  pop%ebx
0xc1030057 :  pop%esi
0xc1030058 :  ret
0xc1030059 :push   %ebx
0xc103005a :  mov%eax,%ebx
0xc103005c :  mov%ebx,%eax
0xc103005e :  call   0xc103000f 
0xc1030063 : test   %eax,%eax
0xc1030065 : jns0xc103006b 
0xc1030067 : pause  
0xc1030069 : jmp0xc103005c 
0xc103006b : pop%ebx
0xc103006c : ret
0xc103006d <__mod_timer>:   push   %ebp
0xc103006e <__mod_timer+1>: mov%edx,%ebp
0xc1030070 <__mod_timer+3>: push   %edi
0xc1030071 <__mod_timer+4>: push   %esi
0xc1030072 <__mod_timer+5>: mov%eax,%esi
0xc1030074 <__mod_timer+7>: push   %ebx
0xc1030075 <__mod_timer+8>: sub$0x8,%esp
0xc1030078 <__mod_timer+11>:mov0x18(%esp),%edx
0xc103007c <__mod_timer+15>:call   0xc102f693 
<__timer_stats_timer_set_start_info>
0xc1030081 <__mod_timer+20>:cmpl   $0x0,0xc(%esi)
0xc1030085 <__mod_timer+24>:jne0xc103008b <__mod_timer+30>
0xc1030087 <__mod_timer+26>:ud2a   
0xc1030089 <__mod_timer+28>:jmp0xc1030089 <__mod_timer+28>
0xc103008b <__mod_timer+30>:lea0x4(%esp),%edx
0xc103008f <__mod_timer+34>:mov%esi,%eax
0xc1030091 <__mod_timer+36>:call   0xc102ffda 
0xc1030096 <__mod_timer+41>:movl   $0x0,(%esp)
0xc103009d <__mod_timer+48>:mov%eax,%ebx
0xc103009f <__mod_timer+50>:cmpl   $0x0,(%esi)
0xc10300a2 <__mod_timer+53>:je 0xc10300bc <__mod_timer+79>
0xc10300a4 <__mod_timer+55>:mov0x4(%esi),%eax
0xc10300a7 <__mod_timer+58>:mov(%esi),%edx
0xc10300a9 <__mod_timer+60>:mov%eax,0x4(%edx)
0xc10300ac <__mod_timer+63>:mov%edx,(%eax)
0xc10300ae <__mod_timer+65>:movl   $0x200200,0x4(%esi)
0xc10300b5 <__mod_timer+72>:movl   $0x1,(%esp)
0xc10300bc <__mod_timer+79>:mov%fs:0xc13a6104,%edx
0xc10300c3 <__mod_timer+86>:mov$0xc13a75dc,%eax
0xc10300c8 <__mod_timer+91>:mov(%edx,%eax,1),%edi
0xc10300cb <__mod_timer+94>:cmp%edi,%ebx
0xc10300cd <__mod_timer+96>:je 0xc10300f0 <__mod_timer+131>
0xc10300cf <__mod_timer+98>:cmp%esi,0x4(%ebx)
0xc10300d2 <__mod_timer+101>:   je 0xc10300f0 <__mod_timer+131>
0xc10300d4 <__mod_timer+103>:   andl   $0x1,0x14(%esi)
0xc10300d8 <__mod_timer+107>:   mov$0x1,%al
0xc10300da <__mod_timer+109>:   xchg

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-08-04 Thread Matthias Hensler

On Fri, Aug 03, 2007 at 11:34:07AM -0700, Andrew Morton wrote:
 (attempting to cc Matthias.  If I have the wrong one, please fix it up)

You got the correct one.

  Looks like the same problem with spinlock unfairness we've seen
  elsewhere: it seems to be looping here? Or is everyone stuck
  just waiting for writeout?
 [...]
 I think.  Or perhaps lock_timer_base() really has gone and got stuck.  One
 possibility is that gcc has decided to cache timer-base in a register
 rather than rereading it around that loop, which would be bad.  Do:
 
 gdb vmlinux
 (gdb) x/100i lock_timer_base

This is from an affected kernel, but not the kernel matching my
stack-trace. Hope it is useful anyway. If not maybe Chuck can provide
the vmlinux file for the 2.6.22.1-27.fc7 kernel (the debuginfo seems to
be deleted from all mirrors).

(gdb) x/100i lock_timer_base
0xc102ffda lock_timer_base:   push   %ebp
0xc102ffdb lock_timer_base+1: mov%edx,%ebp
0xc102ffdd lock_timer_base+3: push   %edi
0xc102ffde lock_timer_base+4: mov%eax,%edi
0xc102ffe0 lock_timer_base+6: push   %esi
0xc102ffe1 lock_timer_base+7: push   %ebx
0xc102ffe2 lock_timer_base+8: mov0x14(%edi),%ebx
0xc102ffe5 lock_timer_base+11:mov%ebx,%esi
0xc102ffe7 lock_timer_base+13:and$0xfffe,%esi
0xc102ffea lock_timer_base+16:je 0xc1030004 lock_timer_base+42
0xc102ffec lock_timer_base+18:mov%esi,%eax
0xc102ffee lock_timer_base+20:call   0xc122ff35 _spin_lock_irqsave
0xc102fff3 lock_timer_base+25:mov%eax,0x0(%ebp)
0xc102fff6 lock_timer_base+28:cmp0x14(%edi),%ebx
0xc102fff9 lock_timer_base+31:je 0xc1030008 lock_timer_base+46
0xc102fffb lock_timer_base+33:mov%eax,%edx
0xc102fffd lock_timer_base+35:mov%esi,%eax
0xc102 lock_timer_base+37:call   0xc122ff95 
_spin_unlock_irqrestore
0xc1030004 lock_timer_base+42:pause  
0xc1030006 lock_timer_base+44:jmp0xc102ffe2 lock_timer_base+8
0xc1030008 lock_timer_base+46:mov%esi,%eax
0xc103000a lock_timer_base+48:pop%ebx
0xc103000b lock_timer_base+49:pop%esi
0xc103000c lock_timer_base+50:pop%edi
0xc103000d lock_timer_base+51:pop%ebp
0xc103000e lock_timer_base+52:ret
0xc103000f try_to_del_timer_sync: push   %esi
0xc1030010 try_to_del_timer_sync+1:   or $0x,%esi
0xc1030013 try_to_del_timer_sync+4:   push   %ebx
0xc1030014 try_to_del_timer_sync+5:   mov%eax,%ebx
0xc1030016 try_to_del_timer_sync+7:   sub$0x4,%esp
0xc1030019 try_to_del_timer_sync+10:  mov%esp,%edx
0xc103001b try_to_del_timer_sync+12:  call   0xc102ffda lock_timer_base
0xc1030020 try_to_del_timer_sync+17:  cmp%ebx,0x4(%eax)
0xc1030023 try_to_del_timer_sync+20:  mov%eax,%ecx
0xc1030025 try_to_del_timer_sync+22:  je 0xc1030049 
try_to_del_timer_sync+58
0xc1030027 try_to_del_timer_sync+24:  xor%esi,%esi
0xc1030029 try_to_del_timer_sync+26:  cmpl   $0x0,(%ebx)
0xc103002c try_to_del_timer_sync+29:  je 0xc1030049 
try_to_del_timer_sync+58
0xc103002e try_to_del_timer_sync+31:  mov(%ebx),%edx
0xc1030030 try_to_del_timer_sync+33:  mov$0x1,%si
0xc1030034 try_to_del_timer_sync+37:  mov0x4(%ebx),%eax
0xc1030037 try_to_del_timer_sync+40:  mov%eax,0x4(%edx)
0xc103003a try_to_del_timer_sync+43:  mov%edx,(%eax)
0xc103003c try_to_del_timer_sync+45:  movl   $0x200200,0x4(%ebx)
0xc1030043 try_to_del_timer_sync+52:  movl   $0x0,(%ebx)
0xc1030049 try_to_del_timer_sync+58:  mov(%esp),%edx
0xc103004c try_to_del_timer_sync+61:  mov%ecx,%eax
0xc103004e try_to_del_timer_sync+63:  call   0xc122ff95 
_spin_unlock_irqrestore
0xc1030053 try_to_del_timer_sync+68:  mov%esi,%eax
0xc1030055 try_to_del_timer_sync+70:  pop%ebx
---Type return to continue, or q return to quit---
0xc1030056 try_to_del_timer_sync+71:  pop%ebx
0xc1030057 try_to_del_timer_sync+72:  pop%esi
0xc1030058 try_to_del_timer_sync+73:  ret
0xc1030059 del_timer_sync:push   %ebx
0xc103005a del_timer_sync+1:  mov%eax,%ebx
0xc103005c del_timer_sync+3:  mov%ebx,%eax
0xc103005e del_timer_sync+5:  call   0xc103000f try_to_del_timer_sync
0xc1030063 del_timer_sync+10: test   %eax,%eax
0xc1030065 del_timer_sync+12: jns0xc103006b del_timer_sync+18
0xc1030067 del_timer_sync+14: pause  
0xc1030069 del_timer_sync+16: jmp0xc103005c del_timer_sync+3
0xc103006b del_timer_sync+18: pop%ebx
0xc103006c del_timer_sync+19: ret
0xc103006d __mod_timer:   push   %ebp
0xc103006e __mod_timer+1: mov%edx,%ebp
0xc1030070 __mod_timer+3: push   %edi
0xc1030071 __mod_timer+4: push   %esi
0xc1030072 __mod_timer+5: mov%eax,%esi
0xc1030074 __mod_timer+7: push   %ebx
0xc1030075 __mod_timer+8: sub$0x8,%esp
0xc1030078 __mod_timer+11:mov0x18(%esp),%edx
0xc103007c __mod_timer+15:call   0xc102f693 
__timer_stats_timer_set_start_info
0xc1030081 __mod_timer+20:cmpl   $0x0,0xc(%esi)

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-08-03 Thread Oleg Nesterov

Chuck Ebbert wrote:
>
> Looks like the same problem with spinlock unfairness we've seen
> elsewhere: it seems to be looping here? Or is everyone stuck
> just waiting for writeout?
> 
> lock_timer_base():
> for (;;) {
> tvec_base_t *prelock_base = timer->base;
> base = tbase_get_base(prelock_base);
> if (likely(base != NULL)) {
> spin_lock_irqsave(>lock, *flags);
> if (likely(prelock_base == timer->base))
> return base;
> /* The timer has migrated to another CPU */
> spin_unlock_irqrestore(>lock, *flags);
> }
> cpu_relax();
> }

I don't think there is an unfairness problem. We are looping only
if timer->base changes in between. IOW, there is no "lock + unlock
+ lock(same_lock)" here, we take another lock on each iteration.

And:

>  [] do_IRQ+0xbd/0xd1
>  [] lock_timer_base+0x19/0x35
>  [] __mod_timer+0x9a/0xa4
>  [] schedule_timeout+0x70/0x8f
>
> ...
>
>  [] do_IRQ+0xbd/0xd1
>  [] lock_timer_base+0x19/0x35
>  [] __mod_timer+0x9a/0xa4
>  [] schedule_timeout+0x70/0x8f
>
> ...
>
>  [] lock_timer_base+0x19/0x35
>  [] __mod_timer+0x9a/0xa4
>  [] schedule_timeout+0x70/0x8f

All traces start from schedule_timeout()->mod_timer(). This timer
is local, nobody can see it, its ->base can't be NULL or changed.

This means that lock_timer_base() can't loop, but can't take the
tvec_t_base_s.lock. But in that case the trace should different,
strange.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-08-03 Thread Andrew Morton

(attempting to cc Matthias.  If I have the wrong one, please fix it up)

(please generally cc reporters when forwarding their bug reports)

On Wed, 01 Aug 2007 18:39:51 -0400
Chuck Ebbert <[EMAIL PROTECTED]> wrote:

> Looks like the same problem with spinlock unfairness we've seen
> elsewhere: it seems to be looping here? Or is everyone stuck
> just waiting for writeout?
> 
> lock_timer_base():
> for (;;) {
> tvec_base_t *prelock_base = timer->base;
> base = tbase_get_base(prelock_base);
> if (likely(base != NULL)) {
> spin_lock_irqsave(>lock, *flags);
> if (likely(prelock_base == timer->base))
> return base;
> /* The timer has migrated to another CPU */
> spin_unlock_irqrestore(>lock, *flags);
> }
> cpu_relax();
> }
> 
> The problem goes away completely if filesystem are mounted
> *without* noatime. Has happened in 2.6.20 through 2.6.22...
> 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=249563
> 
> Part of sysrq-t listing:
> 
> mysqldD 17c0  2196 23162   1562
>e383fcb8 0082 61650954 17c0 e383fc9c  c0407208 
> e383f000 
>a12b0434 4d1d c6ed2c00 c6ed2d9c c200fa80  c0724640 
> f6c60540 
>c4ff3c70 0508 0286 c042ffcb e383fcc8 00014926  
> 0286 
> Call Trace:
>  [] do_IRQ+0xbd/0xd1
>  [] lock_timer_base+0x19/0x35
>  [] __mod_timer+0x9a/0xa4
>  [] schedule_timeout+0x70/0x8f
>  [] process_timeout+0x0/0x5
>  [] schedule_timeout+0x6b/0x8f
>  [] io_schedule_timeout+0x39/0x5d
>  [] congestion_wait+0x50/0x64
>  [] autoremove_wake_function+0x0/0x35
>  [] balance_dirty_pages_ratelimited_nr+0x148/0x193
>  [] generic_file_buffered_write+0x4c7/0x5d3

I expect the lock_timer_base() this is just stack gunk.  Matthias's trace
also includes

mysqldS 17c0  2524  1623   1562
   f6ce3b44 0082 60ca34b2 17c0 f6ce3b28  f6ce3b54 f6ce3000 
   57c63d9c 4d1d f6c9 f6c9019c c200fa80  c0724640 f6c60540 
   07d0 c07e1f00 0286 c042ffcb f6ce3b54 000290ef  0286 
Call Trace:
 [] lock_timer_base+0x19/0x35
 [] __mod_timer+0x9a/0xa4
 [] schedule_timeout+0x70/0x8f
 [] process_timeout+0x0/0x5
 [] schedule_timeout+0x6b/0x8f
 [] do_select+0x36d/0x3c4
 [] __pollwait+0x0/0xac
 [] __next_cpu+0x12/0x1e
 [] find_busiest_group+0x1c4/0x553
 [] update_curr+0x23b/0x25c
 [] rb_insert_color+0x8c/0xad
 [] enqueue_entity+0x276/0x294

and it appears that schedule_timeout() always leaves a copy of
lock_timer_base+0x19 on the stack.  Enabling CONFIG_FRAME_POINTER might
help sort that out.

I think.  Or perhaps lock_timer_base() really has gone and got stuck.  One
possibility is that gcc has decided to cache timer->base in a register
rather than rereading it around that loop, which would be bad.  Do:

gdb vmlinux
(gdb) x/100i lock_timer_base

Is the machine really completely dead?  Or are some tasks running?  If the
latter, it might be dirty-memory windup - perhaps some device driver has
died and we're not getting writes out to disk.

Are all the CPUs running flat-out?  If so, yup, maybe it's
lock_timer_base().  Hit sysrq-P ten times, see where things are stuck.

Please leave `vmstat 1' running in an ssh seesion next time, let's see the
output just prior to the hang.

And do this:

while true
do
echo
cat /proc/meminfo
sleep 1
done

in another ssh session so we can see what the memory looked like when it
died too.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-08-03 Thread Andrew Morton


(attempting to cc Matthias.  If I have the wrong one, please fix it up)

(please generally cc reporters when forwarding their bug reports)

On Wed, 01 Aug 2007 18:39:51 -0400
Chuck Ebbert [EMAIL PROTECTED] wrote:

 Looks like the same problem with spinlock unfairness we've seen
 elsewhere: it seems to be looping here? Or is everyone stuck
 just waiting for writeout?
 
 lock_timer_base():
 for (;;) {
 tvec_base_t *prelock_base = timer-base;
 base = tbase_get_base(prelock_base);
 if (likely(base != NULL)) {
 spin_lock_irqsave(base-lock, *flags);
 if (likely(prelock_base == timer-base))
 return base;
 /* The timer has migrated to another CPU */
 spin_unlock_irqrestore(base-lock, *flags);
 }
 cpu_relax();
 }
 
 The problem goes away completely if filesystem are mounted
 *without* noatime. Has happened in 2.6.20 through 2.6.22...
 
 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=249563
 
 Part of sysrq-t listing:
 
 mysqldD 17c0  2196 23162   1562
e383fcb8 0082 61650954 17c0 e383fc9c  c0407208 
 e383f000 
a12b0434 4d1d c6ed2c00 c6ed2d9c c200fa80  c0724640 
 f6c60540 
c4ff3c70 0508 0286 c042ffcb e383fcc8 00014926  
 0286 
 Call Trace:
  [c0407208] do_IRQ+0xbd/0xd1
  [c042ffcb] lock_timer_base+0x19/0x35
  [c04300df] __mod_timer+0x9a/0xa4
  [c060bb55] schedule_timeout+0x70/0x8f
  [c042fd37] process_timeout+0x0/0x5
  [c060bb50] schedule_timeout+0x6b/0x8f
  [c060b67c] io_schedule_timeout+0x39/0x5d
  [c0465eea] congestion_wait+0x50/0x64
  [c0438539] autoremove_wake_function+0x0/0x35
  [c04620e2] balance_dirty_pages_ratelimited_nr+0x148/0x193
  [c045e7fd] generic_file_buffered_write+0x4c7/0x5d3

I expect the lock_timer_base() this is just stack gunk.  Matthias's trace
also includes

mysqldS 17c0  2524  1623   1562
   f6ce3b44 0082 60ca34b2 17c0 f6ce3b28  f6ce3b54 f6ce3000 
   57c63d9c 4d1d f6c9 f6c9019c c200fa80  c0724640 f6c60540 
   07d0 c07e1f00 0286 c042ffcb f6ce3b54 000290ef  0286 
Call Trace:
 [c042ffcb] lock_timer_base+0x19/0x35
 [c04300df] __mod_timer+0x9a/0xa4
 [c060bb55] schedule_timeout+0x70/0x8f
 [c042fd37] process_timeout+0x0/0x5
 [c060bb50] schedule_timeout+0x6b/0x8f
 [c04857c7] do_select+0x36d/0x3c4
 [c0485da0] __pollwait+0x0/0xac
 [c04e934d] __next_cpu+0x12/0x1e
 [c0420130] find_busiest_group+0x1c4/0x553
 [c0420ada] update_curr+0x23b/0x25c
 [c04eb700] rb_insert_color+0x8c/0xad
 [c042128f] enqueue_entity+0x276/0x294

and it appears that schedule_timeout() always leaves a copy of
lock_timer_base+0x19 on the stack.  Enabling CONFIG_FRAME_POINTER might
help sort that out.

I think.  Or perhaps lock_timer_base() really has gone and got stuck.  One
possibility is that gcc has decided to cache timer-base in a register
rather than rereading it around that loop, which would be bad.  Do:

gdb vmlinux
(gdb) x/100i lock_timer_base

Is the machine really completely dead?  Or are some tasks running?  If the
latter, it might be dirty-memory windup - perhaps some device driver has
died and we're not getting writes out to disk.

Are all the CPUs running flat-out?  If so, yup, maybe it's
lock_timer_base().  Hit sysrq-P ten times, see where things are stuck.


Please leave `vmstat 1' running in an ssh seesion next time, let's see the
output just prior to the hang.

And do this:

while true
do
echo
cat /proc/meminfo
sleep 1
done

in another ssh session so we can see what the memory looked like when it
died too.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-08-03 Thread Oleg Nesterov

Chuck Ebbert wrote:

 Looks like the same problem with spinlock unfairness we've seen
 elsewhere: it seems to be looping here? Or is everyone stuck
 just waiting for writeout?
 
 lock_timer_base():
 for (;;) {
 tvec_base_t *prelock_base = timer-base;
 base = tbase_get_base(prelock_base);
 if (likely(base != NULL)) {
 spin_lock_irqsave(base-lock, *flags);
 if (likely(prelock_base == timer-base))
 return base;
 /* The timer has migrated to another CPU */
 spin_unlock_irqrestore(base-lock, *flags);
 }
 cpu_relax();
 }

I don't think there is an unfairness problem. We are looping only
if timer-base changes in between. IOW, there is no lock + unlock
+ lock(same_lock) here, we take another lock on each iteration.

And:

  [c0407208] do_IRQ+0xbd/0xd1
  [c042ffcb] lock_timer_base+0x19/0x35
  [c04300df] __mod_timer+0x9a/0xa4
  [c060bb55] schedule_timeout+0x70/0x8f

 ...

  [c0407208] do_IRQ+0xbd/0xd1
  [c042ffcb] lock_timer_base+0x19/0x35
  [c04300df] __mod_timer+0x9a/0xa4
  [c060bb55] schedule_timeout+0x70/0x8f

 ...

  [c042ffcb] lock_timer_base+0x19/0x35
  [c04300df] __mod_timer+0x9a/0xa4
  [c060bb55] schedule_timeout+0x70/0x8f

All traces start from schedule_timeout()-mod_timer(). This timer
is local, nobody can see it, its -base can't be NULL or changed.

This means that lock_timer_base() can't loop, but can't take the
tvec_t_base_s.lock. But in that case the trace should different,
strange.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-08-02 Thread richard kennedy

On Wed, 01 Aug 2007 18:39:51 -0400, Chuck Ebbert wrote:

> Looks like the same problem with spinlock unfairness we've seen
> elsewhere: it seems to be looping here? Or is everyone stuck just
> waiting for writeout?
> 
> lock_timer_base():
> for (;;) {
> tvec_base_t *prelock_base = timer->base; base =
> tbase_get_base(prelock_base);
> if (likely(base != NULL)) {
> spin_lock_irqsave(>lock, *flags); if
> (likely(prelock_base == timer->base))
> return base;
> /* The timer has migrated to another CPU */
> spin_unlock_irqrestore(>lock, *flags);
> }
> cpu_relax();
> }
> 
> The problem goes away completely if filesystem are mounted *without*
> noatime. Has happened in 2.6.20 through 2.6.22...
> 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=249563
> 
> Part of sysrq-t listing:
> 
> mysqldD 17c0  2196 23162   1562
>e383fcb8 0082 61650954 17c0 e383fc9c  c0407208
>e383f000 a12b0434 4d1d c6ed2c00 c6ed2d9c c200fa80 
>c0724640 f6c60540 c4ff3c70 0508 0286 c042ffcb e383fcc8
>00014926  0286
> Call Trace:
>  [] do_IRQ+0xbd/0xd1
>  [] lock_timer_base+0x19/0x35 []
>  __mod_timer+0x9a/0xa4
>  [] schedule_timeout+0x70/0x8f []
>  process_timeout+0x0/0x5
>  [] schedule_timeout+0x6b/0x8f []
>  io_schedule_timeout+0x39/0x5d [] congestion_wait+0x50/0x64
>  [] autoremove_wake_function+0x0/0x35 []
>  balance_dirty_pages_ratelimited_nr+0x148/0x193 []
>  generic_file_buffered_write+0x4c7/0x5d3
> 
> 
> named D 17c0  2024  1454  1
>f722acb0 0082 6165ed96 17c0 c1523e80 c16f0c00 c16f20e0
>f722a000 a12be87d 4d1d f768ac00 f768ad9c c200fa80 
> f75bda80 c0407208 0508 0286 c042ffcb f722acc0
>00020207  0286
> Call Trace:
>  [] do_IRQ+0xbd/0xd1
>  [] lock_timer_base+0x19/0x35 []
>  __mod_timer+0x9a/0xa4
>  [] schedule_timeout+0x70/0x8f []
>  process_timeout+0x0/0x5
>  [] schedule_timeout+0x6b/0x8f []
>  io_schedule_timeout+0x39/0x5d [] congestion_wait+0x50/0x64
>  [] autoremove_wake_function+0x0/0x35 []
>  balance_dirty_pages_ratelimited_nr+0x148/0x193 []
>  generic_file_buffered_write+0x4c7/0x5d3
> 
> 
> mysqldD 17c0  2196 23456   1562
>e9293cb8 0082 616692ed 17c0 e9293c9c  e9293cc8
>e9293000 a12c8dd0 4d1d c3d5ac00 c3d5ad9c c200fa80 
>c0724640 f6c60540 e9293d10 c07e1f00 0286 c042ffcb e9293cc8
>0002b57f  0286
> Call Trace:
>  [] lock_timer_base+0x19/0x35 []
>  __mod_timer+0x9a/0xa4
>  [] schedule_timeout+0x70/0x8f []
>  process_timeout+0x0/0x5
>  [] schedule_timeout+0x6b/0x8f []
>  io_schedule_timeout+0x39/0x5d [] congestion_wait+0x50/0x64
>  [] autoremove_wake_function+0x0/0x35 []
>  balance_dirty_pages_ratelimited_nr+0x148/0x193 []
>  generic_file_buffered_write+0x4c7/0x5d3

I'm not sure if this is related to your problem, but I posted a patch the 
other day that fixes an issue in balance_dirty_pages.

If a drive is under light load is can get stuck looping waiting for 
enough pages to be available to be written. In my test case this occurs 
when one drive is under heavy load and the other isn't.

I sent it with the subject "[PATCH] balance_dirty_pages - exit loop when 
no more pages available"
here's the patch again in case it helps

Cheers
Richard

--
--- linux-2.6.22.1/mm/page-writeback.c.orig 2007-07-30 16:36:09.0 
+0100
+++ linux-2.6.22.1/mm/page-writeback.c  2007-07-31 16:26:43.0 +0100
@@ -250,6 +250,8 @@ static void balance_dirty_pages(struct a
pages_written += write_chunk - wbc.nr_to_write;
if (pages_written >= write_chunk)
break;  /* We've done our duty */
+   if (!wbc.encountered_congestion && wbc.nr_to_write > 0)
+   break;  /* didn't find enough to do */
}
congestion_wait(WRITE, HZ/10);
}




 




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Processes spinning forever, apparently in lock_timer_base()?

2007-08-02 Thread richard kennedy

On Wed, 01 Aug 2007 18:39:51 -0400, Chuck Ebbert wrote:

 Looks like the same problem with spinlock unfairness we've seen
 elsewhere: it seems to be looping here? Or is everyone stuck just
 waiting for writeout?
 
 lock_timer_base():
 for (;;) {
 tvec_base_t *prelock_base = timer-base; base =
 tbase_get_base(prelock_base);
 if (likely(base != NULL)) {
 spin_lock_irqsave(base-lock, *flags); if
 (likely(prelock_base == timer-base))
 return base;
 /* The timer has migrated to another CPU */
 spin_unlock_irqrestore(base-lock, *flags);
 }
 cpu_relax();
 }
 
 The problem goes away completely if filesystem are mounted *without*
 noatime. Has happened in 2.6.20 through 2.6.22...
 
 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=249563
 
 Part of sysrq-t listing:
 
 mysqldD 17c0  2196 23162   1562
e383fcb8 0082 61650954 17c0 e383fc9c  c0407208
e383f000 a12b0434 4d1d c6ed2c00 c6ed2d9c c200fa80 
c0724640 f6c60540 c4ff3c70 0508 0286 c042ffcb e383fcc8
00014926  0286
 Call Trace:
  [c0407208] do_IRQ+0xbd/0xd1
  [c042ffcb] lock_timer_base+0x19/0x35 [c04300df]
  __mod_timer+0x9a/0xa4
  [c060bb55] schedule_timeout+0x70/0x8f [c042fd37]
  process_timeout+0x0/0x5
  [c060bb50] schedule_timeout+0x6b/0x8f [c060b67c]
  io_schedule_timeout+0x39/0x5d [c0465eea] congestion_wait+0x50/0x64
  [c0438539] autoremove_wake_function+0x0/0x35 [c04620e2]
  balance_dirty_pages_ratelimited_nr+0x148/0x193 [c045e7fd]
  generic_file_buffered_write+0x4c7/0x5d3
 
 
 named D 17c0  2024  1454  1
f722acb0 0082 6165ed96 17c0 c1523e80 c16f0c00 c16f20e0
f722a000 a12be87d 4d1d f768ac00 f768ad9c c200fa80 
 f75bda80 c0407208 0508 0286 c042ffcb f722acc0
00020207  0286
 Call Trace:
  [c0407208] do_IRQ+0xbd/0xd1
  [c042ffcb] lock_timer_base+0x19/0x35 [c04300df]
  __mod_timer+0x9a/0xa4
  [c060bb55] schedule_timeout+0x70/0x8f [c042fd37]
  process_timeout+0x0/0x5
  [c060bb50] schedule_timeout+0x6b/0x8f [c060b67c]
  io_schedule_timeout+0x39/0x5d [c0465eea] congestion_wait+0x50/0x64
  [c0438539] autoremove_wake_function+0x0/0x35 [c04620e2]
  balance_dirty_pages_ratelimited_nr+0x148/0x193 [c045e7fd]
  generic_file_buffered_write+0x4c7/0x5d3
 
 
 mysqldD 17c0  2196 23456   1562
e9293cb8 0082 616692ed 17c0 e9293c9c  e9293cc8
e9293000 a12c8dd0 4d1d c3d5ac00 c3d5ad9c c200fa80 
c0724640 f6c60540 e9293d10 c07e1f00 0286 c042ffcb e9293cc8
0002b57f  0286
 Call Trace:
  [c042ffcb] lock_timer_base+0x19/0x35 [c04300df]
  __mod_timer+0x9a/0xa4
  [c060bb55] schedule_timeout+0x70/0x8f [c042fd37]
  process_timeout+0x0/0x5
  [c060bb50] schedule_timeout+0x6b/0x8f [c060b67c]
  io_schedule_timeout+0x39/0x5d [c0465eea] congestion_wait+0x50/0x64
  [c0438539] autoremove_wake_function+0x0/0x35 [c04620e2]
  balance_dirty_pages_ratelimited_nr+0x148/0x193 [c045e7fd]
  generic_file_buffered_write+0x4c7/0x5d3

I'm not sure if this is related to your problem, but I posted a patch the 
other day that fixes an issue in balance_dirty_pages.

If a drive is under light load is can get stuck looping waiting for 
enough pages to be available to be written. In my test case this occurs 
when one drive is under heavy load and the other isn't.

I sent it with the subject [PATCH] balance_dirty_pages - exit loop when 
no more pages available
here's the patch again in case it helps

Cheers
Richard

--
--- linux-2.6.22.1/mm/page-writeback.c.orig 2007-07-30 16:36:09.0 
+0100
+++ linux-2.6.22.1/mm/page-writeback.c  2007-07-31 16:26:43.0 +0100
@@ -250,6 +250,8 @@ static void balance_dirty_pages(struct a
pages_written += write_chunk - wbc.nr_to_write;
if (pages_written = write_chunk)
break;  /* We've done our duty */
+   if (!wbc.encountered_congestion  wbc.nr_to_write  0)
+   break;  /* didn't find enough to do */
}
congestion_wait(WRITE, HZ/10);
}




 




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Processes spinning forever, apparently in lock_timer_base()?

2007-08-01 Thread Chuck Ebbert

Looks like the same problem with spinlock unfairness we've seen
elsewhere: it seems to be looping here? Or is everyone stuck
just waiting for writeout?

lock_timer_base():
for (;;) {
tvec_base_t *prelock_base = timer->base;
base = tbase_get_base(prelock_base);
if (likely(base != NULL)) {
spin_lock_irqsave(>lock, *flags);
if (likely(prelock_base == timer->base))
return base;
/* The timer has migrated to another CPU */
spin_unlock_irqrestore(>lock, *flags);
}
cpu_relax();
}

The problem goes away completely if filesystem are mounted
*without* noatime. Has happened in 2.6.20 through 2.6.22...

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=249563

Part of sysrq-t listing:

mysqldD 17c0  2196 23162   1562
   e383fcb8 0082 61650954 17c0 e383fc9c  c0407208 e383f000 
   a12b0434 4d1d c6ed2c00 c6ed2d9c c200fa80  c0724640 f6c60540 
   c4ff3c70 0508 0286 c042ffcb e383fcc8 00014926  0286 
Call Trace:
 [] do_IRQ+0xbd/0xd1
 [] lock_timer_base+0x19/0x35
 [] __mod_timer+0x9a/0xa4
 [] schedule_timeout+0x70/0x8f
 [] process_timeout+0x0/0x5
 [] schedule_timeout+0x6b/0x8f
 [] io_schedule_timeout+0x39/0x5d
 [] congestion_wait+0x50/0x64
 [] autoremove_wake_function+0x0/0x35
 [] balance_dirty_pages_ratelimited_nr+0x148/0x193
 [] generic_file_buffered_write+0x4c7/0x5d3


named D 17c0  2024  1454  1
   f722acb0 0082 6165ed96 17c0 c1523e80 c16f0c00 c16f20e0 f722a000 
   a12be87d 4d1d f768ac00 f768ad9c c200fa80   f75bda80 
   c0407208 0508 0286 c042ffcb f722acc0 00020207  0286 
Call Trace:
 [] do_IRQ+0xbd/0xd1
 [] lock_timer_base+0x19/0x35
 [] __mod_timer+0x9a/0xa4
 [] schedule_timeout+0x70/0x8f
 [] process_timeout+0x0/0x5
 [] schedule_timeout+0x6b/0x8f
 [] io_schedule_timeout+0x39/0x5d
 [] congestion_wait+0x50/0x64
 [] autoremove_wake_function+0x0/0x35
 [] balance_dirty_pages_ratelimited_nr+0x148/0x193
 [] generic_file_buffered_write+0x4c7/0x5d3


mysqldD 17c0  2196 23456   1562
   e9293cb8 0082 616692ed 17c0 e9293c9c  e9293cc8 e9293000 
   a12c8dd0 4d1d c3d5ac00 c3d5ad9c c200fa80  c0724640 f6c60540 
   e9293d10 c07e1f00 0286 c042ffcb e9293cc8 0002b57f  0286 
Call Trace:
 [] lock_timer_base+0x19/0x35
 [] __mod_timer+0x9a/0xa4
 [] schedule_timeout+0x70/0x8f
 [] process_timeout+0x0/0x5
 [] schedule_timeout+0x6b/0x8f
 [] io_schedule_timeout+0x39/0x5d
 [] congestion_wait+0x50/0x64
 [] autoremove_wake_function+0x0/0x35
 [] balance_dirty_pages_ratelimited_nr+0x148/0x193
 [] generic_file_buffered_write+0x4c7/0x5d3
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Processes spinning forever, apparently in lock_timer_base()?

2007-08-01 Thread Chuck Ebbert

Looks like the same problem with spinlock unfairness we've seen
elsewhere: it seems to be looping here? Or is everyone stuck
just waiting for writeout?

lock_timer_base():
for (;;) {
tvec_base_t *prelock_base = timer-base;
base = tbase_get_base(prelock_base);
if (likely(base != NULL)) {
spin_lock_irqsave(base-lock, *flags);
if (likely(prelock_base == timer-base))
return base;
/* The timer has migrated to another CPU */
spin_unlock_irqrestore(base-lock, *flags);
}
cpu_relax();
}

The problem goes away completely if filesystem are mounted
*without* noatime. Has happened in 2.6.20 through 2.6.22...

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=249563

Part of sysrq-t listing:

mysqldD 17c0  2196 23162   1562
   e383fcb8 0082 61650954 17c0 e383fc9c  c0407208 e383f000 
   a12b0434 4d1d c6ed2c00 c6ed2d9c c200fa80  c0724640 f6c60540 
   c4ff3c70 0508 0286 c042ffcb e383fcc8 00014926  0286 
Call Trace:
 [c0407208] do_IRQ+0xbd/0xd1
 [c042ffcb] lock_timer_base+0x19/0x35
 [c04300df] __mod_timer+0x9a/0xa4
 [c060bb55] schedule_timeout+0x70/0x8f
 [c042fd37] process_timeout+0x0/0x5
 [c060bb50] schedule_timeout+0x6b/0x8f
 [c060b67c] io_schedule_timeout+0x39/0x5d
 [c0465eea] congestion_wait+0x50/0x64
 [c0438539] autoremove_wake_function+0x0/0x35
 [c04620e2] balance_dirty_pages_ratelimited_nr+0x148/0x193
 [c045e7fd] generic_file_buffered_write+0x4c7/0x5d3


named D 17c0  2024  1454  1
   f722acb0 0082 6165ed96 17c0 c1523e80 c16f0c00 c16f20e0 f722a000 
   a12be87d 4d1d f768ac00 f768ad9c c200fa80   f75bda80 
   c0407208 0508 0286 c042ffcb f722acc0 00020207  0286 
Call Trace:
 [c0407208] do_IRQ+0xbd/0xd1
 [c042ffcb] lock_timer_base+0x19/0x35
 [c04300df] __mod_timer+0x9a/0xa4
 [c060bb55] schedule_timeout+0x70/0x8f
 [c042fd37] process_timeout+0x0/0x5
 [c060bb50] schedule_timeout+0x6b/0x8f
 [c060b67c] io_schedule_timeout+0x39/0x5d
 [c0465eea] congestion_wait+0x50/0x64
 [c0438539] autoremove_wake_function+0x0/0x35
 [c04620e2] balance_dirty_pages_ratelimited_nr+0x148/0x193
 [c045e7fd] generic_file_buffered_write+0x4c7/0x5d3


mysqldD 17c0  2196 23456   1562
   e9293cb8 0082 616692ed 17c0 e9293c9c  e9293cc8 e9293000 
   a12c8dd0 4d1d c3d5ac00 c3d5ad9c c200fa80  c0724640 f6c60540 
   e9293d10 c07e1f00 0286 c042ffcb e9293cc8 0002b57f  0286 
Call Trace:
 [c042ffcb] lock_timer_base+0x19/0x35
 [c04300df] __mod_timer+0x9a/0xa4
 [c060bb55] schedule_timeout+0x70/0x8f
 [c042fd37] process_timeout+0x0/0x5
 [c060bb50] schedule_timeout+0x6b/0x8f
 [c060b67c] io_schedule_timeout+0x39/0x5d
 [c0465eea] congestion_wait+0x50/0x64
 [c0438539] autoremove_wake_function+0x0/0x35
 [c04620e2] balance_dirty_pages_ratelimited_nr+0x148/0x193
 [c045e7fd] generic_file_buffered_write+0x4c7/0x5d3
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

50 matches

Mail list logo