Bug#557262: Processed: Re: Bug#557262: 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) - root cause found = asterisk

2010-02-02 Thread maximilian attems
tags 557262 moreinfo
stop

On Mon, 21 Dec 2009, Debian Bug Tracking System wrote:

 Processing commands for cont...@bugs.debian.org:
 
  reassign 557262 linux-2.6
 Bug #557262 [asterisk] 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state 
 after 24-48 hours (sysrq-t+w available) - root cause found = asterisk
 Bug reassigned from package 'asterisk' to 'linux-2.6'.
 Bug No longer marked as found in versions 1.6.2.0~dfsg~rc1-1.
  thanks
 Stopping processing here.


please followup to this bug report with reportbug
so that relevant info from the box gets fetched.
reportbug -N 557262

also is this bug still reproducible on 2.6.32?




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#557262: 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) - root cause found = asterisk

2009-12-20 Thread Faidon Liambotis
reassign 557262 linux-2.6
thanks

Justin,

Justin Piszcz wrote:
 Justin Piszcz wrote:
  Found root cause-- root cause is asterisk PBX software.  I use an
 SPA3102.
 When someone called me, they accidentally dropped the connection, I
 called
 them back in a short period.  It is during this time (and the last time)
 this happened that the box froze under multiple(!) kernels, always when
 someone was calling.
 snip
 I don't know what asterisk is doing but top did run before the crash
 and asterisk was using 100% CPU and as I noted before all other
 processes
 were in D-state.

 When this bug occurs, it freezes I/O to all devices and the only way to
 recover
 is to reboot the system.
 That's obviously *not* the root cause.

 It's not normal for an application that isn't even privileged to hang
 all I/O and, subsequently everything on a system.

 This is almost probably a kernel issue and asterisk just does something
 that triggers this bug.

 Regards,
 Faidon

 
 It is possible although I tried with several kernels (2.6.30.[0-9] 
 2.6.31+ (never had a crash with earlier versions, I installed asterisk long
 ago) but it always used to be 1.4.x until recently..  Nasty bug :\
I am reassigning the bug to linux-2.6 since it seems to me this is a
kernel issue. This is, indeed, a nasty bug and I'm not sure you and the
kernel maintainers will have any luck debugging it further; I'll leave
it up to them.

Thanks,
Faidon



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#557262: 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) - root cause found = asterisk

2009-11-21 Thread Roger Heflin

Faidon Liambotis wrote:

Justin Piszcz wrote:
  Found root cause-- root cause is asterisk PBX software.  I use an
SPA3102.

When someone called me, they accidentally dropped the connection, I called
them back in a short period.  It is during this time (and the last time)
this happened that the box froze under multiple(!) kernels, always when
someone was calling.

snip

I don't know what asterisk is doing but top did run before the crash
and asterisk was using 100% CPU and as I noted before all other processes
were in D-state.

When this bug occurs, it freezes I/O to all devices and the only way to
recover
is to reboot the system.

That's obviously *not* the root cause.

It's not normal for an application that isn't even privileged to hang
all I/O and, subsequently everything on a system.

This is almost probably a kernel issue and asterisk just does something
that triggers this bug.

Regards,
Faidon



I had an application in 2.6.5 (SLES9)...that would hang XFS.

The underlying application was multi-threaded and both threads were 
doing full disks syncs every so often, and sometimes when doing the 
full disk sync the XFS subsystem would deadlock, it appeared to me tha 
one sync had a lock and was waiting for another, and the other process 
had the second lock and was waiting for the first...   We were able to 
disable the full disk sync from the application and the deadlock went 
away.   All non-xfs filesytems still worked and could still be 
accessed.I did report the bug with some traces but I don't believe 
anyone ever determined where the underlying issues was.






--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#557262: 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) - root cause found = asterisk

2009-11-20 Thread Justin Piszcz

Package: asterisk
Version: 1.6.2.0~dfsg~rc1-1

See below for issue:

On Wed, 21 Oct 2009, Justin Piszcz wrote:




On Tue, 20 Oct 2009, Justin Piszcz wrote:





On Tue, 20 Oct 2009, Dave Chinner wrote:


On Mon, Oct 19, 2009 at 06:18:58AM -0400, Justin Piszcz wrote:

On Mon, 19 Oct 2009, Dave Chinner wrote:

On Sun, Oct 18, 2009 at 04:17:42PM -0400, Justin Piszcz wrote:

It has happened again, all sysrq-X output was saved this time.

.

All pointing to log IO not completing.




So far I do not have a reproducible test case,


Ok. What sort of load is being placed on the machine?

Hello, generally the load is low, it mainly serves out some samba shares.



It appears that both the xfslogd and the xfsdatad on CPU 0 are in
the running state but don't appear to be consuming any significant
CPU time. If they remain like this then I think that means they are
stuck waiting on the run queue.  Do these XFS threads always appear
like this when the hang occurs? If so, is there something else that
is hogging CPU 0 preventing these threads from getting the CPU?
Yes, the XFS threads show up like this on each time the kernel crashed.  So 
far
with 2.6.30.9 after ~48hrs+ it has not crashed.  So it appears to be some 
issue
between 2.6.30.9 and 2.6.31.x when this began happening.  Any 
recommendations

on how to catch this bug w/certain options enabled/etc?




Cheers,

Dave.
--
Dave Chinner
da...@fromorbit.com





Uptime with 2.6.30.9:

06:18:41 up 2 days, 14:10, 14 users,  load average: 0.41, 0.21, 0.07

No issues yet, so it first started happening in 2.6.(31).(x).

Any further recommendations on how to debug this issue?  BTW: Do you view 
this

as an XFS bug or MD/VFS layer issue based on the logs/output thus far?

Justin.




Found root cause-- root cause is asterisk PBX software.  I use an SPA3102.
When someone called me, they accidentally dropped the connection, I called
them back in a short period.  It is during this time (and the last time)
this happened that the box froze under multiple(!) kernels, always when
someone was calling.

I have removed asterisk but this is the version I was running:
~$ dpkg -l | grep -i asterisk
rc  asterisk 1:1.6.2.0~dfsg~rc1-1 Open S

I don't know what asterisk is doing but top did run before the crash
and asterisk was using 100% CPU and as I noted before all other processes
were in D-state.

When this bug occurs, it freezes I/O to all devices and the only way to recover
is to reboot the system.

Just FYI if anyone else out there has their system crash when running asterisk.

Just out of curiosity, has anyone else running asterisk had such an issue? 
I was not running any special VoIP PCI cards/etc.


Justin.




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#557262: 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) - root cause found = asterisk

2009-11-20 Thread Justin Piszcz



On Sat, 21 Nov 2009, Faidon Liambotis wrote:


Justin Piszcz wrote:
 Found root cause-- root cause is asterisk PBX software.  I use an
SPA3102.

When someone called me, they accidentally dropped the connection, I called
them back in a short period.  It is during this time (and the last time)
this happened that the box froze under multiple(!) kernels, always when
someone was calling.

snip

I don't know what asterisk is doing but top did run before the crash
and asterisk was using 100% CPU and as I noted before all other processes
were in D-state.

When this bug occurs, it freezes I/O to all devices and the only way to
recover
is to reboot the system.

That's obviously *not* the root cause.

It's not normal for an application that isn't even privileged to hang
all I/O and, subsequently everything on a system.

This is almost probably a kernel issue and asterisk just does something
that triggers this bug.

Regards,
Faidon



It is possible although I tried with several kernels (2.6.30.[0-9]  
2.6.31+ (never had a crash with earlier versions, I installed asterisk long

ago) but it always used to be 1.4.x until recently..  Nasty bug :\

Justin.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#557262: 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) - root cause found = asterisk

2009-11-20 Thread Faidon Liambotis
Justin Piszcz wrote:
  Found root cause-- root cause is asterisk PBX software.  I use an
SPA3102.
 When someone called me, they accidentally dropped the connection, I called
 them back in a short period.  It is during this time (and the last time)
 this happened that the box froze under multiple(!) kernels, always when
 someone was calling.
snip
 I don't know what asterisk is doing but top did run before the crash
 and asterisk was using 100% CPU and as I noted before all other processes
 were in D-state.
 
 When this bug occurs, it freezes I/O to all devices and the only way to
 recover
 is to reboot the system.
That's obviously *not* the root cause.

It's not normal for an application that isn't even privileged to hang
all I/O and, subsequently everything on a system.

This is almost probably a kernel issue and asterisk just does something
that triggers this bug.

Regards,
Faidon



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org