Re: Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

2001-04-03 Thread Paul Cassella

On Wed, 28 Mar 2001, Paul Cassella wrote:

> Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

I've been running -ac27 for over 5 days, and it's been fine, so this seems
to have been fixed.

-- 
Paul Cassella

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

2001-03-28 Thread Leonid Mamtchenkov

Hello Paul Cassella,

Once you wrote about "Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.":
PC> [1.] One line summary of the problem:    
PC> Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

I have similar problem with 2.4.0, 2.4.1, 2.4.2.  I tried running -ac24,25,26
and 2.4.3-pre6 and I don't have any problems so far.

PC> [2.] Full description of the problem/report:
PC> I have had hangs under 2.4.2-ac18, -ac19, and -ac24, after uptimes of
PC> 36 hours, 12 hours, and 10 hours, respectively.  -ac12 has twice run
PC> for a week without crashing.  I didn't see anything in the later -ac
PC> changelogs that looks responsible, but I haven't actually tried them.

My uptimes were bigger, but each of them was 16 days + X hours (X being 0-20)

PC> All the crashes were under X.  The machine did not respond to pings,
PC> and no sysrq keys other than B worked; I didn't hear disk activity
PC> after S, and the disks weren't unmounted.  Nothing made it to the
PC> logs.  In the -ac19 crash, I had run at the console for about 12
PC> hours, and then started X; it crashed within 15 minutes.

I also have all these troubles under X.

PC> In the one crash that happened while I was at the console, X
PC> completely froze, and sound output stopped.  In the others, the
PC> monitor was in power-save mode and didn't wake up.

I had it twice.

PC> The hangs don't appear to be related to IO load or anything else I can
PC> think of besides X.  Each time, there was a distributed.net client
PC> running, and nothing else that was in any way intensive.  I don't
PC> believe any sort of updatedb or makewhatis was running during the
PC> crashes, and it never hung overnight when these jobs run.

No distributed.net client here ;)

PC> I ran with -ac12 with nearly 1300 lines of diff narrowed down from
PC> [...skip...]
PC> - i810, (Debian unstable) X 4.0.2, with DRI

I think that the problem might be somewhere he.  I am running i810, 
(RedHat 7...not original anymore :)) X 4.0.1.

PC> I'll be happy to try out patches, configuration changes, and other
PC> suggestions, but I won't be able to tell for three or four days
PC> whether or not it helped.

With regular uptime of 16 days I will be very slow responsive for the testing
phase, though I am willing to try too ;)

-- 
 Best regards,
 Leonid Mamtchenkov
 System Administrator

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

2001-03-28 Thread Paul Cassella

On Thu, 29 Mar 2001, Alan Cox wrote:

> Was anything between 12 and 18 stable ?

I didn't actually try them; I jumped right from 12 to 18, and when that
and 19 died, I went back to 12. 

But a quick look suggests that the entire patch I'd applied to 12 and got
a hang with was in 13, including the pm.c change.

I also haven't tried anything after 24; is it likely to have been fixed?

-- 
Paul Cassella

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

2001-03-28 Thread Alan Cox

> I have had hangs under 2.4.2-ac18, -ac19, and -ac24, after uptimes of
> 36 hours, 12 hours, and 10 hours, respectively.  -ac12 has twice run
> for a week without crashing.  I didn't see anything in the later -ac
> changelogs that looks responsible, but I haven't actually tried them.

Was anything between 12 and 18 stable ?

> A few lines earlier in this function, inode->i_op->truncate() is called
> without lock_kernel().  Should it also have a lock_kernel(), or is it not
> needed there? 

Absolutely correct. The lock is missing. Bizarrely Al Viro just noticed this
about 15 minutes ago


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

2001-03-28 Thread Paul Cassella

Earlier today, I wrote

> and no sysrq keys other than B worked; I didn't hear disk activity
> after S, and the disks weren't unmounted.  Nothing made it to the

Of course, when I rebooted this time (after SysRQ S,U,B), all the
filesystems were clean.

Nothing in the logs this time either though.

> When I get home and reboot (following this most recent hang :( ), I'll
> put the diff, .config, and more stuff from /proc at

>   http://manetheren.eigenray.com/~fortytwo/crash-12-18.2

This is now there.

-- 
Paul Cassella


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

2001-03-28 Thread Paul Cassella

[1.] One line summary of the problem:

Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.

[2.] Full description of the problem/report:

I have had hangs under 2.4.2-ac18, -ac19, and -ac24, after uptimes of
36 hours, 12 hours, and 10 hours, respectively.  -ac12 has twice run
for a week without crashing.  I didn't see anything in the later -ac
changelogs that looks responsible, but I haven't actually tried them.

All the crashes were under X.  The machine did not respond to pings,
and no sysrq keys other than B worked; I didn't hear disk activity
after S, and the disks weren't unmounted.  Nothing made it to the
logs.  In the -ac19 crash, I had run at the console for about 12
hours, and then started X; it crashed within 15 minutes.

In the one crash that happened while I was at the console, X
completely froze, and sound output stopped.  In the others, the
monitor was in power-save mode and didn't wake up.

The hangs don't appear to be related to IO load or anything else I can
think of besides X.  Each time, there was a distributed.net client
running, and nothing else that was in any way intensive.  I don't
believe any sort of updatedb or makewhatis was running during the
crashes, and it never hung overnight when these jobs run.


I ran with -ac12 with nearly 1300 lines of diff narrowed down from
"interdiff -h ac12 ac18" for about 36 hours in console mode; it hung
within 3 hours of starting X.

When I get home and reboot (following this most recent hang :( ), I'll
put the diff, .config, and more stuff from /proc at

  http://manetheren.eigenray.com/~fortytwo/crash-12-18.2

This should be sometime around 8PM CST.  (If someone wants the diff
now, email me.  I have it here, but I don't want to spam the list with
it.)

This diff wasn't "complete"; some modules (ide-cd, at least) weren't
able to load due to missing symbols.


The diff included all the changes referencing bust_spinlocks(), and
everything to do with the console_sem and the console tasklet/tq.  This
included all the changes to printk.c. 

It also included the following.  In -ac18, this is a BUG(), not a
printk(), but I wanted something I could see while X was running.  The
message never showed up.  I didn't look to see what the effect of
returning -1 here is, though.


diff -u linux.ac/kernel/pm.c linux.ac/kernel/pm.c
--- linux.ac/kernel/pm.c
+++ linux.ac/kernel/pm.c
@@ -150,6 +154,10 @@
 {
int status = 0;
int prev_state, next_state;
+
+   if (in_interrupt())
+   {printk("pm_send called from interrupt (0x%p)!\n", 
+__builtin_return_address(0)); return -1; }
+
switch (rqst) {
case PM_SUSPEND:
case PM_RESUME:

AFAICT there was nothing else in the diff.


[7.1.] Software (add the output of the ver_linux script here)

Linux manetheren 2.4.2-ac12 #8 Mon Mar 5 20:02:30 CST 2001 i686 unknown
 
Gnu C  2.95.2
Gnu make   3.79.1
binutils   2.11.90.0.1
util-linux 2.11a
modutils   2.4.2
e2fsprogs  1.19
Linux C Library2.2.2
Dynamic linker (ldd)   2.2.2
Procps 2.0.7
Net-tools  1.59
Console-tools  0.2.3
Sh-utils   2.0.11
Modules Loaded usb-uhci parport_pc lp parport binfmt_misc rtc usbcore

Since I didn't think to copy my .config off the machine, I won't be
able to get to it until tonight.  In the meantime, I do remember that

- It's a UP kernel on a UP box
- Celeron kernel and processor
- The hang happens with USB completely disabled
   (Though I don't think I ever turned off hotplugging.)
- VTs, console on VT, and console on serial configured
   (console was not on serial)
- i810, (Debian unstable) X 4.0.2, with DRI
- PIIX tuning enabled
- Auto-DMA
- No kernel debugging other than SysRq
- No SCSI
- APM was off; don't remember the other pm stuff.
- ecn was on, syncookies off.
- no ip masquerading or firewalling or anything fancy.
- 128M RAM; no HIGHMEM stuff.


I'll be happy to try out patches, configuration changes, and other
suggestions, but I won't be able to tell for three or four days
whether or not it helped.


[7.2.] Processor information (from /proc/cpuinfo):

Single processor,
cpu family  : 6
model   : 6
model name  : Celeron (Mendocino) (466Mhz/66Mhz FSB)
stepping: 5
cpu MHz : 465.265
cache size  : 128 KB


[7.3.] Module information (from /proc/modules):

The modules loaded at the -ac24 crash appear to have been

visor   8400   1
usbserial  17488   1 [visor]
parport_pc 18480   1 (autoclean)
lp  6096   1 (autoclean)
parport24704   1 (autoclean) [parport_pc lp]
uhci   21920   0 (unused)
binfmt_misc 5600   0
rtc 5056   0 (autoclean)
usbcore50480