Re: Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.
On Wed, 28 Mar 2001, Paul Cassella wrote: > Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12. I've been running -ac27 for over 5 days, and it's been fine, so this seems to have been fixed. -- Paul Cassella - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.
Hello Paul Cassella, Once you wrote about "Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.": PC> [1.] One line summary of the problem: PC> Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12. I have similar problem with 2.4.0, 2.4.1, 2.4.2. I tried running -ac24,25,26 and 2.4.3-pre6 and I don't have any problems so far. PC> [2.] Full description of the problem/report: PC> I have had hangs under 2.4.2-ac18, -ac19, and -ac24, after uptimes of PC> 36 hours, 12 hours, and 10 hours, respectively. -ac12 has twice run PC> for a week without crashing. I didn't see anything in the later -ac PC> changelogs that looks responsible, but I haven't actually tried them. My uptimes were bigger, but each of them was 16 days + X hours (X being 0-20) PC> All the crashes were under X. The machine did not respond to pings, PC> and no sysrq keys other than B worked; I didn't hear disk activity PC> after S, and the disks weren't unmounted. Nothing made it to the PC> logs. In the -ac19 crash, I had run at the console for about 12 PC> hours, and then started X; it crashed within 15 minutes. I also have all these troubles under X. PC> In the one crash that happened while I was at the console, X PC> completely froze, and sound output stopped. In the others, the PC> monitor was in power-save mode and didn't wake up. I had it twice. PC> The hangs don't appear to be related to IO load or anything else I can PC> think of besides X. Each time, there was a distributed.net client PC> running, and nothing else that was in any way intensive. I don't PC> believe any sort of updatedb or makewhatis was running during the PC> crashes, and it never hung overnight when these jobs run. No distributed.net client here ;) PC> I ran with -ac12 with nearly 1300 lines of diff narrowed down from PC> [...skip...] PC> - i810, (Debian unstable) X 4.0.2, with DRI I think that the problem might be somewhere he. I am running i810, (RedHat 7...not original anymore :)) X 4.0.1. PC> I'll be happy to try out patches, configuration changes, and other PC> suggestions, but I won't be able to tell for three or four days PC> whether or not it helped. With regular uptime of 16 days I will be very slow responsive for the testing phase, though I am willing to try too ;) -- Best regards, Leonid Mamtchenkov System Administrator - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.
On Thu, 29 Mar 2001, Alan Cox wrote: > Was anything between 12 and 18 stable ? I didn't actually try them; I jumped right from 12 to 18, and when that and 19 died, I went back to 12. But a quick look suggests that the entire patch I'd applied to 12 and got a hang with was in 13, including the pm.c change. I also haven't tried anything after 24; is it likely to have been fixed? -- Paul Cassella - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.
> I have had hangs under 2.4.2-ac18, -ac19, and -ac24, after uptimes of > 36 hours, 12 hours, and 10 hours, respectively. -ac12 has twice run > for a week without crashing. I didn't see anything in the later -ac > changelogs that looks responsible, but I haven't actually tried them. Was anything between 12 and 18 stable ? > A few lines earlier in this function, inode->i_op->truncate() is called > without lock_kernel(). Should it also have a lock_kernel(), or is it not > needed there? Absolutely correct. The lock is missing. Bizarrely Al Viro just noticed this about 15 minutes ago - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.
Earlier today, I wrote > and no sysrq keys other than B worked; I didn't hear disk activity > after S, and the disks weren't unmounted. Nothing made it to the Of course, when I rebooted this time (after SysRQ S,U,B), all the filesystems were clean. Nothing in the logs this time either though. > When I get home and reboot (following this most recent hang :( ), I'll > put the diff, .config, and more stuff from /proc at > http://manetheren.eigenray.com/~fortytwo/crash-12-18.2 This is now there. -- Paul Cassella - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12.
[1.] One line summary of the problem: Hangs under 2.4.2-ac{18,19,24} that do not happen under -ac12. [2.] Full description of the problem/report: I have had hangs under 2.4.2-ac18, -ac19, and -ac24, after uptimes of 36 hours, 12 hours, and 10 hours, respectively. -ac12 has twice run for a week without crashing. I didn't see anything in the later -ac changelogs that looks responsible, but I haven't actually tried them. All the crashes were under X. The machine did not respond to pings, and no sysrq keys other than B worked; I didn't hear disk activity after S, and the disks weren't unmounted. Nothing made it to the logs. In the -ac19 crash, I had run at the console for about 12 hours, and then started X; it crashed within 15 minutes. In the one crash that happened while I was at the console, X completely froze, and sound output stopped. In the others, the monitor was in power-save mode and didn't wake up. The hangs don't appear to be related to IO load or anything else I can think of besides X. Each time, there was a distributed.net client running, and nothing else that was in any way intensive. I don't believe any sort of updatedb or makewhatis was running during the crashes, and it never hung overnight when these jobs run. I ran with -ac12 with nearly 1300 lines of diff narrowed down from "interdiff -h ac12 ac18" for about 36 hours in console mode; it hung within 3 hours of starting X. When I get home and reboot (following this most recent hang :( ), I'll put the diff, .config, and more stuff from /proc at http://manetheren.eigenray.com/~fortytwo/crash-12-18.2 This should be sometime around 8PM CST. (If someone wants the diff now, email me. I have it here, but I don't want to spam the list with it.) This diff wasn't "complete"; some modules (ide-cd, at least) weren't able to load due to missing symbols. The diff included all the changes referencing bust_spinlocks(), and everything to do with the console_sem and the console tasklet/tq. This included all the changes to printk.c. It also included the following. In -ac18, this is a BUG(), not a printk(), but I wanted something I could see while X was running. The message never showed up. I didn't look to see what the effect of returning -1 here is, though. diff -u linux.ac/kernel/pm.c linux.ac/kernel/pm.c --- linux.ac/kernel/pm.c +++ linux.ac/kernel/pm.c @@ -150,6 +154,10 @@ { int status = 0; int prev_state, next_state; + + if (in_interrupt()) + {printk("pm_send called from interrupt (0x%p)!\n", +__builtin_return_address(0)); return -1; } + switch (rqst) { case PM_SUSPEND: case PM_RESUME: AFAICT there was nothing else in the diff. [7.1.] Software (add the output of the ver_linux script here) Linux manetheren 2.4.2-ac12 #8 Mon Mar 5 20:02:30 CST 2001 i686 unknown Gnu C 2.95.2 Gnu make 3.79.1 binutils 2.11.90.0.1 util-linux 2.11a modutils 2.4.2 e2fsprogs 1.19 Linux C Library2.2.2 Dynamic linker (ldd) 2.2.2 Procps 2.0.7 Net-tools 1.59 Console-tools 0.2.3 Sh-utils 2.0.11 Modules Loaded usb-uhci parport_pc lp parport binfmt_misc rtc usbcore Since I didn't think to copy my .config off the machine, I won't be able to get to it until tonight. In the meantime, I do remember that - It's a UP kernel on a UP box - Celeron kernel and processor - The hang happens with USB completely disabled (Though I don't think I ever turned off hotplugging.) - VTs, console on VT, and console on serial configured (console was not on serial) - i810, (Debian unstable) X 4.0.2, with DRI - PIIX tuning enabled - Auto-DMA - No kernel debugging other than SysRq - No SCSI - APM was off; don't remember the other pm stuff. - ecn was on, syncookies off. - no ip masquerading or firewalling or anything fancy. - 128M RAM; no HIGHMEM stuff. I'll be happy to try out patches, configuration changes, and other suggestions, but I won't be able to tell for three or four days whether or not it helped. [7.2.] Processor information (from /proc/cpuinfo): Single processor, cpu family : 6 model : 6 model name : Celeron (Mendocino) (466Mhz/66Mhz FSB) stepping: 5 cpu MHz : 465.265 cache size : 128 KB [7.3.] Module information (from /proc/modules): The modules loaded at the -ac24 crash appear to have been visor 8400 1 usbserial 17488 1 [visor] parport_pc 18480 1 (autoclean) lp 6096 1 (autoclean) parport24704 1 (autoclean) [parport_pc lp] uhci 21920 0 (unused) binfmt_misc 5600 0 rtc 5056 0 (autoclean) usbcore50480