Re: 2.6.24-rc5-mm1 - SCSI/blkdev probing hang
On Thu, 20 Dec 2007 13:22:12 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > On Thu, 20 Dec 2007 15:57:45 -0500 > Rik van Riel <[EMAIL PROTECTED]> wrote: > > > 2.6.24-rc5-mm1 seems to have a hang related to the SCSI or block > > device probing code. > It could be a scsi problem, or it could be all the kobject changes in > Greg's driver tree. Or a combination of the two. > > Don't know, sorry. Whatever it was, it's gone now. 2.6.24-rc6-mm1 boots on my system. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 - SCSI/blkdev probing hang
On Thu, 20 Dec 2007 13:22:12 -0800 Andrew Morton [EMAIL PROTECTED] wrote: On Thu, 20 Dec 2007 15:57:45 -0500 Rik van Riel [EMAIL PROTECTED] wrote: 2.6.24-rc5-mm1 seems to have a hang related to the SCSI or block device probing code. It could be a scsi problem, or it could be all the kobject changes in Greg's driver tree. Or a combination of the two. Don't know, sorry. Whatever it was, it's gone now. 2.6.24-rc6-mm1 boots on my system. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Fri, 21 Dec 2007 22:51:45 +0100 Mariusz Kozlowski <[EMAIL PROTECTED]> wrote: > > Here's a test patch: > > Tested on 2.6.23 and 2.6.24-rc5-mm1. The patch fixes the bug. > > Thanks a lot to both of you. Thank you for testing -mm (especially on sparc64) and for reporting the bug and for testing the fix. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
Hello, > > > [ 145.128915] TSTATE: 004411009603 TPC: 005119ac TNPC: > > > 005119b0 Y: Not tainted > > > [ 145.128940] TPC: > > > > My suspicion at this point is that with certain RAM layouts, simply > > iterating over PFN's is simply not working out. > > That was my original suspicion, which is why I asked Mariusz to > effectively comment out the actual PFN lookup up-thread. I didn't send > him a patch to do that, so I guess my instructions on how to hack it > may have been misunderstood. No. I just made a trivial mistake :-/ Sorry for confusion. I guess I need to verify things three times before sending an email next time. > > pfn_to_page() seems to be doing no range checking, and with sparsemem > > vmemmap, which sparc64 always uses, this can be problematic. > > > > It just blindly goes "vmemmap + pfn" which is asking for trouble, in > > particular when the physical RAM layout really is sparse. > > > > Maybe it's enough to add a pfn_valid() check here? If pfn_valid() > > means there is a vmemmap translation setup for that page struct too, > > it would work. > > Here's a test patch: Tested on 2.6.23 and 2.6.24-rc5-mm1. The patch fixes the bug. Thanks a lot to both of you. Mariusz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1
Andrew Morton wrote: > On Thu, 20 Dec 2007 10:55:51 -0600 > Jason Wessel <[EMAIL PROTECTED]> wrote: > > >> Andrew Morton wrote: >> >>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc5/2.6.24-rc5-mm1/ >>> >>> - git-kgdb.patch is still dropped for the same reason >>> >>> [snip] Regarding the merge output targeting -mm > Conflicts with the arm, ia64, mips, sh and driver trees (at least). I > fixed most of them but gave up on sh, where there has been major code > motion. > > Andrew, Given the churn in patches I think the best approach is to put kgdb after you have cut a -mm1 so it can go in -mm2 or as a fix or however you would like to manage it. The churn should be a whole lot less once the new kgdb arch support gets merged. I updated the for_mm branch to be against 2.6.24-rc5-mm1, and it will merge cleanly. I can update again once the next mm branch is available. Thanks, Jason. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1
Andrew Morton wrote: On Thu, 20 Dec 2007 10:55:51 -0600 Jason Wessel [EMAIL PROTECTED] wrote: Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc5/2.6.24-rc5-mm1/ - git-kgdb.patch is still dropped for the same reason [snip] Regarding the merge output targeting -mm Conflicts with the arm, ia64, mips, sh and driver trees (at least). I fixed most of them but gave up on sh, where there has been major code motion. Andrew, Given the churn in patches I think the best approach is to put kgdb after you have cut a -mm1 so it can go in -mm2 or as a fix or however you would like to manage it. The churn should be a whole lot less once the new kgdb arch support gets merged. I updated the for_mm branch to be against 2.6.24-rc5-mm1, and it will merge cleanly. I can update again once the next mm branch is available. Thanks, Jason. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
Hello, [ 145.128915] TSTATE: 004411009603 TPC: 005119ac TNPC: 005119b0 Y: Not tainted [ 145.128940] TPC: kpagecount_read+0x94/0xe0 My suspicion at this point is that with certain RAM layouts, simply iterating over PFN's is simply not working out. That was my original suspicion, which is why I asked Mariusz to effectively comment out the actual PFN lookup up-thread. I didn't send him a patch to do that, so I guess my instructions on how to hack it may have been misunderstood. No. I just made a trivial mistake :-/ Sorry for confusion. I guess I need to verify things three times before sending an email next time. pfn_to_page() seems to be doing no range checking, and with sparsemem vmemmap, which sparc64 always uses, this can be problematic. It just blindly goes vmemmap + pfn which is asking for trouble, in particular when the physical RAM layout really is sparse. Maybe it's enough to add a pfn_valid() check here? If pfn_valid() means there is a vmemmap translation setup for that page struct too, it would work. Here's a test patch: Tested on 2.6.23 and 2.6.24-rc5-mm1. The patch fixes the bug. Thanks a lot to both of you. Mariusz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Fri, 21 Dec 2007 22:51:45 +0100 Mariusz Kozlowski [EMAIL PROTECTED] wrote: Here's a test patch: Tested on 2.6.23 and 2.6.24-rc5-mm1. The patch fixes the bug. Thanks a lot to both of you. Thank you for testing -mm (especially on sparc64) and for reporting the bug and for testing the fix. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
From: Matt Mackall <[EMAIL PROTECTED]> Date: Thu, 20 Dec 2007 19:06:55 -0600 > @@ -707,7 +707,10 @@ static ssize_t kpagecount_read(struct fi > return -EIO; > > while (count > 0) { > - ppage = pfn_to_page(pfn++); > + ppage = 0; > + if (pfn_valid(pfn)) > + ppage = pfn_to_page(pfn); > + pfn++; > if (!ppage) > pcount = 0; > else Yes that should work, please use "NULL" in the final version of the patch instead of "0" so that sparse is happy. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Thu, Dec 20, 2007 at 04:17:26PM -0800, David Miller wrote: > From: Mariusz Kozlowski <[EMAIL PROTECTED]> > Date: Thu, 20 Dec 2007 20:47:55 +0100 > > > [ 145.128915] TSTATE: 004411009603 TPC: 005119ac TNPC: > > 005119b0 Y: Not tainted > > [ 145.128940] TPC: > > My suspicion at this point is that with certain RAM layouts, simply > iterating over PFN's is simply not working out. That was my original suspicion, which is why I asked Mariusz to effectively comment out the actual PFN lookup up-thread. I didn't send him a patch to do that, so I guess my instructions on how to hack it may have been misunderstood. > pfn_to_page() seems to be doing no range checking, and with sparsemem > vmemmap, which sparc64 always uses, this can be problematic. > > It just blindly goes "vmemmap + pfn" which is asking for trouble, in > particular when the physical RAM layout really is sparse. > > Maybe it's enough to add a pfn_valid() check here? If pfn_valid() > means there is a vmemmap translation setup for that page struct too, > it would work. Here's a test patch: Index: mm/fs/proc/proc_misc.c === --- mm.orig/fs/proc/proc_misc.c 2007-12-20 19:04:35.0 -0600 +++ mm/fs/proc/proc_misc.c 2007-12-20 19:06:01.0 -0600 @@ -707,7 +707,10 @@ static ssize_t kpagecount_read(struct fi return -EIO; while (count > 0) { - ppage = pfn_to_page(pfn++); + ppage = 0; + if (pfn_valid(pfn)) + ppage = pfn_to_page(pfn); + pfn++; if (!ppage) pcount = 0; else @@ -773,7 +776,10 @@ static ssize_t kpageflags_read(struct fi return -EIO; while (count > 0) { - ppage = pfn_to_page(pfn++); + ppage = 0; + if (pfn_valid(pfn)) + ppage = pfn_to_page(pfn); + pfn++; if (!ppage) kflags = 0; else -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
From: Mariusz Kozlowski <[EMAIL PROTECTED]> Date: Thu, 20 Dec 2007 20:47:55 +0100 > [ 145.128915] TSTATE: 004411009603 TPC: 005119ac TNPC: > 005119b0 Y: Not tainted > [ 145.128940] TPC: My suspicion at this point is that with certain RAM layouts, simply iterating over PFN's is simply not working out. pfn_to_page() seems to be doing no range checking, and with sparsemem vmemmap, which sparc64 always uses, this can be problematic. It just blindly goes "vmemmap + pfn" which is asking for trouble, in particular when the physical RAM layout really is sparse. Maybe it's enough to add a pfn_valid() check here? If pfn_valid() means there is a vmemmap translation setup for that page struct too, it would work. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1
On Thu, 20 Dec 2007 10:55:51 -0600 Jason Wessel <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc5/2.6.24-rc5-mm1/ > > > > - If something goes wrong with a PCI device's probing or initialisation, try > > reverting pci-disable-decoding-during-sizing-of-bars.patch. > > > > - git-sched was dropped due to breaking suspend-to-RAM. > > > > - git-block has been restored after having had a few problems > > > > - git-newsetup.patch was dropped due to conflicts with git-x86 > > > > - git-perfmon.patch is still dropped for the same reason > > > > - git-kgdb.patch is still dropped for the same reason > > > > > Andrew, > > I re-based the for_mm branch at: > http://git.kernel.org/?p=linux/kernel/git/jwessel/linux-2.6-kgdb.git;a=shortlog;h=for_mm > against the git-x86/mm branch from the x86-git tree. If there are other > patch trees I need to pull in and patch against to allow for kgdb to be > included into -mm please let me know. The x86 merge worked OK. Here's what it looks like: patching file Documentation/DocBook/Makefile Hunk #1 FAILED at 11. 1 out of 1 hunk FAILED -- saving rejects to file Documentation/DocBook/Makefile.rej patching file Documentation/DocBook/kgdb.tmpl patching file Documentation/kernel-parameters.txt Hunk #1 succeeded at 816 (offset 7 lines). patching file MAINTAINERS Hunk #1 succeeded at 2279 (offset 52 lines). patching file Makefile patching file arch/arm/kernel/Makefile patching file arch/arm/kernel/kgdb-jmp.S patching file arch/arm/kernel/kgdb.c patching file arch/arm/kernel/setup.c patching file arch/arm/kernel/traps.c patching file arch/arm/mach-ixp2000/core.c patching file arch/arm/mach-ixp2000/ixdp2x01.c patching file arch/arm/mach-ixp4xx/coyote-setup.c patching file arch/arm/mach-ixp4xx/ixdp425-setup.c patching file arch/arm/mach-omap1/serial.c patching file arch/arm/mach-omap2/serial.c patching file arch/arm/mach-pnx4008/core.c patching file arch/arm/mach-pxa/Makefile Hunk #1 FAILED at 43. 1 out of 1 hunk FAILED -- saving rejects to file arch/arm/mach-pxa/Makefile.rej patching file arch/arm/mach-pxa/kgdb-serial.c patching file arch/arm/mach-versatile/core.c patching file arch/arm/mm/extable.c patching file arch/ia64/kernel/Makefile patching file arch/ia64/kernel/kgdb-jmp.S patching file arch/ia64/kernel/kgdb.c patching file arch/ia64/kernel/smp.c patching file arch/ia64/kernel/traps.c Hunk #1 FAILED at 155. 1 out of 1 hunk FAILED -- saving rejects to file arch/ia64/kernel/traps.c.rej patching file arch/ia64/mm/extable.c patching file arch/ia64/mm/fault.c patching file arch/mips/Kconfig Hunk #2 succeeded at 323 (offset -6 lines). Hunk #4 succeeded at 419 (offset -7 lines). Hunk #5 succeeded at 531 (offset 21 lines). Hunk #6 succeeded at 608 (offset -21 lines). Hunk #7 succeeded at 670 (offset 21 lines). Hunk #8 succeeded at 914 (offset -24 lines). patching file arch/mips/Kconfig.debug patching file arch/mips/au1000/common/Makefile patching file arch/mips/au1000/common/dbg_io.c patching file arch/mips/basler/excite/Makefile patching file arch/mips/basler/excite/excite_dbg_io.c patching file arch/mips/basler/excite/excite_irq.c patching file arch/mips/basler/excite/excite_setup.c patching file arch/mips/jmr3927/rbhma3100/Makefile patching file arch/mips/jmr3927/rbhma3100/kgdb_io.c patching file arch/mips/kernel/Makefile patching file arch/mips/kernel/gdb-low.S patching file arch/mips/kernel/gdb-stub.c patching file arch/mips/kernel/irq.c patching file arch/mips/kernel/kgdb-jmp.c patching file arch/mips/kernel/kgdb-setjmp.S patching file arch/mips/kernel/kgdb.c patching file arch/mips/kernel/kgdb_handler.S patching file arch/mips/kernel/traps.c patching file arch/mips/mips-boards/atlas/Makefile patching file arch/mips/mips-boards/atlas/atlas_gdb.c patching file arch/mips/mips-boards/atlas/atlas_setup.c patching file arch/mips/mips-boards/generic/Makefile patching file arch/mips/mips-boards/generic/gdb_hook.c patching file arch/mips/mips-boards/generic/init.c patching file arch/mips/mips-boards/malta/malta_setup.c patching file arch/mips/mm/extable.c patching file arch/mips/pci/fixup-atlas.c patching file arch/mips/philips/pnx8550/common/Makefile patching file arch/mips/philips/pnx8550/common/gdb_hook.c patching file arch/mips/philips/pnx8550/common/setup.c patching file arch/mips/pmc-sierra/yosemite/Makefile patching file arch/mips/pmc-sierra/yosemite/dbg_io.c patching file arch/mips/pmc-sierra/yosemite/irq.c patching file arch/mips/sgi-ip22/ip22-setup.c patching file arch/mips/sgi-ip27/Makefile patching file arch/mips/sgi-ip27/ip27-dbgio.c patching file arch/mips/sibyte/bcm1480/irq.c patching file arch/mips/sibyte/cfe/setup.c Hunk #3 succeeded at 298 (offset -3 lines). patching file arch/mips/sibyte/sb1250/irq.c patching file arch/mips/sibyte/sb1250/kgdb_sibyte.c patching file arch/mips/sibyte/swarm/Makefile patching file arch/mips/sibyte/swarm/dbg_io.c patching file arch/mips/tx4927/common/Makefile
Re: 2.6.24-rc5-mm1 - SCSI/blkdev probing hang
On Thu, 20 Dec 2007 15:57:45 -0500 Rik van Riel <[EMAIL PROTECTED]> wrote: > 2.6.24-rc5-mm1 seems to have a hang related to the SCSI or block > device probing code. > > This is on a dual quad-core x86-64 system with megaraid_sas controller. > > scsi 0:2:0:0: Direct-Access DELL PERC 5/i 1.03 PQ: 0 ANSI: 5 > general protection fault: [1] SMP > last sysfs file: /sys/class/firmware/timeout > CPU 7 > Modules linked in: ata_piix libata dm_snapshot dm_zero dm_mirror dm_mod > shpchp megaraid_sas sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd > ehci_hcd > Pid: 678, comm: scsi_scan_0 Not tainted 2.6.24-rc5-mm1 #1 > RIP: 0010:[] [] mark_lock+0x1b/0x472 Could be that someone passed a garbage pointer into lockdep. > RSP: 0018:81043ba29c20 EFLAGS: 00010002 > RAX: 0010 RBX: 81043b9ee8f0 RCX: 81043b9ee804 > RDX: 6b6b6b6b6b6b6b6b RSI: 81043b9ee8f0 RDI: 81043b9ee000 > RBP: 81043b9ee000 R08: 0002 R09: > R10: 81129055 R11: 000281128c8d R12: 0004 > R13: 0001 R14: 0002 R15: 81043e508028 > FS: () GS:81043e4e6a28() knlGS: > CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b > CR2: 00361969afa0 CR3: 00201000 CR4: 06e0 > DR0: DR1: DR2: > DR3: DR6: 0ff0 DR7: 0400 > Process scsi_scan_0 (pid: 678, threadinfo 81043ba28000, task > 81043b9ee000) > Stack: 81043b9ee8f0 6b6b6b6b6b6b6b6b 81043b9ee000<6>ata1.00: ATAPI: > HL-DT-STCD-RW/DVD-ROM GCC-T10N, A102, max UDMA/33 > 81059139 > 3ba29c50 0002 81058623 > 81043b504660 0246 81043e508028 81043b504660 > Call Trace: > [] __lock_acquire+0x4d7/0xc8e > [] mark_held_locks+0x49/0x67 > [] lock_acquire+0x5a/0x73 > [] kobject_add+0xca/0x194 > [] mutex_lock_nested+0x2a1/0x2b0 > [] _spin_lock+0x26/0x52 > [] kobject_add+0xca/0x194 > [] device_add+0x9a/0x56e > [] :scsi_mod:scsi_alloc_target+0x2cd/0x343 > [] :scsi_mod:__scsi_scan_target+0x66/0x5c6 > [] trace_hardirqs_on+0x115/0x138 > [] :scsi_mod:scsi_scan_channel+0x45/0x70 > [] :scsi_mod:scsi_scan_host_selected+0xd5/0x110 > ata1.00: configured for UDMA/33 > ata2: port disabled. ignoring. > [] :scsi_mod:do_scan_async+0x0/0x152 > [] :scsi_mod:do_scan_async+0x14/0x152 > [] :scsi_mod:do_scan_async+0x0/0x152 > [] kthread+0x47/0x73 > [] trace_hardirqs_on_thunk+0x35/0x3a > [] child_rip+0xa/0x12 > [] restore_args+0x0/0x30 > [] menu_reflect+0x0/0x75 > [] kthreadd+0x115/0x13a > [] kthread+0x0/0x73 > [] child_rip+0x0/0x12 > It could be a scsi problem, or it could be all the kobject changes in Greg's driver tree. Or a combination of the two. Don't know, sorry. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 - SCSI/blkdev probing hang
On Thu, 13 Dec 2007 02:40:50 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > 2.6.24-rc5-mm1 seems to have a hang related to the SCSI or block device probing code. This is on a dual quad-core x86-64 system with megaraid_sas controller. scsi 0:2:0:0: Direct-Access DELL PERC 5/i 1.03 PQ: 0 ANSI: 5 general protection fault: [1] SMP last sysfs file: /sys/class/firmware/timeout CPU 7 Modules linked in: ata_piix libata dm_snapshot dm_zero dm_mirror dm_mod shpchp megaraid_sas sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 678, comm: scsi_scan_0 Not tainted 2.6.24-rc5-mm1 #1 RIP: 0010:[] [] mark_lock+0x1b/0x472 RSP: 0018:81043ba29c20 EFLAGS: 00010002 RAX: 0010 RBX: 81043b9ee8f0 RCX: 81043b9ee804 RDX: 6b6b6b6b6b6b6b6b RSI: 81043b9ee8f0 RDI: 81043b9ee000 RBP: 81043b9ee000 R08: 0002 R09: R10: 81129055 R11: 000281128c8d R12: 0004 R13: 0001 R14: 0002 R15: 81043e508028 FS: () GS:81043e4e6a28() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 00361969afa0 CR3: 00201000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process scsi_scan_0 (pid: 678, threadinfo 81043ba28000, task 81043b9ee000) Stack: 81043b9ee8f0 6b6b6b6b6b6b6b6b 81043b9ee000<6>ata1.00: ATAPI: HL-DT-STCD-RW/DVD-ROM GCC-T10N, A102, max UDMA/33 81059139 3ba29c50 0002 81058623 81043b504660 0246 81043e508028 81043b504660 Call Trace: [] __lock_acquire+0x4d7/0xc8e [] mark_held_locks+0x49/0x67 [] lock_acquire+0x5a/0x73 [] kobject_add+0xca/0x194 [] mutex_lock_nested+0x2a1/0x2b0 [] _spin_lock+0x26/0x52 [] kobject_add+0xca/0x194 [] device_add+0x9a/0x56e [] :scsi_mod:scsi_alloc_target+0x2cd/0x343 [] :scsi_mod:__scsi_scan_target+0x66/0x5c6 [] trace_hardirqs_on+0x115/0x138 [] :scsi_mod:scsi_scan_channel+0x45/0x70 [] :scsi_mod:scsi_scan_host_selected+0xd5/0x110 ata1.00: configured for UDMA/33 ata2: port disabled. ignoring. [] :scsi_mod:do_scan_async+0x0/0x152 [] :scsi_mod:do_scan_async+0x14/0x152 [] :scsi_mod:do_scan_async+0x0/0x152 [] kthread+0x47/0x73 [] trace_hardirqs_on_thunk+0x35/0x3a [] child_rip+0xa/0x12 [] restore_args+0x0/0x30 [] menu_reflect+0x0/0x75 [] kthreadd+0x115/0x13a [] kthread+0x0/0x73 [] child_rip+0x0/0x12 Code: 48 85 42 30 0f 85 2e 04 00 00 f0 ff 0d 2c ce 34 00 79 0d f3 RIP [] mark_lock+0x1b/0x472 RSP general protection fault: [2] SMP last sysfs file: /sys/class/firmware/timeout CPU 3 Modules linked in: ata_piix libata dm_snapshot dm_zero dm_mirror dm_mod shpchp megaraid_sas sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 743, comm: insmod Tainted: G D 2.6.24-rc5-mm1 #1 RIP: 0010:[] [] __list_add+0x2b/0x5b RSP: :81043b4319c8 EFLAGS: 00010246 RAX: 6b6b6b6b6b6b6b6b RBX: 81043bec4a68 RCX: RDX: 6b6b6b6b6b6b6b6b RSI: 81043e508000 RDI: 81043bec4a78 RBP: 81043ba794b0 R08: 0002 R09: R10: 81129055 R11: 8102093a R12: 81043bec4aa8 R13: fffe R14: R15: 81043ba79090 FS: 7fc3239ae6f0() GS:81043fc01d48() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0036196d5140 CR3: 00043bb4c000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process insmod (pid: 743, threadinfo 81043b43, task 81043b42e000) Stack: 81043ba79090 81129066 81043ba79098 81043bec48b8 81043bec4aa8 81043bec48b8 811a318c 81043ba79098 81043ba79300 81043bec4a68 81043ba79098 Call Trace: [] kobject_add+0xdb/0x194 [] device_add+0x9a/0x56e [] :scsi_mod:scsi_alloc_target+0x2cd/0x343 [] :scsi_mod:__scsi_add_device+0x5b/0xd9 [] :libata:ata_scsi_scan_host+0xa8/0x28b [] :libata:ata_host_register+0x256/0x280 [] :libata:ata_pci_init_one+0x231/0x285 [] :ata_piix:piix_init_one+0x512/0x53d [] native_sched_clock+0x47/0x70 [] _spin_unlock+0x17/0x20 [] pci_device_probe+0xb3/0xfd [] driver_probe_device+0xee/0x16b [] __driver_attach+0x90/0xcc [] __driver_attach+0x0/0xcc [] __driver_attach+0x0/0xcc [] bus_for_each_dev+0x47/0x72 [] bus_add_driver+0xc4/0x20b [] driver_register+0x59/0xcd [] __pci_register_driver+0x57/0x8b [] :ata_piix:piix_init+0x1e/0x32 [] sys_init_module+0x15e5/0x173b [] system_call+0x7e/0x83 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
Hello, > > > Actually, you may only need these two: > > > > > > > maps4-add-proc-kpagecount-interface.patch > > > > maps4-add-proc-kpageflags-interface.patch > > > > Yes these two were enough, and exporting fs/proc/base.c's > > mem_lseek(). > > > > As hard as I try, I can't reproduce this at all. I tried > > both on my workstation and my niagara boxes. > > That's good to know, I was having a very hard time imagining how the > kpagecount code could be going south. > > > It must be other needle in the 30MB+ -mm haystack. :-( I'm afraid you are wrong. Eariler kernel are affected as well. At reading your mail I was thinking of applying those two patches to 2.6.24-rc5 and do bisection on the rest of -mm series. Unfortunately clean 2.6.24-rc5 with these two patches is affected as well (new processes stuck in D state etc). So I tried vanilla 2.6.23 patched by these two patches (and mem_lseek export from fs/proc/base.c). Now at least I got a trace produced by 'cat /proc/kpagecount' which you can find below. Also, in spite of the oops, the box doesn't get locked (as with -mm) and is still usable. [ 126.060976] TSTATE: 009980009603 TPC: 00428a84 TNPC: 00428a88 Y: Not tainted [ 126.063486] TPC: [ 126.065986] g0: 0009 g1: 04804000 g2: 000f g3: 007204c0 [ 126.068636] g4: 007244c0 g5: f8007f878000 g6: 007204c0 g7: 00724958 [ 126.071232] o0: 0001 o1: 007204c8 o2: 0001 o3: [ 126.073924] o4: 6000 o5: 0078f140 sp: 007239b1 ret_pc: 00428a78 [ 126.076569] RPC: [ 126.079185] l0: 0072 l1: 0002 l2: 0001 l3: 0075d400 [ 126.081934] l4: 0075d400 l5: f80080015b10 l6: f80080005b08 l7: 0001 [ 126.084637] i0: 0001 i1: 00720094 i2: i3: [ 126.087375] i4: 007204c0 i5: 0002 i6: 00723a71 i7: 00665a24 [ 126.090135] I7: [ 145.121228] Unable to handle kernel NULL pointer dereference [ 145.124515] tsk->{mm,active_mm}->context = 0d41 [ 145.127778] tsk->{mm,active_mm}->pgd = f800bd8d2000 [ 145.127801] \|/ \|/ [ 145.127808] "@'/ .. \`@" [ 145.127815] /_| \__/ |_\ [ 145.127821] \__U_/ [ 145.127831] cat(3111): Oops [#1] [ 145.127849] [ 145.127853] = [ 145.127861] [ INFO: inconsistent lock state ] [ 145.127873] 2.6.23 #1 [ 145.127880] - [ 145.127891] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage. [ 145.127906] cat/3111 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 145.127918] (regdump_lock){+...}, at: [<004281d0>] __show_regs+0x18/0x320 [ 145.127951] {in-hardirq-W} state was registered at: [ 145.127960] [<00669780>] _spin_lock+0x28/0x40 [ 145.127983] [<004281d0>] __show_regs+0x18/0x320 [ 145.128000] [<004284e4>] show_regs+0xc/0x20 [ 145.128016] [<005ac9d8>] sysrq_handle_showregs+0x20/0x40 [ 145.128041] [<005ac7fc>] __handle_sysrq+0x84/0x160 [ 145.128060] [<005ac8f8>] handle_sysrq+0x20/0x40 [ 145.128078] [<005a4f08>] kbd_event+0x670/0xb60 [ 145.128110] [<005ea0c0>] input_event+0x1e8/0x560 [ 145.128140] [<005efa2c>] sunkbd_interrupt+0x114/0x140 [ 145.128167] [<005e6270>] serio_interrupt+0x38/0xa0 [ 145.128186] [<005b2e58>] sunsu_kbd_ms_interrupt+0xa0/0x140 [ 145.128212] [<0049f6f8>] handle_IRQ_event+0x20/0x80 [ 145.128251] [<0049f808>] __do_IRQ+0xb0/0x140 [ 145.128268] [<0042f48c>] handler_irq+0x94/0xc0 [ 145.128306] [<00426f30>] sunos_sys_table+0x560/0x728 [ 145.128324] [<00428a78>] cpu_idle+0x20/0xe0 [ 145.128341] [<00665a24>] rest_init+0x6c/0x80 [ 145.128375] [<0076ec24>] start_kernel+0x2ec/0x340 [ 145.128405] [<0066599c>] tlb_fixup_done+0xa0/0xbc [ 145.128425] [<>] 0x8 [ 145.128443] irq event stamp: 1209 [ 145.128451] hardirqs last enabled at (1209): [<00404b74>] __handle_softirq_continue+0x20/0x24 [ 145.128480] hardirqs last disabled at (1207): [<00474494>] __do_softirq+0xbc/0x140 [ 145.128506] softirqs last enabled at (1208): [<004744dc>] __do_softirq+0x104/0x140 [ 145.128526] softirqs last disabled at (1203): [<004745a0>] do_softirq+0x88/0xa0 [ 145.128546] [ 145.128551] other info that might help us debug this: [ 145.128562] no locks held by cat/3111. [ 145.128570] [ 145.128574] stack backtrace: [ 145.128582] Call Trace: [ 145.128590] [004907a0] print_usage_bug+0x148/0x160 [ 145.128624] [004917f4] mark_lock+0x6dc/0x780 [ 145.128641] [0049286c] __lock_acquire+0x734/0x12a0 [ 145.128659]
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Thu, Dec 20, 2007 at 04:53:59AM -0800, David Miller wrote: > From: Matt Mackall <[EMAIL PROTECTED]> > Date: Mon, 17 Dec 2007 08:55:54 -0600 > > > On Sun, Dec 16, 2007 at 10:39:17PM -0800, Andrew Morton wrote: > > Actually, you may only need these two: > > > > > maps4-add-proc-kpagecount-interface.patch > > > maps4-add-proc-kpageflags-interface.patch > > Yes these two were enough, and exporting fs/proc/base.c's > mem_lseek(). > > As hard as I try, I can't reproduce this at all. I tried > both on my workstation and my niagara boxes. That's good to know, I was having a very hard time imagining how the kpagecount code could be going south. > It must be other needle in the 30MB+ -mm haystack. :-( Have we seen a config for the broken machine? Perhaps that'll help us make a guess.. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1
Andrew Morton wrote: > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc5/2.6.24-rc5-mm1/ > > - If something goes wrong with a PCI device's probing or initialisation, try > reverting pci-disable-decoding-during-sizing-of-bars.patch. > > - git-sched was dropped due to breaking suspend-to-RAM. > > - git-block has been restored after having had a few problems > > - git-newsetup.patch was dropped due to conflicts with git-x86 > > - git-perfmon.patch is still dropped for the same reason > > - git-kgdb.patch is still dropped for the same reason > > Andrew, I re-based the for_mm branch at: http://git.kernel.org/?p=linux/kernel/git/jwessel/linux-2.6-kgdb.git;a=shortlog;h=for_mm against the git-x86/mm branch from the x86-git tree. If there are other patch trees I need to pull in and patch against to allow for kgdb to be included into -mm please let me know. I would like to submit another review request for kgdb into the mainline as well as resolve the issues with the -mm tree + kgdb. Thanks, Jason. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} -> {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Thursday, 20 of December 2007, Miles Lane wrote: > On Dec 19, 2007 8:31 PM, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > > > On Thursday, 20 of December 2007, Miles Lane wrote: > > > On Dec 19, 2007 7:09 PM, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > > > > > > On Thursday, 20 of December 2007, Christoph Lameter wrote: > > > > > On Thu, 20 Dec 2007, Rafael J. Wysocki wrote: > > > > > > > > > > > > We could reexport drain_local_pages() again but then I do not > > > > understand > > > > > > > why we would only drain the pages of this processor and not of all > > > > other > > > > > > > processors as well. It seems that software suspend intend was to > > > > flush > > > > > > > them all right? > > > > > > > > > > > > Well, not exactly. We are on one CPU at this point, the others have > > > > been > > > > > > disabled. > > > > > > > > > > Ok so the others are flush. Here is a patch to re-export > > > > > drain_local_pages() again and use it for software suspend: > > > > > > > > > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > > > > > > > > > > --- > > > > > include/linux/gfp.h |1 + > > > > > kernel/power/snapshot.c |2 +- > > > > > mm/page_alloc.c |2 +- > > > > > 3 files changed, 3 insertions(+), 2 deletions(-) > > > > > > > > > > Index: linux-2.6.24-rc5-mm1/kernel/power/snapshot.c > > > > > === > > > > > --- linux-2.6.24-rc5-mm1.orig/kernel/power/snapshot.c 2007-12-19 > > > > > 11:59: > > > > 25.233961700 -0800 > > > > > +++ linux-2.6.24-rc5-mm1/kernel/power/snapshot.c 2007-12-19 > > > > > 15:16: > > > > 34.179661929 -0800 > > > > > @@ -1203,7 +1203,7 @@ asmlinkage int swsusp_save(void) > > > > > > > > > > printk(KERN_INFO "PM: Creating hibernation image: \n"); > > > > > > > > > > - drain_all_pages(); > > > > > + drain_local_pages(NULL); > > > > > nr_pages = count_data_pages(); > > > > > nr_highmem = count_highmem_pages(); > > > > > printk(KERN_INFO "PM: Need to copy %u pages\n", nr_pages + > > > > nr_highmem); > > > > > > > > You've omitted the second instance, right before the copy_data_pages() > > > > call. > > > > > > > > > > I guess I will wait for a revised patch. > > > > There's an Andrew's fix on top of this one in -mm: > > http://marc.info/?l=linux-mm-commits=119810866812965=2 > > > > > > > > > > > Index: linux-2.6.24-rc5-mm1/mm/page_alloc.c > > > > > === > > > > > --- linux-2.6.24-rc5-mm1.orig/mm/page_alloc.c 2007-12-19 12:01: > > > > 00.630421258 -0800 > > > > > +++ linux-2.6.24-rc5-mm1/mm/page_alloc.c 2007-12-19 15:12: > > > > 19.850545818 -0800 > > > > > @@ -930,7 +930,7 @@ static void drain_pages(unsigned int cpu > > > > > /* > > > > > * Spill all of this CPU's per-cpu pages back into the buddy > > > > > allocator. > > > > > */ > > > > > -static void drain_local_pages(void *arg) > > > > > +void drain_local_pages(void *arg) > > > > > { > > > > > drain_pages(smp_processor_id()); > > > > > } > > > > > Index: linux-2.6.24-rc5-mm1/include/linux/gfp.h > > > > > === > > > > > --- linux-2.6.24-rc5-mm1.orig/include/linux/gfp.h 2007-12-19 > > > > > 15:13: > > > > 51.926950065 -0800 > > > > > +++ linux-2.6.24-rc5-mm1/include/linux/gfp.h 2007-12-19 15:16: > > > > 11.951564369 -0800 > > > > > @@ -229,5 +229,6 @@ extern void FASTCALL(free_cold_page(stru > > > > > void page_alloc_init(void); > > > > > void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); > > > > > void drain_all_pages(void); > > > > > +void drain_local_pages(void *dummy); > > > > > > > > > > #endif /* __LINUX_GFP_H */ > > > > > > > > I applied Christoph and Andrew's patches and recompiled. I suspended > to disk and to ram several times and all looks good. OK, thanks for testing! Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
From: Matt Mackall <[EMAIL PROTECTED]> Date: Mon, 17 Dec 2007 08:55:54 -0600 > On Sun, Dec 16, 2007 at 10:39:17PM -0800, Andrew Morton wrote: > Actually, you may only need these two: > > > maps4-add-proc-kpagecount-interface.patch > > maps4-add-proc-kpageflags-interface.patch Yes these two were enough, and exporting fs/proc/base.c's mem_lseek(). As hard as I try, I can't reproduce this at all. I tried both on my workstation and my niagara boxes. It must be other needle in the 30MB+ -mm haystack. :-( -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
From: Matt Mackall [EMAIL PROTECTED] Date: Mon, 17 Dec 2007 08:55:54 -0600 On Sun, Dec 16, 2007 at 10:39:17PM -0800, Andrew Morton wrote: Actually, you may only need these two: maps4-add-proc-kpagecount-interface.patch maps4-add-proc-kpageflags-interface.patch Yes these two were enough, and exporting fs/proc/base.c's mem_lseek(). As hard as I try, I can't reproduce this at all. I tried both on my workstation and my niagara boxes. It must be other needle in the 30MB+ -mm haystack. :-( -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} - {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Thursday, 20 of December 2007, Miles Lane wrote: On Dec 19, 2007 8:31 PM, Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 20 of December 2007, Miles Lane wrote: On Dec 19, 2007 7:09 PM, Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 20 of December 2007, Christoph Lameter wrote: On Thu, 20 Dec 2007, Rafael J. Wysocki wrote: We could reexport drain_local_pages() again but then I do not understand why we would only drain the pages of this processor and not of all other processors as well. It seems that software suspend intend was to flush them all right? Well, not exactly. We are on one CPU at this point, the others have been disabled. Ok so the others are flush. Here is a patch to re-export drain_local_pages() again and use it for software suspend: Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/linux/gfp.h |1 + kernel/power/snapshot.c |2 +- mm/page_alloc.c |2 +- 3 files changed, 3 insertions(+), 2 deletions(-) Index: linux-2.6.24-rc5-mm1/kernel/power/snapshot.c === --- linux-2.6.24-rc5-mm1.orig/kernel/power/snapshot.c 2007-12-19 11:59: 25.233961700 -0800 +++ linux-2.6.24-rc5-mm1/kernel/power/snapshot.c 2007-12-19 15:16: 34.179661929 -0800 @@ -1203,7 +1203,7 @@ asmlinkage int swsusp_save(void) printk(KERN_INFO PM: Creating hibernation image: \n); - drain_all_pages(); + drain_local_pages(NULL); nr_pages = count_data_pages(); nr_highmem = count_highmem_pages(); printk(KERN_INFO PM: Need to copy %u pages\n, nr_pages + nr_highmem); You've omitted the second instance, right before the copy_data_pages() call. I guess I will wait for a revised patch. There's an Andrew's fix on top of this one in -mm: http://marc.info/?l=linux-mm-commitsm=119810866812965w=2 Index: linux-2.6.24-rc5-mm1/mm/page_alloc.c === --- linux-2.6.24-rc5-mm1.orig/mm/page_alloc.c 2007-12-19 12:01: 00.630421258 -0800 +++ linux-2.6.24-rc5-mm1/mm/page_alloc.c 2007-12-19 15:12: 19.850545818 -0800 @@ -930,7 +930,7 @@ static void drain_pages(unsigned int cpu /* * Spill all of this CPU's per-cpu pages back into the buddy allocator. */ -static void drain_local_pages(void *arg) +void drain_local_pages(void *arg) { drain_pages(smp_processor_id()); } Index: linux-2.6.24-rc5-mm1/include/linux/gfp.h === --- linux-2.6.24-rc5-mm1.orig/include/linux/gfp.h 2007-12-19 15:13: 51.926950065 -0800 +++ linux-2.6.24-rc5-mm1/include/linux/gfp.h 2007-12-19 15:16: 11.951564369 -0800 @@ -229,5 +229,6 @@ extern void FASTCALL(free_cold_page(stru void page_alloc_init(void); void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); void drain_all_pages(void); +void drain_local_pages(void *dummy); #endif /* __LINUX_GFP_H */ I applied Christoph and Andrew's patches and recompiled. I suspended to disk and to ram several times and all looks good. OK, thanks for testing! Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1
Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc5/2.6.24-rc5-mm1/ - If something goes wrong with a PCI device's probing or initialisation, try reverting pci-disable-decoding-during-sizing-of-bars.patch. - git-sched was dropped due to breaking suspend-to-RAM. - git-block has been restored after having had a few problems - git-newsetup.patch was dropped due to conflicts with git-x86 - git-perfmon.patch is still dropped for the same reason - git-kgdb.patch is still dropped for the same reason Andrew, I re-based the for_mm branch at: http://git.kernel.org/?p=linux/kernel/git/jwessel/linux-2.6-kgdb.git;a=shortlog;h=for_mm against the git-x86/mm branch from the x86-git tree. If there are other patch trees I need to pull in and patch against to allow for kgdb to be included into -mm please let me know. I would like to submit another review request for kgdb into the mainline as well as resolve the issues with the -mm tree + kgdb. Thanks, Jason. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Thu, Dec 20, 2007 at 04:53:59AM -0800, David Miller wrote: From: Matt Mackall [EMAIL PROTECTED] Date: Mon, 17 Dec 2007 08:55:54 -0600 On Sun, Dec 16, 2007 at 10:39:17PM -0800, Andrew Morton wrote: Actually, you may only need these two: maps4-add-proc-kpagecount-interface.patch maps4-add-proc-kpageflags-interface.patch Yes these two were enough, and exporting fs/proc/base.c's mem_lseek(). As hard as I try, I can't reproduce this at all. I tried both on my workstation and my niagara boxes. That's good to know, I was having a very hard time imagining how the kpagecount code could be going south. It must be other needle in the 30MB+ -mm haystack. :-( Have we seen a config for the broken machine? Perhaps that'll help us make a guess.. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
Hello, Actually, you may only need these two: maps4-add-proc-kpagecount-interface.patch maps4-add-proc-kpageflags-interface.patch Yes these two were enough, and exporting fs/proc/base.c's mem_lseek(). As hard as I try, I can't reproduce this at all. I tried both on my workstation and my niagara boxes. That's good to know, I was having a very hard time imagining how the kpagecount code could be going south. It must be other needle in the 30MB+ -mm haystack. :-( I'm afraid you are wrong. Eariler kernel are affected as well. At reading your mail I was thinking of applying those two patches to 2.6.24-rc5 and do bisection on the rest of -mm series. Unfortunately clean 2.6.24-rc5 with these two patches is affected as well (new processes stuck in D state etc). So I tried vanilla 2.6.23 patched by these two patches (and mem_lseek export from fs/proc/base.c). Now at least I got a trace produced by 'cat /proc/kpagecount' which you can find below. Also, in spite of the oops, the box doesn't get locked (as with -mm) and is still usable. [ 126.060976] TSTATE: 009980009603 TPC: 00428a84 TNPC: 00428a88 Y: Not tainted [ 126.063486] TPC: cpu_idle+0x2c/0xe0 [ 126.065986] g0: 0009 g1: 04804000 g2: 000f g3: 007204c0 [ 126.068636] g4: 007244c0 g5: f8007f878000 g6: 007204c0 g7: 00724958 [ 126.071232] o0: 0001 o1: 007204c8 o2: 0001 o3: [ 126.073924] o4: 6000 o5: 0078f140 sp: 007239b1 ret_pc: 00428a78 [ 126.076569] RPC: cpu_idle+0x20/0xe0 [ 126.079185] l0: 0072 l1: 0002 l2: 0001 l3: 0075d400 [ 126.081934] l4: 0075d400 l5: f80080015b10 l6: f80080005b08 l7: 0001 [ 126.084637] i0: 0001 i1: 00720094 i2: i3: [ 126.087375] i4: 007204c0 i5: 0002 i6: 00723a71 i7: 00665a24 [ 126.090135] I7: rest_init+0x6c/0x80 [ 145.121228] Unable to handle kernel NULL pointer dereference [ 145.124515] tsk-{mm,active_mm}-context = 0d41 [ 145.127778] tsk-{mm,active_mm}-pgd = f800bd8d2000 [ 145.127801] \|/ \|/ [ 145.127808] @'/ .. \`@ [ 145.127815] /_| \__/ |_\ [ 145.127821] \__U_/ [ 145.127831] cat(3111): Oops [#1] [ 145.127849] [ 145.127853] = [ 145.127861] [ INFO: inconsistent lock state ] [ 145.127873] 2.6.23 #1 [ 145.127880] - [ 145.127891] inconsistent {in-hardirq-W} - {hardirq-on-W} usage. [ 145.127906] cat/3111 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 145.127918] (regdump_lock){+...}, at: [004281d0] __show_regs+0x18/0x320 [ 145.127951] {in-hardirq-W} state was registered at: [ 145.127960] [00669780] _spin_lock+0x28/0x40 [ 145.127983] [004281d0] __show_regs+0x18/0x320 [ 145.128000] [004284e4] show_regs+0xc/0x20 [ 145.128016] [005ac9d8] sysrq_handle_showregs+0x20/0x40 [ 145.128041] [005ac7fc] __handle_sysrq+0x84/0x160 [ 145.128060] [005ac8f8] handle_sysrq+0x20/0x40 [ 145.128078] [005a4f08] kbd_event+0x670/0xb60 [ 145.128110] [005ea0c0] input_event+0x1e8/0x560 [ 145.128140] [005efa2c] sunkbd_interrupt+0x114/0x140 [ 145.128167] [005e6270] serio_interrupt+0x38/0xa0 [ 145.128186] [005b2e58] sunsu_kbd_ms_interrupt+0xa0/0x140 [ 145.128212] [0049f6f8] handle_IRQ_event+0x20/0x80 [ 145.128251] [0049f808] __do_IRQ+0xb0/0x140 [ 145.128268] [0042f48c] handler_irq+0x94/0xc0 [ 145.128306] [00426f30] sunos_sys_table+0x560/0x728 [ 145.128324] [00428a78] cpu_idle+0x20/0xe0 [ 145.128341] [00665a24] rest_init+0x6c/0x80 [ 145.128375] [0076ec24] start_kernel+0x2ec/0x340 [ 145.128405] [0066599c] tlb_fixup_done+0xa0/0xbc [ 145.128425] [] 0x8 [ 145.128443] irq event stamp: 1209 [ 145.128451] hardirqs last enabled at (1209): [00404b74] __handle_softirq_continue+0x20/0x24 [ 145.128480] hardirqs last disabled at (1207): [00474494] __do_softirq+0xbc/0x140 [ 145.128506] softirqs last enabled at (1208): [004744dc] __do_softirq+0x104/0x140 [ 145.128526] softirqs last disabled at (1203): [004745a0] do_softirq+0x88/0xa0 [ 145.128546] [ 145.128551] other info that might help us debug this: [ 145.128562] no locks held by cat/3111. [ 145.128570] [ 145.128574] stack backtrace: [ 145.128582] Call Trace: [ 145.128590] [004907a0] print_usage_bug+0x148/0x160 [ 145.128624] [004917f4] mark_lock+0x6dc/0x780 [ 145.128641] [0049286c] __lock_acquire+0x734/0x12a0 [ 145.128659] [00493430] lock_acquire+0x58/0x80 [
Re: 2.6.24-rc5-mm1 - SCSI/blkdev probing hang
On Thu, 13 Dec 2007 02:40:50 -0800 Andrew Morton [EMAIL PROTECTED] wrote: 2.6.24-rc5-mm1 seems to have a hang related to the SCSI or block device probing code. This is on a dual quad-core x86-64 system with megaraid_sas controller. scsi 0:2:0:0: Direct-Access DELL PERC 5/i 1.03 PQ: 0 ANSI: 5 general protection fault: [1] SMP last sysfs file: /sys/class/firmware/timeout CPU 7 Modules linked in: ata_piix libata dm_snapshot dm_zero dm_mirror dm_mod shpchp megaraid_sas sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 678, comm: scsi_scan_0 Not tainted 2.6.24-rc5-mm1 #1 RIP: 0010:[81058183] [81058183] mark_lock+0x1b/0x472 RSP: 0018:81043ba29c20 EFLAGS: 00010002 RAX: 0010 RBX: 81043b9ee8f0 RCX: 81043b9ee804 RDX: 6b6b6b6b6b6b6b6b RSI: 81043b9ee8f0 RDI: 81043b9ee000 RBP: 81043b9ee000 R08: 0002 R09: R10: 81129055 R11: 000281128c8d R12: 0004 R13: 0001 R14: 0002 R15: 81043e508028 FS: () GS:81043e4e6a28() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 00361969afa0 CR3: 00201000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process scsi_scan_0 (pid: 678, threadinfo 81043ba28000, task 81043b9ee000) Stack: 81043b9ee8f0 6b6b6b6b6b6b6b6b 81043b9ee0006ata1.00: ATAPI: HL-DT-STCD-RW/DVD-ROM GCC-T10N, A102, max UDMA/33 81059139 3ba29c50 0002 81058623 81043b504660 0246 81043e508028 81043b504660 Call Trace: [81059139] __lock_acquire+0x4d7/0xc8e [81058623] mark_held_locks+0x49/0x67 [81059ce2] lock_acquire+0x5a/0x73 [81129055] kobject_add+0xca/0x194 [8126d56c] mutex_lock_nested+0x2a1/0x2b0 [8126e997] _spin_lock+0x26/0x52 [81129055] kobject_add+0xca/0x194 [811a318c] device_add+0x9a/0x56e [8805c327] :scsi_mod:scsi_alloc_target+0x2cd/0x343 [8805c492] :scsi_mod:__scsi_scan_target+0x66/0x5c6 [810587f0] trace_hardirqs_on+0x115/0x138 [8805ca37] :scsi_mod:scsi_scan_channel+0x45/0x70 [8805cb37] :scsi_mod:scsi_scan_host_selected+0xd5/0x110 ata1.00: configured for UDMA/33 ata2: port disabled. ignoring. [8805cbe5] :scsi_mod:do_scan_async+0x0/0x152 [8805cbf9] :scsi_mod:do_scan_async+0x14/0x152 [8805cbe5] :scsi_mod:do_scan_async+0x0/0x152 [8104d4e8] kthread+0x47/0x73 [8126e418] trace_hardirqs_on_thunk+0x35/0x3a [8100cee8] child_rip+0xa/0x12 [8100c5ff] restore_args+0x0/0x30 [811e0908] menu_reflect+0x0/0x75 [8104d371] kthreadd+0x115/0x13a [8104d4a1] kthread+0x0/0x73 [8100cede] child_rip+0x0/0x12 Code: 48 85 42 30 0f 85 2e 04 00 00 f0 ff 0d 2c ce 34 00 79 0d f3 RIP [81058183] mark_lock+0x1b/0x472 RSP 81043ba29c20 general protection fault: [2] SMP last sysfs file: /sys/class/firmware/timeout CPU 3 Modules linked in: ata_piix libata dm_snapshot dm_zero dm_mirror dm_mod shpchp megaraid_sas sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 743, comm: insmod Tainted: G D 2.6.24-rc5-mm1 #1 RIP: 0010:[81130cca] [81130cca] __list_add+0x2b/0x5b RSP: :81043b4319c8 EFLAGS: 00010246 RAX: 6b6b6b6b6b6b6b6b RBX: 81043bec4a68 RCX: RDX: 6b6b6b6b6b6b6b6b RSI: 81043e508000 RDI: 81043bec4a78 RBP: 81043ba794b0 R08: 0002 R09: R10: 81129055 R11: 8102093a R12: 81043bec4aa8 R13: fffe R14: R15: 81043ba79090 FS: 7fc3239ae6f0() GS:81043fc01d48() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0036196d5140 CR3: 00043bb4c000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process insmod (pid: 743, threadinfo 81043b43, task 81043b42e000) Stack: 81043ba79090 81129066 81043ba79098 81043bec48b8 81043bec4aa8 81043bec48b8 811a318c 81043ba79098 81043ba79300 81043bec4a68 81043ba79098 Call Trace: [81129066] kobject_add+0xdb/0x194 [811a318c] device_add+0x9a/0x56e [8805c327] :scsi_mod:scsi_alloc_target+0x2cd/0x343 [8805cd92] :scsi_mod:__scsi_add_device+0x5b/0xd9 [880c78d4] :libata:ata_scsi_scan_host+0xa8/0x28b [880c4698] :libata:ata_host_register+0x256/0x280 [880c9bfd] :libata:ata_pci_init_one+0x231/0x285 [880e38cc] :ata_piix:piix_init_one+0x512/0x53d [81012f31] native_sched_clock+0x47/0x70 [8126e8ae]
Re: 2.6.24-rc5-mm1 - SCSI/blkdev probing hang
On Thu, 20 Dec 2007 15:57:45 -0500 Rik van Riel [EMAIL PROTECTED] wrote: 2.6.24-rc5-mm1 seems to have a hang related to the SCSI or block device probing code. This is on a dual quad-core x86-64 system with megaraid_sas controller. scsi 0:2:0:0: Direct-Access DELL PERC 5/i 1.03 PQ: 0 ANSI: 5 general protection fault: [1] SMP last sysfs file: /sys/class/firmware/timeout CPU 7 Modules linked in: ata_piix libata dm_snapshot dm_zero dm_mirror dm_mod shpchp megaraid_sas sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 678, comm: scsi_scan_0 Not tainted 2.6.24-rc5-mm1 #1 RIP: 0010:[81058183] [81058183] mark_lock+0x1b/0x472 Could be that someone passed a garbage pointer into lockdep. RSP: 0018:81043ba29c20 EFLAGS: 00010002 RAX: 0010 RBX: 81043b9ee8f0 RCX: 81043b9ee804 RDX: 6b6b6b6b6b6b6b6b RSI: 81043b9ee8f0 RDI: 81043b9ee000 RBP: 81043b9ee000 R08: 0002 R09: R10: 81129055 R11: 000281128c8d R12: 0004 R13: 0001 R14: 0002 R15: 81043e508028 FS: () GS:81043e4e6a28() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 00361969afa0 CR3: 00201000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process scsi_scan_0 (pid: 678, threadinfo 81043ba28000, task 81043b9ee000) Stack: 81043b9ee8f0 6b6b6b6b6b6b6b6b 81043b9ee0006ata1.00: ATAPI: HL-DT-STCD-RW/DVD-ROM GCC-T10N, A102, max UDMA/33 81059139 3ba29c50 0002 81058623 81043b504660 0246 81043e508028 81043b504660 Call Trace: [81059139] __lock_acquire+0x4d7/0xc8e [81058623] mark_held_locks+0x49/0x67 [81059ce2] lock_acquire+0x5a/0x73 [81129055] kobject_add+0xca/0x194 [8126d56c] mutex_lock_nested+0x2a1/0x2b0 [8126e997] _spin_lock+0x26/0x52 [81129055] kobject_add+0xca/0x194 [811a318c] device_add+0x9a/0x56e [8805c327] :scsi_mod:scsi_alloc_target+0x2cd/0x343 [8805c492] :scsi_mod:__scsi_scan_target+0x66/0x5c6 [810587f0] trace_hardirqs_on+0x115/0x138 [8805ca37] :scsi_mod:scsi_scan_channel+0x45/0x70 [8805cb37] :scsi_mod:scsi_scan_host_selected+0xd5/0x110 ata1.00: configured for UDMA/33 ata2: port disabled. ignoring. [8805cbe5] :scsi_mod:do_scan_async+0x0/0x152 [8805cbf9] :scsi_mod:do_scan_async+0x14/0x152 [8805cbe5] :scsi_mod:do_scan_async+0x0/0x152 [8104d4e8] kthread+0x47/0x73 [8126e418] trace_hardirqs_on_thunk+0x35/0x3a [8100cee8] child_rip+0xa/0x12 [8100c5ff] restore_args+0x0/0x30 [811e0908] menu_reflect+0x0/0x75 [8104d371] kthreadd+0x115/0x13a [8104d4a1] kthread+0x0/0x73 [8100cede] child_rip+0x0/0x12 It could be a scsi problem, or it could be all the kobject changes in Greg's driver tree. Or a combination of the two. Don't know, sorry. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1
On Thu, 20 Dec 2007 10:55:51 -0600 Jason Wessel [EMAIL PROTECTED] wrote: Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc5/2.6.24-rc5-mm1/ - If something goes wrong with a PCI device's probing or initialisation, try reverting pci-disable-decoding-during-sizing-of-bars.patch. - git-sched was dropped due to breaking suspend-to-RAM. - git-block has been restored after having had a few problems - git-newsetup.patch was dropped due to conflicts with git-x86 - git-perfmon.patch is still dropped for the same reason - git-kgdb.patch is still dropped for the same reason Andrew, I re-based the for_mm branch at: http://git.kernel.org/?p=linux/kernel/git/jwessel/linux-2.6-kgdb.git;a=shortlog;h=for_mm against the git-x86/mm branch from the x86-git tree. If there are other patch trees I need to pull in and patch against to allow for kgdb to be included into -mm please let me know. The x86 merge worked OK. Here's what it looks like: patching file Documentation/DocBook/Makefile Hunk #1 FAILED at 11. 1 out of 1 hunk FAILED -- saving rejects to file Documentation/DocBook/Makefile.rej patching file Documentation/DocBook/kgdb.tmpl patching file Documentation/kernel-parameters.txt Hunk #1 succeeded at 816 (offset 7 lines). patching file MAINTAINERS Hunk #1 succeeded at 2279 (offset 52 lines). patching file Makefile patching file arch/arm/kernel/Makefile patching file arch/arm/kernel/kgdb-jmp.S patching file arch/arm/kernel/kgdb.c patching file arch/arm/kernel/setup.c patching file arch/arm/kernel/traps.c patching file arch/arm/mach-ixp2000/core.c patching file arch/arm/mach-ixp2000/ixdp2x01.c patching file arch/arm/mach-ixp4xx/coyote-setup.c patching file arch/arm/mach-ixp4xx/ixdp425-setup.c patching file arch/arm/mach-omap1/serial.c patching file arch/arm/mach-omap2/serial.c patching file arch/arm/mach-pnx4008/core.c patching file arch/arm/mach-pxa/Makefile Hunk #1 FAILED at 43. 1 out of 1 hunk FAILED -- saving rejects to file arch/arm/mach-pxa/Makefile.rej patching file arch/arm/mach-pxa/kgdb-serial.c patching file arch/arm/mach-versatile/core.c patching file arch/arm/mm/extable.c patching file arch/ia64/kernel/Makefile patching file arch/ia64/kernel/kgdb-jmp.S patching file arch/ia64/kernel/kgdb.c patching file arch/ia64/kernel/smp.c patching file arch/ia64/kernel/traps.c Hunk #1 FAILED at 155. 1 out of 1 hunk FAILED -- saving rejects to file arch/ia64/kernel/traps.c.rej patching file arch/ia64/mm/extable.c patching file arch/ia64/mm/fault.c patching file arch/mips/Kconfig Hunk #2 succeeded at 323 (offset -6 lines). Hunk #4 succeeded at 419 (offset -7 lines). Hunk #5 succeeded at 531 (offset 21 lines). Hunk #6 succeeded at 608 (offset -21 lines). Hunk #7 succeeded at 670 (offset 21 lines). Hunk #8 succeeded at 914 (offset -24 lines). patching file arch/mips/Kconfig.debug patching file arch/mips/au1000/common/Makefile patching file arch/mips/au1000/common/dbg_io.c patching file arch/mips/basler/excite/Makefile patching file arch/mips/basler/excite/excite_dbg_io.c patching file arch/mips/basler/excite/excite_irq.c patching file arch/mips/basler/excite/excite_setup.c patching file arch/mips/jmr3927/rbhma3100/Makefile patching file arch/mips/jmr3927/rbhma3100/kgdb_io.c patching file arch/mips/kernel/Makefile patching file arch/mips/kernel/gdb-low.S patching file arch/mips/kernel/gdb-stub.c patching file arch/mips/kernel/irq.c patching file arch/mips/kernel/kgdb-jmp.c patching file arch/mips/kernel/kgdb-setjmp.S patching file arch/mips/kernel/kgdb.c patching file arch/mips/kernel/kgdb_handler.S patching file arch/mips/kernel/traps.c patching file arch/mips/mips-boards/atlas/Makefile patching file arch/mips/mips-boards/atlas/atlas_gdb.c patching file arch/mips/mips-boards/atlas/atlas_setup.c patching file arch/mips/mips-boards/generic/Makefile patching file arch/mips/mips-boards/generic/gdb_hook.c patching file arch/mips/mips-boards/generic/init.c patching file arch/mips/mips-boards/malta/malta_setup.c patching file arch/mips/mm/extable.c patching file arch/mips/pci/fixup-atlas.c patching file arch/mips/philips/pnx8550/common/Makefile patching file arch/mips/philips/pnx8550/common/gdb_hook.c patching file arch/mips/philips/pnx8550/common/setup.c patching file arch/mips/pmc-sierra/yosemite/Makefile patching file arch/mips/pmc-sierra/yosemite/dbg_io.c patching file arch/mips/pmc-sierra/yosemite/irq.c patching file arch/mips/sgi-ip22/ip22-setup.c patching file arch/mips/sgi-ip27/Makefile patching file arch/mips/sgi-ip27/ip27-dbgio.c patching file arch/mips/sibyte/bcm1480/irq.c patching file arch/mips/sibyte/cfe/setup.c Hunk #3 succeeded at 298 (offset -3 lines). patching file arch/mips/sibyte/sb1250/irq.c patching file arch/mips/sibyte/sb1250/kgdb_sibyte.c patching file arch/mips/sibyte/swarm/Makefile patching file arch/mips/sibyte/swarm/dbg_io.c patching file arch/mips/tx4927/common/Makefile Hunk #1 succeeded at 9 with fuzz 1. patching
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
From: Mariusz Kozlowski [EMAIL PROTECTED] Date: Thu, 20 Dec 2007 20:47:55 +0100 [ 145.128915] TSTATE: 004411009603 TPC: 005119ac TNPC: 005119b0 Y: Not tainted [ 145.128940] TPC: kpagecount_read+0x94/0xe0 My suspicion at this point is that with certain RAM layouts, simply iterating over PFN's is simply not working out. pfn_to_page() seems to be doing no range checking, and with sparsemem vmemmap, which sparc64 always uses, this can be problematic. It just blindly goes vmemmap + pfn which is asking for trouble, in particular when the physical RAM layout really is sparse. Maybe it's enough to add a pfn_valid() check here? If pfn_valid() means there is a vmemmap translation setup for that page struct too, it would work. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Thu, Dec 20, 2007 at 04:17:26PM -0800, David Miller wrote: From: Mariusz Kozlowski [EMAIL PROTECTED] Date: Thu, 20 Dec 2007 20:47:55 +0100 [ 145.128915] TSTATE: 004411009603 TPC: 005119ac TNPC: 005119b0 Y: Not tainted [ 145.128940] TPC: kpagecount_read+0x94/0xe0 My suspicion at this point is that with certain RAM layouts, simply iterating over PFN's is simply not working out. That was my original suspicion, which is why I asked Mariusz to effectively comment out the actual PFN lookup up-thread. I didn't send him a patch to do that, so I guess my instructions on how to hack it may have been misunderstood. pfn_to_page() seems to be doing no range checking, and with sparsemem vmemmap, which sparc64 always uses, this can be problematic. It just blindly goes vmemmap + pfn which is asking for trouble, in particular when the physical RAM layout really is sparse. Maybe it's enough to add a pfn_valid() check here? If pfn_valid() means there is a vmemmap translation setup for that page struct too, it would work. Here's a test patch: Index: mm/fs/proc/proc_misc.c === --- mm.orig/fs/proc/proc_misc.c 2007-12-20 19:04:35.0 -0600 +++ mm/fs/proc/proc_misc.c 2007-12-20 19:06:01.0 -0600 @@ -707,7 +707,10 @@ static ssize_t kpagecount_read(struct fi return -EIO; while (count 0) { - ppage = pfn_to_page(pfn++); + ppage = 0; + if (pfn_valid(pfn)) + ppage = pfn_to_page(pfn); + pfn++; if (!ppage) pcount = 0; else @@ -773,7 +776,10 @@ static ssize_t kpageflags_read(struct fi return -EIO; while (count 0) { - ppage = pfn_to_page(pfn++); + ppage = 0; + if (pfn_valid(pfn)) + ppage = pfn_to_page(pfn); + pfn++; if (!ppage) kflags = 0; else -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
From: Matt Mackall [EMAIL PROTECTED] Date: Thu, 20 Dec 2007 19:06:55 -0600 @@ -707,7 +707,10 @@ static ssize_t kpagecount_read(struct fi return -EIO; while (count 0) { - ppage = pfn_to_page(pfn++); + ppage = 0; + if (pfn_valid(pfn)) + ppage = pfn_to_page(pfn); + pfn++; if (!ppage) pcount = 0; else Yes that should work, please use NULL in the final version of the patch instead of 0 so that sparse is happy. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1
On Dec 20, 2007 11:34 AM, Alan Stern <[EMAIL PROTECTED]> wrote: > Note carefully. This: > > > > > 2. on 2.6.24-rc5 kernel reports only the part 1, after try mount the > > > > disk it reports the part 2 and mount the partition as rw > > contradicts this: > > > > > 3. on 2.6.24-rc5 kernel reports only the part 1, after try mount the > > > > disk it just mount the partition as ro with nothing more messages. Oh, sorry. It's a typo. should be 2.6.24-rc5-mm1 > > So which is correct? > > > Hi, Alan > > > > I'm sure about my post. > > But your post contradicts itself. It can't be correct. > > > I'm not so famillar with usb. > > It looks weird. Seems that my device will be firstly recoganized as a > > mp3 player and then a usb storage, so the system will report part 1 & > > part 2 under previous kernels. > > I think those "part 2" messages aren't caused by the kernel at all, but > instead by some program running on your computer. You could try > booting into single-user mode and see if the behavior changes. No doubt for me. Under osx plugin this device will popup a dialog(I don't remember the content), after press ok then the disk icon go away, and then being remount again. > > Also there's no question -- the device does behave strangely. It > shouldn't change the write-protect setting all by itself. Yes, I think so too. > > Alan Stern > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} -> {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Dec 19, 2007 8:31 PM, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > On Thursday, 20 of December 2007, Miles Lane wrote: > > On Dec 19, 2007 7:09 PM, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > > > > On Thursday, 20 of December 2007, Christoph Lameter wrote: > > > > On Thu, 20 Dec 2007, Rafael J. Wysocki wrote: > > > > > > > > > > We could reexport drain_local_pages() again but then I do not > > > understand > > > > > > why we would only drain the pages of this processor and not of all > > > other > > > > > > processors as well. It seems that software suspend intend was to > > > flush > > > > > > them all right? > > > > > > > > > > Well, not exactly. We are on one CPU at this point, the others have > > > been > > > > > disabled. > > > > > > > > Ok so the others are flush. Here is a patch to re-export > > > > drain_local_pages() again and use it for software suspend: > > > > > > > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > > > > > > > > --- > > > > include/linux/gfp.h |1 + > > > > kernel/power/snapshot.c |2 +- > > > > mm/page_alloc.c |2 +- > > > > 3 files changed, 3 insertions(+), 2 deletions(-) > > > > > > > > Index: linux-2.6.24-rc5-mm1/kernel/power/snapshot.c > > > > === > > > > --- linux-2.6.24-rc5-mm1.orig/kernel/power/snapshot.c 2007-12-19 11:59: > > > 25.233961700 -0800 > > > > +++ linux-2.6.24-rc5-mm1/kernel/power/snapshot.c 2007-12-19 15:16: > > > 34.179661929 -0800 > > > > @@ -1203,7 +1203,7 @@ asmlinkage int swsusp_save(void) > > > > > > > > printk(KERN_INFO "PM: Creating hibernation image: \n"); > > > > > > > > - drain_all_pages(); > > > > + drain_local_pages(NULL); > > > > nr_pages = count_data_pages(); > > > > nr_highmem = count_highmem_pages(); > > > > printk(KERN_INFO "PM: Need to copy %u pages\n", nr_pages + > > > nr_highmem); > > > > > > You've omitted the second instance, right before the copy_data_pages() > > > call. > > > > > > > I guess I will wait for a revised patch. > > There's an Andrew's fix on top of this one in -mm: > http://marc.info/?l=linux-mm-commits=119810866812965=2 > > > > > > > Index: linux-2.6.24-rc5-mm1/mm/page_alloc.c > > > > === > > > > --- linux-2.6.24-rc5-mm1.orig/mm/page_alloc.c 2007-12-19 12:01: > > > 00.630421258 -0800 > > > > +++ linux-2.6.24-rc5-mm1/mm/page_alloc.c 2007-12-19 15:12: > > > 19.850545818 -0800 > > > > @@ -930,7 +930,7 @@ static void drain_pages(unsigned int cpu > > > > /* > > > > * Spill all of this CPU's per-cpu pages back into the buddy allocator. > > > > */ > > > > -static void drain_local_pages(void *arg) > > > > +void drain_local_pages(void *arg) > > > > { > > > > drain_pages(smp_processor_id()); > > > > } > > > > Index: linux-2.6.24-rc5-mm1/include/linux/gfp.h > > > > === > > > > --- linux-2.6.24-rc5-mm1.orig/include/linux/gfp.h 2007-12-19 15:13: > > > 51.926950065 -0800 > > > > +++ linux-2.6.24-rc5-mm1/include/linux/gfp.h 2007-12-19 15:16: > > > 11.951564369 -0800 > > > > @@ -229,5 +229,6 @@ extern void FASTCALL(free_cold_page(stru > > > > void page_alloc_init(void); > > > > void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); > > > > void drain_all_pages(void); > > > > +void drain_local_pages(void *dummy); > > > > > > > > #endif /* __LINUX_GFP_H */ > > > > I applied Christoph and Andrew's patches and recompiled. I suspended to disk and to ram several times and all looks good. Miles -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1
Note carefully. This: > > > 2. on 2.6.24-rc5 kernel reports only the part 1, after try mount the > > > disk it reports the part 2 and mount the partition as rw contradicts this: > > > 3. on 2.6.24-rc5 kernel reports only the part 1, after try mount the > > > disk it just mount the partition as ro with nothing more messages. So which is correct? > Hi, Alan > > I'm sure about my post. But your post contradicts itself. It can't be correct. > I'm not so famillar with usb. > It looks weird. Seems that my device will be firstly recoganized as a > mp3 player and then a usb storage, so the system will report part 1 & > part 2 under previous kernels. I think those "part 2" messages aren't caused by the kernel at all, but instead by some program running on your computer. You could try booting into single-user mode and see if the behavior changes. Also there's no question -- the device does behave strangely. It shouldn't change the write-protect setting all by itself. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} -> {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Thursday, 20 of December 2007, Miles Lane wrote: > On Dec 19, 2007 7:09 PM, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > > On Thursday, 20 of December 2007, Christoph Lameter wrote: > > > On Thu, 20 Dec 2007, Rafael J. Wysocki wrote: > > > > > > > > We could reexport drain_local_pages() again but then I do not > > understand > > > > > why we would only drain the pages of this processor and not of all > > other > > > > > processors as well. It seems that software suspend intend was to > > flush > > > > > them all right? > > > > > > > > Well, not exactly. We are on one CPU at this point, the others have > > been > > > > disabled. > > > > > > Ok so the others are flush. Here is a patch to re-export > > > drain_local_pages() again and use it for software suspend: > > > > > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > > > > > > --- > > > include/linux/gfp.h |1 + > > > kernel/power/snapshot.c |2 +- > > > mm/page_alloc.c |2 +- > > > 3 files changed, 3 insertions(+), 2 deletions(-) > > > > > > Index: linux-2.6.24-rc5-mm1/kernel/power/snapshot.c > > > === > > > --- linux-2.6.24-rc5-mm1.orig/kernel/power/snapshot.c 2007-12-19 11:59: > > 25.233961700 -0800 > > > +++ linux-2.6.24-rc5-mm1/kernel/power/snapshot.c 2007-12-19 15:16: > > 34.179661929 -0800 > > > @@ -1203,7 +1203,7 @@ asmlinkage int swsusp_save(void) > > > > > > printk(KERN_INFO "PM: Creating hibernation image: \n"); > > > > > > - drain_all_pages(); > > > + drain_local_pages(NULL); > > > nr_pages = count_data_pages(); > > > nr_highmem = count_highmem_pages(); > > > printk(KERN_INFO "PM: Need to copy %u pages\n", nr_pages + > > nr_highmem); > > > > You've omitted the second instance, right before the copy_data_pages() > > call. > > > > I guess I will wait for a revised patch. There's an Andrew's fix on top of this one in -mm: http://marc.info/?l=linux-mm-commits=119810866812965=2 > > > Index: linux-2.6.24-rc5-mm1/mm/page_alloc.c > > > === > > > --- linux-2.6.24-rc5-mm1.orig/mm/page_alloc.c 2007-12-19 12:01: > > 00.630421258 -0800 > > > +++ linux-2.6.24-rc5-mm1/mm/page_alloc.c 2007-12-19 15:12: > > 19.850545818 -0800 > > > @@ -930,7 +930,7 @@ static void drain_pages(unsigned int cpu > > > /* > > > * Spill all of this CPU's per-cpu pages back into the buddy allocator. > > > */ > > > -static void drain_local_pages(void *arg) > > > +void drain_local_pages(void *arg) > > > { > > > drain_pages(smp_processor_id()); > > > } > > > Index: linux-2.6.24-rc5-mm1/include/linux/gfp.h > > > === > > > --- linux-2.6.24-rc5-mm1.orig/include/linux/gfp.h 2007-12-19 15:13: > > 51.926950065 -0800 > > > +++ linux-2.6.24-rc5-mm1/include/linux/gfp.h 2007-12-19 15:16: > > 11.951564369 -0800 > > > @@ -229,5 +229,6 @@ extern void FASTCALL(free_cold_page(stru > > > void page_alloc_init(void); > > > void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); > > > void drain_all_pages(void); > > > +void drain_local_pages(void *dummy); > > > > > > #endif /* __LINUX_GFP_H */ > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} -> {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Dec 19, 2007 7:09 PM, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: > > On Thursday, 20 of December 2007, Christoph Lameter wrote: > > On Thu, 20 Dec 2007, Rafael J. Wysocki wrote: > > > > > > We could reexport drain_local_pages() again but then I do not understand > > > > why we would only drain the pages of this processor and not of all other > > > > processors as well. It seems that software suspend intend was to flush > > > > them all right? > > > > > > Well, not exactly. We are on one CPU at this point, the others have been > > > disabled. > > > > Ok so the others are flush. Here is a patch to re-export > > drain_local_pages() again and use it for software suspend: > > > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > > > > --- > > include/linux/gfp.h |1 + > > kernel/power/snapshot.c |2 +- > > mm/page_alloc.c |2 +- > > 3 files changed, 3 insertions(+), 2 deletions(-) > > > > Index: linux-2.6.24-rc5-mm1/kernel/power/snapshot.c > > === > > --- linux-2.6.24-rc5-mm1.orig/kernel/power/snapshot.c 2007-12-19 > > 11:59:25.233961700 -0800 > > +++ linux-2.6.24-rc5-mm1/kernel/power/snapshot.c 2007-12-19 > > 15:16:34.179661929 -0800 > > @@ -1203,7 +1203,7 @@ asmlinkage int swsusp_save(void) > > > > printk(KERN_INFO "PM: Creating hibernation image: \n"); > > > > - drain_all_pages(); > > + drain_local_pages(NULL); > > nr_pages = count_data_pages(); > > nr_highmem = count_highmem_pages(); > > printk(KERN_INFO "PM: Need to copy %u pages\n", nr_pages + > > nr_highmem); > > You've omitted the second instance, right before the copy_data_pages() call. I will wait for a revised patch and then test. (Sorry for the duplicate message. I am resending because I accidentally sent an HTML message the first time. Whoops.) > > Index: linux-2.6.24-rc5-mm1/mm/page_alloc.c > > === > > --- linux-2.6.24-rc5-mm1.orig/mm/page_alloc.c 2007-12-19 12:01:00.630421258 > > -0800 > > +++ linux-2.6.24-rc5-mm1/mm/page_alloc.c 2007-12-19 15:12:19.850545818 > > -0800 > > @@ -930,7 +930,7 @@ static void drain_pages(unsigned int cpu > > /* > > * Spill all of this CPU's per-cpu pages back into the buddy allocator. > > */ > > -static void drain_local_pages(void *arg) > > +void drain_local_pages(void *arg) > > { > > drain_pages(smp_processor_id()); > > } > > Index: linux-2.6.24-rc5-mm1/include/linux/gfp.h > > === > > --- linux-2.6.24-rc5-mm1.orig/include/linux/gfp.h 2007-12-19 > > 15:13:51.926950065 -0800 > > +++ linux-2.6.24-rc5-mm1/include/linux/gfp.h 2007-12-19 15:16:11.951564369 > > -0800 > > @@ -229,5 +229,6 @@ extern void FASTCALL(free_cold_page(stru > > void page_alloc_init(void); > > void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); > > void drain_all_pages(void); > > +void drain_local_pages(void *dummy); > > > > #endif /* __LINUX_GFP_H */ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1
On Dec 20, 2007 12:07 AM, Alan Stern <[EMAIL PROTECTED]> wrote: > > On Wed, 19 Dec 2007, Dave Young wrote: > > > I tested on another machine with kernel 2.6.24-rc2. And the result is > > diffrent again. > > Here is the result: > > > > 1. on 2.6.24-rc2, when I plugin the player the kernel reports below > > messages: > > > > usb-storage: waiting for device to settle before scanning > > /*[lets mark the below part as part 1]*/ > > scsi 0:0:0:0: Direct-Access Newman mp3 PQ: 0 ANSI: > > 0 CCS > > sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) > > sd 0:0:0:0: [sda] Write Protect is on > > sd 0:0:0:0: [sda] Mode Sense: 03 00 80 00 > > sd 0:0:0:0: [sda] Assuming drive cache: write through > > sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) > > sd 0:0:0:0: [sda] Write Protect is on > > sd 0:0:0:0: [sda] Mode Sense: 03 00 80 00 > > sd 0:0:0:0: [sda] Assuming drive cache: write through > > sda: sda1 > > /*[lets mark the below part as part 2]*/ > > sd 0:0:0:0: [sda] Attached SCSI removable disk > > usb-storage: device scan complete > > sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) > > sd 0:0:0:0: [sda] Write Protect is off > > sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 > > sd 0:0:0:0: [sda] Assuming drive cache: write through > > sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) > > sd 0:0:0:0: [sda] Write Protect is off > > sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 > > sd 0:0:0:0: [sda] Assuming drive cache: write through > > sda: sda1 > > This is not normal. When you plug in a storage device you should get > all of the messages in your part 1 plus the first two lines in your > part 2, but not the rest of part 2. > > > 2. on 2.6.24-rc5 kernel reports only the part 1, after try mount the > > disk it reports the part 2 and mount the partition as rw > > > > 3. on 2.6.24-rc5 kernel reports only the part 1, after try mount the > > disk it just mount the partition as ro with nothing more messages. > > You must have a typo there. Those can't both be true for 2.6.24-rc5. > In fact you shouldn't see part 2 at all. > > Here's what I get when I plug in a USB mass-storage device under > 2.6.24-rc5: > > [ 87.903014] usb-storage: device found at 2 > [ 87.909570] scsi 0:0:0:0: Direct-Access Memorex TD 2B1.09 > PQ: 0 ANSI: 0 CCS > [ 87.913144] usb-storage: device scan complete > [ 88.804031] sd 0:0:0:0: [sda] 243712 512-byte hardware sectors (125 MB) > [ 88.805507] sd 0:0:0:0: [sda] Write Protect is off > [ 88.805577] sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 > [ 88.805639] sd 0:0:0:0: [sda] Assuming drive cache: write through > [ 88.809526] sd 0:0:0:0: [sda] 243712 512-byte hardware sectors (125 MB) > [ 88.810421] sd 0:0:0:0: [sda] Write Protect is off > [ 88.810488] sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 > [ 88.810575] sd 0:0:0:0: [sda] Assuming drive cache: write through > [ 88.810641] sda: sda1 > [ 88.812450] sd 0:0:0:0: [sda] Attached SCSI removable disk > [ 89.041014] sd 0:0:0:0: Attached scsi generic sg0 type 0 > > Mounting the disk produces no extra output at all. I get the same > result under 2.6.23 and earlier operating systems. You should see > approximately the same thing. Hi, Alan I'm sure about my post. I'm not so famillar with usb. It looks weird. Seems that my device will be firstly recoganized as a mp3 player and then a usb storage, so the system will report part 1 & part 2 under previous kernels. Regards dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} -> {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Thursday, 20 of December 2007, Christoph Lameter wrote: > On Thu, 20 Dec 2007, Rafael J. Wysocki wrote: > > > > We could reexport drain_local_pages() again but then I do not understand > > > why we would only drain the pages of this processor and not of all other > > > processors as well. It seems that software suspend intend was to flush > > > them all right? > > > > Well, not exactly. We are on one CPU at this point, the others have been > > disabled. > > Ok so the others are flush. Here is a patch to re-export > drain_local_pages() again and use it for software suspend: > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > > --- > include/linux/gfp.h |1 + > kernel/power/snapshot.c |2 +- > mm/page_alloc.c |2 +- > 3 files changed, 3 insertions(+), 2 deletions(-) > > Index: linux-2.6.24-rc5-mm1/kernel/power/snapshot.c > === > --- linux-2.6.24-rc5-mm1.orig/kernel/power/snapshot.c 2007-12-19 > 11:59:25.233961700 -0800 > +++ linux-2.6.24-rc5-mm1/kernel/power/snapshot.c 2007-12-19 > 15:16:34.179661929 -0800 > @@ -1203,7 +1203,7 @@ asmlinkage int swsusp_save(void) > > printk(KERN_INFO "PM: Creating hibernation image: \n"); > > - drain_all_pages(); > + drain_local_pages(NULL); > nr_pages = count_data_pages(); > nr_highmem = count_highmem_pages(); > printk(KERN_INFO "PM: Need to copy %u pages\n", nr_pages + nr_highmem); You've omitted the second instance, right before the copy_data_pages() call. > Index: linux-2.6.24-rc5-mm1/mm/page_alloc.c > === > --- linux-2.6.24-rc5-mm1.orig/mm/page_alloc.c 2007-12-19 12:01:00.630421258 > -0800 > +++ linux-2.6.24-rc5-mm1/mm/page_alloc.c 2007-12-19 15:12:19.850545818 > -0800 > @@ -930,7 +930,7 @@ static void drain_pages(unsigned int cpu > /* > * Spill all of this CPU's per-cpu pages back into the buddy allocator. > */ > -static void drain_local_pages(void *arg) > +void drain_local_pages(void *arg) > { > drain_pages(smp_processor_id()); > } > Index: linux-2.6.24-rc5-mm1/include/linux/gfp.h > === > --- linux-2.6.24-rc5-mm1.orig/include/linux/gfp.h 2007-12-19 > 15:13:51.926950065 -0800 > +++ linux-2.6.24-rc5-mm1/include/linux/gfp.h 2007-12-19 15:16:11.951564369 > -0800 > @@ -229,5 +229,6 @@ extern void FASTCALL(free_cold_page(stru > void page_alloc_init(void); > void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); > void drain_all_pages(void); > +void drain_local_pages(void *dummy); > > #endif /* __LINUX_GFP_H */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} -> {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Thu, 20 Dec 2007, Rafael J. Wysocki wrote: > > We could reexport drain_local_pages() again but then I do not understand > > why we would only drain the pages of this processor and not of all other > > processors as well. It seems that software suspend intend was to flush > > them all right? > > Well, not exactly. We are on one CPU at this point, the others have been > disabled. Ok so the others are flush. Here is a patch to re-export drain_local_pages() again and use it for software suspend: Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/gfp.h |1 + kernel/power/snapshot.c |2 +- mm/page_alloc.c |2 +- 3 files changed, 3 insertions(+), 2 deletions(-) Index: linux-2.6.24-rc5-mm1/kernel/power/snapshot.c === --- linux-2.6.24-rc5-mm1.orig/kernel/power/snapshot.c 2007-12-19 11:59:25.233961700 -0800 +++ linux-2.6.24-rc5-mm1/kernel/power/snapshot.c2007-12-19 15:16:34.179661929 -0800 @@ -1203,7 +1203,7 @@ asmlinkage int swsusp_save(void) printk(KERN_INFO "PM: Creating hibernation image: \n"); - drain_all_pages(); + drain_local_pages(NULL); nr_pages = count_data_pages(); nr_highmem = count_highmem_pages(); printk(KERN_INFO "PM: Need to copy %u pages\n", nr_pages + nr_highmem); Index: linux-2.6.24-rc5-mm1/mm/page_alloc.c === --- linux-2.6.24-rc5-mm1.orig/mm/page_alloc.c 2007-12-19 12:01:00.630421258 -0800 +++ linux-2.6.24-rc5-mm1/mm/page_alloc.c2007-12-19 15:12:19.850545818 -0800 @@ -930,7 +930,7 @@ static void drain_pages(unsigned int cpu /* * Spill all of this CPU's per-cpu pages back into the buddy allocator. */ -static void drain_local_pages(void *arg) +void drain_local_pages(void *arg) { drain_pages(smp_processor_id()); } Index: linux-2.6.24-rc5-mm1/include/linux/gfp.h === --- linux-2.6.24-rc5-mm1.orig/include/linux/gfp.h 2007-12-19 15:13:51.926950065 -0800 +++ linux-2.6.24-rc5-mm1/include/linux/gfp.h2007-12-19 15:16:11.951564369 -0800 @@ -229,5 +229,6 @@ extern void FASTCALL(free_cold_page(stru void page_alloc_init(void); void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); void drain_all_pages(void); +void drain_local_pages(void *dummy); #endif /* __LINUX_GFP_H */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} -> {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Wednesday, 19 of December 2007, Christoph Lameter wrote: > On Wed, 19 Dec 2007, Daniel Walker wrote: > > > > It looks like the swsusp_save() calls drain_all_pages() , which calls > > > on_each_cpu() .. On return on_each_cpu() unconditionally enables > > > interrupts so the rest of the resume process has interrupt enable > > > (which , it looks like, shouldn't happen) and then you get the lockdep() > > > warning due to the above.. > > > > > > Not sure if this has been found already, or not? > > Hmmm... It will unconditionally enable interrupts regardless how we call > this. We could explicity save and restore interrrupts in > swsusp_save() I guess. Why is swsusp_save() disabling interrupts? Actually, it's called with interrupts disabled, because it's job is to create the hibernation image. At this point everything is off except for the CPU running swsusp_save(). > > > Should drain_all_pages() really be drain_local_pages() ? > > > > It looks like it was drain_local_pages, but the following patch > > > > page-allocator-clean-up-pcp-draining-functions.patch > > > > Changes that in -mm .. I added Christoph Lameter to the CC since it's > > his patch .. > > We could reexport drain_local_pages() again but then I do not understand > why we would only drain the pages of this processor and not of all other > processors as well. It seems that software suspend intend was to flush > them all right? Well, not exactly. We are on one CPU at this point, the others have been disabled. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} -> {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Wed, 19 Dec 2007, Daniel Walker wrote: > > It looks like the swsusp_save() calls drain_all_pages() , which calls > > on_each_cpu() .. On return on_each_cpu() unconditionally enables > > interrupts so the rest of the resume process has interrupt enable > > (which , it looks like, shouldn't happen) and then you get the lockdep() > > warning due to the above.. > > > > Not sure if this has been found already, or not? Hmmm... It will unconditionally enable interrupts regardless how we call this. We could explicity save and restore interrrupts in swsusp_save() I guess. Why is swsusp_save() disabling interrupts? > > Should drain_all_pages() really be drain_local_pages() ? > > It looks like it was drain_local_pages, but the following patch > > page-allocator-clean-up-pcp-draining-functions.patch > > Changes that in -mm .. I added Christoph Lameter to the CC since it's > his patch .. We could reexport drain_local_pages() again but then I do not understand why we would only drain the pages of this processor and not of all other processors as well. It seems that software suspend intend was to flush them all right? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} -> {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Wed, 2007-12-19 at 10:42 -0800, Daniel Walker wrote: > On Wed, 2007-12-19 at 10:06 -0500, Miles Lane wrote: > > [ 11.827653] PM: Creating hibernation image: > > [ 11.827658] WARNING: at arch/x86/kernel/smp_32.c:561 > > native_smp_call_function_mask() > > [ 11.827661] Pid: 9940, comm: pm-hibernate Not tainted > > 2.6.24-rc5-mm1 #8 > > [ 11.827665] [] show_trace_log_lvl+0x12/0x25 > > [ 11.827673] [] show_trace+0xd/0x10 > > [ 11.827677] [] dump_stack+0x57/0x5f > > [ 11.827681] [] native_smp_call_function_mask+0x41/0x126 > > [ 11.827686] [] smp_call_function+0x18/0x1f > > [ 11.827690] [] on_each_cpu+0x12/0x40 > > [ 11.827695] [] drain_all_pages+0x13/0x16 > > [ 11.827700] [] swsusp_save+0x18/0x46b > > [ 11.827705] [] swsusp_arch_suspend+0x2a/0x2c > > [ 11.827710] [] hibernate+0xba/0x16e > > [ 11.827714] [] state_store+0x45/0xac > > [ 11.827717] [] kobj_attr_store+0x1a/0x22 > > [ 11.827722] [] sysfs_write_file+0xb8/0xe3 > > [ 11.827726] [] vfs_write+0xa4/0x120 > > [ 11.827731] [] sys_write+0x3b/0x60 > > [ 11.827734] [] sysenter_past_esp+0x6b/0xc1 > > [ 11.827738] === > ... > > [ 15.624993] = > > [ 15.624995] [ INFO: inconsistent lock state ] > > [ 15.624998] 2.6.24-rc5-mm1 #8 > > [ 15.624999] - > > [ 15.625001] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage. > > It looks like the swsusp_save() calls drain_all_pages() , which calls > on_each_cpu() .. On return on_each_cpu() unconditionally enables > interrupts so the rest of the resume process has interrupt enable > (which , it looks like, shouldn't happen) and then you get the lockdep() > warning due to the above.. > > Not sure if this has been found already, or not? > > Should drain_all_pages() really be drain_local_pages() ? It looks like it was drain_local_pages, but the following patch page-allocator-clean-up-pcp-draining-functions.patch Changes that in -mm .. I added Christoph Lameter to the CC since it's his patch .. Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} -> {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Wed, 2007-12-19 at 10:06 -0500, Miles Lane wrote: > [ 11.827653] PM: Creating hibernation image: > [ 11.827658] WARNING: at arch/x86/kernel/smp_32.c:561 > native_smp_call_function_mask() > [ 11.827661] Pid: 9940, comm: pm-hibernate Not tainted > 2.6.24-rc5-mm1 #8 > [ 11.827665] [] show_trace_log_lvl+0x12/0x25 > [ 11.827673] [] show_trace+0xd/0x10 > [ 11.827677] [] dump_stack+0x57/0x5f > [ 11.827681] [] native_smp_call_function_mask+0x41/0x126 > [ 11.827686] [] smp_call_function+0x18/0x1f > [ 11.827690] [] on_each_cpu+0x12/0x40 > [ 11.827695] [] drain_all_pages+0x13/0x16 > [ 11.827700] [] swsusp_save+0x18/0x46b > [ 11.827705] [] swsusp_arch_suspend+0x2a/0x2c > [ 11.827710] [] hibernate+0xba/0x16e > [ 11.827714] [] state_store+0x45/0xac > [ 11.827717] [] kobj_attr_store+0x1a/0x22 > [ 11.827722] [] sysfs_write_file+0xb8/0xe3 > [ 11.827726] [] vfs_write+0xa4/0x120 > [ 11.827731] [] sys_write+0x3b/0x60 > [ 11.827734] [] sysenter_past_esp+0x6b/0xc1 > [ 11.827738] === ... > [ 15.624993] = > [ 15.624995] [ INFO: inconsistent lock state ] > [ 15.624998] 2.6.24-rc5-mm1 #8 > [ 15.624999] - > [ 15.625001] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage. It looks like the swsusp_save() calls drain_all_pages() , which calls on_each_cpu() .. On return on_each_cpu() unconditionally enables interrupts so the rest of the resume process has interrupt enable (which , it looks like, shouldn't happen) and then you get the lockdep() warning due to the above.. Not sure if this has been found already, or not? Should drain_all_pages() really be drain_local_pages() ? Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1
On Wed, 19 Dec 2007, Dave Young wrote: > I tested on another machine with kernel 2.6.24-rc2. And the result is > diffrent again. > Here is the result: > > 1. on 2.6.24-rc2, when I plugin the player the kernel reports below messages: > > usb-storage: waiting for device to settle before scanning > /*[lets mark the below part as part 1]*/ > scsi 0:0:0:0: Direct-Access Newman mp3 PQ: 0 ANSI: 0 > CCS > sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) > sd 0:0:0:0: [sda] Write Protect is on > sd 0:0:0:0: [sda] Mode Sense: 03 00 80 00 > sd 0:0:0:0: [sda] Assuming drive cache: write through > sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) > sd 0:0:0:0: [sda] Write Protect is on > sd 0:0:0:0: [sda] Mode Sense: 03 00 80 00 > sd 0:0:0:0: [sda] Assuming drive cache: write through > sda: sda1 > /*[lets mark the below part as part 2]*/ > sd 0:0:0:0: [sda] Attached SCSI removable disk > usb-storage: device scan complete > sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) > sd 0:0:0:0: [sda] Write Protect is off > sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 > sd 0:0:0:0: [sda] Assuming drive cache: write through > sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) > sd 0:0:0:0: [sda] Write Protect is off > sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 > sd 0:0:0:0: [sda] Assuming drive cache: write through > sda: sda1 This is not normal. When you plug in a storage device you should get all of the messages in your part 1 plus the first two lines in your part 2, but not the rest of part 2. > 2. on 2.6.24-rc5 kernel reports only the part 1, after try mount the > disk it reports the part 2 and mount the partition as rw > > 3. on 2.6.24-rc5 kernel reports only the part 1, after try mount the > disk it just mount the partition as ro with nothing more messages. You must have a typo there. Those can't both be true for 2.6.24-rc5. In fact you shouldn't see part 2 at all. Here's what I get when I plug in a USB mass-storage device under 2.6.24-rc5: [ 87.903014] usb-storage: device found at 2 [ 87.909570] scsi 0:0:0:0: Direct-Access Memorex TD 2B1.09 PQ: 0 ANSI: 0 CCS [ 87.913144] usb-storage: device scan complete [ 88.804031] sd 0:0:0:0: [sda] 243712 512-byte hardware sectors (125 MB) [ 88.805507] sd 0:0:0:0: [sda] Write Protect is off [ 88.805577] sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 [ 88.805639] sd 0:0:0:0: [sda] Assuming drive cache: write through [ 88.809526] sd 0:0:0:0: [sda] 243712 512-byte hardware sectors (125 MB) [ 88.810421] sd 0:0:0:0: [sda] Write Protect is off [ 88.810488] sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 [ 88.810575] sd 0:0:0:0: [sda] Assuming drive cache: write through [ 88.810641] sda: sda1 [ 88.812450] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 89.041014] sd 0:0:0:0: Attached scsi generic sg0 type 0 Mounting the disk produces no extra output at all. I get the same result under 2.6.23 and earlier operating systems. You should see approximately the same thing. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} -> {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
I discovered that I can use IMAP with GMail now, so I can send messages using Thunderbird and avoid the line wrapping problem. I tried doing a series: suspend-to-disk, suspend-to-ram and suspend-to-disk Here is the result: [ 11.827653] PM: Creating hibernation image: [ 11.827658] WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask() [ 11.827661] Pid: 9940, comm: pm-hibernate Not tainted 2.6.24-rc5-mm1 #8 [ 11.827665] [] show_trace_log_lvl+0x12/0x25 [ 11.827673] [] show_trace+0xd/0x10 [ 11.827677] [] dump_stack+0x57/0x5f [ 11.827681] [] native_smp_call_function_mask+0x41/0x126 [ 11.827686] [] smp_call_function+0x18/0x1f [ 11.827690] [] on_each_cpu+0x12/0x40 [ 11.827695] [] drain_all_pages+0x13/0x16 [ 11.827700] [] swsusp_save+0x18/0x46b [ 11.827705] [] swsusp_arch_suspend+0x2a/0x2c [ 11.827710] [] hibernate+0xba/0x16e [ 11.827714] [] state_store+0x45/0xac [ 11.827717] [] kobj_attr_store+0x1a/0x22 [ 11.827722] [] sysfs_write_file+0xb8/0xe3 [ 11.827726] [] vfs_write+0xa4/0x120 [ 11.827731] [] sys_write+0x3b/0x60 [ 11.827734] [] sysenter_past_esp+0x6b/0xc1 [ 11.827738] === [ 11.920363] PM: Need to copy 124108 pages [ 11.920368] PM: Normal pages needed: 46468 + 1024 + 40, available pages: 182806 [ 15.623893] PM: Hibernation image created (124108 pages copied) [ 15.624618] Intel machine check architecture supported. [ 15.624625] Intel machine check reporting enabled on CPU#0. [ 15.624992] [ 15.624993] = [ 15.624995] [ INFO: inconsistent lock state ] [ 15.624998] 2.6.24-rc5-mm1 #8 [ 15.624999] - [ 15.625001] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage. [ 15.625005] pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 15.625007] (_base->lock_key){++..}, at: [] retrigger_next_event+0x63/0x9f [ 15.625017] {in-hardirq-W} state was registered at: [ 15.625019] [] __lock_acquire+0x408/0xbf4 [ 15.625025] [] lock_acquire+0x76/0x9d [ 15.625029] [] _spin_lock+0x19/0x28 [ 15.625035] [] hrtimer_interrupt+0x72/0x1b0 [ 15.625039] [] smp_apic_timer_interrupt+0x69/0x7c [ 15.625045] [] apic_timer_interrupt+0x33/0x38 [ 15.625050] [] mwait_idle+0x1b/0x1d [ 15.625054] [] cpu_idle+0xb3/0xd4 [ 15.625058] [] rest_init+0x49/0x4b [ 15.625062] [] start_kernel+0x357/0x35f [ 15.625069] [<>] 0x0 [ 15.625082] [] 0x [ 15.625087] irq event stamp: 1182359 [ 15.625089] hardirqs last enabled at (1182359): [] restore_nocheck+0x12/0x15 [ 15.625094] hardirqs last disabled at (1182358): [] apic_timer_interrupt+0x29/0x38 [ 15.625098] softirqs last enabled at (933018): [] __rcu_offline_cpu+0x32/0x62 [ 15.625104] softirqs last disabled at (933016): [] _spin_lock_bh+0xb/0x2d [ 15.625109] [ 15.625110] other info that might help us debug this: [ 15.625112] 2 locks held by pm-hibernate/9940: [ 15.625114] #0: (>mutex){--..}, at: [] sysfs_write_file+0x25/0xe3 [ 15.625121] #1: (pm_mutex){--..}, at: [] hibernate+0x10/0x16e [ 15.625127] [ 15.625128] stack backtrace: [ 15.625131] Pid: 9940, comm: pm-hibernate Not tainted 2.6.24-rc5-mm1 #8 [ 15.625133] [] show_trace_log_lvl+0x12/0x25 [ 15.625138] [] show_trace+0xd/0x10 [ 15.625141] [] dump_stack+0x57/0x5f [ 15.625144] [] print_usage_bug+0x10a/0x117 [ 15.625148] [] mark_lock+0x1e7/0x3fe [ 15.625152] [] __lock_acquire+0x475/0xbf4 [ 15.625156] [] lock_acquire+0x76/0x9d [ 15.625159] [] _spin_lock+0x19/0x28 [ 15.625163] [] retrigger_next_event+0x63/0x9f [ 15.625167] [] hres_timers_resume+0x4d/0x4f [ 15.625170] [] timekeeping_resume+0x117/0x11e [ 15.625175] [] __sysdev_resume+0x14/0x34 [ 15.625179] [] sysdev_resume+0x21/0x57 [ 15.625183] [] device_power_up+0x8/0xf [ 15.625188] [] hibernation_snapshot+0x13c/0x173 [ 15.625192] [] hibernate+0xba/0x16e [ 15.625195] [] state_store+0x45/0xac [ 15.625199] [] kobj_attr_store+0x1a/0x22 [ 15.625203] [] sysfs_write_file+0xb8/0xe3 [ 15.625207] [] vfs_write+0xa4/0x120 [ 15.625211] [] sys_write+0x3b/0x60 [ 15.625214] [] sysenter_past_esp+0x6b/0xc1 [ 15.625217] === [ 15.625242] agpgart-intel :00:00.0: EARLY resume ... [ 15.624618] Intel machine check architecture supported. [ 15.624625] Intel machine check reporting enabled on CPU#0. [ 15.624992] [ 15.624993] = [ 15.624995] [ INFO: inconsistent lock state ] [ 15.624998] 2.6.24-rc5-mm1 #8 [ 15.624999] - [ 15.625001] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage. [ 15.625005] pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 15.625007] (_base->lock_key){++..}, at: [] retrigger_next_event+0x63/0x9f [ 15.625017] {in-hardirq-W} state was registered at: [ 15.625019] [] __lock_acquire+0x408/0xbf4 [ 15.625025] [] lock_acquire+0x76/0x9d [
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} - {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
I discovered that I can use IMAP with GMail now, so I can send messages using Thunderbird and avoid the line wrapping problem. I tried doing a series: suspend-to-disk, suspend-to-ram and suspend-to-disk Here is the result: [ 11.827653] PM: Creating hibernation image: [ 11.827658] WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask() [ 11.827661] Pid: 9940, comm: pm-hibernate Not tainted 2.6.24-rc5-mm1 #8 [ 11.827665] [c0107d55] show_trace_log_lvl+0x12/0x25 [ 11.827673] [c010848a] show_trace+0xd/0x10 [ 11.827677] [c0108763] dump_stack+0x57/0x5f [ 11.827681] [c0117db4] native_smp_call_function_mask+0x41/0x126 [ 11.827686] [c01192d9] smp_call_function+0x18/0x1f [ 11.827690] [c012c624] on_each_cpu+0x12/0x40 [ 11.827695] [c0166ece] drain_all_pages+0x13/0x16 [ 11.827700] [c014f7b3] swsusp_save+0x18/0x46b [ 11.827705] [c03103fa] swsusp_arch_suspend+0x2a/0x2c [ 11.827710] [c014e7d8] hibernate+0xba/0x16e [ 11.827714] [c014d56b] state_store+0x45/0xac [ 11.827717] [c01ffe95] kobj_attr_store+0x1a/0x22 [ 11.827722] [c01b92c7] sysfs_write_file+0xb8/0xe3 [ 11.827726] [c01837eb] vfs_write+0xa4/0x120 [ 11.827731] [c0183d5e] sys_write+0x3b/0x60 [ 11.827734] [c0106bae] sysenter_past_esp+0x6b/0xc1 [ 11.827738] === [ 11.920363] PM: Need to copy 124108 pages [ 11.920368] PM: Normal pages needed: 46468 + 1024 + 40, available pages: 182806 [ 15.623893] PM: Hibernation image created (124108 pages copied) [ 15.624618] Intel machine check architecture supported. [ 15.624625] Intel machine check reporting enabled on CPU#0. [ 15.624992] [ 15.624993] = [ 15.624995] [ INFO: inconsistent lock state ] [ 15.624998] 2.6.24-rc5-mm1 #8 [ 15.624999] - [ 15.625001] inconsistent {in-hardirq-W} - {hardirq-on-W} usage. [ 15.625005] pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 15.625007] (cpu_base-lock_key){++..}, at: [c013c453] retrigger_next_event+0x63/0x9f [ 15.625017] {in-hardirq-W} state was registered at: [ 15.625019] [c0145432] __lock_acquire+0x408/0xbf4 [ 15.625025] [c0145c94] lock_acquire+0x76/0x9d [ 15.625029] [c039aa08] _spin_lock+0x19/0x28 [ 15.625035] [c013cd92] hrtimer_interrupt+0x72/0x1b0 [ 15.625039] [c011a2b7] smp_apic_timer_interrupt+0x69/0x7c [ 15.625045] [c010] apic_timer_interrupt+0x33/0x38 [ 15.625050] [c01054b5] mwait_idle+0x1b/0x1d [ 15.625054] [c01055e9] cpu_idle+0xb3/0xd4 [ 15.625058] [c03986c5] rest_init+0x49/0x4b [ 15.625062] [c04f696d] start_kernel+0x357/0x35f [ 15.625069] [] 0x0 [ 15.625082] [] 0x [ 15.625087] irq event stamp: 1182359 [ 15.625089] hardirqs last enabled at (1182359): [c0106cb3] restore_nocheck+0x12/0x15 [ 15.625094] hardirqs last disabled at (1182358): [c010776d] apic_timer_interrupt+0x29/0x38 [ 15.625098] softirqs last enabled at (933018): [c0137d89] __rcu_offline_cpu+0x32/0x62 [ 15.625104] softirqs last disabled at (933016): [c039aa22] _spin_lock_bh+0xb/0x2d [ 15.625109] [ 15.625110] other info that might help us debug this: [ 15.625112] 2 locks held by pm-hibernate/9940: [ 15.625114] #0: (buffer-mutex){--..}, at: [c01b9234] sysfs_write_file+0x25/0xe3 [ 15.625121] #1: (pm_mutex){--..}, at: [c014e72e] hibernate+0x10/0x16e [ 15.625127] [ 15.625128] stack backtrace: [ 15.625131] Pid: 9940, comm: pm-hibernate Not tainted 2.6.24-rc5-mm1 #8 [ 15.625133] [c0107d55] show_trace_log_lvl+0x12/0x25 [ 15.625138] [c010848a] show_trace+0xd/0x10 [ 15.625141] [c0108763] dump_stack+0x57/0x5f [ 15.625144] [c0143e45] print_usage_bug+0x10a/0x117 [ 15.625148] [c01447de] mark_lock+0x1e7/0x3fe [ 15.625152] [c014549f] __lock_acquire+0x475/0xbf4 [ 15.625156] [c0145c94] lock_acquire+0x76/0x9d [ 15.625159] [c039aa08] _spin_lock+0x19/0x28 [ 15.625163] [c013c453] retrigger_next_event+0x63/0x9f [ 15.625167] [c013caf7] hres_timers_resume+0x4d/0x4f [ 15.625170] [c013eed1] timekeeping_resume+0x117/0x11e [ 15.625175] [c027b2ba] __sysdev_resume+0x14/0x34 [ 15.625179] [c027b752] sysdev_resume+0x21/0x57 [ 15.625183] [c027f426] device_power_up+0x8/0xf [ 15.625188] [c014e6e7] hibernation_snapshot+0x13c/0x173 [ 15.625192] [c014e7d8] hibernate+0xba/0x16e [ 15.625195] [c014d56b] state_store+0x45/0xac [ 15.625199] [c01ffe95] kobj_attr_store+0x1a/0x22 [ 15.625203] [c01b92c7] sysfs_write_file+0xb8/0xe3 [ 15.625207] [c01837eb] vfs_write+0xa4/0x120 [ 15.625211] [c0183d5e] sys_write+0x3b/0x60 [ 15.625214] [c0106bae] sysenter_past_esp+0x6b/0xc1 [ 15.625217] === [ 15.625242] agpgart-intel :00:00.0: EARLY resume ... [ 15.624618] Intel machine check architecture supported. [ 15.624625] Intel machine check reporting enabled on CPU#0. [ 15.624992] [ 15.624993] = [ 15.624995] [ INFO: inconsistent lock
Re: 2.6.24-rc5-mm1
On Wed, 19 Dec 2007, Dave Young wrote: I tested on another machine with kernel 2.6.24-rc2. And the result is diffrent again. Here is the result: 1. on 2.6.24-rc2, when I plugin the player the kernel reports below messages: usb-storage: waiting for device to settle before scanning /*[lets mark the below part as part 1]*/ scsi 0:0:0:0: Direct-Access Newman mp3 PQ: 0 ANSI: 0 CCS sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) sd 0:0:0:0: [sda] Write Protect is on sd 0:0:0:0: [sda] Mode Sense: 03 00 80 00 sd 0:0:0:0: [sda] Assuming drive cache: write through sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) sd 0:0:0:0: [sda] Write Protect is on sd 0:0:0:0: [sda] Mode Sense: 03 00 80 00 sd 0:0:0:0: [sda] Assuming drive cache: write through sda: sda1 /*[lets mark the below part as part 2]*/ sd 0:0:0:0: [sda] Attached SCSI removable disk usb-storage: device scan complete sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 sd 0:0:0:0: [sda] Assuming drive cache: write through sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 sd 0:0:0:0: [sda] Assuming drive cache: write through sda: sda1 This is not normal. When you plug in a storage device you should get all of the messages in your part 1 plus the first two lines in your part 2, but not the rest of part 2. 2. on 2.6.24-rc5 kernel reports only the part 1, after try mount the disk it reports the part 2 and mount the partition as rw 3. on 2.6.24-rc5 kernel reports only the part 1, after try mount the disk it just mount the partition as ro with nothing more messages. You must have a typo there. Those can't both be true for 2.6.24-rc5. In fact you shouldn't see part 2 at all. Here's what I get when I plug in a USB mass-storage device under 2.6.24-rc5: [ 87.903014] usb-storage: device found at 2 [ 87.909570] scsi 0:0:0:0: Direct-Access Memorex TD 2B1.09 PQ: 0 ANSI: 0 CCS [ 87.913144] usb-storage: device scan complete [ 88.804031] sd 0:0:0:0: [sda] 243712 512-byte hardware sectors (125 MB) [ 88.805507] sd 0:0:0:0: [sda] Write Protect is off [ 88.805577] sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 [ 88.805639] sd 0:0:0:0: [sda] Assuming drive cache: write through [ 88.809526] sd 0:0:0:0: [sda] 243712 512-byte hardware sectors (125 MB) [ 88.810421] sd 0:0:0:0: [sda] Write Protect is off [ 88.810488] sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 [ 88.810575] sd 0:0:0:0: [sda] Assuming drive cache: write through [ 88.810641] sda: sda1 [ 88.812450] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 89.041014] sd 0:0:0:0: Attached scsi generic sg0 type 0 Mounting the disk produces no extra output at all. I get the same result under 2.6.23 and earlier operating systems. You should see approximately the same thing. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} - {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Wed, 2007-12-19 at 10:06 -0500, Miles Lane wrote: [ 11.827653] PM: Creating hibernation image: [ 11.827658] WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask() [ 11.827661] Pid: 9940, comm: pm-hibernate Not tainted 2.6.24-rc5-mm1 #8 [ 11.827665] [c0107d55] show_trace_log_lvl+0x12/0x25 [ 11.827673] [c010848a] show_trace+0xd/0x10 [ 11.827677] [c0108763] dump_stack+0x57/0x5f [ 11.827681] [c0117db4] native_smp_call_function_mask+0x41/0x126 [ 11.827686] [c01192d9] smp_call_function+0x18/0x1f [ 11.827690] [c012c624] on_each_cpu+0x12/0x40 [ 11.827695] [c0166ece] drain_all_pages+0x13/0x16 [ 11.827700] [c014f7b3] swsusp_save+0x18/0x46b [ 11.827705] [c03103fa] swsusp_arch_suspend+0x2a/0x2c [ 11.827710] [c014e7d8] hibernate+0xba/0x16e [ 11.827714] [c014d56b] state_store+0x45/0xac [ 11.827717] [c01ffe95] kobj_attr_store+0x1a/0x22 [ 11.827722] [c01b92c7] sysfs_write_file+0xb8/0xe3 [ 11.827726] [c01837eb] vfs_write+0xa4/0x120 [ 11.827731] [c0183d5e] sys_write+0x3b/0x60 [ 11.827734] [c0106bae] sysenter_past_esp+0x6b/0xc1 [ 11.827738] === ... [ 15.624993] = [ 15.624995] [ INFO: inconsistent lock state ] [ 15.624998] 2.6.24-rc5-mm1 #8 [ 15.624999] - [ 15.625001] inconsistent {in-hardirq-W} - {hardirq-on-W} usage. It looks like the swsusp_save() calls drain_all_pages() , which calls on_each_cpu() .. On return on_each_cpu() unconditionally enables interrupts so the rest of the resume process has interrupt enable (which , it looks like, shouldn't happen) and then you get the lockdep() warning due to the above.. Not sure if this has been found already, or not? Should drain_all_pages() really be drain_local_pages() ? Daniel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} - {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Wed, 2007-12-19 at 10:42 -0800, Daniel Walker wrote: On Wed, 2007-12-19 at 10:06 -0500, Miles Lane wrote: [ 11.827653] PM: Creating hibernation image: [ 11.827658] WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask() [ 11.827661] Pid: 9940, comm: pm-hibernate Not tainted 2.6.24-rc5-mm1 #8 [ 11.827665] [c0107d55] show_trace_log_lvl+0x12/0x25 [ 11.827673] [c010848a] show_trace+0xd/0x10 [ 11.827677] [c0108763] dump_stack+0x57/0x5f [ 11.827681] [c0117db4] native_smp_call_function_mask+0x41/0x126 [ 11.827686] [c01192d9] smp_call_function+0x18/0x1f [ 11.827690] [c012c624] on_each_cpu+0x12/0x40 [ 11.827695] [c0166ece] drain_all_pages+0x13/0x16 [ 11.827700] [c014f7b3] swsusp_save+0x18/0x46b [ 11.827705] [c03103fa] swsusp_arch_suspend+0x2a/0x2c [ 11.827710] [c014e7d8] hibernate+0xba/0x16e [ 11.827714] [c014d56b] state_store+0x45/0xac [ 11.827717] [c01ffe95] kobj_attr_store+0x1a/0x22 [ 11.827722] [c01b92c7] sysfs_write_file+0xb8/0xe3 [ 11.827726] [c01837eb] vfs_write+0xa4/0x120 [ 11.827731] [c0183d5e] sys_write+0x3b/0x60 [ 11.827734] [c0106bae] sysenter_past_esp+0x6b/0xc1 [ 11.827738] === ... [ 15.624993] = [ 15.624995] [ INFO: inconsistent lock state ] [ 15.624998] 2.6.24-rc5-mm1 #8 [ 15.624999] - [ 15.625001] inconsistent {in-hardirq-W} - {hardirq-on-W} usage. It looks like the swsusp_save() calls drain_all_pages() , which calls on_each_cpu() .. On return on_each_cpu() unconditionally enables interrupts so the rest of the resume process has interrupt enable (which , it looks like, shouldn't happen) and then you get the lockdep() warning due to the above.. Not sure if this has been found already, or not? Should drain_all_pages() really be drain_local_pages() ? It looks like it was drain_local_pages, but the following patch page-allocator-clean-up-pcp-draining-functions.patch Changes that in -mm .. I added Christoph Lameter to the CC since it's his patch .. Daniel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} - {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Wed, 19 Dec 2007, Daniel Walker wrote: It looks like the swsusp_save() calls drain_all_pages() , which calls on_each_cpu() .. On return on_each_cpu() unconditionally enables interrupts so the rest of the resume process has interrupt enable (which , it looks like, shouldn't happen) and then you get the lockdep() warning due to the above.. Not sure if this has been found already, or not? Hmmm... It will unconditionally enable interrupts regardless how we call this. We could explicity save and restore interrrupts in swsusp_save() I guess. Why is swsusp_save() disabling interrupts? Should drain_all_pages() really be drain_local_pages() ? It looks like it was drain_local_pages, but the following patch page-allocator-clean-up-pcp-draining-functions.patch Changes that in -mm .. I added Christoph Lameter to the CC since it's his patch .. We could reexport drain_local_pages() again but then I do not understand why we would only drain the pages of this processor and not of all other processors as well. It seems that software suspend intend was to flush them all right? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} - {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Wednesday, 19 of December 2007, Christoph Lameter wrote: On Wed, 19 Dec 2007, Daniel Walker wrote: It looks like the swsusp_save() calls drain_all_pages() , which calls on_each_cpu() .. On return on_each_cpu() unconditionally enables interrupts so the rest of the resume process has interrupt enable (which , it looks like, shouldn't happen) and then you get the lockdep() warning due to the above.. Not sure if this has been found already, or not? Hmmm... It will unconditionally enable interrupts regardless how we call this. We could explicity save and restore interrrupts in swsusp_save() I guess. Why is swsusp_save() disabling interrupts? Actually, it's called with interrupts disabled, because it's job is to create the hibernation image. At this point everything is off except for the CPU running swsusp_save(). Should drain_all_pages() really be drain_local_pages() ? It looks like it was drain_local_pages, but the following patch page-allocator-clean-up-pcp-draining-functions.patch Changes that in -mm .. I added Christoph Lameter to the CC since it's his patch .. We could reexport drain_local_pages() again but then I do not understand why we would only drain the pages of this processor and not of all other processors as well. It seems that software suspend intend was to flush them all right? Well, not exactly. We are on one CPU at this point, the others have been disabled. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} - {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Thu, 20 Dec 2007, Rafael J. Wysocki wrote: We could reexport drain_local_pages() again but then I do not understand why we would only drain the pages of this processor and not of all other processors as well. It seems that software suspend intend was to flush them all right? Well, not exactly. We are on one CPU at this point, the others have been disabled. Ok so the others are flush. Here is a patch to re-export drain_local_pages() again and use it for software suspend: Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/linux/gfp.h |1 + kernel/power/snapshot.c |2 +- mm/page_alloc.c |2 +- 3 files changed, 3 insertions(+), 2 deletions(-) Index: linux-2.6.24-rc5-mm1/kernel/power/snapshot.c === --- linux-2.6.24-rc5-mm1.orig/kernel/power/snapshot.c 2007-12-19 11:59:25.233961700 -0800 +++ linux-2.6.24-rc5-mm1/kernel/power/snapshot.c2007-12-19 15:16:34.179661929 -0800 @@ -1203,7 +1203,7 @@ asmlinkage int swsusp_save(void) printk(KERN_INFO PM: Creating hibernation image: \n); - drain_all_pages(); + drain_local_pages(NULL); nr_pages = count_data_pages(); nr_highmem = count_highmem_pages(); printk(KERN_INFO PM: Need to copy %u pages\n, nr_pages + nr_highmem); Index: linux-2.6.24-rc5-mm1/mm/page_alloc.c === --- linux-2.6.24-rc5-mm1.orig/mm/page_alloc.c 2007-12-19 12:01:00.630421258 -0800 +++ linux-2.6.24-rc5-mm1/mm/page_alloc.c2007-12-19 15:12:19.850545818 -0800 @@ -930,7 +930,7 @@ static void drain_pages(unsigned int cpu /* * Spill all of this CPU's per-cpu pages back into the buddy allocator. */ -static void drain_local_pages(void *arg) +void drain_local_pages(void *arg) { drain_pages(smp_processor_id()); } Index: linux-2.6.24-rc5-mm1/include/linux/gfp.h === --- linux-2.6.24-rc5-mm1.orig/include/linux/gfp.h 2007-12-19 15:13:51.926950065 -0800 +++ linux-2.6.24-rc5-mm1/include/linux/gfp.h2007-12-19 15:16:11.951564369 -0800 @@ -229,5 +229,6 @@ extern void FASTCALL(free_cold_page(stru void page_alloc_init(void); void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); void drain_all_pages(void); +void drain_local_pages(void *dummy); #endif /* __LINUX_GFP_H */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} - {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Thursday, 20 of December 2007, Christoph Lameter wrote: On Thu, 20 Dec 2007, Rafael J. Wysocki wrote: We could reexport drain_local_pages() again but then I do not understand why we would only drain the pages of this processor and not of all other processors as well. It seems that software suspend intend was to flush them all right? Well, not exactly. We are on one CPU at this point, the others have been disabled. Ok so the others are flush. Here is a patch to re-export drain_local_pages() again and use it for software suspend: Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/linux/gfp.h |1 + kernel/power/snapshot.c |2 +- mm/page_alloc.c |2 +- 3 files changed, 3 insertions(+), 2 deletions(-) Index: linux-2.6.24-rc5-mm1/kernel/power/snapshot.c === --- linux-2.6.24-rc5-mm1.orig/kernel/power/snapshot.c 2007-12-19 11:59:25.233961700 -0800 +++ linux-2.6.24-rc5-mm1/kernel/power/snapshot.c 2007-12-19 15:16:34.179661929 -0800 @@ -1203,7 +1203,7 @@ asmlinkage int swsusp_save(void) printk(KERN_INFO PM: Creating hibernation image: \n); - drain_all_pages(); + drain_local_pages(NULL); nr_pages = count_data_pages(); nr_highmem = count_highmem_pages(); printk(KERN_INFO PM: Need to copy %u pages\n, nr_pages + nr_highmem); You've omitted the second instance, right before the copy_data_pages() call. Index: linux-2.6.24-rc5-mm1/mm/page_alloc.c === --- linux-2.6.24-rc5-mm1.orig/mm/page_alloc.c 2007-12-19 12:01:00.630421258 -0800 +++ linux-2.6.24-rc5-mm1/mm/page_alloc.c 2007-12-19 15:12:19.850545818 -0800 @@ -930,7 +930,7 @@ static void drain_pages(unsigned int cpu /* * Spill all of this CPU's per-cpu pages back into the buddy allocator. */ -static void drain_local_pages(void *arg) +void drain_local_pages(void *arg) { drain_pages(smp_processor_id()); } Index: linux-2.6.24-rc5-mm1/include/linux/gfp.h === --- linux-2.6.24-rc5-mm1.orig/include/linux/gfp.h 2007-12-19 15:13:51.926950065 -0800 +++ linux-2.6.24-rc5-mm1/include/linux/gfp.h 2007-12-19 15:16:11.951564369 -0800 @@ -229,5 +229,6 @@ extern void FASTCALL(free_cold_page(stru void page_alloc_init(void); void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); void drain_all_pages(void); +void drain_local_pages(void *dummy); #endif /* __LINUX_GFP_H */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1
On Dec 20, 2007 12:07 AM, Alan Stern [EMAIL PROTECTED] wrote: On Wed, 19 Dec 2007, Dave Young wrote: I tested on another machine with kernel 2.6.24-rc2. And the result is diffrent again. Here is the result: 1. on 2.6.24-rc2, when I plugin the player the kernel reports below messages: usb-storage: waiting for device to settle before scanning /*[lets mark the below part as part 1]*/ scsi 0:0:0:0: Direct-Access Newman mp3 PQ: 0 ANSI: 0 CCS sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) sd 0:0:0:0: [sda] Write Protect is on sd 0:0:0:0: [sda] Mode Sense: 03 00 80 00 sd 0:0:0:0: [sda] Assuming drive cache: write through sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) sd 0:0:0:0: [sda] Write Protect is on sd 0:0:0:0: [sda] Mode Sense: 03 00 80 00 sd 0:0:0:0: [sda] Assuming drive cache: write through sda: sda1 /*[lets mark the below part as part 2]*/ sd 0:0:0:0: [sda] Attached SCSI removable disk usb-storage: device scan complete sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 sd 0:0:0:0: [sda] Assuming drive cache: write through sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 sd 0:0:0:0: [sda] Assuming drive cache: write through sda: sda1 This is not normal. When you plug in a storage device you should get all of the messages in your part 1 plus the first two lines in your part 2, but not the rest of part 2. 2. on 2.6.24-rc5 kernel reports only the part 1, after try mount the disk it reports the part 2 and mount the partition as rw 3. on 2.6.24-rc5 kernel reports only the part 1, after try mount the disk it just mount the partition as ro with nothing more messages. You must have a typo there. Those can't both be true for 2.6.24-rc5. In fact you shouldn't see part 2 at all. Here's what I get when I plug in a USB mass-storage device under 2.6.24-rc5: [ 87.903014] usb-storage: device found at 2 [ 87.909570] scsi 0:0:0:0: Direct-Access Memorex TD 2B1.09 PQ: 0 ANSI: 0 CCS [ 87.913144] usb-storage: device scan complete [ 88.804031] sd 0:0:0:0: [sda] 243712 512-byte hardware sectors (125 MB) [ 88.805507] sd 0:0:0:0: [sda] Write Protect is off [ 88.805577] sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 [ 88.805639] sd 0:0:0:0: [sda] Assuming drive cache: write through [ 88.809526] sd 0:0:0:0: [sda] 243712 512-byte hardware sectors (125 MB) [ 88.810421] sd 0:0:0:0: [sda] Write Protect is off [ 88.810488] sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 [ 88.810575] sd 0:0:0:0: [sda] Assuming drive cache: write through [ 88.810641] sda: sda1 [ 88.812450] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 89.041014] sd 0:0:0:0: Attached scsi generic sg0 type 0 Mounting the disk produces no extra output at all. I get the same result under 2.6.23 and earlier operating systems. You should see approximately the same thing. Hi, Alan I'm sure about my post. I'm not so famillar with usb. It looks weird. Seems that my device will be firstly recoganized as a mp3 player and then a usb storage, so the system will report part 1 part 2 under previous kernels. Regards dave -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} - {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Dec 19, 2007 7:09 PM, Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 20 of December 2007, Christoph Lameter wrote: On Thu, 20 Dec 2007, Rafael J. Wysocki wrote: We could reexport drain_local_pages() again but then I do not understand why we would only drain the pages of this processor and not of all other processors as well. It seems that software suspend intend was to flush them all right? Well, not exactly. We are on one CPU at this point, the others have been disabled. Ok so the others are flush. Here is a patch to re-export drain_local_pages() again and use it for software suspend: Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/linux/gfp.h |1 + kernel/power/snapshot.c |2 +- mm/page_alloc.c |2 +- 3 files changed, 3 insertions(+), 2 deletions(-) Index: linux-2.6.24-rc5-mm1/kernel/power/snapshot.c === --- linux-2.6.24-rc5-mm1.orig/kernel/power/snapshot.c 2007-12-19 11:59:25.233961700 -0800 +++ linux-2.6.24-rc5-mm1/kernel/power/snapshot.c 2007-12-19 15:16:34.179661929 -0800 @@ -1203,7 +1203,7 @@ asmlinkage int swsusp_save(void) printk(KERN_INFO PM: Creating hibernation image: \n); - drain_all_pages(); + drain_local_pages(NULL); nr_pages = count_data_pages(); nr_highmem = count_highmem_pages(); printk(KERN_INFO PM: Need to copy %u pages\n, nr_pages + nr_highmem); You've omitted the second instance, right before the copy_data_pages() call. I will wait for a revised patch and then test. (Sorry for the duplicate message. I am resending because I accidentally sent an HTML message the first time. Whoops.) Index: linux-2.6.24-rc5-mm1/mm/page_alloc.c === --- linux-2.6.24-rc5-mm1.orig/mm/page_alloc.c 2007-12-19 12:01:00.630421258 -0800 +++ linux-2.6.24-rc5-mm1/mm/page_alloc.c 2007-12-19 15:12:19.850545818 -0800 @@ -930,7 +930,7 @@ static void drain_pages(unsigned int cpu /* * Spill all of this CPU's per-cpu pages back into the buddy allocator. */ -static void drain_local_pages(void *arg) +void drain_local_pages(void *arg) { drain_pages(smp_processor_id()); } Index: linux-2.6.24-rc5-mm1/include/linux/gfp.h === --- linux-2.6.24-rc5-mm1.orig/include/linux/gfp.h 2007-12-19 15:13:51.926950065 -0800 +++ linux-2.6.24-rc5-mm1/include/linux/gfp.h 2007-12-19 15:16:11.951564369 -0800 @@ -229,5 +229,6 @@ extern void FASTCALL(free_cold_page(stru void page_alloc_init(void); void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); void drain_all_pages(void); +void drain_local_pages(void *dummy); #endif /* __LINUX_GFP_H */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} - {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Thursday, 20 of December 2007, Miles Lane wrote: On Dec 19, 2007 7:09 PM, Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 20 of December 2007, Christoph Lameter wrote: On Thu, 20 Dec 2007, Rafael J. Wysocki wrote: We could reexport drain_local_pages() again but then I do not understand why we would only drain the pages of this processor and not of all other processors as well. It seems that software suspend intend was to flush them all right? Well, not exactly. We are on one CPU at this point, the others have been disabled. Ok so the others are flush. Here is a patch to re-export drain_local_pages() again and use it for software suspend: Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/linux/gfp.h |1 + kernel/power/snapshot.c |2 +- mm/page_alloc.c |2 +- 3 files changed, 3 insertions(+), 2 deletions(-) Index: linux-2.6.24-rc5-mm1/kernel/power/snapshot.c === --- linux-2.6.24-rc5-mm1.orig/kernel/power/snapshot.c 2007-12-19 11:59: 25.233961700 -0800 +++ linux-2.6.24-rc5-mm1/kernel/power/snapshot.c 2007-12-19 15:16: 34.179661929 -0800 @@ -1203,7 +1203,7 @@ asmlinkage int swsusp_save(void) printk(KERN_INFO PM: Creating hibernation image: \n); - drain_all_pages(); + drain_local_pages(NULL); nr_pages = count_data_pages(); nr_highmem = count_highmem_pages(); printk(KERN_INFO PM: Need to copy %u pages\n, nr_pages + nr_highmem); You've omitted the second instance, right before the copy_data_pages() call. I guess I will wait for a revised patch. There's an Andrew's fix on top of this one in -mm: http://marc.info/?l=linux-mm-commitsm=119810866812965w=2 Index: linux-2.6.24-rc5-mm1/mm/page_alloc.c === --- linux-2.6.24-rc5-mm1.orig/mm/page_alloc.c 2007-12-19 12:01: 00.630421258 -0800 +++ linux-2.6.24-rc5-mm1/mm/page_alloc.c 2007-12-19 15:12: 19.850545818 -0800 @@ -930,7 +930,7 @@ static void drain_pages(unsigned int cpu /* * Spill all of this CPU's per-cpu pages back into the buddy allocator. */ -static void drain_local_pages(void *arg) +void drain_local_pages(void *arg) { drain_pages(smp_processor_id()); } Index: linux-2.6.24-rc5-mm1/include/linux/gfp.h === --- linux-2.6.24-rc5-mm1.orig/include/linux/gfp.h 2007-12-19 15:13: 51.926950065 -0800 +++ linux-2.6.24-rc5-mm1/include/linux/gfp.h 2007-12-19 15:16: 11.951564369 -0800 @@ -229,5 +229,6 @@ extern void FASTCALL(free_cold_page(stru void page_alloc_init(void); void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); void drain_all_pages(void); +void drain_local_pages(void *dummy); #endif /* __LINUX_GFP_H */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1
Note carefully. This: 2. on 2.6.24-rc5 kernel reports only the part 1, after try mount the disk it reports the part 2 and mount the partition as rw contradicts this: 3. on 2.6.24-rc5 kernel reports only the part 1, after try mount the disk it just mount the partition as ro with nothing more messages. So which is correct? Hi, Alan I'm sure about my post. But your post contradicts itself. It can't be correct. I'm not so famillar with usb. It looks weird. Seems that my device will be firstly recoganized as a mp3 player and then a usb storage, so the system will report part 1 part 2 under previous kernels. I think those part 2 messages aren't caused by the kernel at all, but instead by some program running on your computer. You could try booting into single-user mode and see if the behavior changes. Also there's no question -- the device does behave strangely. It shouldn't change the write-protect setting all by itself. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1
On Dec 20, 2007 11:34 AM, Alan Stern [EMAIL PROTECTED] wrote: Note carefully. This: 2. on 2.6.24-rc5 kernel reports only the part 1, after try mount the disk it reports the part 2 and mount the partition as rw contradicts this: 3. on 2.6.24-rc5 kernel reports only the part 1, after try mount the disk it just mount the partition as ro with nothing more messages. Oh, sorry. It's a typo. should be 2.6.24-rc5-mm1 So which is correct? Hi, Alan I'm sure about my post. But your post contradicts itself. It can't be correct. I'm not so famillar with usb. It looks weird. Seems that my device will be firstly recoganized as a mp3 player and then a usb storage, so the system will report part 1 part 2 under previous kernels. I think those part 2 messages aren't caused by the kernel at all, but instead by some program running on your computer. You could try booting into single-user mode and see if the behavior changes. No doubt for me. Under osx plugin this device will popup a dialog(I don't remember the content), after press ok then the disk icon go away, and then being remount again. Also there's no question -- the device does behave strangely. It shouldn't change the write-protect setting all by itself. Yes, I think so too. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-hardirq-W} - {hardirq-on-W} usage -- pm-hibernate/9940 [HC0[0]:SC0[0]:HE1:SE1]
On Dec 19, 2007 8:31 PM, Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 20 of December 2007, Miles Lane wrote: On Dec 19, 2007 7:09 PM, Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 20 of December 2007, Christoph Lameter wrote: On Thu, 20 Dec 2007, Rafael J. Wysocki wrote: We could reexport drain_local_pages() again but then I do not understand why we would only drain the pages of this processor and not of all other processors as well. It seems that software suspend intend was to flush them all right? Well, not exactly. We are on one CPU at this point, the others have been disabled. Ok so the others are flush. Here is a patch to re-export drain_local_pages() again and use it for software suspend: Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/linux/gfp.h |1 + kernel/power/snapshot.c |2 +- mm/page_alloc.c |2 +- 3 files changed, 3 insertions(+), 2 deletions(-) Index: linux-2.6.24-rc5-mm1/kernel/power/snapshot.c === --- linux-2.6.24-rc5-mm1.orig/kernel/power/snapshot.c 2007-12-19 11:59: 25.233961700 -0800 +++ linux-2.6.24-rc5-mm1/kernel/power/snapshot.c 2007-12-19 15:16: 34.179661929 -0800 @@ -1203,7 +1203,7 @@ asmlinkage int swsusp_save(void) printk(KERN_INFO PM: Creating hibernation image: \n); - drain_all_pages(); + drain_local_pages(NULL); nr_pages = count_data_pages(); nr_highmem = count_highmem_pages(); printk(KERN_INFO PM: Need to copy %u pages\n, nr_pages + nr_highmem); You've omitted the second instance, right before the copy_data_pages() call. I guess I will wait for a revised patch. There's an Andrew's fix on top of this one in -mm: http://marc.info/?l=linux-mm-commitsm=119810866812965w=2 Index: linux-2.6.24-rc5-mm1/mm/page_alloc.c === --- linux-2.6.24-rc5-mm1.orig/mm/page_alloc.c 2007-12-19 12:01: 00.630421258 -0800 +++ linux-2.6.24-rc5-mm1/mm/page_alloc.c 2007-12-19 15:12: 19.850545818 -0800 @@ -930,7 +930,7 @@ static void drain_pages(unsigned int cpu /* * Spill all of this CPU's per-cpu pages back into the buddy allocator. */ -static void drain_local_pages(void *arg) +void drain_local_pages(void *arg) { drain_pages(smp_processor_id()); } Index: linux-2.6.24-rc5-mm1/include/linux/gfp.h === --- linux-2.6.24-rc5-mm1.orig/include/linux/gfp.h 2007-12-19 15:13: 51.926950065 -0800 +++ linux-2.6.24-rc5-mm1/include/linux/gfp.h 2007-12-19 15:16: 11.951564369 -0800 @@ -229,5 +229,6 @@ extern void FASTCALL(free_cold_page(stru void page_alloc_init(void); void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); void drain_all_pages(void); +void drain_local_pages(void *dummy); #endif /* __LINUX_GFP_H */ I applied Christoph and Andrew's patches and recompiled. I suspended to disk and to ram several times and all looks good. Miles -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1
On Dec 17, 2007 9:14 AM, Dave Young <[EMAIL PROTECTED]> wrote: > On Dec 14, 2007 11:44 PM, Alan Stern <[EMAIL PROTECTED]> wrote: > > On Fri, 14 Dec 2007, Dave Young wrote: > > > > > Hi, > > > The behaviour of my mp3 player (also act as usb-storage device) seems > > > changed from rc5 to rc5-mm1. > > > > This can't be considered a bug, right? > > I'm not sure. > > > > It's just that the player > > changed from one slightly non-standard behavior to a different slightly > > non-standard behavior. > > > > > > > : > > > = > > > usb 1-7: new high speed USB device using ehci_hcd and address 7 > > > usb 1-7: configuration #1 chosen from 1 choice > > > scsi4 : SCSI emulation for USB Mass Storage devices > > > usb-storage: device found at 7 > > > usb-storage: waiting for device to settle before scanning > > > scsi 4:0:0:0: Direct-Access Newman mp3 PQ: 0 > > > ANSI: 0 CCS > > > sd 4:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) > > > sd 4:0:0:0: [sdb] Write Protect is on > > > sd 4:0:0:0: [sdb] Mode Sense: 03 00 80 00 > > > sd 4:0:0:0: [sdb] Assuming drive cache: write through > > > sd 4:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) > > > sd 4:0:0:0: [sdb] Write Protect is on > > > sd 4:0:0:0: [sdb] Mode Sense: 03 00 80 00 > > > sd 4:0:0:0: [sdb] Assuming drive cache: write through > > > sdb: sdb1 > > > sd 4:0:0:0: [sdb] Attached SCSI removable disk > > > sd 4:0:0:0: Attached scsi generic sg1 type 0 > > > usb-storage: device scan complete > > > > > > == > > > try mount it (or just blockdev --rereadpt), then write protect become off: > > > == > > > > > > sd 4:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) > > > sd 4:0:0:0: [sdb] Write Protect is off > > > sd 4:0:0:0: [sdb] Mode Sense: 03 00 00 00 > > > sd 4:0:0:0: [sdb] Assuming drive cache: write through > > > sd 4:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) > > > sd 4:0:0:0: [sdb] Write Protect is off > > > sd 4:0:0:0: [sdb] Mode Sense: 03 00 00 00 > > > sd 4:0:0:0: [sdb] Assuming drive cache: write through > > > sdb: sdb1 > > > > This output won't appear if you simply mount the device. So how do you > > know that mounting turns off write protect? > > This can be observed by eye: > dmesg -> mount -> dmesg > > > > > > But under rc5-mm1, after mount command being executed, it is just > > > mouted as read only partition without set the write-protect to off > > > > > > I tried "blockdev --rereadpt", it do set the write-protect to off as rc5 > > > kernel. > > > > > > Below is the output of dmesg under rc5-mm1 > > > == > > > usb 1-8: new high speed USB device using ehci_hcd and address 6 > > > usb 1-8: configuration #1 chosen from 1 choice > > > scsi3 : SCSI emulation for USB Mass Storage devices > > > usb-storage: device found at 6 > > > usb-storage: waiting for device to settle before scanning > > > scsi 3:0:0:0: Direct-Access Newman mp3 PQ: 0 > > > ANSI: 0 CCS > > > sd 3:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) > > > sd 3:0:0:0: [sdb] Write Protect is on > > > sd 3:0:0:0: [sdb] Mode Sense: 03 00 80 00 > > > sd 3:0:0:0: [sdb] Assuming drive cache: write through > > > sd 3:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) > > > sd 3:0:0:0: [sdb] Write Protect is on > > > sd 3:0:0:0: [sdb] Mode Sense: 03 00 80 00 > > > sd 3:0:0:0: [sdb] Assuming drive cache: write through > > > sdb: sdb1 > > > > This looks exactly the same as the output above (except for various > > port, device, and bus numbers). > > Yes, but lacks the part of "'Write Protect if off' and other lines". > > > > > If you turn on CONFIG_USB_STORAGE_DEBUG for both kernels and compare > > the dmesg output for the mount command, that might highlight the > > difference. > > Ok, I will test with do once have time, thanks. > There's not useful infomation with DEBUG on. I tested on another machine with kernel 2.6.24-rc2. And the result is diffrent again. Here is the result: 1. on 2.6.24-rc2, when I plugin the player the kernel reports below messages: usb-storage: waiting for device to settle before scanning /*[lets mark the below part as part 1]*/ scsi 0:0:0:0: Direct-Access Newman mp3 PQ: 0 ANSI: 0 CCS sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) sd 0:0:0:0: [sda] Write Protect is on sd 0:0:0:0: [sda] Mode Sense: 03 00 80 00 sd 0:0:0:0: [sda] Assuming drive cache: write through sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) sd 0:0:0:0: [sda] Write Protect is on sd 0:0:0:0: [sda] Mode Sense: 03 00 80 00 sd 0:0:0:0: [sda] Assuming drive cache: write through sda: sda1 /*[lets mark the below part as part 2]*/ sd 0:0:0:0: [sda] Attached SCSI removable disk usb-storage: device scan complete sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 sd 0:0:0:0: [sda] Assuming drive cache: write through sd 0:0:0:0: [sda] 245248 512-byte
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
On Tue, Dec 18, 2007 at 06:04:58PM -0800, Andrew Morton wrote: > Nobody seems to look after hppfs. I'll resend the fat and hostfs patches to > maintainers for a review, please. It's mine - I'll take a look at it. Jeff -- Work email - jdike at linux dot intel dot com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
On Wed, 19 Dec 2007 01:22:21 + David Howells <[EMAIL PROTECTED]> wrote: > Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > - inode = ERR_PTR(ret); > > > + return NULL; > > > } else { > > > unlock_new_inode(inode); > > > } > > > > > > > Yup. > > Nope. The correct fix is to make the various callers use IS_ERR() to check > the result of this function rather than checking for a NULL return. > > > David, this is concerning. More such error-path bugs in that code will take > > years and years to get found and fixed. > > Yes, I know. I've looked over the patches several times, however I know there > may be bugs in there because I may have made assumptions about what I've > written that cause me to overlook things. It's a danger of checking your own > code:-( > > > The best way to eliminate them is a line-by-line re-review of the patchset. > > And ideally by someone other than me. Some of them have been reviewed by > other people, but I'm not sure that all have. > > However, I've just had another look through. ISOFS appears to be the only one > in which I'd missed updating the callers. I've sent you a patch for it. > > Note that I expressed reservations about three filesystems in the cover note > (FAT, HPPFS and HOSTFS), but nothing seems to have come of it. > Nobody seems to look after hppfs. I'll resend the fat and hostfs patches to maintainers for a review, please. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
On Dec 19, 2007 9:22 AM, David Howells <[EMAIL PROTECTED]> wrote: > Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > - inode = ERR_PTR(ret); > > > + return NULL; > > > } else { > > > unlock_new_inode(inode); > > > } > > > > > > > Yup. > > Nope. The correct fix is to make the various callers use IS_ERR() to check > the result of this function rather than checking for a NULL return. > > > David, this is concerning. More such error-path bugs in that code will take > > years and years to get found and fixed. > > Yes, I know. I've looked over the patches several times, however I know there > may be bugs in there because I may have made assumptions about what I've > written that cause me to overlook things. It's a danger of checking your own > code:-( > > > The best way to eliminate them is a line-by-line re-review of the patchset. > > And ideally by someone other than me. Some of them have been reviewed by > other people, but I'm not sure that all have. > > However, I've just had another look through. ISOFS appears to be the only one > in which I'd missed updating the callers. I've sent you a patch for it. > > Note that I expressed reservations about three filesystems in the cover note > (FAT, HPPFS and HOSTFS), but nothing seems to have come of it. > Hi, The oops is at iput, I use 'return NULL ' is because I don't want to change the the behaviour of iput in fs/inode.c. Regards dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
Andrew Morton <[EMAIL PROTECTED]> wrote: > > - inode = ERR_PTR(ret); > > + return NULL; > > } else { > > unlock_new_inode(inode); > > } > > > > Yup. Nope. The correct fix is to make the various callers use IS_ERR() to check the result of this function rather than checking for a NULL return. > David, this is concerning. More such error-path bugs in that code will take > years and years to get found and fixed. Yes, I know. I've looked over the patches several times, however I know there may be bugs in there because I may have made assumptions about what I've written that cause me to overlook things. It's a danger of checking your own code:-( > The best way to eliminate them is a line-by-line re-review of the patchset. And ideally by someone other than me. Some of them have been reviewed by other people, but I'm not sure that all have. However, I've just had another look through. ISOFS appears to be the only one in which I'd missed updating the callers. I've sent you a patch for it. Note that I expressed reservations about three filesystems in the cover note (FAT, HPPFS and HOSTFS), but nothing seems to have come of it. David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
(Adding Dave Howells, his name is on iget-stop-isofs-from-using-read_inode.patch) On Tue, 18 Dec 2007 10:37:32 +0800, Dave Young said: > > I don't mind it failing the mount, but the oops seems excessive. I suspect > > that *somewhere* in that stack trace, we're wanting something like a > > > > if (!foo_ptr) > > return -EIO; > > > > but I admit not being competent enough to decide where that should be. > > > > Hi, > Could you please try the below patch: > > Signed-off-by: Dave Young <[EMAIL PROTECTED]> > > --- > fs/isofs/inode.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) With that patch applied, I took the ISO image (which I ended up reading on another machine and copying over the net to get a complete usable image), and dd'ed just the first 150M into another file, and tried to loopback mount it. And I got: # mount -o ro,loop /path/to/cd.partial.image /mnt/loop mount: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so And my dmesg says: [ 33.622073] ISO 9660 Extensions: Microsoft Joliet Level 3 [ 33.622125] attempt to access beyond end of device [ 33.622129] loop0: rw=0, want=1284500, limit=30 [ 33.622133] ISOFS: unable to read i-node block Here is where we would oops before - now it errors out more reasonably: [ 33.622140] ISOFS: changing to secondary root [ 33.622148] attempt to access beyond end of device [ 33.622151] loop0: rw=0, want=1284508, limit=30 [ 33.622155] ISOFS: unable to read i-node block [ 33.622159] isofs_fill_super: get root inode failed So that fixes *this* bug. I looked in the -rc5-mm1 broken-out/, saw the vast multitudes of 'iget-stop--from-using' patches, and decided that somebody else will probably have to audit them for sanity. In the iget-* series, there's some 184 'return ERR_PTR(-E);' for some FOO, and 3 other uses: % grep ERR_PTR iget* | grep -v return iget-stop-isofs-from-using-read_inode.patch:+ inode = ERR_PTR(ret); iget-stop-jfs-from-using-iget-and-read_inode-try.patch:+parent = ERR_PTR(-ENOMEM); iget-stop-jfs-from-using-iget-and-read_inode-try.patch:-parent = ERR_PTR(-EACCES); iget-stop-jfs-from-using-iget-and-read_inode-try.patch:-parent = ERR_PTR(-ENOMEM); isofs is the only place we don't return a constant 'ERR_PTR(-EFOO)', but cast somebody else's return value. I wish I understood what that tells us. ;) pgppwUchT0vXx.pgp Description: PGP signature
Re: 2.6.24-rc5-mm1 - IPv6 throws section mismatches.
[EMAIL PROTECTED] wrote: On Thu, 13 Dec 2007 02:40:50 PST, Andrew Morton said: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc5/2.6.24-rc5-mm1/ git-net.patch (I'm guessing one of Daniel's commits, but not sure which one) causes some complaints: LD vmlinux.o MODPOST vmlinux.o WARNING: vmlinux.o(.init.text+0x2263f): Section mismatch: reference to .exit.text:tcpv6_exit (between 'inet6_init' and 'ac6_proc_init') WARNING: vmlinux.o(.init.text+0x22644): Section mismatch: reference to .exit.text:udplitev6_exit (between 'inet6_init' and 'ac6_proc_init') WARNING: vmlinux.o(.init.text+0x22649): Section mismatch: reference to .exit.text:udpv6_exit (between 'inet6_init' and 'ac6_proc_init') WARNING: vmlinux.o(.init.text+0x22658): Section mismatch: reference to .exit.text:addrconf_cleanup (between 'inet6_init' and 'ac6_proc_init') WARNING: vmlinux.o(.init.text+0x226bc): Section mismatch: reference to .exit.text:rawv6_exit (between 'inet6_init' and 'ac6_proc_init') Looks like the problem is that tcpv6_exit and friends are called from net/ipv6/af_inet6.c:inet6_init() - which is declared as: static int __init inet6_init(void) I can see how calling an __exit from an __init would be Bad Juju... Yep, thanks Valdis for pointing that. I sent a patch several days ago which fix that to DaveM and he applied it to the latest net-2.6.25 -- Sauf indication contraire ci-dessus: Compagnie IBM France Siège Social : Tour Descartes, 2, avenue Gambetta, La Défense 5, 92400 Courbevoie RCS Nanterre 552 118 465 Forme Sociale : S.A.S. Capital Social : 542.737.118 ? SIREN/SIRET : 552 118 465 02430 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- INFO: possible circular locking dependency detected -- pm-suspend/5800 is trying to acquire lock
> Sorry. GMail doesn't support sending unwrapped text, as far as I can > tell. I will send the log segment to you as an attachment. Also, > when I sent my .config inline to Andrew recently, it tripped his spam > filter. I'll attach it as well. Thanks. This is a bug in iwlwifi. The problem is actually another case where my workqueue debugging with lockdep is triggering a warning :)) Here's the thing: iwl3945_cancel_deferred_work does cancel_delayed_work_sync(>init_alive_start); (which is the "(&(>init_alive_start)->work)" lock) but it is called from within a locked section of mutex_lock(>mutex); (locked from iwl3945_pci_suspend) On the other hand, the task that runs from the init_alive_start workqueue is iwl3945_bg_init_alive_start() which will lock the same mutex. So the deadlock condition is that you can be in cancel_delayed_work_sync() above while the mutex is locked, and be waiting for iwl_3945_bg_init_alive_start() which tries to lock the mutex. johannes signature.asc Description: This is a digitally signed message part
Re: 2.6.24-rc5-mm1 -- INFO: possible circular locking dependency detected -- pm-suspend/5800 is trying to acquire lock
On Tue, 2007-12-18 at 09:03 -0500, Miles Lane wrote: > I have only seen this happen once, and cannot reproduce it. I'll keep > trying, though. > > Dec 16 22:10:48 syntropy kernel: [ 231.718023] > === Do you have a version that isn't line-wrapped before I try to unwrap it? Thanks, johannes signature.asc Description: This is a digitally signed message part
Re: 2.6.24-rc5-mm1 -- INFO: possible circular locking dependency detected -- pm-suspend/5800 is trying to acquire lock
On Tue, 2007-12-18 at 09:03 -0500, Miles Lane wrote: I have only seen this happen once, and cannot reproduce it. I'll keep trying, though. Dec 16 22:10:48 syntropy kernel: [ 231.718023] === Do you have a version that isn't line-wrapped before I try to unwrap it? Thanks, johannes signature.asc Description: This is a digitally signed message part
Re: 2.6.24-rc5-mm1 -- INFO: possible circular locking dependency detected -- pm-suspend/5800 is trying to acquire lock
Sorry. GMail doesn't support sending unwrapped text, as far as I can tell. I will send the log segment to you as an attachment. Also, when I sent my .config inline to Andrew recently, it tripped his spam filter. I'll attach it as well. Thanks. This is a bug in iwlwifi. The problem is actually another case where my workqueue debugging with lockdep is triggering a warning :)) Here's the thing: iwl3945_cancel_deferred_work does cancel_delayed_work_sync(priv-init_alive_start); (which is the ((priv-init_alive_start)-work) lock) but it is called from within a locked section of mutex_lock(priv-mutex); (locked from iwl3945_pci_suspend) On the other hand, the task that runs from the init_alive_start workqueue is iwl3945_bg_init_alive_start() which will lock the same mutex. So the deadlock condition is that you can be in cancel_delayed_work_sync() above while the mutex is locked, and be waiting for iwl_3945_bg_init_alive_start() which tries to lock the mutex. johannes signature.asc Description: This is a digitally signed message part
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
(Adding Dave Howells, his name is on iget-stop-isofs-from-using-read_inode.patch) On Tue, 18 Dec 2007 10:37:32 +0800, Dave Young said: I don't mind it failing the mount, but the oops seems excessive. I suspect that *somewhere* in that stack trace, we're wanting something like a if (!foo_ptr) return -EIO; but I admit not being competent enough to decide where that should be. Hi, Could you please try the below patch: Signed-off-by: Dave Young [EMAIL PROTECTED] --- fs/isofs/inode.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) With that patch applied, I took the ISO image (which I ended up reading on another machine and copying over the net to get a complete usable image), and dd'ed just the first 150M into another file, and tried to loopback mount it. And I got: # mount -o ro,loop /path/to/cd.partial.image /mnt/loop mount: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so And my dmesg says: [ 33.622073] ISO 9660 Extensions: Microsoft Joliet Level 3 [ 33.622125] attempt to access beyond end of device [ 33.622129] loop0: rw=0, want=1284500, limit=30 [ 33.622133] ISOFS: unable to read i-node block Here is where we would oops before - now it errors out more reasonably: [ 33.622140] ISOFS: changing to secondary root [ 33.622148] attempt to access beyond end of device [ 33.622151] loop0: rw=0, want=1284508, limit=30 [ 33.622155] ISOFS: unable to read i-node block [ 33.622159] isofs_fill_super: get root inode failed So that fixes *this* bug. I looked in the -rc5-mm1 broken-out/, saw the vast multitudes of 'iget-stop-foofs-from-using' patches, and decided that somebody else will probably have to audit them for sanity. In the iget-* series, there's some 184 'return ERR_PTR(-EFOO);' for some FOO, and 3 other uses: % grep ERR_PTR iget* | grep -v return iget-stop-isofs-from-using-read_inode.patch:+ inode = ERR_PTR(ret); iget-stop-jfs-from-using-iget-and-read_inode-try.patch:+parent = ERR_PTR(-ENOMEM); iget-stop-jfs-from-using-iget-and-read_inode-try.patch:-parent = ERR_PTR(-EACCES); iget-stop-jfs-from-using-iget-and-read_inode-try.patch:-parent = ERR_PTR(-ENOMEM); isofs is the only place we don't return a constant 'ERR_PTR(-EFOO)', but cast somebody else's return value. I wish I understood what that tells us. ;) pgppwUchT0vXx.pgp Description: PGP signature
Re: 2.6.24-rc5-mm1 - IPv6 throws section mismatches.
[EMAIL PROTECTED] wrote: On Thu, 13 Dec 2007 02:40:50 PST, Andrew Morton said: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc5/2.6.24-rc5-mm1/ git-net.patch (I'm guessing one of Daniel's commits, but not sure which one) causes some complaints: LD vmlinux.o MODPOST vmlinux.o WARNING: vmlinux.o(.init.text+0x2263f): Section mismatch: reference to .exit.text:tcpv6_exit (between 'inet6_init' and 'ac6_proc_init') WARNING: vmlinux.o(.init.text+0x22644): Section mismatch: reference to .exit.text:udplitev6_exit (between 'inet6_init' and 'ac6_proc_init') WARNING: vmlinux.o(.init.text+0x22649): Section mismatch: reference to .exit.text:udpv6_exit (between 'inet6_init' and 'ac6_proc_init') WARNING: vmlinux.o(.init.text+0x22658): Section mismatch: reference to .exit.text:addrconf_cleanup (between 'inet6_init' and 'ac6_proc_init') WARNING: vmlinux.o(.init.text+0x226bc): Section mismatch: reference to .exit.text:rawv6_exit (between 'inet6_init' and 'ac6_proc_init') Looks like the problem is that tcpv6_exit and friends are called from net/ipv6/af_inet6.c:inet6_init() - which is declared as: static int __init inet6_init(void) I can see how calling an __exit from an __init would be Bad Juju... Yep, thanks Valdis for pointing that. I sent a patch several days ago which fix that to DaveM and he applied it to the latest net-2.6.25 -- Sauf indication contraire ci-dessus: Compagnie IBM France Siège Social : Tour Descartes, 2, avenue Gambetta, La Défense 5, 92400 Courbevoie RCS Nanterre 552 118 465 Forme Sociale : S.A.S. Capital Social : 542.737.118 ? SIREN/SIRET : 552 118 465 02430 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
Andrew Morton [EMAIL PROTECTED] wrote: - inode = ERR_PTR(ret); + return NULL; } else { unlock_new_inode(inode); } Yup. Nope. The correct fix is to make the various callers use IS_ERR() to check the result of this function rather than checking for a NULL return. David, this is concerning. More such error-path bugs in that code will take years and years to get found and fixed. Yes, I know. I've looked over the patches several times, however I know there may be bugs in there because I may have made assumptions about what I've written that cause me to overlook things. It's a danger of checking your own code:-( The best way to eliminate them is a line-by-line re-review of the patchset. And ideally by someone other than me. Some of them have been reviewed by other people, but I'm not sure that all have. However, I've just had another look through. ISOFS appears to be the only one in which I'd missed updating the callers. I've sent you a patch for it. Note that I expressed reservations about three filesystems in the cover note (FAT, HPPFS and HOSTFS), but nothing seems to have come of it. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
On Dec 19, 2007 9:22 AM, David Howells [EMAIL PROTECTED] wrote: Andrew Morton [EMAIL PROTECTED] wrote: - inode = ERR_PTR(ret); + return NULL; } else { unlock_new_inode(inode); } Yup. Nope. The correct fix is to make the various callers use IS_ERR() to check the result of this function rather than checking for a NULL return. David, this is concerning. More such error-path bugs in that code will take years and years to get found and fixed. Yes, I know. I've looked over the patches several times, however I know there may be bugs in there because I may have made assumptions about what I've written that cause me to overlook things. It's a danger of checking your own code:-( The best way to eliminate them is a line-by-line re-review of the patchset. And ideally by someone other than me. Some of them have been reviewed by other people, but I'm not sure that all have. However, I've just had another look through. ISOFS appears to be the only one in which I'd missed updating the callers. I've sent you a patch for it. Note that I expressed reservations about three filesystems in the cover note (FAT, HPPFS and HOSTFS), but nothing seems to have come of it. Hi, The oops is at iput, I use 'return NULL ' is because I don't want to change the the behaviour of iput in fs/inode.c. Regards dave -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
On Wed, 19 Dec 2007 01:22:21 + David Howells [EMAIL PROTECTED] wrote: Andrew Morton [EMAIL PROTECTED] wrote: - inode = ERR_PTR(ret); + return NULL; } else { unlock_new_inode(inode); } Yup. Nope. The correct fix is to make the various callers use IS_ERR() to check the result of this function rather than checking for a NULL return. David, this is concerning. More such error-path bugs in that code will take years and years to get found and fixed. Yes, I know. I've looked over the patches several times, however I know there may be bugs in there because I may have made assumptions about what I've written that cause me to overlook things. It's a danger of checking your own code:-( The best way to eliminate them is a line-by-line re-review of the patchset. And ideally by someone other than me. Some of them have been reviewed by other people, but I'm not sure that all have. However, I've just had another look through. ISOFS appears to be the only one in which I'd missed updating the callers. I've sent you a patch for it. Note that I expressed reservations about three filesystems in the cover note (FAT, HPPFS and HOSTFS), but nothing seems to have come of it. Nobody seems to look after hppfs. I'll resend the fat and hostfs patches to maintainers for a review, please. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
On Tue, Dec 18, 2007 at 06:04:58PM -0800, Andrew Morton wrote: Nobody seems to look after hppfs. I'll resend the fat and hostfs patches to maintainers for a review, please. It's mine - I'll take a look at it. Jeff -- Work email - jdike at linux dot intel dot com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1
On Dec 17, 2007 9:14 AM, Dave Young [EMAIL PROTECTED] wrote: On Dec 14, 2007 11:44 PM, Alan Stern [EMAIL PROTECTED] wrote: On Fri, 14 Dec 2007, Dave Young wrote: Hi, The behaviour of my mp3 player (also act as usb-storage device) seems changed from rc5 to rc5-mm1. This can't be considered a bug, right? I'm not sure. It's just that the player changed from one slightly non-standard behavior to a different slightly non-standard behavior. dmesg output under rc5: = usb 1-7: new high speed USB device using ehci_hcd and address 7 usb 1-7: configuration #1 chosen from 1 choice scsi4 : SCSI emulation for USB Mass Storage devices usb-storage: device found at 7 usb-storage: waiting for device to settle before scanning scsi 4:0:0:0: Direct-Access Newman mp3 PQ: 0 ANSI: 0 CCS sd 4:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) sd 4:0:0:0: [sdb] Write Protect is on sd 4:0:0:0: [sdb] Mode Sense: 03 00 80 00 sd 4:0:0:0: [sdb] Assuming drive cache: write through sd 4:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) sd 4:0:0:0: [sdb] Write Protect is on sd 4:0:0:0: [sdb] Mode Sense: 03 00 80 00 sd 4:0:0:0: [sdb] Assuming drive cache: write through sdb: sdb1 sd 4:0:0:0: [sdb] Attached SCSI removable disk sd 4:0:0:0: Attached scsi generic sg1 type 0 usb-storage: device scan complete == try mount it (or just blockdev --rereadpt), then write protect become off: == sd 4:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) sd 4:0:0:0: [sdb] Write Protect is off sd 4:0:0:0: [sdb] Mode Sense: 03 00 00 00 sd 4:0:0:0: [sdb] Assuming drive cache: write through sd 4:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) sd 4:0:0:0: [sdb] Write Protect is off sd 4:0:0:0: [sdb] Mode Sense: 03 00 00 00 sd 4:0:0:0: [sdb] Assuming drive cache: write through sdb: sdb1 This output won't appear if you simply mount the device. So how do you know that mounting turns off write protect? This can be observed by eye: dmesg - mount - dmesg But under rc5-mm1, after mount command being executed, it is just mouted as read only partition without set the write-protect to off I tried blockdev --rereadpt, it do set the write-protect to off as rc5 kernel. Below is the output of dmesg under rc5-mm1 == usb 1-8: new high speed USB device using ehci_hcd and address 6 usb 1-8: configuration #1 chosen from 1 choice scsi3 : SCSI emulation for USB Mass Storage devices usb-storage: device found at 6 usb-storage: waiting for device to settle before scanning scsi 3:0:0:0: Direct-Access Newman mp3 PQ: 0 ANSI: 0 CCS sd 3:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) sd 3:0:0:0: [sdb] Write Protect is on sd 3:0:0:0: [sdb] Mode Sense: 03 00 80 00 sd 3:0:0:0: [sdb] Assuming drive cache: write through sd 3:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) sd 3:0:0:0: [sdb] Write Protect is on sd 3:0:0:0: [sdb] Mode Sense: 03 00 80 00 sd 3:0:0:0: [sdb] Assuming drive cache: write through sdb: sdb1 This looks exactly the same as the output above (except for various port, device, and bus numbers). Yes, but lacks the part of 'Write Protect if off' and other lines. If you turn on CONFIG_USB_STORAGE_DEBUG for both kernels and compare the dmesg output for the mount command, that might highlight the difference. Ok, I will test with do once have time, thanks. There's not useful infomation with DEBUG on. I tested on another machine with kernel 2.6.24-rc2. And the result is diffrent again. Here is the result: 1. on 2.6.24-rc2, when I plugin the player the kernel reports below messages: usb-storage: waiting for device to settle before scanning /*[lets mark the below part as part 1]*/ scsi 0:0:0:0: Direct-Access Newman mp3 PQ: 0 ANSI: 0 CCS sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) sd 0:0:0:0: [sda] Write Protect is on sd 0:0:0:0: [sda] Mode Sense: 03 00 80 00 sd 0:0:0:0: [sda] Assuming drive cache: write through sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) sd 0:0:0:0: [sda] Write Protect is on sd 0:0:0:0: [sda] Mode Sense: 03 00 80 00 sd 0:0:0:0: [sda] Assuming drive cache: write through sda: sda1 /*[lets mark the below part as part 2]*/ sd 0:0:0:0: [sda] Attached SCSI removable disk usb-storage: device scan complete sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 sd 0:0:0:0: [sda] Assuming drive cache: write through sd 0:0:0:0: [sda] 245248 512-byte hardware sectors (126 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00 sd 0:0:0:0: [sda] Assuming drive cache: write through sda: sda1 2. on 2.6.24-rc5 kernel reports only the part 1, after try
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
On Tue, 18 Dec 2007 10:37:32 +0800 Dave Young <[EMAIL PROTECTED]> wrote: > On Mon, Dec 17, 2007 at 09:07:56PM -0500, [EMAIL PROTECTED] wrote: > > On Mon, 17 Dec 2007 14:56:44 PST, Andrew Morton said: > > > > (Adding Al Viro to the list, he's listed as "file systems" and MAINTAINERS > > doesn't list 'isofs' anyplace. Will Al or Andrew please vector to whoever > > actually does that code?) > > > > > > I try it again, and it reports it died at the same exact place, but in > > > > about > > > > 2 seconds flat, and reports 91M/sec transfer. OK, that's *weird*, I > > > > didn't > > > > think that blocks read from /dev/cdrom would get cached, but OK. > > > > > > It'll remain cached if something is holding the device open. > > > > Does it need to be "device open", or are there other things as well? If the > > drop_cache was hosed, that would result in the same symptoms, no? > > > > > Something's holding s_umount for writing I guess. Possibly busted error > > > handling somewhere totally different. > > > > Aha - found what was holding it - an attempt to loopback mount the truncated > > file (before I realized it was truncated) had failed - I had gotten a > > 'Killed' > > back from the mount, but I didn't realize it had pulled an actual oops: > > > > Dec 17 15:54:33 turing-police kernel: [14503.402385] attempt to access > > beyond end of device > > Dec 17 15:54:33 turing-police kernel: [14503.402391] loop1: rw=0, > > want=1284500, limit=314240 > > Dec 17 15:54:33 turing-police kernel: [14503.402395] ISOFS: unable to read > > i-node block > > Dec 17 15:54:33 turing-police kernel: [14503.402428] Unable to handle > > kernel NULL pointer dereference at 010b RIP: > > Dec 17 15:54:33 turing-police kernel: [14503.402440] [] > > iput+0x11/0x80 > > ... > > Dec 17 15:54:33 turing-police kernel: [14503.403008] Call Trace: > > Dec 17 15:54:33 turing-police kernel: [14503.403026] [] > > isofs_fill_super+0x7e9/0xa6b > > Dec 17 15:54:33 turing-police kernel: [14503.403045] [] > > __down_write_nested+0x3d/0xa1 > > Dec 17 15:54:33 turing-police kernel: [14503.403061] [] > > __down_write+0xb/0xd > > Dec 17 15:54:33 turing-police kernel: [14503.403076] [] > > sget+0x397/0x3a9 > > Dec 17 15:54:33 turing-police kernel: [14503.403090] [] > > set_bdev_super+0x0/0x14 > > Dec 17 15:54:33 turing-police kernel: [14503.403106] [] > > get_sb_bdev+0x109/0x157 > > Dec 17 15:54:33 turing-police kernel: [14503.403120] [] > > isofs_fill_super+0x0/0xa6b > > Dec 17 15:54:33 turing-police kernel: [14503.403138] [] > > isofs_get_sb+0x13/0x15 > > Dec 17 15:54:33 turing-police kernel: [14503.403151] [] > > vfs_kern_mount+0x90/0x11a > > Dec 17 15:54:33 turing-police kernel: [14503.403167] [] > > do_kern_mount+0x47/0xe3 > > Dec 17 15:54:33 turing-police kernel: [14503.403183] [] > > do_mount+0x717/0x78a > > Dec 17 15:54:33 turing-police kernel: [14503.403199] [] > > _read_lock_irq+0x9/0xb > > Dec 17 15:54:33 turing-police kernel: [14503.403212] [] > > find_lock_page+0x8c/0x97 > > Dec 17 15:54:33 turing-police kernel: [14503.403227] [] > > filemap_fault+0x1fa/0x3c6 > > Dec 17 15:54:33 turing-police kernel: [14503.403241] [] > > unlock_page+0x2d/0x31 > > Dec 17 15:54:33 turing-police kernel: [14503.403254] [] > > __do_fault+0x38d/0x3c3 > > Dec 17 15:54:33 turing-police kernel: [14503.403274] [] > > handle_mm_fault+0x36d/0x6e9 > > Dec 17 15:54:33 turing-police kernel: [14503.403293] [] > > __alloc_pages+0x68/0x2f6 > > Dec 17 15:54:33 turing-police kernel: [14503.403314] [] > > sys_mount+0x89/0xcb > > Dec 17 15:54:33 turing-police kernel: [14503.403328] [] > > syscall_trace_enter+0x97/0x9b > > Dec 17 15:54:33 turing-police kernel: [14503.403344] [] > > tracesys+0xdc/0xe1 > > Dec 17 15:54:33 turing-police kernel: [14503.403359] > > Dec 17 15:54:33 turing-police kernel: [14503.403366] > > Dec 17 15:54:33 turing-police kernel: [14503.403367] Code: 48 8b 87 10 01 > > 00 00 48 83 bf 38 02 00 00 40 48 8b 40 38 75 > > > > I don't mind it failing the mount, but the oops seems excessive. I suspect > > that *somewhere* in that stack trace, we're wanting something like a > > > > if (!foo_ptr) > > return -EIO; > > > > but I admit not being competent enough to decide where that should be. > > > > Hi, > Could you please try the below patch: > > Signed-off-by: Dave Young <[EMAIL PROTECTED]> > > --- > fs/isofs/inode.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff -upr linux/fs/isofs/inode.c linux.new/fs/isofs/inode.c > --- linux/fs/isofs/inode.c2007-12-18 10:31:12.0 +0800 > +++ linux.new/fs/isofs/inode.c2007-12-18 10:31:56.0 +0800 > @@ -1414,7 +1414,7 @@ struct inode *isofs_iget(struct super_bl > ret = isofs_read_inode(inode); > if (ret < 0) { > iget_failed(inode); > - inode = ERR_PTR(ret); > + return NULL; > } else { >
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
On Mon, Dec 17, 2007 at 09:07:56PM -0500, [EMAIL PROTECTED] wrote: > On Mon, 17 Dec 2007 14:56:44 PST, Andrew Morton said: > > (Adding Al Viro to the list, he's listed as "file systems" and MAINTAINERS > doesn't list 'isofs' anyplace. Will Al or Andrew please vector to whoever > actually does that code?) > > > > I try it again, and it reports it died at the same exact place, but in > > > about > > > 2 seconds flat, and reports 91M/sec transfer. OK, that's *weird*, I > > > didn't > > > think that blocks read from /dev/cdrom would get cached, but OK. > > > > It'll remain cached if something is holding the device open. > > Does it need to be "device open", or are there other things as well? If the > drop_cache was hosed, that would result in the same symptoms, no? > > > Something's holding s_umount for writing I guess. Possibly busted error > > handling somewhere totally different. > > Aha - found what was holding it - an attempt to loopback mount the truncated > file (before I realized it was truncated) had failed - I had gotten a 'Killed' > back from the mount, but I didn't realize it had pulled an actual oops: > > Dec 17 15:54:33 turing-police kernel: [14503.402385] attempt to access beyond > end of device > Dec 17 15:54:33 turing-police kernel: [14503.402391] loop1: rw=0, > want=1284500, limit=314240 > Dec 17 15:54:33 turing-police kernel: [14503.402395] ISOFS: unable to read > i-node block > Dec 17 15:54:33 turing-police kernel: [14503.402428] Unable to handle kernel > NULL pointer dereference at 010b RIP: > Dec 17 15:54:33 turing-police kernel: [14503.402440] [] > iput+0x11/0x80 > ... > Dec 17 15:54:33 turing-police kernel: [14503.403008] Call Trace: > Dec 17 15:54:33 turing-police kernel: [14503.403026] [] > isofs_fill_super+0x7e9/0xa6b > Dec 17 15:54:33 turing-police kernel: [14503.403045] [] > __down_write_nested+0x3d/0xa1 > Dec 17 15:54:33 turing-police kernel: [14503.403061] [] > __down_write+0xb/0xd > Dec 17 15:54:33 turing-police kernel: [14503.403076] [] > sget+0x397/0x3a9 > Dec 17 15:54:33 turing-police kernel: [14503.403090] [] > set_bdev_super+0x0/0x14 > Dec 17 15:54:33 turing-police kernel: [14503.403106] [] > get_sb_bdev+0x109/0x157 > Dec 17 15:54:33 turing-police kernel: [14503.403120] [] > isofs_fill_super+0x0/0xa6b > Dec 17 15:54:33 turing-police kernel: [14503.403138] [] > isofs_get_sb+0x13/0x15 > Dec 17 15:54:33 turing-police kernel: [14503.403151] [] > vfs_kern_mount+0x90/0x11a > Dec 17 15:54:33 turing-police kernel: [14503.403167] [] > do_kern_mount+0x47/0xe3 > Dec 17 15:54:33 turing-police kernel: [14503.403183] [] > do_mount+0x717/0x78a > Dec 17 15:54:33 turing-police kernel: [14503.403199] [] > _read_lock_irq+0x9/0xb > Dec 17 15:54:33 turing-police kernel: [14503.403212] [] > find_lock_page+0x8c/0x97 > Dec 17 15:54:33 turing-police kernel: [14503.403227] [] > filemap_fault+0x1fa/0x3c6 > Dec 17 15:54:33 turing-police kernel: [14503.403241] [] > unlock_page+0x2d/0x31 > Dec 17 15:54:33 turing-police kernel: [14503.403254] [] > __do_fault+0x38d/0x3c3 > Dec 17 15:54:33 turing-police kernel: [14503.403274] [] > handle_mm_fault+0x36d/0x6e9 > Dec 17 15:54:33 turing-police kernel: [14503.403293] [] > __alloc_pages+0x68/0x2f6 > Dec 17 15:54:33 turing-police kernel: [14503.403314] [] > sys_mount+0x89/0xcb > Dec 17 15:54:33 turing-police kernel: [14503.403328] [] > syscall_trace_enter+0x97/0x9b > Dec 17 15:54:33 turing-police kernel: [14503.403344] [] > tracesys+0xdc/0xe1 > Dec 17 15:54:33 turing-police kernel: [14503.403359] > Dec 17 15:54:33 turing-police kernel: [14503.403366] > Dec 17 15:54:33 turing-police kernel: [14503.403367] Code: 48 8b 87 10 01 00 > 00 48 83 bf 38 02 00 00 40 48 8b 40 38 75 > > I don't mind it failing the mount, but the oops seems excessive. I suspect > that *somewhere* in that stack trace, we're wanting something like a > > if (!foo_ptr) > return -EIO; > > but I admit not being competent enough to decide where that should be. > Hi, Could you please try the below patch: Signed-off-by: Dave Young <[EMAIL PROTECTED]> --- fs/isofs/inode.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -upr linux/fs/isofs/inode.c linux.new/fs/isofs/inode.c --- linux/fs/isofs/inode.c 2007-12-18 10:31:12.0 +0800 +++ linux.new/fs/isofs/inode.c 2007-12-18 10:31:56.0 +0800 @@ -1414,7 +1414,7 @@ struct inode *isofs_iget(struct super_bl ret = isofs_read_inode(inode); if (ret < 0) { iget_failed(inode); - inode = ERR_PTR(ret); + return NULL; } else { unlock_new_inode(inode); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
On Mon, 17 Dec 2007 14:56:44 PST, Andrew Morton said: (Adding Al Viro to the list, he's listed as "file systems" and MAINTAINERS doesn't list 'isofs' anyplace. Will Al or Andrew please vector to whoever actually does that code?) > > I try it again, and it reports it died at the same exact place, but in about > > 2 seconds flat, and reports 91M/sec transfer. OK, that's *weird*, I didn't > > think that blocks read from /dev/cdrom would get cached, but OK. > > It'll remain cached if something is holding the device open. Does it need to be "device open", or are there other things as well? If the drop_cache was hosed, that would result in the same symptoms, no? > Something's holding s_umount for writing I guess. Possibly busted error > handling somewhere totally different. Aha - found what was holding it - an attempt to loopback mount the truncated file (before I realized it was truncated) had failed - I had gotten a 'Killed' back from the mount, but I didn't realize it had pulled an actual oops: Dec 17 15:54:33 turing-police kernel: [14503.402385] attempt to access beyond end of device Dec 17 15:54:33 turing-police kernel: [14503.402391] loop1: rw=0, want=1284500, limit=314240 Dec 17 15:54:33 turing-police kernel: [14503.402395] ISOFS: unable to read i-node block Dec 17 15:54:33 turing-police kernel: [14503.402428] Unable to handle kernel NULL pointer dereference at 010b RIP: Dec 17 15:54:33 turing-police kernel: [14503.402440] [] iput+0x11/0x80 ... Dec 17 15:54:33 turing-police kernel: [14503.403008] Call Trace: Dec 17 15:54:33 turing-police kernel: [14503.403026] [] isofs_fill_super+0x7e9/0xa6b Dec 17 15:54:33 turing-police kernel: [14503.403045] [] __down_write_nested+0x3d/0xa1 Dec 17 15:54:33 turing-police kernel: [14503.403061] [] __down_write+0xb/0xd Dec 17 15:54:33 turing-police kernel: [14503.403076] [] sget+0x397/0x3a9 Dec 17 15:54:33 turing-police kernel: [14503.403090] [] set_bdev_super+0x0/0x14 Dec 17 15:54:33 turing-police kernel: [14503.403106] [] get_sb_bdev+0x109/0x157 Dec 17 15:54:33 turing-police kernel: [14503.403120] [] isofs_fill_super+0x0/0xa6b Dec 17 15:54:33 turing-police kernel: [14503.403138] [] isofs_get_sb+0x13/0x15 Dec 17 15:54:33 turing-police kernel: [14503.403151] [] vfs_kern_mount+0x90/0x11a Dec 17 15:54:33 turing-police kernel: [14503.403167] [] do_kern_mount+0x47/0xe3 Dec 17 15:54:33 turing-police kernel: [14503.403183] [] do_mount+0x717/0x78a Dec 17 15:54:33 turing-police kernel: [14503.403199] [] _read_lock_irq+0x9/0xb Dec 17 15:54:33 turing-police kernel: [14503.403212] [] find_lock_page+0x8c/0x97 Dec 17 15:54:33 turing-police kernel: [14503.403227] [] filemap_fault+0x1fa/0x3c6 Dec 17 15:54:33 turing-police kernel: [14503.403241] [] unlock_page+0x2d/0x31 Dec 17 15:54:33 turing-police kernel: [14503.403254] [] __do_fault+0x38d/0x3c3 Dec 17 15:54:33 turing-police kernel: [14503.403274] [] handle_mm_fault+0x36d/0x6e9 Dec 17 15:54:33 turing-police kernel: [14503.403293] [] __alloc_pages+0x68/0x2f6 Dec 17 15:54:33 turing-police kernel: [14503.403314] [] sys_mount+0x89/0xcb Dec 17 15:54:33 turing-police kernel: [14503.403328] [] syscall_trace_enter+0x97/0x9b Dec 17 15:54:33 turing-police kernel: [14503.403344] [] tracesys+0xdc/0xe1 Dec 17 15:54:33 turing-police kernel: [14503.403359] Dec 17 15:54:33 turing-police kernel: [14503.403366] Dec 17 15:54:33 turing-police kernel: [14503.403367] Code: 48 8b 87 10 01 00 00 48 83 bf 38 02 00 00 40 48 8b 40 38 75 I don't mind it failing the mount, but the oops seems excessive. I suspect that *somewhere* in that stack trace, we're wanting something like a if (!foo_ptr) return -EIO; but I admit not being competent enough to decide where that should be. pgp96V9uaXsyW.pgp Description: PGP signature
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
On Mon, 17 Dec 2007 17:44:11 -0500 [EMAIL PROTECTED] wrote: > On Thu, 13 Dec 2007 02:40:50 PST, Andrew Morton said: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc5/2.6.24-rc5-mm1/ > > OK, so I'm trying to 'dd' a CD and the drive on the laptop is having issues > reading the disk. > > I try it once, and get an I/O error about 117M in - dd reports 1.7M/sec. > > I try it again, and it reports it died at the same exact place, but in about > 2 seconds flat, and reports 91M/sec transfer. OK, that's *weird*, I didn't > think that blocks read from /dev/cdrom would get cached, but OK. It'll remain cached if something is holding the device open. > So I try > the obviously stupid thing: > > # echo 1 >| /proc/sys/vm/drop_caches > > Alas, that hangs gloriously - 'echo t > /proc/sysrq-trigger' tells me: > > Dec 17 17:30:02 turing-police kernel: [20235.823201] bash D > 0001 5288 15123 15085 > Dec 17 17:30:02 turing-police kernel: [20235.823206] 81007ba7de28 > 0086 > Dec 17 17:30:02 turing-police kernel: [20235.823210] 81007bbd9000 > 81007d70e000 81007bbd9248 0001019e3e48 > Dec 17 17:30:02 turing-police kernel: [20235.823214] e2f36028 > e200012b9978 e2eece48 e20001164188 > Dec 17 17:30:02 turing-police kernel: [20235.823218] Call Trace: > Dec 17 17:30:02 turing-police kernel: [20235.823224] [] > __down_read+0x87/0xa1 > Dec 17 17:30:02 turing-police kernel: [20235.823229] [] > down_read+0x9/0xe > Dec 17 17:30:02 turing-police kernel: [20235.823232] [] > drop_pagecache+0x3a/0x8c > Dec 17 17:30:02 turing-police kernel: [20235.823235] [] > drop_caches_sysctl_handler+0x22/0x38 > Dec 17 17:30:02 turing-police kernel: [20235.823239] [] > proc_sys_write+0x7e/0xa6 > Dec 17 17:30:02 turing-police kernel: [20235.823244] [] > vfs_write+0xc7/0x170 > Dec 17 17:30:02 turing-police kernel: [20235.823248] [] > sys_write+0x47/0x70 > Dec 17 17:30:02 turing-police kernel: [20235.823251] [] > tracesys+0xdc/0xe1 > Something's holding s_umount for writing I guess. Possibly busted error handling somewhere totally different. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
Hello, > > cat /proc/kpagecount on the other hand - with the change in line 710 > > - locks the box. Sysrq works, changing consoles works, but there is > > no "BUG: soft lockup ..." message. After a while the box becomes > > totaly unresponsive - even caps lock doesn't work, no responses to > > ping. > > Well I'm baffled. There's basically two things in that function that > do anything interesting: pfn_to_page and put_user. access_ok is > "return 1" on Sparc64. atomic_read is a simple read. > > My usual approach at this point would be to litter it with printks and > see where its hanging. Ok. Maybe this will help. Don't know how to compare that to the results from yesterday (test with ppage = NULL) - maybe I f something up. This time I added a bunch of printks and got these results: This is from 'cat /proc/kpageflags' (after this the box is locked): 01 pfn:0, src:0, KPMSIZE:8 23458 ppage:0002, pfn:1 and the relevant code: static ssize_t kpageflags_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { u64 __user *out = (u64 __user *)buf; struct page *ppage; unsigned long src = *ppos; unsigned long pfn; ssize_t ret = 0; u64 kflags, uflags; printk("0"); if (!access_ok(VERIFY_WRITE, buf, count)) return -EFAULT; printk("1"); pfn = src / KPMSIZE; printk("\npfn:%u, src:%u, KPMSIZE:%d\n", pfn, src, KPMSIZE); count = min_t(unsigned long, count, (max_pfn * KPMSIZE) - src); printk("2"); if (src & KPMMASK || count & KPMMASK) return -EIO; printk("3"); while (count > 0) { printk("4"); ppage = pfn_to_page(pfn++); printk("5"); if (!ppage) { printk("6"); kflags = 0; printk("7"); } else { printk("8"); printk("\nppage:%p, pfn:%u\n", ppage, pfn); kflags = ppage->flags; // < something bad happens printk("9"); } printk("a"); This is from 'cat /proc/kpagecount' (after this the box is locked) 01 pfn:0, src:0, KPMSIZE:8 23567a ppage:0002, pfn:1 and this is the relevant code: static ssize_t kpagecount_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { u64 __user *out = (u64 __user *)buf; struct page *ppage; unsigned long src = *ppos; unsigned long pfn; ssize_t ret = 0; u64 pcount; printk("0"); if (!access_ok(VERIFY_WRITE, buf, count)) return -EFAULT; printk("1"); pfn = src / KPMSIZE; printk("\npfn:%u, src:%u, KPMSIZE:%d\n", pfn, src, KPMSIZE); printk("2"); count = min_t(size_t, count, (max_pfn * KPMSIZE) - src); printk("3"); if (src & KPMMASK || count & KPMMASK) { printk("4"); return -EIO; } printk("5"); while (count > 0) { printk("6"); ppage = pfn_to_page(pfn++); printk("7"); if (!ppage) { printk("8"); pcount = 0; } else { printk("a"); printk("\nppage:%p, pfn:%u\n", ppage, pfn); pcount = atomic_read(>_count); // < something bad happens printk("b"); } Regards, Mariusz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Sun, Dec 16, 2007 at 10:39:17PM -0800, Andrew Morton wrote: > On Sun, 16 Dec 2007 20:26:11 -0800 (PST) David Miller <[EMAIL PROTECTED]> > wrote: > > > From: Matt Mackall <[EMAIL PROTECTED]> > > Date: Sun, 16 Dec 2007 20:11:49 -0600 > > > > > But as the function doesn't actually show up in your stack trace, > > > something else is probably wrong. So I'd also try commenting out > > > pieces of that function until it started working. > > > > Some piece of state is being indirectly corrupted and this > > is showing up later in some unrelated operation. > > > > Can someone send me this kpageflags patch under seperate > > cover? I'll try figure out why it farts on sparc64. > > hm, non trivial. It's the third-from-last patch in: > > maps4-add-proportional-set-size-accounting-in-smaps.patch > maps4-rework-task_size-macros.patch > maps4-rework-task_size-macros-mips-fix.patch > maps4-move-is_swap_pte.patch > maps4-introduce-a-generic-page-walker.patch > maps4-use-pagewalker-in-clear_refs-and-smaps.patch > maps4-simplify-interdependence-of-maps-and-smaps.patch > maps4-move-clear_refs-code-to-task_mmuc.patch > maps4-regroup-task_mmu-by-interface.patch > maps4-add-proc-pid-pagemap-interface.patch Actually, you may only need these two: > maps4-add-proc-kpagecount-interface.patch > maps4-add-proc-kpageflags-interface.patch -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 compile failure: usbhid_lookup_quirk
On Mon, 17 Dec 2007, Andrew Morton wrote: > > MODPOST 196 modules > > ERROR: "usbhid_lookup_quirk" [drivers/hid/usbhid/usbmouse.ko] > > undefined! > > ERROR: "usbhid_lookup_quirk" [drivers/hid/usbhid/usbkbd.ko] undefined! > > make[1]: *** [__modpost] Error 1 > > make: *** [modules] Error 2 > > The problem was fixed by defining CONFIG_USB_HID=m - but I think that > > should happen automatically if it is necessary. > > .config was created by running make oldconfig against 2.6.23-rc8-mm2: > Thanks. That's coming out of git-hid.patch. Thanks a lot for the report, I will fix that up in my tree. By the way, please be aware that you almost certainly _do not_ want to use usbmouse and usbkbd drivers. Please read their Kconfig help text. -- Jiri Kosina -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 compile failure: usbhid_lookup_quirk
On Sat, 15 Dec 2007 19:50:40 +0100 jurriaan <[EMAIL PROTECTED]> wrote: > MODPOST 196 modules > ERROR: "usbhid_lookup_quirk" [drivers/hid/usbhid/usbmouse.ko] > undefined! > ERROR: "usbhid_lookup_quirk" [drivers/hid/usbhid/usbkbd.ko] undefined! > make[1]: *** [__modpost] Error 1 > make: *** [modules] Error 2 > > The problem was fixed by defining CONFIG_USB_HID=m - but I think that > should happen automatically if it is necessary. > > .config was created by running make oldconfig against 2.6.23-rc8-mm2: Thanks. That's coming out of git-hid.patch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 compile failure: usbhid_lookup_quirk
On Sat, 15 Dec 2007 19:50:40 +0100 jurriaan [EMAIL PROTECTED] wrote: MODPOST 196 modules ERROR: usbhid_lookup_quirk [drivers/hid/usbhid/usbmouse.ko] undefined! ERROR: usbhid_lookup_quirk [drivers/hid/usbhid/usbkbd.ko] undefined! make[1]: *** [__modpost] Error 1 make: *** [modules] Error 2 The problem was fixed by defining CONFIG_USB_HID=m - but I think that should happen automatically if it is necessary. .config was created by running make oldconfig against 2.6.23-rc8-mm2: Thanks. That's coming out of git-hid.patch. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 compile failure: usbhid_lookup_quirk
On Mon, 17 Dec 2007, Andrew Morton wrote: MODPOST 196 modules ERROR: usbhid_lookup_quirk [drivers/hid/usbhid/usbmouse.ko] undefined! ERROR: usbhid_lookup_quirk [drivers/hid/usbhid/usbkbd.ko] undefined! make[1]: *** [__modpost] Error 1 make: *** [modules] Error 2 The problem was fixed by defining CONFIG_USB_HID=m - but I think that should happen automatically if it is necessary. .config was created by running make oldconfig against 2.6.23-rc8-mm2: Thanks. That's coming out of git-hid.patch. Thanks a lot for the report, I will fix that up in my tree. By the way, please be aware that you almost certainly _do not_ want to use usbmouse and usbkbd drivers. Please read their Kconfig help text. -- Jiri Kosina -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Sun, Dec 16, 2007 at 10:39:17PM -0800, Andrew Morton wrote: On Sun, 16 Dec 2007 20:26:11 -0800 (PST) David Miller [EMAIL PROTECTED] wrote: From: Matt Mackall [EMAIL PROTECTED] Date: Sun, 16 Dec 2007 20:11:49 -0600 But as the function doesn't actually show up in your stack trace, something else is probably wrong. So I'd also try commenting out pieces of that function until it started working. Some piece of state is being indirectly corrupted and this is showing up later in some unrelated operation. Can someone send me this kpageflags patch under seperate cover? I'll try figure out why it farts on sparc64. hm, non trivial. It's the third-from-last patch in: maps4-add-proportional-set-size-accounting-in-smaps.patch maps4-rework-task_size-macros.patch maps4-rework-task_size-macros-mips-fix.patch maps4-move-is_swap_pte.patch maps4-introduce-a-generic-page-walker.patch maps4-use-pagewalker-in-clear_refs-and-smaps.patch maps4-simplify-interdependence-of-maps-and-smaps.patch maps4-move-clear_refs-code-to-task_mmuc.patch maps4-regroup-task_mmu-by-interface.patch maps4-add-proc-pid-pagemap-interface.patch Actually, you may only need these two: maps4-add-proc-kpagecount-interface.patch maps4-add-proc-kpageflags-interface.patch -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
Hello, cat /proc/kpagecount on the other hand - with the change in line 710 - locks the box. Sysrq works, changing consoles works, but there is no BUG: soft lockup ... message. After a while the box becomes totaly unresponsive - even caps lock doesn't work, no responses to ping. Well I'm baffled. There's basically two things in that function that do anything interesting: pfn_to_page and put_user. access_ok is return 1 on Sparc64. atomic_read is a simple read. My usual approach at this point would be to litter it with printks and see where its hanging. Ok. Maybe this will help. Don't know how to compare that to the results from yesterday (test with ppage = NULL) - maybe I f something up. This time I added a bunch of printks and got these results: This is from 'cat /proc/kpageflags' (after this the box is locked): 01 pfn:0, src:0, KPMSIZE:8 23458 ppage:0002, pfn:1 and the relevant code: static ssize_t kpageflags_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { u64 __user *out = (u64 __user *)buf; struct page *ppage; unsigned long src = *ppos; unsigned long pfn; ssize_t ret = 0; u64 kflags, uflags; printk(0); if (!access_ok(VERIFY_WRITE, buf, count)) return -EFAULT; printk(1); pfn = src / KPMSIZE; printk(\npfn:%u, src:%u, KPMSIZE:%d\n, pfn, src, KPMSIZE); count = min_t(unsigned long, count, (max_pfn * KPMSIZE) - src); printk(2); if (src KPMMASK || count KPMMASK) return -EIO; printk(3); while (count 0) { printk(4); ppage = pfn_to_page(pfn++); printk(5); if (!ppage) { printk(6); kflags = 0; printk(7); } else { printk(8); printk(\nppage:%p, pfn:%u\n, ppage, pfn); kflags = ppage-flags; // something bad happens printk(9); } printk(a); This is from 'cat /proc/kpagecount' (after this the box is locked) 01 pfn:0, src:0, KPMSIZE:8 23567a ppage:0002, pfn:1 and this is the relevant code: static ssize_t kpagecount_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { u64 __user *out = (u64 __user *)buf; struct page *ppage; unsigned long src = *ppos; unsigned long pfn; ssize_t ret = 0; u64 pcount; printk(0); if (!access_ok(VERIFY_WRITE, buf, count)) return -EFAULT; printk(1); pfn = src / KPMSIZE; printk(\npfn:%u, src:%u, KPMSIZE:%d\n, pfn, src, KPMSIZE); printk(2); count = min_t(size_t, count, (max_pfn * KPMSIZE) - src); printk(3); if (src KPMMASK || count KPMMASK) { printk(4); return -EIO; } printk(5); while (count 0) { printk(6); ppage = pfn_to_page(pfn++); printk(7); if (!ppage) { printk(8); pcount = 0; } else { printk(a); printk(\nppage:%p, pfn:%u\n, ppage, pfn); pcount = atomic_read(ppage-_count); // something bad happens printk(b); } Regards, Mariusz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
On Mon, 17 Dec 2007 17:44:11 -0500 [EMAIL PROTECTED] wrote: On Thu, 13 Dec 2007 02:40:50 PST, Andrew Morton said: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc5/2.6.24-rc5-mm1/ OK, so I'm trying to 'dd' a CD and the drive on the laptop is having issues reading the disk. I try it once, and get an I/O error about 117M in - dd reports 1.7M/sec. I try it again, and it reports it died at the same exact place, but in about 2 seconds flat, and reports 91M/sec transfer. OK, that's *weird*, I didn't think that blocks read from /dev/cdrom would get cached, but OK. It'll remain cached if something is holding the device open. So I try the obviously stupid thing: # echo 1 | /proc/sys/vm/drop_caches Alas, that hangs gloriously - 'echo t /proc/sysrq-trigger' tells me: Dec 17 17:30:02 turing-police kernel: [20235.823201] bash D 0001 5288 15123 15085 Dec 17 17:30:02 turing-police kernel: [20235.823206] 81007ba7de28 0086 Dec 17 17:30:02 turing-police kernel: [20235.823210] 81007bbd9000 81007d70e000 81007bbd9248 0001019e3e48 Dec 17 17:30:02 turing-police kernel: [20235.823214] e2f36028 e200012b9978 e2eece48 e20001164188 Dec 17 17:30:02 turing-police kernel: [20235.823218] Call Trace: Dec 17 17:30:02 turing-police kernel: [20235.823224] [80523e20] __down_read+0x87/0xa1 Dec 17 17:30:02 turing-police kernel: [20235.823229] [8024bc13] down_read+0x9/0xe Dec 17 17:30:02 turing-police kernel: [20235.823232] [802abafe] drop_pagecache+0x3a/0x8c Dec 17 17:30:02 turing-police kernel: [20235.823235] [802abb72] drop_caches_sysctl_handler+0x22/0x38 Dec 17 17:30:02 turing-police kernel: [20235.823239] [802d2b70] proc_sys_write+0x7e/0xa6 Dec 17 17:30:02 turing-police kernel: [20235.823244] [8028e18c] vfs_write+0xc7/0x170 Dec 17 17:30:02 turing-police kernel: [20235.823248] [8028e772] sys_write+0x47/0x70 Dec 17 17:30:02 turing-police kernel: [20235.823251] [8020c34c] tracesys+0xdc/0xe1 Something's holding s_umount for writing I guess. Possibly busted error handling somewhere totally different. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
On Mon, 17 Dec 2007 14:56:44 PST, Andrew Morton said: (Adding Al Viro to the list, he's listed as file systems and MAINTAINERS doesn't list 'isofs' anyplace. Will Al or Andrew please vector to whoever actually does that code?) I try it again, and it reports it died at the same exact place, but in about 2 seconds flat, and reports 91M/sec transfer. OK, that's *weird*, I didn't think that blocks read from /dev/cdrom would get cached, but OK. It'll remain cached if something is holding the device open. Does it need to be device open, or are there other things as well? If the drop_cache was hosed, that would result in the same symptoms, no? Something's holding s_umount for writing I guess. Possibly busted error handling somewhere totally different. Aha - found what was holding it - an attempt to loopback mount the truncated file (before I realized it was truncated) had failed - I had gotten a 'Killed' back from the mount, but I didn't realize it had pulled an actual oops: Dec 17 15:54:33 turing-police kernel: [14503.402385] attempt to access beyond end of device Dec 17 15:54:33 turing-police kernel: [14503.402391] loop1: rw=0, want=1284500, limit=314240 Dec 17 15:54:33 turing-police kernel: [14503.402395] ISOFS: unable to read i-node block Dec 17 15:54:33 turing-police kernel: [14503.402428] Unable to handle kernel NULL pointer dereference at 010b RIP: Dec 17 15:54:33 turing-police kernel: [14503.402440] [802a096b] iput+0x11/0x80 ... Dec 17 15:54:33 turing-police kernel: [14503.403008] Call Trace: Dec 17 15:54:33 turing-police kernel: [14503.403026] [802ff73e] isofs_fill_super+0x7e9/0xa6b Dec 17 15:54:33 turing-police kernel: [14503.403045] [80523d28] __down_write_nested+0x3d/0xa1 Dec 17 15:54:33 turing-police kernel: [14503.403061] [80523d97] __down_write+0xb/0xd Dec 17 15:54:33 turing-police kernel: [14503.403076] [8028fb63] sget+0x397/0x3a9 Dec 17 15:54:33 turing-police kernel: [14503.403090] [8028f204] set_bdev_super+0x0/0x14 Dec 17 15:54:33 turing-police kernel: [14503.403106] [80290301] get_sb_bdev+0x109/0x157 Dec 17 15:54:33 turing-police kernel: [14503.403120] [802fef55] isofs_fill_super+0x0/0xa6b Dec 17 15:54:33 turing-police kernel: [14503.403138] [802fe2e9] isofs_get_sb+0x13/0x15 Dec 17 15:54:33 turing-police kernel: [14503.403151] [80290075] vfs_kern_mount+0x90/0x11a Dec 17 15:54:33 turing-police kernel: [14503.403167] [8029015c] do_kern_mount+0x47/0xe3 Dec 17 15:54:33 turing-police kernel: [14503.403183] [802a5012] do_mount+0x717/0x78a Dec 17 15:54:33 turing-police kernel: [14503.403199] [805242fc] _read_lock_irq+0x9/0xb Dec 17 15:54:33 turing-police kernel: [14503.403212] [8026cce0] find_lock_page+0x8c/0x97 Dec 17 15:54:33 turing-police kernel: [14503.403227] [8026ecb6] filemap_fault+0x1fa/0x3c6 Dec 17 15:54:33 turing-police kernel: [14503.403241] [8026cb6b] unlock_page+0x2d/0x31 Dec 17 15:54:33 turing-police kernel: [14503.403254] [8027925c] __do_fault+0x38d/0x3c3 Dec 17 15:54:33 turing-police kernel: [14503.403274] [8027ab68] handle_mm_fault+0x36d/0x6e9 Dec 17 15:54:33 turing-police kernel: [14503.403293] [80271903] __alloc_pages+0x68/0x2f6 Dec 17 15:54:33 turing-police kernel: [14503.403314] [802a510e] sys_mount+0x89/0xcb Dec 17 15:54:33 turing-police kernel: [14503.403328] [80214f34] syscall_trace_enter+0x97/0x9b Dec 17 15:54:33 turing-police kernel: [14503.403344] [8020c34c] tracesys+0xdc/0xe1 Dec 17 15:54:33 turing-police kernel: [14503.403359] Dec 17 15:54:33 turing-police kernel: [14503.403366] Dec 17 15:54:33 turing-police kernel: [14503.403367] Code: 48 8b 87 10 01 00 00 48 83 bf 38 02 00 00 40 48 8b 40 38 75 I don't mind it failing the mount, but the oops seems excessive. I suspect that *somewhere* in that stack trace, we're wanting something like a if (!foo_ptr) return -EIO; but I admit not being competent enough to decide where that should be. pgp96V9uaXsyW.pgp Description: PGP signature
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
On Mon, Dec 17, 2007 at 09:07:56PM -0500, [EMAIL PROTECTED] wrote: On Mon, 17 Dec 2007 14:56:44 PST, Andrew Morton said: (Adding Al Viro to the list, he's listed as file systems and MAINTAINERS doesn't list 'isofs' anyplace. Will Al or Andrew please vector to whoever actually does that code?) I try it again, and it reports it died at the same exact place, but in about 2 seconds flat, and reports 91M/sec transfer. OK, that's *weird*, I didn't think that blocks read from /dev/cdrom would get cached, but OK. It'll remain cached if something is holding the device open. Does it need to be device open, or are there other things as well? If the drop_cache was hosed, that would result in the same symptoms, no? Something's holding s_umount for writing I guess. Possibly busted error handling somewhere totally different. Aha - found what was holding it - an attempt to loopback mount the truncated file (before I realized it was truncated) had failed - I had gotten a 'Killed' back from the mount, but I didn't realize it had pulled an actual oops: Dec 17 15:54:33 turing-police kernel: [14503.402385] attempt to access beyond end of device Dec 17 15:54:33 turing-police kernel: [14503.402391] loop1: rw=0, want=1284500, limit=314240 Dec 17 15:54:33 turing-police kernel: [14503.402395] ISOFS: unable to read i-node block Dec 17 15:54:33 turing-police kernel: [14503.402428] Unable to handle kernel NULL pointer dereference at 010b RIP: Dec 17 15:54:33 turing-police kernel: [14503.402440] [802a096b] iput+0x11/0x80 ... Dec 17 15:54:33 turing-police kernel: [14503.403008] Call Trace: Dec 17 15:54:33 turing-police kernel: [14503.403026] [802ff73e] isofs_fill_super+0x7e9/0xa6b Dec 17 15:54:33 turing-police kernel: [14503.403045] [80523d28] __down_write_nested+0x3d/0xa1 Dec 17 15:54:33 turing-police kernel: [14503.403061] [80523d97] __down_write+0xb/0xd Dec 17 15:54:33 turing-police kernel: [14503.403076] [8028fb63] sget+0x397/0x3a9 Dec 17 15:54:33 turing-police kernel: [14503.403090] [8028f204] set_bdev_super+0x0/0x14 Dec 17 15:54:33 turing-police kernel: [14503.403106] [80290301] get_sb_bdev+0x109/0x157 Dec 17 15:54:33 turing-police kernel: [14503.403120] [802fef55] isofs_fill_super+0x0/0xa6b Dec 17 15:54:33 turing-police kernel: [14503.403138] [802fe2e9] isofs_get_sb+0x13/0x15 Dec 17 15:54:33 turing-police kernel: [14503.403151] [80290075] vfs_kern_mount+0x90/0x11a Dec 17 15:54:33 turing-police kernel: [14503.403167] [8029015c] do_kern_mount+0x47/0xe3 Dec 17 15:54:33 turing-police kernel: [14503.403183] [802a5012] do_mount+0x717/0x78a Dec 17 15:54:33 turing-police kernel: [14503.403199] [805242fc] _read_lock_irq+0x9/0xb Dec 17 15:54:33 turing-police kernel: [14503.403212] [8026cce0] find_lock_page+0x8c/0x97 Dec 17 15:54:33 turing-police kernel: [14503.403227] [8026ecb6] filemap_fault+0x1fa/0x3c6 Dec 17 15:54:33 turing-police kernel: [14503.403241] [8026cb6b] unlock_page+0x2d/0x31 Dec 17 15:54:33 turing-police kernel: [14503.403254] [8027925c] __do_fault+0x38d/0x3c3 Dec 17 15:54:33 turing-police kernel: [14503.403274] [8027ab68] handle_mm_fault+0x36d/0x6e9 Dec 17 15:54:33 turing-police kernel: [14503.403293] [80271903] __alloc_pages+0x68/0x2f6 Dec 17 15:54:33 turing-police kernel: [14503.403314] [802a510e] sys_mount+0x89/0xcb Dec 17 15:54:33 turing-police kernel: [14503.403328] [80214f34] syscall_trace_enter+0x97/0x9b Dec 17 15:54:33 turing-police kernel: [14503.403344] [8020c34c] tracesys+0xdc/0xe1 Dec 17 15:54:33 turing-police kernel: [14503.403359] Dec 17 15:54:33 turing-police kernel: [14503.403366] Dec 17 15:54:33 turing-police kernel: [14503.403367] Code: 48 8b 87 10 01 00 00 48 83 bf 38 02 00 00 40 48 8b 40 38 75 I don't mind it failing the mount, but the oops seems excessive. I suspect that *somewhere* in that stack trace, we're wanting something like a if (!foo_ptr) return -EIO; but I admit not being competent enough to decide where that should be. Hi, Could you please try the below patch: Signed-off-by: Dave Young [EMAIL PROTECTED] --- fs/isofs/inode.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -upr linux/fs/isofs/inode.c linux.new/fs/isofs/inode.c --- linux/fs/isofs/inode.c 2007-12-18 10:31:12.0 +0800 +++ linux.new/fs/isofs/inode.c 2007-12-18 10:31:56.0 +0800 @@ -1414,7 +1414,7 @@ struct inode *isofs_iget(struct super_bl ret = isofs_read_inode(inode); if (ret 0) { iget_failed(inode); - inode = ERR_PTR(ret); + return NULL; } else { unlock_new_inode(inode);
Re: 2.6.24-rc5-mm1 - wonky disk cache and CDROM behavior...
On Tue, 18 Dec 2007 10:37:32 +0800 Dave Young [EMAIL PROTECTED] wrote: On Mon, Dec 17, 2007 at 09:07:56PM -0500, [EMAIL PROTECTED] wrote: On Mon, 17 Dec 2007 14:56:44 PST, Andrew Morton said: (Adding Al Viro to the list, he's listed as file systems and MAINTAINERS doesn't list 'isofs' anyplace. Will Al or Andrew please vector to whoever actually does that code?) I try it again, and it reports it died at the same exact place, but in about 2 seconds flat, and reports 91M/sec transfer. OK, that's *weird*, I didn't think that blocks read from /dev/cdrom would get cached, but OK. It'll remain cached if something is holding the device open. Does it need to be device open, or are there other things as well? If the drop_cache was hosed, that would result in the same symptoms, no? Something's holding s_umount for writing I guess. Possibly busted error handling somewhere totally different. Aha - found what was holding it - an attempt to loopback mount the truncated file (before I realized it was truncated) had failed - I had gotten a 'Killed' back from the mount, but I didn't realize it had pulled an actual oops: Dec 17 15:54:33 turing-police kernel: [14503.402385] attempt to access beyond end of device Dec 17 15:54:33 turing-police kernel: [14503.402391] loop1: rw=0, want=1284500, limit=314240 Dec 17 15:54:33 turing-police kernel: [14503.402395] ISOFS: unable to read i-node block Dec 17 15:54:33 turing-police kernel: [14503.402428] Unable to handle kernel NULL pointer dereference at 010b RIP: Dec 17 15:54:33 turing-police kernel: [14503.402440] [802a096b] iput+0x11/0x80 ... Dec 17 15:54:33 turing-police kernel: [14503.403008] Call Trace: Dec 17 15:54:33 turing-police kernel: [14503.403026] [802ff73e] isofs_fill_super+0x7e9/0xa6b Dec 17 15:54:33 turing-police kernel: [14503.403045] [80523d28] __down_write_nested+0x3d/0xa1 Dec 17 15:54:33 turing-police kernel: [14503.403061] [80523d97] __down_write+0xb/0xd Dec 17 15:54:33 turing-police kernel: [14503.403076] [8028fb63] sget+0x397/0x3a9 Dec 17 15:54:33 turing-police kernel: [14503.403090] [8028f204] set_bdev_super+0x0/0x14 Dec 17 15:54:33 turing-police kernel: [14503.403106] [80290301] get_sb_bdev+0x109/0x157 Dec 17 15:54:33 turing-police kernel: [14503.403120] [802fef55] isofs_fill_super+0x0/0xa6b Dec 17 15:54:33 turing-police kernel: [14503.403138] [802fe2e9] isofs_get_sb+0x13/0x15 Dec 17 15:54:33 turing-police kernel: [14503.403151] [80290075] vfs_kern_mount+0x90/0x11a Dec 17 15:54:33 turing-police kernel: [14503.403167] [8029015c] do_kern_mount+0x47/0xe3 Dec 17 15:54:33 turing-police kernel: [14503.403183] [802a5012] do_mount+0x717/0x78a Dec 17 15:54:33 turing-police kernel: [14503.403199] [805242fc] _read_lock_irq+0x9/0xb Dec 17 15:54:33 turing-police kernel: [14503.403212] [8026cce0] find_lock_page+0x8c/0x97 Dec 17 15:54:33 turing-police kernel: [14503.403227] [8026ecb6] filemap_fault+0x1fa/0x3c6 Dec 17 15:54:33 turing-police kernel: [14503.403241] [8026cb6b] unlock_page+0x2d/0x31 Dec 17 15:54:33 turing-police kernel: [14503.403254] [8027925c] __do_fault+0x38d/0x3c3 Dec 17 15:54:33 turing-police kernel: [14503.403274] [8027ab68] handle_mm_fault+0x36d/0x6e9 Dec 17 15:54:33 turing-police kernel: [14503.403293] [80271903] __alloc_pages+0x68/0x2f6 Dec 17 15:54:33 turing-police kernel: [14503.403314] [802a510e] sys_mount+0x89/0xcb Dec 17 15:54:33 turing-police kernel: [14503.403328] [80214f34] syscall_trace_enter+0x97/0x9b Dec 17 15:54:33 turing-police kernel: [14503.403344] [8020c34c] tracesys+0xdc/0xe1 Dec 17 15:54:33 turing-police kernel: [14503.403359] Dec 17 15:54:33 turing-police kernel: [14503.403366] Dec 17 15:54:33 turing-police kernel: [14503.403367] Code: 48 8b 87 10 01 00 00 48 83 bf 38 02 00 00 40 48 8b 40 38 75 I don't mind it failing the mount, but the oops seems excessive. I suspect that *somewhere* in that stack trace, we're wanting something like a if (!foo_ptr) return -EIO; but I admit not being competent enough to decide where that should be. Hi, Could you please try the below patch: Signed-off-by: Dave Young [EMAIL PROTECTED] --- fs/isofs/inode.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -upr linux/fs/isofs/inode.c linux.new/fs/isofs/inode.c --- linux/fs/isofs/inode.c2007-12-18 10:31:12.0 +0800 +++ linux.new/fs/isofs/inode.c2007-12-18 10:31:56.0 +0800 @@ -1414,7 +1414,7 @@ struct inode *isofs_iget(struct super_bl ret = isofs_read_inode(inode); if (ret 0) {
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Sun, 16 Dec 2007 20:26:11 -0800 (PST) David Miller <[EMAIL PROTECTED]> wrote: > From: Matt Mackall <[EMAIL PROTECTED]> > Date: Sun, 16 Dec 2007 20:11:49 -0600 > > > But as the function doesn't actually show up in your stack trace, > > something else is probably wrong. So I'd also try commenting out > > pieces of that function until it started working. > > Some piece of state is being indirectly corrupted and this > is showing up later in some unrelated operation. > > Can someone send me this kpageflags patch under seperate > cover? I'll try figure out why it farts on sparc64. hm, non trivial. It's the third-from-last patch in: maps4-add-proportional-set-size-accounting-in-smaps.patch maps4-rework-task_size-macros.patch maps4-rework-task_size-macros-mips-fix.patch maps4-move-is_swap_pte.patch maps4-introduce-a-generic-page-walker.patch maps4-use-pagewalker-in-clear_refs-and-smaps.patch maps4-simplify-interdependence-of-maps-and-smaps.patch maps4-move-clear_refs-code-to-task_mmuc.patch maps4-regroup-task_mmu-by-interface.patch maps4-add-proc-pid-pagemap-interface.patch maps4-add-proc-kpagecount-interface.patch maps4-add-proc-kpageflags-interface.patch maps4-make-page-monitoring-proc-file-optional.patch maps4-make-page-monitoring-proc-file-optional-fix.patch from ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc5/2.6.24-rc5-mm1/broken-out That patch series does apply OK to mainline though. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
From: Matt Mackall <[EMAIL PROTECTED]> Date: Sun, 16 Dec 2007 20:11:49 -0600 > But as the function doesn't actually show up in your stack trace, > something else is probably wrong. So I'd also try commenting out > pieces of that function until it started working. Some piece of state is being indirectly corrupted and this is showing up later in some unrelated operation. Can someone send me this kpageflags patch under seperate cover? I'll try figure out why it farts on sparc64. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Sun, Dec 16, 2007 at 08:10:10PM +0100, Mariusz Kozlowski wrote: > > > Can you change line 710 of fs/proc/proc_misc.c to: > > > > > > ppage = NULL; > > > > Sure. > > > > > ..and see if it still breaks? > > > > Yes it does - the same way as eariler. Box is locked, processes stuck in D > > state > > and after a while "BUG: soft lockup - CPU#0 stuck for 11s!". > > My mistake. I run cat /proc/kpageflags in the first place - so how > could anything change :) > > cat /proc/kpagecount on the other hand - with the change in line 710 > - locks the box. Sysrq works, changing consoles works, but there is > no "BUG: soft lockup ..." message. After a while the box becomes > totaly unresponsive - even caps lock doesn't work, no responses to > ping. Well I'm baffled. There's basically two things in that function that do anything interesting: pfn_to_page and put_user. access_ok is "return 1" on Sparc64. atomic_read is a simple read. My usual approach at this point would be to litter it with printks and see where its hanging. But as the function doesn't actually show up in your stack trace, something else is probably wrong. So I'd also try commenting out pieces of that function until it started working. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1
On Dec 14, 2007 11:44 PM, Alan Stern <[EMAIL PROTECTED]> wrote: > On Fri, 14 Dec 2007, Dave Young wrote: > > > Hi, > > The behaviour of my mp3 player (also act as usb-storage device) seems > > changed from rc5 to rc5-mm1. > > This can't be considered a bug, right? I'm not sure. > It's just that the player > changed from one slightly non-standard behavior to a different slightly > non-standard behavior. > > > > : > > = > > usb 1-7: new high speed USB device using ehci_hcd and address 7 > > usb 1-7: configuration #1 chosen from 1 choice > > scsi4 : SCSI emulation for USB Mass Storage devices > > usb-storage: device found at 7 > > usb-storage: waiting for device to settle before scanning > > scsi 4:0:0:0: Direct-Access Newman mp3 PQ: 0 ANSI: > > 0 CCS > > sd 4:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) > > sd 4:0:0:0: [sdb] Write Protect is on > > sd 4:0:0:0: [sdb] Mode Sense: 03 00 80 00 > > sd 4:0:0:0: [sdb] Assuming drive cache: write through > > sd 4:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) > > sd 4:0:0:0: [sdb] Write Protect is on > > sd 4:0:0:0: [sdb] Mode Sense: 03 00 80 00 > > sd 4:0:0:0: [sdb] Assuming drive cache: write through > > sdb: sdb1 > > sd 4:0:0:0: [sdb] Attached SCSI removable disk > > sd 4:0:0:0: Attached scsi generic sg1 type 0 > > usb-storage: device scan complete > > > > == > > try mount it (or just blockdev --rereadpt), then write protect become off: > > == > > > > sd 4:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) > > sd 4:0:0:0: [sdb] Write Protect is off > > sd 4:0:0:0: [sdb] Mode Sense: 03 00 00 00 > > sd 4:0:0:0: [sdb] Assuming drive cache: write through > > sd 4:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) > > sd 4:0:0:0: [sdb] Write Protect is off > > sd 4:0:0:0: [sdb] Mode Sense: 03 00 00 00 > > sd 4:0:0:0: [sdb] Assuming drive cache: write through > > sdb: sdb1 > > This output won't appear if you simply mount the device. So how do you > know that mounting turns off write protect? This can be observed by eye: dmesg -> mount -> dmesg > > > But under rc5-mm1, after mount command being executed, it is just > > mouted as read only partition without set the write-protect to off > > > > I tried "blockdev --rereadpt", it do set the write-protect to off as rc5 > > kernel. > > > > Below is the output of dmesg under rc5-mm1 > > == > > usb 1-8: new high speed USB device using ehci_hcd and address 6 > > usb 1-8: configuration #1 chosen from 1 choice > > scsi3 : SCSI emulation for USB Mass Storage devices > > usb-storage: device found at 6 > > usb-storage: waiting for device to settle before scanning > > scsi 3:0:0:0: Direct-Access Newman mp3 PQ: 0 ANSI: > > 0 CCS > > sd 3:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) > > sd 3:0:0:0: [sdb] Write Protect is on > > sd 3:0:0:0: [sdb] Mode Sense: 03 00 80 00 > > sd 3:0:0:0: [sdb] Assuming drive cache: write through > > sd 3:0:0:0: [sdb] 245248 512-byte hardware sectors (126 MB) > > sd 3:0:0:0: [sdb] Write Protect is on > > sd 3:0:0:0: [sdb] Mode Sense: 03 00 80 00 > > sd 3:0:0:0: [sdb] Assuming drive cache: write through > > sdb: sdb1 > > This looks exactly the same as the output above (except for various > port, device, and bus numbers). Yes, but lacks the part of "'Write Protect if off' and other lines". > > If you turn on CONFIG_USB_STORAGE_DEBUG for both kernels and compare > the dmesg output for the mount command, that might highlight the > difference. Ok, I will test with do once have time, thanks. Regards dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1 -- inconsistent {in-softirq-W} -> {softirq-on-R} usage.
From: Andrew Morton <[EMAIL PROTECTED]> Date: Fri, 14 Dec 2007 15:36:33 -0800 > The networking bug looks to be around sock_i_ino()'s taking of > sk_callback_lock with softirq's enabled. Perhaps this will fix it. One should be suspicious of any case where write_lock is performed on sk->sk_callback_lock in softint context. And that's the only way this can trigger, so this patch is wrong. Generally, sock_orphan() and sock_graft() are the only primary places where sk->sk_callback_lock is acquired as a writer. And these should be invoked only from process context. Perhaps there is some exception to this in some specialized layer such as SUNRPC, which are the only other spots I see potentially doing sk->sk_callback_lock write acquires in softint context, which as stated should not be done. OCFS2 and ISCSI seem to be following the rules in it's write lock calls on this lock. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
Witam, > > > > cat /proc/kpageflags on sparc64 causes the box to lock. > > > > I can not write on any terminal - but I can issue sysrqs and switch > > > > between consoles. > > > > > > > > cat process hangs in read(3, ... > > > > > > cat /proc/kpagecount produces similar symptoms. box is locked - sysrq-w > > > sshd trace: > > > > > > __down > > > __down_interruptible > > > kobject_get > > > lock_kernel > > > chrdev_open > > > __dentry_open > > > nameidata_to_filp > > > open_pathname > > > do_sys_open > > > sparc32_open > > > linux_sparc_syscall32 > > > > Perhaps this is related to sparsemem. > > > > Can you change line 710 of fs/proc/proc_misc.c to: > > > > ppage = NULL; > > Sure. > > > ..and see if it still breaks? > > Yes it does - the same way as eariler. Box is locked, processes stuck in D > state > and after a while "BUG: soft lockup - CPU#0 stuck for 11s!". My mistake. I run cat /proc/kpageflags in the first place - so how could anything change :) cat /proc/kpagecount on the other hand - with the change in line 710 - locks the box. Sysrq works, changing consoles works, but there is no "BUG: soft lockup ..." message. After a while the box becomes totaly unresponsive - even caps lock doesn't work, no responses to ping. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
> > > cat /proc/kpageflags on sparc64 causes the box to lock. > > > I can not write on any terminal - but I can issue sysrqs and switch > > > between consoles. > > > > > > cat process hangs in read(3, ... > > > > cat /proc/kpagecount produces similar symptoms. box is locked - sysrq-w > > sshd trace: > > > > __down > > __down_interruptible > > kobject_get > > lock_kernel > > chrdev_open > > __dentry_open > > nameidata_to_filp > > open_pathname > > do_sys_open > > sparc32_open > > linux_sparc_syscall32 > > Perhaps this is related to sparsemem. > > Can you change line 710 of fs/proc/proc_misc.c to: > > ppage = NULL; Sure. > ..and see if it still breaks? Yes it does - the same way as eariler. Box is locked, processes stuck in D state and after a while "BUG: soft lockup - CPU#0 stuck for 11s!". -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Sun, Dec 16, 2007 at 12:40:53PM +0100, Mariusz Kozlowski wrote: > > cat /proc/kpageflags on sparc64 causes the box to lock. > > I can not write on any terminal - but I can issue sysrqs and switch > > between consoles. > > > > cat process hangs in read(3, ... > > cat /proc/kpagecount produces similar symptoms. box is locked - sysrq-w sshd > trace: > > __down > __down_interruptible > kobject_get > lock_kernel > chrdev_open > __dentry_open > nameidata_to_filp > open_pathname > do_sys_open > sparc32_open > linux_sparc_syscall32 Perhaps this is related to sparsemem. Can you change line 710 of fs/proc/proc_misc.c to: ppage = NULL; ..and see if it still breaks? -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/