Re: kernel BUG at kernel/power/snapshot.c:464!

2008-02-18 Thread Rafael J. Wysocki
On Saturday, 9 of February 2008, Rafael J. Wysocki wrote:
> On Friday, 8 of February 2008, Jeff Mahoney wrote:
> > Rafael J. Wysocki wrote:
> > > On Friday, 8 of February 2008, Pavel Machek wrote:
> > >> Hi!
> > >>
> > >>> Our old friend kernel BUG at kernel/power/snapshot.c:464! is back, this
> > >>> time from mainline. I can't reproduce with 2.6.24-final, but I can with
> > >>> a git snapshot from a few days ago. I'm doing a git bisect run now, but
> > >>> it's rather time consuming, so I thought I'd pass this on in the 
> > >>> interim.
> > >>>
> > >>> I can reproduce this just by doing "cat /dev/snapshot".
> > >>>
> > >>> Working output looks like:
> > >>> swsusp: Marking nosave pages: 0009f000 - 0010
> > >>> swsusp: Marking nosave pages: f7ff - 0001
> > >>> swsusp: Basic memory bitmaps created
> > >>> swsusp: Basic memory bitmaps freed
> > >> [EMAIL PROTECTED]:~# cat /dev/snapshot
> > >> cat: /dev/snapshot: No data available
> > >> [EMAIL PROTECTED]:~#
> > >>
> > >> ...on less than two days old 2.6.25-rc0-git. Rafael, do you have any
> > >> ideas what may break?
> > >
> > > No idea and I can't reproduce it.
> > >
> > > Plus the trace looks bogus, as there are no "swsusp: ..." messages in the
> > > mainline any more.
> > 
> > The git version from two days ago did. :)
> > 
> > I just git pulled and built and got the same BUG.
> > 
> > Here are the nosave registration messages:
> > PM: Registered nosave memory: 0009f000 - 000a
> > PM: Registered nosave memory: 000a - 000e
> > PM: Registered nosave memory: 000e - 0010
> > PM: Registered nosave memory: f7ff - f7fff000
> > PM: Registered nosave memory: f7fff000 - f800
> > PM: Registered nosave memory: f800 - ff78
> > PM: Registered nosave memory: ff78 - 0001
> > 
> > And the old swsusp messages match those ranges, just coalesced into two
> > ranges.
> > 
> > Reassembling the zones from /proc/zoneinfo yields:
> > Node 0, zone DMAstart_pfn: 0, spanned 4096
> > (0x0-0x1000)
> > Node 0, zone DMA32  start_pfn: 4096, spanned 1011696
> > (0x1000-0xf7ff0)
> > Node 1, zone Normal start_pfn: 1048576, spanned 1048576
> > (0x10-20)
> 
> Ah, NUMA.
> 
> > The pfn it's searching for is 0xf7ff0, which will end up hitting this in
> > memory_bm_find_bit:
> > while (pfn < zone_bm->start_pfn || pfn >= zone_bm->end_pfn) {
> > zone_bm = zone_bm->next;
> > BUG_ON(!zone_bm)
> > }
> > 
> > Should that be pfn > zone_bm->end_pfn, or is end_pfn non-inclusive?
> 
> It used to be non-inclusive and I think it is, as 0xf7ff0 seems to be the 
> start
> of a reserved region.
> 
> Well, the assumption is that if the PFN doesn't belong to any zone, then
> pfn_valid() in mark_nosave_pages() should filter it out.  Apparently, it has
> stopped doing this at one point.

Andi, Thomas, Ingo,
the source of the bug is that on a K8 NUMA system there is a PFN for which
pfn_valid() returns 'true' and yet it doesn't belong to any zone.  Is there a
valid scenarion in which something like this is possible?  It didn't happen
with 2.6.24.

[Please see http://bugzilla.kernel.org/show_bug.cgi?id=9966 for the reference
to the entire thread.]

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at kernel/power/snapshot.c:464!

2008-02-18 Thread Rafael J. Wysocki
On Saturday, 9 of February 2008, Rafael J. Wysocki wrote:
 On Friday, 8 of February 2008, Jeff Mahoney wrote:
  Rafael J. Wysocki wrote:
   On Friday, 8 of February 2008, Pavel Machek wrote:
   Hi!
  
   Our old friend kernel BUG at kernel/power/snapshot.c:464! is back, this
   time from mainline. I can't reproduce with 2.6.24-final, but I can with
   a git snapshot from a few days ago. I'm doing a git bisect run now, but
   it's rather time consuming, so I thought I'd pass this on in the 
   interim.
  
   I can reproduce this just by doing cat /dev/snapshot.
  
   Working output looks like:
   swsusp: Marking nosave pages: 0009f000 - 0010
   swsusp: Marking nosave pages: f7ff - 0001
   swsusp: Basic memory bitmaps created
   swsusp: Basic memory bitmaps freed
   [EMAIL PROTECTED]:~# cat /dev/snapshot
   cat: /dev/snapshot: No data available
   [EMAIL PROTECTED]:~#
  
   ...on less than two days old 2.6.25-rc0-git. Rafael, do you have any
   ideas what may break?
  
   No idea and I can't reproduce it.
  
   Plus the trace looks bogus, as there are no swsusp: ... messages in the
   mainline any more.
  
  The git version from two days ago did. :)
  
  I just git pulled and built and got the same BUG.
  
  Here are the nosave registration messages:
  PM: Registered nosave memory: 0009f000 - 000a
  PM: Registered nosave memory: 000a - 000e
  PM: Registered nosave memory: 000e - 0010
  PM: Registered nosave memory: f7ff - f7fff000
  PM: Registered nosave memory: f7fff000 - f800
  PM: Registered nosave memory: f800 - ff78
  PM: Registered nosave memory: ff78 - 0001
  
  And the old swsusp messages match those ranges, just coalesced into two
  ranges.
  
  Reassembling the zones from /proc/zoneinfo yields:
  Node 0, zone DMAstart_pfn: 0, spanned 4096
  (0x0-0x1000)
  Node 0, zone DMA32  start_pfn: 4096, spanned 1011696
  (0x1000-0xf7ff0)
  Node 1, zone Normal start_pfn: 1048576, spanned 1048576
  (0x10-20)
 
 Ah, NUMA.
 
  The pfn it's searching for is 0xf7ff0, which will end up hitting this in
  memory_bm_find_bit:
  while (pfn  zone_bm-start_pfn || pfn = zone_bm-end_pfn) {
  zone_bm = zone_bm-next;
  BUG_ON(!zone_bm)
  }
  
  Should that be pfn  zone_bm-end_pfn, or is end_pfn non-inclusive?
 
 It used to be non-inclusive and I think it is, as 0xf7ff0 seems to be the 
 start
 of a reserved region.
 
 Well, the assumption is that if the PFN doesn't belong to any zone, then
 pfn_valid() in mark_nosave_pages() should filter it out.  Apparently, it has
 stopped doing this at one point.

Andi, Thomas, Ingo,
the source of the bug is that on a K8 NUMA system there is a PFN for which
pfn_valid() returns 'true' and yet it doesn't belong to any zone.  Is there a
valid scenarion in which something like this is possible?  It didn't happen
with 2.6.24.

[Please see http://bugzilla.kernel.org/show_bug.cgi?id=9966 for the reference
to the entire thread.]

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at kernel/power/snapshot.c:464!

2008-02-08 Thread Rafael J. Wysocki
On Friday, 8 of February 2008, Jeff Mahoney wrote:
> Rafael J. Wysocki wrote:
> > On Friday, 8 of February 2008, Pavel Machek wrote:
> >> Hi!
> >>
> >>> Our old friend kernel BUG at kernel/power/snapshot.c:464! is back, this
> >>> time from mainline. I can't reproduce with 2.6.24-final, but I can with
> >>> a git snapshot from a few days ago. I'm doing a git bisect run now, but
> >>> it's rather time consuming, so I thought I'd pass this on in the interim.
> >>>
> >>> I can reproduce this just by doing "cat /dev/snapshot".
> >>>
> >>> Working output looks like:
> >>> swsusp: Marking nosave pages: 0009f000 - 0010
> >>> swsusp: Marking nosave pages: f7ff - 0001
> >>> swsusp: Basic memory bitmaps created
> >>> swsusp: Basic memory bitmaps freed
> >> [EMAIL PROTECTED]:~# cat /dev/snapshot
> >> cat: /dev/snapshot: No data available
> >> [EMAIL PROTECTED]:~#
> >>
> >> ...on less than two days old 2.6.25-rc0-git. Rafael, do you have any
> >> ideas what may break?
> >
> > No idea and I can't reproduce it.
> >
> > Plus the trace looks bogus, as there are no "swsusp: ..." messages in the
> > mainline any more.
> 
> The git version from two days ago did. :)
> 
> I just git pulled and built and got the same BUG.
> 
> Here are the nosave registration messages:
> PM: Registered nosave memory: 0009f000 - 000a
> PM: Registered nosave memory: 000a - 000e
> PM: Registered nosave memory: 000e - 0010
> PM: Registered nosave memory: f7ff - f7fff000
> PM: Registered nosave memory: f7fff000 - f800
> PM: Registered nosave memory: f800 - ff78
> PM: Registered nosave memory: ff78 - 0001
> 
> And the old swsusp messages match those ranges, just coalesced into two
> ranges.
> 
> Reassembling the zones from /proc/zoneinfo yields:
> Node 0, zone DMA  start_pfn: 0, spanned 4096
>   (0x0-0x1000)
> Node 0, zone DMA32start_pfn: 4096, spanned 1011696
>   (0x1000-0xf7ff0)
> Node 1, zone Normal   start_pfn: 1048576, spanned 1048576
>   (0x10-20)

Ah, NUMA.

> The pfn it's searching for is 0xf7ff0, which will end up hitting this in
> memory_bm_find_bit:
> while (pfn < zone_bm->start_pfn || pfn >= zone_bm->end_pfn) {
>   zone_bm = zone_bm->next;
>   BUG_ON(!zone_bm)
> }
> 
> Should that be pfn > zone_bm->end_pfn, or is end_pfn non-inclusive?

It used to be non-inclusive and I think it is, as 0xf7ff0 seems to be the start
of a reserved region.

Well, the assumption is that if the PFN doesn't belong to any zone, then
pfn_valid() in mark_nosave_pages() should filter it out.  Apparently, it has
stopped doing this at one point.

Andrew, have we had any changes to the way in which pfn_valid() works recently?

Rafael


> Here's the updated oops, which doesn't look any different:
> 
> [ cut here ]
> kernel BUG at kernel/power/snapshot.c:464!
> invalid opcode:  [1] SMP
> CPU 1
> Modules linked in: ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs
> autofs4 sunrpc iptable_filter ip_tables ip6table_filter ip6_tables
> x_tables ipv6 af_packet loop dm_mod sbp2 ohci1394 ieee1394 k8temp
> amd_rng tg3 i2c_amd8111 hwmon i2c_amd756 floppy shpchp rtc_cmos rtc_core
> rtc_lib sr_mod i2c_core cdrom parport_pc parport pci_hotplug serio_raw
> button sg ohci_hcd sd_mod usbcore edd ext3 mbcache jbd fan sata_sil
> pata_amd libata scsi_mod thermal processor
> Pid: 3165, comm: cat Not tainted 2.6.24-vanilla #20
> RIP: 0010:[]  []
> memory_bm_find_bit+0x20/0x78
> RSP: 0018:8100379bfd78  EFLAGS: 00010246
> RAX:  RBX: 81003480 RCX: 8100379bfd8c
> RDX: 8100379bfd80 RSI: 000f7ff0 RDI: 81003793e5c0
> RBP: 000f7ff0 R08: 8100379bfd80 R09: 
> R10: 0028 R11: 0001 R12: 81003793e5c0
> R13: 81003783f118 R14: 81003783f118 R15: 8100f603e380
> FS:  7f753cff06f0() GS:8100f767ec40() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2: 7f753cb47c30 CR3: f61fc000 CR4: 06e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: 0ff0 DR7: 0400
> Process cat (pid: 3165, threadinfo 8100379be000, task 8100378f0640)
> Stack:  80254cb5 810037837018 0

Re: kernel BUG at kernel/power/snapshot.c:464!

2008-02-08 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Rafael J. Wysocki wrote:
> On Friday, 8 of February 2008, Pavel Machek wrote:
>> Hi!
>>
>>> Our old friend kernel BUG at kernel/power/snapshot.c:464! is back, this
>>> time from mainline. I can't reproduce with 2.6.24-final, but I can with
>>> a git snapshot from a few days ago. I'm doing a git bisect run now, but
>>> it's rather time consuming, so I thought I'd pass this on in the interim.
>>>
>>> I can reproduce this just by doing "cat /dev/snapshot".
>>>
>>> Working output looks like:
>>> swsusp: Marking nosave pages: 0009f000 - 0010
>>> swsusp: Marking nosave pages: f7ff - 0001
>>> swsusp: Basic memory bitmaps created
>>> swsusp: Basic memory bitmaps freed
>> [EMAIL PROTECTED]:~# cat /dev/snapshot
>> cat: /dev/snapshot: No data available
>> [EMAIL PROTECTED]:~# 
>>
>> ...on less than two days old 2.6.25-rc0-git. Rafael, do you have any
>> ideas what may break?
> 
> No idea and I can't reproduce it.
> 
> Plus the trace looks bogus, as there are no "swsusp: ..." messages in the
> mainline any more.

The git version from two days ago did. :)

I just git pulled and built and got the same BUG.

Here are the nosave registration messages:
PM: Registered nosave memory: 0009f000 - 000a
PM: Registered nosave memory: 000a - 000e
PM: Registered nosave memory: 000e - 0010
PM: Registered nosave memory: f7ff - f7fff000
PM: Registered nosave memory: f7fff000 - f800
PM: Registered nosave memory: f800 - ff78
PM: Registered nosave memory: ff78 - 0001

And the old swsusp messages match those ranges, just coalesced into two
ranges.

Reassembling the zones from /proc/zoneinfo yields:
Node 0, zone DMAstart_pfn: 0, spanned 4096
(0x0-0x1000)
Node 0, zone DMA32  start_pfn: 4096, spanned 1011696
(0x1000-0xf7ff0)
Node 1, zone Normal start_pfn: 1048576, spanned 1048576
(0x10-20)

The pfn it's searching for is 0xf7ff0, which will end up hitting this in
memory_bm_find_bit:
while (pfn < zone_bm->start_pfn || pfn >= zone_bm->end_pfn) {
zone_bm = zone_bm->next;
        BUG_ON(!zone_bm)
}

Should that be pfn > zone_bm->end_pfn, or is end_pfn non-inclusive?

- -Jeff

Here's the updated oops, which doesn't look any different:

- [ cut here ]
kernel BUG at kernel/power/snapshot.c:464!
invalid opcode:  [1] SMP
CPU 1
Modules linked in: ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs
autofs4 sunrpc iptable_filter ip_tables ip6table_filter ip6_tables
x_tables ipv6 af_packet loop dm_mod sbp2 ohci1394 ieee1394 k8temp
amd_rng tg3 i2c_amd8111 hwmon i2c_amd756 floppy shpchp rtc_cmos rtc_core
rtc_lib sr_mod i2c_core cdrom parport_pc parport pci_hotplug serio_raw
button sg ohci_hcd sd_mod usbcore edd ext3 mbcache jbd fan sata_sil
pata_amd libata scsi_mod thermal processor
Pid: 3165, comm: cat Not tainted 2.6.24-vanilla #20
RIP: 0010:[]  []
memory_bm_find_bit+0x20/0x78
RSP: 0018:8100379bfd78  EFLAGS: 00010246
RAX:  RBX: 81003480 RCX: 8100379bfd8c
RDX: 8100379bfd80 RSI: 000f7ff0 RDI: 81003793e5c0
RBP: 000f7ff0 R08: 8100379bfd80 R09: 
R10: 0028 R11: 0001 R12: 81003793e5c0
R13: 81003783f118 R14: 81003783f118 R15: 8100f603e380
FS:  7f753cff06f0() GS:8100f767ec40() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f753cb47c30 CR3: f61fc000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process cat (pid: 3165, threadinfo 8100379be000, task 8100378f0640)
Stack:  80254cb5 810037837018 003ff603e380 8025641f
 8100f603e380  81003783f118 80257016
 8100f777fe40 8040dbca 80451aa0 
Call Trace:
 [] ? memory_bm_set_bit+0x11/0x20
 [] ? create_basic_memory_bitmaps+0x134/0x139
 [] ? snapshot_open+0x58/0x13f
 [] ? mutex_lock+0xd/0x1e
 [] ? misc_open+0x13e/0x1b2
 [] ? chrdev_open+0x150/0x174
 [] ? open_namei+0x2d0/0x653
 [] ? chrdev_open+0x0/0x174
 [] ? __dentry_open+0xeb/0x1be
 [] ? do_filp_open+0x2d/0x3d
 [] ? get_unused_fd_flags+0x7f/0x10e
 [] ? do_sys_open+0x46/0xc3
 [] ? system_call_after_swapgs+0x7b/0x80


Code: 00 3d 4f 80 e9 74 8f 1b 00 90 90 48 8b 47 10 49 89 d0 48 3b 70 08
72 06 48 3b 70 10 72 21 48 8b 07 eb 0c 48 8b 00 48 85 c0 75 04 <0f> 0b
eb fe 48 3b 70 08 72 ee 48 3b 70 10 73 e8 48 89 47 10 

Re: kernel BUG at kernel/power/snapshot.c:464!

2008-02-08 Thread Rafael J. Wysocki
On Friday, 8 of February 2008, Pavel Machek wrote:
> Hi!
> 
> > Our old friend kernel BUG at kernel/power/snapshot.c:464! is back, this
> > time from mainline. I can't reproduce with 2.6.24-final, but I can with
> > a git snapshot from a few days ago. I'm doing a git bisect run now, but
> > it's rather time consuming, so I thought I'd pass this on in the interim.
> > 
> > I can reproduce this just by doing "cat /dev/snapshot".
> > 
> > Working output looks like:
> > swsusp: Marking nosave pages: 0009f000 - 0010
> > swsusp: Marking nosave pages: f7ff - 0001
> > swsusp: Basic memory bitmaps created
> > swsusp: Basic memory bitmaps freed
> 
> [EMAIL PROTECTED]:~# cat /dev/snapshot
> cat: /dev/snapshot: No data available
> [EMAIL PROTECTED]:~# 
> 
> ...on less than two days old 2.6.25-rc0-git. Rafael, do you have any
> ideas what may break?

No idea and I can't reproduce it.

Plus the trace looks bogus, as there are no "swsusp: ..." messages in the
mainline any more.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at kernel/power/snapshot.c:464!

2008-02-08 Thread Pavel Machek
Hi!

> Our old friend kernel BUG at kernel/power/snapshot.c:464! is back, this
> time from mainline. I can't reproduce with 2.6.24-final, but I can with
> a git snapshot from a few days ago. I'm doing a git bisect run now, but
> it's rather time consuming, so I thought I'd pass this on in the interim.
> 
> I can reproduce this just by doing "cat /dev/snapshot".
> 
> Working output looks like:
> swsusp: Marking nosave pages: 0009f000 - 0010
> swsusp: Marking nosave pages: f7ff - 0001
> swsusp: Basic memory bitmaps created
> swsusp: Basic memory bitmaps freed

[EMAIL PROTECTED]:~# cat /dev/snapshot
cat: /dev/snapshot: No data available
[EMAIL PROTECTED]:~# 

...on less than two days old 2.6.25-rc0-git. Rafael, do you have any
ideas what may break?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel BUG at kernel/power/snapshot.c:464!

2008-02-08 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Hi Pavel -

Our old friend kernel BUG at kernel/power/snapshot.c:464! is back, this
time from mainline. I can't reproduce with 2.6.24-final, but I can with
a git snapshot from a few days ago. I'm doing a git bisect run now, but
it's rather time consuming, so I thought I'd pass this on in the interim.

I can reproduce this just by doing "cat /dev/snapshot".

Working output looks like:
swsusp: Marking nosave pages: 0009f000 - 0010
swsusp: Marking nosave pages: f7ff - 0001
swsusp: Basic memory bitmaps created
swsusp: Basic memory bitmaps freed

Here's the trace:

swsusp: Marking nosave pages: 0009f000 - 0010
swsusp: Marking nosave pages: f7ff - 0001
- [ cut here ]--------
kernel BUG at kernel/power/snapshot.c:464!
invalid opcode:  [1] SMP
CPU 1
Modules linked in: ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs
autofs4 sunrpc iptable_filter ip_tables ip6table_filter ip6_tables
x_tables ipv6 af_packet loop dm_mod sbp2 ohci1394 ieee1394 parport_pc
parport tg3 sr_mod i2c_amd8111 shpchp button pci_hotplug amd_rng
rtc_cmos rtc_core rtc_lib i2c_amd756 i2c_core k8temp serio_raw cdrom
hwmon sg floppy ohci_hcd sd_mod usbcore edd ext3 mbcache jbd fan
sata_sil pata_amd libata scsi_mod thermal processor
Pid: 3131, comm: cat Not tainted 2.6.24-vanilla #14
RIP: 0010:[]  []
memory_bm_find_bit+0x20/0x78
RSP: 0018:8100f602bd78  EFLAGS: 00010246
RAX:  RBX: 81003480 RCX: 8100f602bd8c
RDX: 8100f602bd80 RSI: 000f7ff0 RDI: 8100f65eadc0
RBP: 000f7ff0 R08: 8100f602bd80 R09: 0001
R10: 8100f602bae8 R11: 80322454 R12: 8100f65eadc0
R13: 8100f67f6d58 R14: 8100f67f6d58 R15: 8100f6138ec0
FS:  2b24042806f0() GS:8100f7688ac0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 2b2403fe1c30 CR3: f69bd000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process cat (pid: 3131, threadinfo 8100f602a000, task
8100379ff040)
Stack:  8025a2bd 8100379ce018 003ff6138ec0 8025ba68
 8100f6138ec0  8100f67f6d58 8025c65f
 8100f77341c0 80412b6a 80445610 
Call Trace:
 [] memory_bm_set_bit+0x11/0x20
 [] create_basic_memory_bitmaps+0x162/0x166
 [] snapshot_open+0x58/0x102
 [] mutex_lock+0xd/0x1e
 [] misc_open+0x13e/0x1b2
 [] chrdev_open+0x158/0x17c
 [] open_namei+0x2d0/0x653

 [] chrdev_open+0x0/0x17c
 [] __dentry_open+0xeb/0x1be
 [] do_filp_open+0x2d/0x3d
 [] get_unused_fd_flags+0x80/0x118
 [] do_sys_open+0x46/0xc3
 [] system_call+0x7e/0x83


Code: 0f 0b eb fe 48 3b 70 08 72 ee 48 3b 70 10 73 e8 48 89 47 10
RIP  [] memory_bm_find_bit+0x20/0x78
 RSP 
- ---[ end trace c6de0b8a8d80da39 ]---


- --
Jeff Mahoney
SUSE Labs
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFHrKd+LPWxlyuTD7IRArWBAJ9+wGj8l2g/NU3B09YTwtM3+8dYIACdGeAt
3F2Cs30J2dCqRXFYe95StO8=
=j1bj
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at kernel/power/snapshot.c:464!

2008-02-08 Thread Rafael J. Wysocki
On Friday, 8 of February 2008, Pavel Machek wrote:
 Hi!
 
  Our old friend kernel BUG at kernel/power/snapshot.c:464! is back, this
  time from mainline. I can't reproduce with 2.6.24-final, but I can with
  a git snapshot from a few days ago. I'm doing a git bisect run now, but
  it's rather time consuming, so I thought I'd pass this on in the interim.
  
  I can reproduce this just by doing cat /dev/snapshot.
  
  Working output looks like:
  swsusp: Marking nosave pages: 0009f000 - 0010
  swsusp: Marking nosave pages: f7ff - 0001
  swsusp: Basic memory bitmaps created
  swsusp: Basic memory bitmaps freed
 
 [EMAIL PROTECTED]:~# cat /dev/snapshot
 cat: /dev/snapshot: No data available
 [EMAIL PROTECTED]:~# 
 
 ...on less than two days old 2.6.25-rc0-git. Rafael, do you have any
 ideas what may break?

No idea and I can't reproduce it.

Plus the trace looks bogus, as there are no swsusp: ... messages in the
mainline any more.

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at kernel/power/snapshot.c:464!

2008-02-08 Thread Pavel Machek
Hi!

 Our old friend kernel BUG at kernel/power/snapshot.c:464! is back, this
 time from mainline. I can't reproduce with 2.6.24-final, but I can with
 a git snapshot from a few days ago. I'm doing a git bisect run now, but
 it's rather time consuming, so I thought I'd pass this on in the interim.
 
 I can reproduce this just by doing cat /dev/snapshot.
 
 Working output looks like:
 swsusp: Marking nosave pages: 0009f000 - 0010
 swsusp: Marking nosave pages: f7ff - 0001
 swsusp: Basic memory bitmaps created
 swsusp: Basic memory bitmaps freed

[EMAIL PROTECTED]:~# cat /dev/snapshot
cat: /dev/snapshot: No data available
[EMAIL PROTECTED]:~# 

...on less than two days old 2.6.25-rc0-git. Rafael, do you have any
ideas what may break?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel BUG at kernel/power/snapshot.c:464!

2008-02-08 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Hi Pavel -

Our old friend kernel BUG at kernel/power/snapshot.c:464! is back, this
time from mainline. I can't reproduce with 2.6.24-final, but I can with
a git snapshot from a few days ago. I'm doing a git bisect run now, but
it's rather time consuming, so I thought I'd pass this on in the interim.

I can reproduce this just by doing cat /dev/snapshot.

Working output looks like:
swsusp: Marking nosave pages: 0009f000 - 0010
swsusp: Marking nosave pages: f7ff - 0001
swsusp: Basic memory bitmaps created
swsusp: Basic memory bitmaps freed

Here's the trace:

swsusp: Marking nosave pages: 0009f000 - 0010
swsusp: Marking nosave pages: f7ff - 0001
- [ cut here ]
kernel BUG at kernel/power/snapshot.c:464!
invalid opcode:  [1] SMP
CPU 1
Modules linked in: ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs
autofs4 sunrpc iptable_filter ip_tables ip6table_filter ip6_tables
x_tables ipv6 af_packet loop dm_mod sbp2 ohci1394 ieee1394 parport_pc
parport tg3 sr_mod i2c_amd8111 shpchp button pci_hotplug amd_rng
rtc_cmos rtc_core rtc_lib i2c_amd756 i2c_core k8temp serio_raw cdrom
hwmon sg floppy ohci_hcd sd_mod usbcore edd ext3 mbcache jbd fan
sata_sil pata_amd libata scsi_mod thermal processor
Pid: 3131, comm: cat Not tainted 2.6.24-vanilla #14
RIP: 0010:[8025a254]  [8025a254]
memory_bm_find_bit+0x20/0x78
RSP: 0018:8100f602bd78  EFLAGS: 00010246
RAX:  RBX: 81003480 RCX: 8100f602bd8c
RDX: 8100f602bd80 RSI: 000f7ff0 RDI: 8100f65eadc0
RBP: 000f7ff0 R08: 8100f602bd80 R09: 0001
R10: 8100f602bae8 R11: 80322454 R12: 8100f65eadc0
R13: 8100f67f6d58 R14: 8100f67f6d58 R15: 8100f6138ec0
FS:  2b24042806f0() GS:8100f7688ac0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 2b2403fe1c30 CR3: f69bd000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process cat (pid: 3131, threadinfo 8100f602a000, task
8100379ff040)
Stack:  8025a2bd 8100379ce018 003ff6138ec0 8025ba68
 8100f6138ec0  8100f67f6d58 8025c65f
 8100f77341c0 80412b6a 80445610 
Call Trace:
 [8025a2bd] memory_bm_set_bit+0x11/0x20
 [8025ba68] create_basic_memory_bitmaps+0x162/0x166
 [8025c65f] snapshot_open+0x58/0x102
 [80412b6a] mutex_lock+0xd/0x1e
 [80361d8a] misc_open+0x13e/0x1b2
 [802990bf] chrdev_open+0x158/0x17c
 [802a12b1] open_namei+0x2d0/0x653

 [80298f67] chrdev_open+0x0/0x17c
 [8029527a] __dentry_open+0xeb/0x1be
 [80295404] do_filp_open+0x2d/0x3d
 [802950f0] get_unused_fd_flags+0x80/0x118
 [8029545a] do_sys_open+0x46/0xc3
 [8020bfde] system_call+0x7e/0x83


Code: 0f 0b eb fe 48 3b 70 08 72 ee 48 3b 70 10 73 e8 48 89 47 10
RIP  [8025a254] memory_bm_find_bit+0x20/0x78
 RSP 8100f602bd78
- ---[ end trace c6de0b8a8d80da39 ]---


- --
Jeff Mahoney
SUSE Labs
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFHrKd+LPWxlyuTD7IRArWBAJ9+wGj8l2g/NU3B09YTwtM3+8dYIACdGeAt
3F2Cs30J2dCqRXFYe95StO8=
=j1bj
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at kernel/power/snapshot.c:464!

2008-02-08 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Rafael J. Wysocki wrote:
 On Friday, 8 of February 2008, Pavel Machek wrote:
 Hi!

 Our old friend kernel BUG at kernel/power/snapshot.c:464! is back, this
 time from mainline. I can't reproduce with 2.6.24-final, but I can with
 a git snapshot from a few days ago. I'm doing a git bisect run now, but
 it's rather time consuming, so I thought I'd pass this on in the interim.

 I can reproduce this just by doing cat /dev/snapshot.

 Working output looks like:
 swsusp: Marking nosave pages: 0009f000 - 0010
 swsusp: Marking nosave pages: f7ff - 0001
 swsusp: Basic memory bitmaps created
 swsusp: Basic memory bitmaps freed
 [EMAIL PROTECTED]:~# cat /dev/snapshot
 cat: /dev/snapshot: No data available
 [EMAIL PROTECTED]:~# 

 ...on less than two days old 2.6.25-rc0-git. Rafael, do you have any
 ideas what may break?
 
 No idea and I can't reproduce it.
 
 Plus the trace looks bogus, as there are no swsusp: ... messages in the
 mainline any more.

The git version from two days ago did. :)

I just git pulled and built and got the same BUG.

Here are the nosave registration messages:
PM: Registered nosave memory: 0009f000 - 000a
PM: Registered nosave memory: 000a - 000e
PM: Registered nosave memory: 000e - 0010
PM: Registered nosave memory: f7ff - f7fff000
PM: Registered nosave memory: f7fff000 - f800
PM: Registered nosave memory: f800 - ff78
PM: Registered nosave memory: ff78 - 0001

And the old swsusp messages match those ranges, just coalesced into two
ranges.

Reassembling the zones from /proc/zoneinfo yields:
Node 0, zone DMAstart_pfn: 0, spanned 4096
(0x0-0x1000)
Node 0, zone DMA32  start_pfn: 4096, spanned 1011696
(0x1000-0xf7ff0)
Node 1, zone Normal start_pfn: 1048576, spanned 1048576
(0x10-20)

The pfn it's searching for is 0xf7ff0, which will end up hitting this in
memory_bm_find_bit:
while (pfn  zone_bm-start_pfn || pfn = zone_bm-end_pfn) {
zone_bm = zone_bm-next;
BUG_ON(!zone_bm)
}

Should that be pfn  zone_bm-end_pfn, or is end_pfn non-inclusive?

- -Jeff

Here's the updated oops, which doesn't look any different:

- [ cut here ]
kernel BUG at kernel/power/snapshot.c:464!
invalid opcode:  [1] SMP
CPU 1
Modules linked in: ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs
autofs4 sunrpc iptable_filter ip_tables ip6table_filter ip6_tables
x_tables ipv6 af_packet loop dm_mod sbp2 ohci1394 ieee1394 k8temp
amd_rng tg3 i2c_amd8111 hwmon i2c_amd756 floppy shpchp rtc_cmos rtc_core
rtc_lib sr_mod i2c_core cdrom parport_pc parport pci_hotplug serio_raw
button sg ohci_hcd sd_mod usbcore edd ext3 mbcache jbd fan sata_sil
pata_amd libata scsi_mod thermal processor
Pid: 3165, comm: cat Not tainted 2.6.24-vanilla #20
RIP: 0010:[80254c4c]  [80254c4c]
memory_bm_find_bit+0x20/0x78
RSP: 0018:8100379bfd78  EFLAGS: 00010246
RAX:  RBX: 81003480 RCX: 8100379bfd8c
RDX: 8100379bfd80 RSI: 000f7ff0 RDI: 81003793e5c0
RBP: 000f7ff0 R08: 8100379bfd80 R09: 
R10: 0028 R11: 0001 R12: 81003793e5c0
R13: 81003783f118 R14: 81003783f118 R15: 8100f603e380
FS:  7f753cff06f0() GS:8100f767ec40() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f753cb47c30 CR3: f61fc000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process cat (pid: 3165, threadinfo 8100379be000, task 8100378f0640)
Stack:  80254cb5 810037837018 003ff603e380 8025641f
 8100f603e380  81003783f118 80257016
 8100f777fe40 8040dbca 80451aa0 
Call Trace:
 [80254cb5] ? memory_bm_set_bit+0x11/0x20
 [8025641f] ? create_basic_memory_bitmaps+0x134/0x139
 [80257016] ? snapshot_open+0x58/0x13f
 [8040dbca] ? mutex_lock+0xd/0x1e
 [8035fb76] ? misc_open+0x13e/0x1b2
 [80294459] ? chrdev_open+0x150/0x174
 [8029c495] ? open_namei+0x2d0/0x653
 [80294309] ? chrdev_open+0x0/0x174
 [802906dc] ? __dentry_open+0xeb/0x1be
 [80290866] ? do_filp_open+0x2d/0x3d
 [8029055b] ? get_unused_fd_flags+0x7f/0x10e
 [802908bc] ? do_sys_open+0x46/0xc3
 [8020befb] ? system_call_after_swapgs+0x7b/0x80


Code: 00 3d 4f 80 e9 74 8f 1b 00 90 90 48 8b 47 10 49 89 d0 48 3b 70 08
72 06 48 3b 70 10 72 21 48 8b 07 eb 0c 48 8b 00 48 85 c0 75 04 0f 0b
eb fe 48 3b 70 08 72 ee 48 3b 70 10 73 e8 48 89 47 10 48
RIP  [80254c4c] memory_bm_find_bit+0x20/0x78

Re: kernel BUG at kernel/power/snapshot.c:464!

2008-02-08 Thread Rafael J. Wysocki
On Friday, 8 of February 2008, Jeff Mahoney wrote:
 Rafael J. Wysocki wrote:
  On Friday, 8 of February 2008, Pavel Machek wrote:
  Hi!
 
  Our old friend kernel BUG at kernel/power/snapshot.c:464! is back, this
  time from mainline. I can't reproduce with 2.6.24-final, but I can with
  a git snapshot from a few days ago. I'm doing a git bisect run now, but
  it's rather time consuming, so I thought I'd pass this on in the interim.
 
  I can reproduce this just by doing cat /dev/snapshot.
 
  Working output looks like:
  swsusp: Marking nosave pages: 0009f000 - 0010
  swsusp: Marking nosave pages: f7ff - 0001
  swsusp: Basic memory bitmaps created
  swsusp: Basic memory bitmaps freed
  [EMAIL PROTECTED]:~# cat /dev/snapshot
  cat: /dev/snapshot: No data available
  [EMAIL PROTECTED]:~#
 
  ...on less than two days old 2.6.25-rc0-git. Rafael, do you have any
  ideas what may break?
 
  No idea and I can't reproduce it.
 
  Plus the trace looks bogus, as there are no swsusp: ... messages in the
  mainline any more.
 
 The git version from two days ago did. :)
 
 I just git pulled and built and got the same BUG.
 
 Here are the nosave registration messages:
 PM: Registered nosave memory: 0009f000 - 000a
 PM: Registered nosave memory: 000a - 000e
 PM: Registered nosave memory: 000e - 0010
 PM: Registered nosave memory: f7ff - f7fff000
 PM: Registered nosave memory: f7fff000 - f800
 PM: Registered nosave memory: f800 - ff78
 PM: Registered nosave memory: ff78 - 0001
 
 And the old swsusp messages match those ranges, just coalesced into two
 ranges.
 
 Reassembling the zones from /proc/zoneinfo yields:
 Node 0, zone DMA  start_pfn: 0, spanned 4096
   (0x0-0x1000)
 Node 0, zone DMA32start_pfn: 4096, spanned 1011696
   (0x1000-0xf7ff0)
 Node 1, zone Normal   start_pfn: 1048576, spanned 1048576
   (0x10-20)

Ah, NUMA.

 The pfn it's searching for is 0xf7ff0, which will end up hitting this in
 memory_bm_find_bit:
 while (pfn  zone_bm-start_pfn || pfn = zone_bm-end_pfn) {
   zone_bm = zone_bm-next;
   BUG_ON(!zone_bm)
 }
 
 Should that be pfn  zone_bm-end_pfn, or is end_pfn non-inclusive?

It used to be non-inclusive and I think it is, as 0xf7ff0 seems to be the start
of a reserved region.

Well, the assumption is that if the PFN doesn't belong to any zone, then
pfn_valid() in mark_nosave_pages() should filter it out.  Apparently, it has
stopped doing this at one point.

Andrew, have we had any changes to the way in which pfn_valid() works recently?

Rafael


 Here's the updated oops, which doesn't look any different:
 
 [ cut here ]
 kernel BUG at kernel/power/snapshot.c:464!
 invalid opcode:  [1] SMP
 CPU 1
 Modules linked in: ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs
 autofs4 sunrpc iptable_filter ip_tables ip6table_filter ip6_tables
 x_tables ipv6 af_packet loop dm_mod sbp2 ohci1394 ieee1394 k8temp
 amd_rng tg3 i2c_amd8111 hwmon i2c_amd756 floppy shpchp rtc_cmos rtc_core
 rtc_lib sr_mod i2c_core cdrom parport_pc parport pci_hotplug serio_raw
 button sg ohci_hcd sd_mod usbcore edd ext3 mbcache jbd fan sata_sil
 pata_amd libata scsi_mod thermal processor
 Pid: 3165, comm: cat Not tainted 2.6.24-vanilla #20
 RIP: 0010:[80254c4c]  [80254c4c]
 memory_bm_find_bit+0x20/0x78
 RSP: 0018:8100379bfd78  EFLAGS: 00010246
 RAX:  RBX: 81003480 RCX: 8100379bfd8c
 RDX: 8100379bfd80 RSI: 000f7ff0 RDI: 81003793e5c0
 RBP: 000f7ff0 R08: 8100379bfd80 R09: 
 R10: 0028 R11: 0001 R12: 81003793e5c0
 R13: 81003783f118 R14: 81003783f118 R15: 8100f603e380
 FS:  7f753cff06f0() GS:8100f767ec40() knlGS:
 CS:  0010 DS:  ES:  CR0: 8005003b
 CR2: 7f753cb47c30 CR3: f61fc000 CR4: 06e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Process cat (pid: 3165, threadinfo 8100379be000, task 8100378f0640)
 Stack:  80254cb5 810037837018 003ff603e380 8025641f
  8100f603e380  81003783f118 80257016
  8100f777fe40 8040dbca 80451aa0 
 Call Trace:
  [80254cb5] ? memory_bm_set_bit+0x11/0x20
  [8025641f] ? create_basic_memory_bitmaps+0x134/0x139
  [80257016] ? snapshot_open+0x58/0x13f
  [8040dbca] ? mutex_lock+0xd/0x1e
  [8035fb76] ? misc_open+0x13e/0x1b2
  [80294459] ? chrdev_open+0x150/0x174
  [8029c495] ? open_namei+0x2d0/0x653
  [80294309] ? chrdev_open+0x0/0x174
  [802906dc] ? __dentry_open