Re: mm: gpf in find_vma

2013-09-18 Thread Dan Merillat
Resent due to Thunderbird completely mangling it the first time around:
(Apologies if this is a third copy, gmail told me it didn't send)

On 09/07/2013 05:32 PM, Sasha Levin wrote:
> Hi all,
> 
> While fuzzing with trinity inside a KVM tools guest, running latest
> -next kernel, I've
> stumbled on the following:
> 

> The disassembly is:
> 
> /* Check the cache first. */
> /* (Cache hit rate is typically around 35%.) */
> vma = ACCESS_ONCE(mm->mmap_cache);
>  1f9:   48 8b 47 10 mov0x10(%rdi),%rax
> if (!(vma && vma->vm_end > addr && vma->vm_start <= addr)) {
>  1fd:   48 85 c0test   %rax,%rax
>  200:   74 0b   je 20d 
>  202:   48 39 70 08 cmp%rsi,0x8(%rax)<--- here
>  206:   76 05   jbe20d 
>  208:   48 3b 30cmp(%rax),%rsi
>  20b:   73 4d   jae25a 

I may have hit the same thing earlier this morning:
  191:   48 8b 47 08 mov0x8(%rdi),%rax
  195:   31 d2   xor%edx,%edx
  197:   48 85 c0test   %rax,%rax
  19a:   74 1c   je 1b8 
  19c:   48 39 70 e8 cmp%rsi,-0x18(%rax)<-- here
  1a0:   76 10   jbe1b2 
  1a2:   48 39 70 e0 cmp%rsi,-0x20(%rax)
  1a6:   48 8d 50 e0 lea-0x20(%rax),%rdx
  1aa:   76 14   jbe1c0 

Except I got there via munmap():

Sep 18 04:58:04 kernel: [563331.668961] general protection fault:  [#1] 
PREEMPT SMP
Sep 18 04:58:04 kernel: [563331.669009] Modules linked in: sha1_generic cts 
powernow_k8 nfnetlink_queue nfnetlink_log binfmt_misc rpcsec_gss_krb5 fuse it87 
hwmon_vid loop pl2303 usbserial vhost_net tun vhost kvm_amd kvm hid_generic 
snd_hda_codec_hdmi snd_hda_codec_realtek pcspkr rtc_cmos snd_hda_intel 
snd_hda_codec snd_hwdep snd_pcm snd_seq snd_seq_device wmi snd_timer mperf 
radeon drm_kms_helper snd ttm drm backlight i2c_algo_bit i2c_piix4 k8temp 
soundcore i2c_core snd_page_alloc ohci_pci ohci_hcd ide_pci_generic 
firewire_ohci firewire_core ehci_pci atiixp ide_core pata_acpi ehci_hcd
Sep 18 04:58:04 kernel: [563331.669009] CPU: 0 PID: 3937 Comm: Xorg Not tainted 
3.11.0-rc6-dan #1
Sep 18 04:58:04 kernel: [563331.669009] Hardware name: Gigabyte Technology Co., 
Ltd. GA-MA78GPM-DS2H/GA-MA78GPM-DS2H, BIOS F6h 12/25/2010
Sep 18 04:58:04 kernel: [563331.669009] task: 88021d8f9700 ti: 
88021d66a000 task.ti: 88021d66a000
Sep 18 04:58:04 kernel: [563331.669009] RIP: 0010:[]  
[] find_vma+0x23/0x50
Sep 18 04:58:04 kernel: [563331.669009] RSP: 0018:88021d66bed0  EFLAGS: 
00010206
Sep 18 04:58:04 kernel: [563331.669009] RAX: 00ff8801e8e00ba0 RBX: 
880212a3f0c0 RCX: 
Sep 18 04:58:04 kernel: [563331.669009] RDX: 8801ae075f18 RSI: 
7feef8258000 RDI: 880212a3f0c0
Sep 18 04:58:04 kernel: [563331.669009] RBP: 88021d66bed0 R08: 
 R09: 00d1
Sep 18 04:58:04 kernel: [563331.669009] R10:  R11: 
0206 R12: 880212a3f0c0
Sep 18 04:58:04 kernel: [563331.669009] R13: 7feef8258000 R14: 
1000 R15: 7feef8258000
Sep 18 04:58:04 kernel: [563331.669009] FS:  7feefe54b880() 
GS:880227c0() knlGS:f2640980
Sep 18 04:58:04 kernel: [563331.669009] CS:  0010 DS:  ES:  CR0: 
80050033
Sep 18 04:58:04 kernel: [563331.669009] CR2: 7feef7486000 CR3: 
0002113d3000 CR4: 07f0
Sep 18 04:58:04 kernel: [563331.669009] Stack:
Sep 18 04:58:04 kernel: [563331.669009]  88021d66bf20 810eace0 
88021e23a420 88021b411600
Sep 18 04:58:04 kernel: [563331.669009]  7feef8258000 880212a3f110 
880212a3f0c0 7feef8258000
Sep 18 04:58:04 kernel: [563331.669009]  1000 002f 
88021d66bf58 810eaf1e
Sep 18 04:58:04 kernel: [563331.669009] Call Trace:
Sep 18 04:58:04 kernel: [563331.669009]  [] 
do_munmap+0xdd/0x2de
Sep 18 04:58:04 kernel: [563331.669009]  [] 
vm_munmap+0x3d/0x56
Sep 18 04:58:04 kernel: [563331.669009]  [] 
SyS_munmap+0x1e/0x24
Sep 18 04:58:04 kernel: [563331.669009]  [] 
system_call_fastpath+0x1a/0x1f
Sep 18 04:58:04 kernel: [563331.669009] Code: 85 c9 74 cb eb e4 5d c3 48 8b 47 
10 55 48 89 e5 48 85 c0 74 0b 48 39 70 08 76 05 48 39 30 76 36 48 8b 47 08 31 
d2 48 85 c0 74 1c <48> 39 70 e8 76 10 48 39 70 e0 48 8d 50 e0 76 14 48 8b 40 10 
eb
Sep 18 04:58:04 kernel: [563331.669009] RIP  [] 
find_vma+0x23/0x50
Sep 18 04:58:04 kernel: [563331.669009]  RSP 
Sep 18 04:58:04 kernel: [563331.690510] ---[ end trace 0b78e99bd4849eb8 ]---

This is possibly related, same machine, same path, same origin (Xorg,
probably cookie clicker causing lots of allocation churn on both bugs)
but an older kernel:

Sep 11 13:17:33 kernel: [12808122.743464] general protection fault: 

Re: mm: gpf in find_vma

2013-09-18 Thread Dan Merillat
On 09/07/2013 05:32 PM, Sasha Levin wrote:
> Hi all,
> 
> While fuzzing with trinity inside a KVM tools guest, running latest
> -next kernel, I've
> stumbled on the following:
> 

> The disassembly is:
> 
> /* Check the cache first. */
> /* (Cache hit rate is typically around 35%.) */
> vma = ACCESS_ONCE(mm->mmap_cache);
>  1f9:   48 8b 47 10 mov0x10(%rdi),%rax
> if (!(vma && vma->vm_end > addr && vma->vm_start <= addr)) {
>  1fd:   48 85 c0test   %rax,%rax
>  200:   74 0b   je 20d 
>  202:   48 39 70 08 cmp%rsi,0x8(%rax)<--- here
>  206:   76 05   jbe20d 
>  208:   48 3b 30cmp(%rax),%rsi
>  20b:   73 4d   jae25a 

I may have hit the same thing earlier this morning:
  191:   48 8b 47 08 mov0x8(%rdi),%rax
  195:   31 d2   xor%edx,%edx
  197:   48 85 c0test   %rax,%rax
  19a:   74 1c   je 1b8 
  19c:   48 39 70 e8 cmp%rsi,-0x18(%rax)<-- here
  1a0:   76 10   jbe1b2 
  1a2:   48 39 70 e0 cmp%rsi,-0x20(%rax)
  1a6:   48 8d 50 e0 lea-0x20(%rax),%rdx
  1aa:   76 14   jbe1c0 

Except I got there via munmap():

Sep 18 04:58:04 kernel: [563331.669009] CPU: 0 PID: 3937 Comm: Xorg Not
tainted 3.11.0-rc6-dan #1
Sep 18 04:58:04 kernel: [563331.669009] Hardware name: Gigabyte
Technology Co., Ltd. GA-MA78GPM-DS2H/GA-MA78GPM-DS2H, BIOS F6h 12/25/2010
Sep 18 04:58:04 kernel: [563331.669009] task: 88021d8f9700 ti:
88021d66a000 task.ti: 88021d66a000 Sep 18 04:58:04 kernel:
[563331.669009] RIP: 0010:[]  []
find_vma+0x23/0x50
Sep 18 04:58:04 kernel: [563331.669009] RSP: 0018:88021d66bed0
EFLAGS: 00010206
Sep 18 04:58:04 kernel: [563331.669009] RAX: 00ff8801e8e00ba0 RBX:
880212a3f0c0 RCX: 
Sep 18 04:58:04 kernel: [563331.669009] RDX: 8801ae075f18 RSI:
7feef8258000 RDI: 880212a3f0c0
Sep 18 04:58:04 kernel: [563331.669009] RBP: 88021d66bed0 R08:
 R09: 00d1
Sep 18 04:58:04 kernel: [563331.669009] R10:  R11:
0206 R12: 880212a3f0c0
Sep 18 04:58:04 kernel: [563331.669009] R13: 7feef8258000 R14:
1000 R15: 7feef8258000
Sep 18 04:58:04 kernel: [563331.669009] FS:  7feefe54b880()
GS:880227c0() knlGS:f2640980
Sep 18 04:58:04 kernel: [563331.669009] CS:  0010 DS:  ES:  CR0:
80050033
Sep 18 04:58:04 kernel: [563331.669009] CR2: 7feef7486000 CR3:
0002113d3000 CR4: 07f0
Sep 18 04:58:04 kernel: [563331.669009] Stack:
Sep 18 04:58:04 kernel: [563331.669009]  88021d66bf20
810eace0 88021e23a420 88021b411600
Sep 18 04:58:04 kernel: [563331.669009]  7feef8258000
880212a3f110 880212a3f0c0 7feef8258000
Sep 18 04:58:04 kernel: [563331.669009]  1000
002f 88021d66bf58 810eaf1e
Sep 18 04:58:04 kernel: [563331.669009] Call Trace:
Sep 18 04:58:04 kernel: [563331.669009]  []
do_munmap+0xdd/0x2de
Sep 18 04:58:04 kernel: [563331.669009]  []
vm_munmap+0x3d/0x56
Sep 18 04:58:04 kernel: [563331.669009]  []
SyS_munmap+0x1e/0x24
Sep 18 04:58:04 kernel: [563331.669009]  []
system_call_fastpath+0x1a/0x1f
Sep 18 04:58:04 kernel: [563331.669009] Code: 85 c9 74 cb eb e4 5d c3 48
8b 47 10 55 48 89 e5 48 85 c0 74 0b 48 39 70 08 76 05 48 39 30 76 36 48
8b 47 08 31 d2 48 85 c0 74 1c <48> 39 70 e8 76 10 48 39 70 e0 48 8d 50
e0 76 14 48 8b 40 10 eb
Sep 18 04:58:04 kernel: [563331.669009] RIP  []
find_vma+0x23/0x50
Sep 18 04:58:04 kernel: [563331.669009]  RSP 
Sep 18 04:58:04 kernel: [563331.690510] ---[ end trace 0b78e99bd4849eb8 ]---

This is possibly related, same machine, same path, same origin (Xorg,
probably cookie clicker causing lots of allocation churn on both bugs)
but an older kernel:

Sep 11 13:17:33 kernel: [12808122.743464] general protection fault: 
[#3] PREEMPT SMP
Sep 11 13:17:33 kernel: [12808122.746610] Modules linked in: uvcvideo
videobuf2_vmalloc videobuf2_memops videobuf2_core videodev iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
ip_tables x_tables pl2303 usbserial nfnetlink_queue nfnetlink_log ntfs
msdos reiserfs ext4 jbd2 ext3 jbd fuse arc4 ecb md4 sha256_generic
nls_utf8 cifs fscache cdc_acm efivars nls_cp437 vfat fat sg usb_storage
binfmt_misc rpcsec_gss_krb5 it87 hwmon_vid loop hid_generic
snd_hda_codec_hdmi snd_hda_codec_realtek powernow_k8 kvm_amd kvm pcspkr
k8temp snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc
snd_seq snd_seq_device snd_timer i2c_piix4 rtc_cmos radeon snd
drm_kms_helper ehci_pci ttm drm backlight i2c_algo_bit i2c_core wmi
soundcore ide_pci_generic atiixp ide_core firewire_ohci firewire_core
pata_acpi 

Re: mm: gpf in find_vma

2013-09-18 Thread Dan Merillat
Resent due to Thunderbird completely mangling it the first time around:

On 09/07/2013 05:32 PM, Sasha Levin wrote:
> Hi all,
> 
> While fuzzing with trinity inside a KVM tools guest, running latest
> -next kernel, I've
> stumbled on the following:
> 

> The disassembly is:
> 
> /* Check the cache first. */
> /* (Cache hit rate is typically around 35%.) */
> vma = ACCESS_ONCE(mm->mmap_cache);
>  1f9:   48 8b 47 10 mov0x10(%rdi),%rax
> if (!(vma && vma->vm_end > addr && vma->vm_start <= addr)) {
>  1fd:   48 85 c0test   %rax,%rax
>  200:   74 0b   je 20d 
>  202:   48 39 70 08 cmp%rsi,0x8(%rax)<--- here
>  206:   76 05   jbe20d 
>  208:   48 3b 30cmp(%rax),%rsi
>  20b:   73 4d   jae25a 

I may have hit the same thing earlier this morning:
  191:   48 8b 47 08 mov0x8(%rdi),%rax
  195:   31 d2   xor%edx,%edx
  197:   48 85 c0test   %rax,%rax
  19a:   74 1c   je 1b8 
  19c:   48 39 70 e8 cmp%rsi,-0x18(%rax)<-- here
  1a0:   76 10   jbe1b2 
  1a2:   48 39 70 e0 cmp%rsi,-0x20(%rax)
  1a6:   48 8d 50 e0 lea-0x20(%rax),%rdx
  1aa:   76 14   jbe1c0 

Except I got there via munmap():

Sep 18 04:58:04 kernel: [563331.668961] general protection fault:  [#1] 
PREEMPT SMP
Sep 18 04:58:04 kernel: [563331.669009] Modules linked in: sha1_generic cts 
powernow_k8 nfnetlink_queue nfnetlink_log binfmt_misc rpcsec_gss_krb5 fuse it87 
hwmon_vid loop pl2303 usbserial vhost_net tun vhost kvm_amd kvm hid_generic 
snd_hda_codec_hdmi snd_hda_codec_realtek pcspkr rtc_cmos snd_hda_intel 
snd_hda_codec snd_hwdep snd_pcm snd_seq snd_seq_device wmi snd_timer mperf 
radeon drm_kms_helper snd ttm drm backlight i2c_algo_bit i2c_piix4 k8temp 
soundcore i2c_core snd_page_alloc ohci_pci ohci_hcd ide_pci_generic 
firewire_ohci firewire_core ehci_pci atiixp ide_core pata_acpi ehci_hcd
Sep 18 04:58:04 kernel: [563331.669009] CPU: 0 PID: 3937 Comm: Xorg Not tainted 
3.11.0-rc6-dan #1
Sep 18 04:58:04 kernel: [563331.669009] Hardware name: Gigabyte Technology Co., 
Ltd. GA-MA78GPM-DS2H/GA-MA78GPM-DS2H, BIOS F6h 12/25/2010
Sep 18 04:58:04 kernel: [563331.669009] task: 88021d8f9700 ti: 
88021d66a000 task.ti: 88021d66a000
Sep 18 04:58:04 kernel: [563331.669009] RIP: 0010:[]  
[] find_vma+0x23/0x50
Sep 18 04:58:04 kernel: [563331.669009] RSP: 0018:88021d66bed0  EFLAGS: 
00010206
Sep 18 04:58:04 kernel: [563331.669009] RAX: 00ff8801e8e00ba0 RBX: 
880212a3f0c0 RCX: 
Sep 18 04:58:04 kernel: [563331.669009] RDX: 8801ae075f18 RSI: 
7feef8258000 RDI: 880212a3f0c0
Sep 18 04:58:04 kernel: [563331.669009] RBP: 88021d66bed0 R08: 
 R09: 00d1
Sep 18 04:58:04 kernel: [563331.669009] R10:  R11: 
0206 R12: 880212a3f0c0
Sep 18 04:58:04 kernel: [563331.669009] R13: 7feef8258000 R14: 
1000 R15: 7feef8258000
Sep 18 04:58:04 kernel: [563331.669009] FS:  7feefe54b880() 
GS:880227c0() knlGS:f2640980
Sep 18 04:58:04 kernel: [563331.669009] CS:  0010 DS:  ES:  CR0: 
80050033
Sep 18 04:58:04 kernel: [563331.669009] CR2: 7feef7486000 CR3: 
0002113d3000 CR4: 07f0
Sep 18 04:58:04 kernel: [563331.669009] Stack:
Sep 18 04:58:04 kernel: [563331.669009]  88021d66bf20 810eace0 
88021e23a420 88021b411600
Sep 18 04:58:04 kernel: [563331.669009]  7feef8258000 880212a3f110 
880212a3f0c0 7feef8258000
Sep 18 04:58:04 kernel: [563331.669009]  1000 002f 
88021d66bf58 810eaf1e
Sep 18 04:58:04 kernel: [563331.669009] Call Trace:
Sep 18 04:58:04 kernel: [563331.669009]  [] 
do_munmap+0xdd/0x2de
Sep 18 04:58:04 kernel: [563331.669009]  [] 
vm_munmap+0x3d/0x56
Sep 18 04:58:04 kernel: [563331.669009]  [] 
SyS_munmap+0x1e/0x24
Sep 18 04:58:04 kernel: [563331.669009]  [] 
system_call_fastpath+0x1a/0x1f
Sep 18 04:58:04 kernel: [563331.669009] Code: 85 c9 74 cb eb e4 5d c3 48 8b 47 
10 55 48 89 e5 48 85 c0 74 0b 48 39 70 08 76 05 48 39 30 76 36 48 8b 47 08 31 
d2 48 85 c0 74 1c <48> 39 70 e8 76 10 48 39 70 e0 48 8d 50 e0 76 14 48 8b 40 10 
eb
Sep 18 04:58:04 kernel: [563331.669009] RIP  [] 
find_vma+0x23/0x50
Sep 18 04:58:04 kernel: [563331.669009]  RSP 
Sep 18 04:58:04 kernel: [563331.690510] ---[ end trace 0b78e99bd4849eb8 ]---

This is possibly related, same machine, same path, same origin (Xorg,
probably cookie clicker causing lots of allocation churn on both bugs)
but an older kernel:

Sep 11 13:17:33 kernel: [12808122.743464] general protection fault:  [#3] 
PREEMPT SMP
Sep 11 13:17:33 kernel: [12808122.746610] 

Re: mm: gpf in find_vma

2013-09-18 Thread Dan Merillat
Resent due to Thunderbird completely mangling it the first time around:

On 09/07/2013 05:32 PM, Sasha Levin wrote:
 Hi all,
 
 While fuzzing with trinity inside a KVM tools guest, running latest
 -next kernel, I've
 stumbled on the following:
 

 The disassembly is:
 
 /* Check the cache first. */
 /* (Cache hit rate is typically around 35%.) */
 vma = ACCESS_ONCE(mm-mmap_cache);
  1f9:   48 8b 47 10 mov0x10(%rdi),%rax
 if (!(vma  vma-vm_end  addr  vma-vm_start = addr)) {
  1fd:   48 85 c0test   %rax,%rax
  200:   74 0b   je 20d find_vma+0x1d
  202:   48 39 70 08 cmp%rsi,0x8(%rax)--- here
  206:   76 05   jbe20d find_vma+0x1d
  208:   48 3b 30cmp(%rax),%rsi
  20b:   73 4d   jae25a find_vma+0x6a

I may have hit the same thing earlier this morning:
  191:   48 8b 47 08 mov0x8(%rdi),%rax
  195:   31 d2   xor%edx,%edx
  197:   48 85 c0test   %rax,%rax
  19a:   74 1c   je 1b8 find_vma+0x3f
  19c:   48 39 70 e8 cmp%rsi,-0x18(%rax)-- here
  1a0:   76 10   jbe1b2 find_vma+0x39
  1a2:   48 39 70 e0 cmp%rsi,-0x20(%rax)
  1a6:   48 8d 50 e0 lea-0x20(%rax),%rdx
  1aa:   76 14   jbe1c0 find_vma+0x47

Except I got there via munmap():

Sep 18 04:58:04 kernel: [563331.668961] general protection fault:  [#1] 
PREEMPT SMP
Sep 18 04:58:04 kernel: [563331.669009] Modules linked in: sha1_generic cts 
powernow_k8 nfnetlink_queue nfnetlink_log binfmt_misc rpcsec_gss_krb5 fuse it87 
hwmon_vid loop pl2303 usbserial vhost_net tun vhost kvm_amd kvm hid_generic 
snd_hda_codec_hdmi snd_hda_codec_realtek pcspkr rtc_cmos snd_hda_intel 
snd_hda_codec snd_hwdep snd_pcm snd_seq snd_seq_device wmi snd_timer mperf 
radeon drm_kms_helper snd ttm drm backlight i2c_algo_bit i2c_piix4 k8temp 
soundcore i2c_core snd_page_alloc ohci_pci ohci_hcd ide_pci_generic 
firewire_ohci firewire_core ehci_pci atiixp ide_core pata_acpi ehci_hcd
Sep 18 04:58:04 kernel: [563331.669009] CPU: 0 PID: 3937 Comm: Xorg Not tainted 
3.11.0-rc6-dan #1
Sep 18 04:58:04 kernel: [563331.669009] Hardware name: Gigabyte Technology Co., 
Ltd. GA-MA78GPM-DS2H/GA-MA78GPM-DS2H, BIOS F6h 12/25/2010
Sep 18 04:58:04 kernel: [563331.669009] task: 88021d8f9700 ti: 
88021d66a000 task.ti: 88021d66a000
Sep 18 04:58:04 kernel: [563331.669009] RIP: 0010:[810e9305]  
[810e9305] find_vma+0x23/0x50
Sep 18 04:58:04 kernel: [563331.669009] RSP: 0018:88021d66bed0  EFLAGS: 
00010206
Sep 18 04:58:04 kernel: [563331.669009] RAX: 00ff8801e8e00ba0 RBX: 
880212a3f0c0 RCX: 
Sep 18 04:58:04 kernel: [563331.669009] RDX: 8801ae075f18 RSI: 
7feef8258000 RDI: 880212a3f0c0
Sep 18 04:58:04 kernel: [563331.669009] RBP: 88021d66bed0 R08: 
 R09: 00d1
Sep 18 04:58:04 kernel: [563331.669009] R10:  R11: 
0206 R12: 880212a3f0c0
Sep 18 04:58:04 kernel: [563331.669009] R13: 7feef8258000 R14: 
1000 R15: 7feef8258000
Sep 18 04:58:04 kernel: [563331.669009] FS:  7feefe54b880() 
GS:880227c0() knlGS:f2640980
Sep 18 04:58:04 kernel: [563331.669009] CS:  0010 DS:  ES:  CR0: 
80050033
Sep 18 04:58:04 kernel: [563331.669009] CR2: 7feef7486000 CR3: 
0002113d3000 CR4: 07f0
Sep 18 04:58:04 kernel: [563331.669009] Stack:
Sep 18 04:58:04 kernel: [563331.669009]  88021d66bf20 810eace0 
88021e23a420 88021b411600
Sep 18 04:58:04 kernel: [563331.669009]  7feef8258000 880212a3f110 
880212a3f0c0 7feef8258000
Sep 18 04:58:04 kernel: [563331.669009]  1000 002f 
88021d66bf58 810eaf1e
Sep 18 04:58:04 kernel: [563331.669009] Call Trace:
Sep 18 04:58:04 kernel: [563331.669009]  [810eace0] 
do_munmap+0xdd/0x2de
Sep 18 04:58:04 kernel: [563331.669009]  [810eaf1e] 
vm_munmap+0x3d/0x56
Sep 18 04:58:04 kernel: [563331.669009]  [810eaf55] 
SyS_munmap+0x1e/0x24
Sep 18 04:58:04 kernel: [563331.669009]  [81549e96] 
system_call_fastpath+0x1a/0x1f
Sep 18 04:58:04 kernel: [563331.669009] Code: 85 c9 74 cb eb e4 5d c3 48 8b 47 
10 55 48 89 e5 48 85 c0 74 0b 48 39 70 08 76 05 48 39 30 76 36 48 8b 47 08 31 
d2 48 85 c0 74 1c 48 39 70 e8 76 10 48 39 70 e0 48 8d 50 e0 76 14 48 8b 40 10 
eb
Sep 18 04:58:04 kernel: [563331.669009] RIP  [810e9305] 
find_vma+0x23/0x50
Sep 18 04:58:04 kernel: [563331.669009]  RSP 88021d66bed0
Sep 18 04:58:04 kernel: [563331.690510] ---[ end trace 0b78e99bd4849eb8 ]---

This is possibly related, same machine, same path, same origin (Xorg,
probably cookie clicker causing lots of allocation 

Re: mm: gpf in find_vma

2013-09-18 Thread Dan Merillat
On 09/07/2013 05:32 PM, Sasha Levin wrote:
 Hi all,
 
 While fuzzing with trinity inside a KVM tools guest, running latest
 -next kernel, I've
 stumbled on the following:
 

 The disassembly is:
 
 /* Check the cache first. */
 /* (Cache hit rate is typically around 35%.) */
 vma = ACCESS_ONCE(mm-mmap_cache);
  1f9:   48 8b 47 10 mov0x10(%rdi),%rax
 if (!(vma  vma-vm_end  addr  vma-vm_start = addr)) {
  1fd:   48 85 c0test   %rax,%rax
  200:   74 0b   je 20d find_vma+0x1d
  202:   48 39 70 08 cmp%rsi,0x8(%rax)--- here
  206:   76 05   jbe20d find_vma+0x1d
  208:   48 3b 30cmp(%rax),%rsi
  20b:   73 4d   jae25a find_vma+0x6a

I may have hit the same thing earlier this morning:
  191:   48 8b 47 08 mov0x8(%rdi),%rax
  195:   31 d2   xor%edx,%edx
  197:   48 85 c0test   %rax,%rax
  19a:   74 1c   je 1b8 find_vma+0x3f
  19c:   48 39 70 e8 cmp%rsi,-0x18(%rax)-- here
  1a0:   76 10   jbe1b2 find_vma+0x39
  1a2:   48 39 70 e0 cmp%rsi,-0x20(%rax)
  1a6:   48 8d 50 e0 lea-0x20(%rax),%rdx
  1aa:   76 14   jbe1c0 find_vma+0x47

Except I got there via munmap():

Sep 18 04:58:04 kernel: [563331.669009] CPU: 0 PID: 3937 Comm: Xorg Not
tainted 3.11.0-rc6-dan #1
Sep 18 04:58:04 kernel: [563331.669009] Hardware name: Gigabyte
Technology Co., Ltd. GA-MA78GPM-DS2H/GA-MA78GPM-DS2H, BIOS F6h 12/25/2010
Sep 18 04:58:04 kernel: [563331.669009] task: 88021d8f9700 ti:
88021d66a000 task.ti: 88021d66a000 Sep 18 04:58:04 kernel:
[563331.669009] RIP: 0010:[810e9305]  [810e9305]
find_vma+0x23/0x50
Sep 18 04:58:04 kernel: [563331.669009] RSP: 0018:88021d66bed0
EFLAGS: 00010206
Sep 18 04:58:04 kernel: [563331.669009] RAX: 00ff8801e8e00ba0 RBX:
880212a3f0c0 RCX: 
Sep 18 04:58:04 kernel: [563331.669009] RDX: 8801ae075f18 RSI:
7feef8258000 RDI: 880212a3f0c0
Sep 18 04:58:04 kernel: [563331.669009] RBP: 88021d66bed0 R08:
 R09: 00d1
Sep 18 04:58:04 kernel: [563331.669009] R10:  R11:
0206 R12: 880212a3f0c0
Sep 18 04:58:04 kernel: [563331.669009] R13: 7feef8258000 R14:
1000 R15: 7feef8258000
Sep 18 04:58:04 kernel: [563331.669009] FS:  7feefe54b880()
GS:880227c0() knlGS:f2640980
Sep 18 04:58:04 kernel: [563331.669009] CS:  0010 DS:  ES:  CR0:
80050033
Sep 18 04:58:04 kernel: [563331.669009] CR2: 7feef7486000 CR3:
0002113d3000 CR4: 07f0
Sep 18 04:58:04 kernel: [563331.669009] Stack:
Sep 18 04:58:04 kernel: [563331.669009]  88021d66bf20
810eace0 88021e23a420 88021b411600
Sep 18 04:58:04 kernel: [563331.669009]  7feef8258000
880212a3f110 880212a3f0c0 7feef8258000
Sep 18 04:58:04 kernel: [563331.669009]  1000
002f 88021d66bf58 810eaf1e
Sep 18 04:58:04 kernel: [563331.669009] Call Trace:
Sep 18 04:58:04 kernel: [563331.669009]  [810eace0]
do_munmap+0xdd/0x2de
Sep 18 04:58:04 kernel: [563331.669009]  [810eaf1e]
vm_munmap+0x3d/0x56
Sep 18 04:58:04 kernel: [563331.669009]  [810eaf55]
SyS_munmap+0x1e/0x24
Sep 18 04:58:04 kernel: [563331.669009]  [81549e96]
system_call_fastpath+0x1a/0x1f
Sep 18 04:58:04 kernel: [563331.669009] Code: 85 c9 74 cb eb e4 5d c3 48
8b 47 10 55 48 89 e5 48 85 c0 74 0b 48 39 70 08 76 05 48 39 30 76 36 48
8b 47 08 31 d2 48 85 c0 74 1c 48 39 70 e8 76 10 48 39 70 e0 48 8d 50
e0 76 14 48 8b 40 10 eb
Sep 18 04:58:04 kernel: [563331.669009] RIP  [810e9305]
find_vma+0x23/0x50
Sep 18 04:58:04 kernel: [563331.669009]  RSP 88021d66bed0
Sep 18 04:58:04 kernel: [563331.690510] ---[ end trace 0b78e99bd4849eb8 ]---

This is possibly related, same machine, same path, same origin (Xorg,
probably cookie clicker causing lots of allocation churn on both bugs)
but an older kernel:

Sep 11 13:17:33 kernel: [12808122.743464] general protection fault: 
[#3] PREEMPT SMP
Sep 11 13:17:33 kernel: [12808122.746610] Modules linked in: uvcvideo
videobuf2_vmalloc videobuf2_memops videobuf2_core videodev iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
ip_tables x_tables pl2303 usbserial nfnetlink_queue nfnetlink_log ntfs
msdos reiserfs ext4 jbd2 ext3 jbd fuse arc4 ecb md4 sha256_generic
nls_utf8 cifs fscache cdc_acm efivars nls_cp437 vfat fat sg usb_storage
binfmt_misc rpcsec_gss_krb5 it87 hwmon_vid loop hid_generic
snd_hda_codec_hdmi snd_hda_codec_realtek powernow_k8 kvm_amd kvm pcspkr
k8temp snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc
snd_seq snd_seq_device snd_timer 

Re: mm: gpf in find_vma

2013-09-18 Thread Dan Merillat
Resent due to Thunderbird completely mangling it the first time around:
(Apologies if this is a third copy, gmail told me it didn't send)

On 09/07/2013 05:32 PM, Sasha Levin wrote:
 Hi all,
 
 While fuzzing with trinity inside a KVM tools guest, running latest
 -next kernel, I've
 stumbled on the following:
 

 The disassembly is:
 
 /* Check the cache first. */
 /* (Cache hit rate is typically around 35%.) */
 vma = ACCESS_ONCE(mm-mmap_cache);
  1f9:   48 8b 47 10 mov0x10(%rdi),%rax
 if (!(vma  vma-vm_end  addr  vma-vm_start = addr)) {
  1fd:   48 85 c0test   %rax,%rax
  200:   74 0b   je 20d find_vma+0x1d
  202:   48 39 70 08 cmp%rsi,0x8(%rax)--- here
  206:   76 05   jbe20d find_vma+0x1d
  208:   48 3b 30cmp(%rax),%rsi
  20b:   73 4d   jae25a find_vma+0x6a

I may have hit the same thing earlier this morning:
  191:   48 8b 47 08 mov0x8(%rdi),%rax
  195:   31 d2   xor%edx,%edx
  197:   48 85 c0test   %rax,%rax
  19a:   74 1c   je 1b8 find_vma+0x3f
  19c:   48 39 70 e8 cmp%rsi,-0x18(%rax)-- here
  1a0:   76 10   jbe1b2 find_vma+0x39
  1a2:   48 39 70 e0 cmp%rsi,-0x20(%rax)
  1a6:   48 8d 50 e0 lea-0x20(%rax),%rdx
  1aa:   76 14   jbe1c0 find_vma+0x47

Except I got there via munmap():

Sep 18 04:58:04 kernel: [563331.668961] general protection fault:  [#1] 
PREEMPT SMP
Sep 18 04:58:04 kernel: [563331.669009] Modules linked in: sha1_generic cts 
powernow_k8 nfnetlink_queue nfnetlink_log binfmt_misc rpcsec_gss_krb5 fuse it87 
hwmon_vid loop pl2303 usbserial vhost_net tun vhost kvm_amd kvm hid_generic 
snd_hda_codec_hdmi snd_hda_codec_realtek pcspkr rtc_cmos snd_hda_intel 
snd_hda_codec snd_hwdep snd_pcm snd_seq snd_seq_device wmi snd_timer mperf 
radeon drm_kms_helper snd ttm drm backlight i2c_algo_bit i2c_piix4 k8temp 
soundcore i2c_core snd_page_alloc ohci_pci ohci_hcd ide_pci_generic 
firewire_ohci firewire_core ehci_pci atiixp ide_core pata_acpi ehci_hcd
Sep 18 04:58:04 kernel: [563331.669009] CPU: 0 PID: 3937 Comm: Xorg Not tainted 
3.11.0-rc6-dan #1
Sep 18 04:58:04 kernel: [563331.669009] Hardware name: Gigabyte Technology Co., 
Ltd. GA-MA78GPM-DS2H/GA-MA78GPM-DS2H, BIOS F6h 12/25/2010
Sep 18 04:58:04 kernel: [563331.669009] task: 88021d8f9700 ti: 
88021d66a000 task.ti: 88021d66a000
Sep 18 04:58:04 kernel: [563331.669009] RIP: 0010:[810e9305]  
[810e9305] find_vma+0x23/0x50
Sep 18 04:58:04 kernel: [563331.669009] RSP: 0018:88021d66bed0  EFLAGS: 
00010206
Sep 18 04:58:04 kernel: [563331.669009] RAX: 00ff8801e8e00ba0 RBX: 
880212a3f0c0 RCX: 
Sep 18 04:58:04 kernel: [563331.669009] RDX: 8801ae075f18 RSI: 
7feef8258000 RDI: 880212a3f0c0
Sep 18 04:58:04 kernel: [563331.669009] RBP: 88021d66bed0 R08: 
 R09: 00d1
Sep 18 04:58:04 kernel: [563331.669009] R10:  R11: 
0206 R12: 880212a3f0c0
Sep 18 04:58:04 kernel: [563331.669009] R13: 7feef8258000 R14: 
1000 R15: 7feef8258000
Sep 18 04:58:04 kernel: [563331.669009] FS:  7feefe54b880() 
GS:880227c0() knlGS:f2640980
Sep 18 04:58:04 kernel: [563331.669009] CS:  0010 DS:  ES:  CR0: 
80050033
Sep 18 04:58:04 kernel: [563331.669009] CR2: 7feef7486000 CR3: 
0002113d3000 CR4: 07f0
Sep 18 04:58:04 kernel: [563331.669009] Stack:
Sep 18 04:58:04 kernel: [563331.669009]  88021d66bf20 810eace0 
88021e23a420 88021b411600
Sep 18 04:58:04 kernel: [563331.669009]  7feef8258000 880212a3f110 
880212a3f0c0 7feef8258000
Sep 18 04:58:04 kernel: [563331.669009]  1000 002f 
88021d66bf58 810eaf1e
Sep 18 04:58:04 kernel: [563331.669009] Call Trace:
Sep 18 04:58:04 kernel: [563331.669009]  [810eace0] 
do_munmap+0xdd/0x2de
Sep 18 04:58:04 kernel: [563331.669009]  [810eaf1e] 
vm_munmap+0x3d/0x56
Sep 18 04:58:04 kernel: [563331.669009]  [810eaf55] 
SyS_munmap+0x1e/0x24
Sep 18 04:58:04 kernel: [563331.669009]  [81549e96] 
system_call_fastpath+0x1a/0x1f
Sep 18 04:58:04 kernel: [563331.669009] Code: 85 c9 74 cb eb e4 5d c3 48 8b 47 
10 55 48 89 e5 48 85 c0 74 0b 48 39 70 08 76 05 48 39 30 76 36 48 8b 47 08 31 
d2 48 85 c0 74 1c 48 39 70 e8 76 10 48 39 70 e0 48 8d 50 e0 76 14 48 8b 40 10 
eb
Sep 18 04:58:04 kernel: [563331.669009] RIP  [810e9305] 
find_vma+0x23/0x50
Sep 18 04:58:04 kernel: [563331.669009]  RSP 88021d66bed0
Sep 18 04:58:04 kernel: [563331.690510] ---[ end trace 0b78e99bd4849eb8 ]---

This is possibly related, same machine, same path, same 

IO stalls on one disk stall entire system

2012-09-01 Thread Dan Merillat
I have a known-broken WD15EADS, which has the hilariously terrible
1000ms IO response time.  Yes, that's the right number of zeros.  I'm
using it as a convenient way to hunt down a general feeling of
unresponsiveness under disk load

In this case, the failing drive is mounted to /backup, and I'm copying
large random files to it.  Firefox is operating on my normal system
drives, and takes up to two-minute stalls:

Aug 26 17:41:13 fileserver kernel: [919921.115258] INFO: task
firefox-bin:17616 blocked for more than 120 seconds.
Aug 26 17:41:13 fileserver kernel: [919921.115261] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 26 17:41:13 fileserver kernel: [919921.115264] firefox-bin D
88012fd12440 0 17616  17525 0x
Aug 26 17:41:13 fileserver kernel: [919921.115270]  8800ba87dd60
0082 7f5d 8800ba87dfd8
Aug 26 17:41:13 fileserver kernel: [919921.115277]  4000
00012440 88012aeb96a0 8800ba952d40
Aug 26 17:41:13 fileserver kernel: [919921.115283]  700401208b00
14040800 0200 5cdbbb01
Aug 26 17:41:13 fileserver kernel: [919921.115289] Call Trace:
Aug 26 17:41:13 fileserver kernel: [919921.115296]  []
? inet_sendmsg+0x93/0x9c
Aug 26 17:41:13 fileserver kernel: [919921.115301]  []
schedule+0x5f/0x61
Aug 26 17:41:13 fileserver kernel: [919921.115305]  []
rwsem_down_failed_common+0xdb/0x10d
Aug 26 17:41:13 fileserver kernel: [919921.115310]  []
rwsem_down_read_failed+0x12/0x14
Aug 26 17:41:13 fileserver kernel: [919921.115314]  []
call_rwsem_down_read_failed+0x14/0x30
Aug 26 17:41:13 fileserver kernel: [919921.115318]  []
? down_read+0x12/0x14
Aug 26 17:41:13 fileserver kernel: [919921.115326]  []
do_page_fault+0x259/0x45d
Aug 26 17:41:13 fileserver kernel: [919921.115332]  []
? vfsmount_lock_local_unlock+0x21/0x3c
Aug 26 17:41:13 fileserver kernel: [919921.115337]  []
? mntput_no_expire+0x2a/0x101
Aug 26 17:41:13 fileserver kernel: [919921.115343]  []
? __d_free+0x4e/0x53
Aug 26 17:41:13 fileserver kernel: [919921.115347]  []
? mntput+0x28/0x2a
Aug 26 17:41:13 fileserver kernel: [919921.115351]  []
? trace_hardirqs_off_thunk+0x3a/0x6c
Aug 26 17:41:13 fileserver kernel: [919921.115356]  []
page_fault+0x1f/0x30

Linux fileserver 3.4.0-dan-2-ga84219d #2 SMP PREEMPT Mon May 21
09:36:23 EDT 2012 x86_64 GNU/Linux

The only hardware shared between the bad drive and the rest of the
system is the first AHCI controller:
> 00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI 
> SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]

Some other drives are on this one:
> 02:00.0 SATA controller: JMicron Technology Corp. JMB363 SATA/IDE Controller 
> (rev 03)


This is obviously an extreme case, but I've felt this IO stalling in
other contexts, like doing a recursive shasum on large bodies of data.
 I purposely have /home and /root on separate spindles from the bulk
data, but I still get IO stalls when /largevol is being used.

It's pretty easy to reproduce, just a pain to work on it when I do.
That's a two minute failure to satisfy a pagefault on an otherwise idle
drive (I.E. not the slow one), so it really bogs down badly when it
fails.  This continues until I ^C the rsync process - and wait for it to
finish flushing the current set of dirtied pages.

This may involve btrfs, as that's the underlying filesystem on the
target drive and on /largevol.  Root on a LV, physically located on yet
another, separate drive (lots of disks here).

2x250gb SATA MD-raid1 LVM - / (reiserfs), swap
1x40gb IDE - LVM /home/me/.mozilla btrfs,
4x2tb MD-raid5 - (no LVM) /largevol, btrfs
1x1.5TB SLOW ESATA - /backup

So I should have tons of spindle independence, but I'm just not seeing it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


IO stalls on one disk stall entire system

2012-09-01 Thread Dan Merillat
I have a known-broken WD15EADS, which has the hilariously terrible
1000ms IO response time.  Yes, that's the right number of zeros.  I'm
using it as a convenient way to hunt down a general feeling of
unresponsiveness under disk load

In this case, the failing drive is mounted to /backup, and I'm copying
large random files to it.  Firefox is operating on my normal system
drives, and takes up to two-minute stalls:

Aug 26 17:41:13 fileserver kernel: [919921.115258] INFO: task
firefox-bin:17616 blocked for more than 120 seconds.
Aug 26 17:41:13 fileserver kernel: [919921.115261] echo 0 
/proc/sys/kernel/hung_task_timeout_secs disables this message.
Aug 26 17:41:13 fileserver kernel: [919921.115264] firefox-bin D
88012fd12440 0 17616  17525 0x
Aug 26 17:41:13 fileserver kernel: [919921.115270]  8800ba87dd60
0082 7f5d 8800ba87dfd8
Aug 26 17:41:13 fileserver kernel: [919921.115277]  4000
00012440 88012aeb96a0 8800ba952d40
Aug 26 17:41:13 fileserver kernel: [919921.115283]  700401208b00
14040800 0200 5cdbbb01
Aug 26 17:41:13 fileserver kernel: [919921.115289] Call Trace:
Aug 26 17:41:13 fileserver kernel: [919921.115296]  [81508453]
? inet_sendmsg+0x93/0x9c
Aug 26 17:41:13 fileserver kernel: [919921.115301]  [8159a155]
schedule+0x5f/0x61
Aug 26 17:41:13 fileserver kernel: [919921.115305]  [8159ac3b]
rwsem_down_failed_common+0xdb/0x10d
Aug 26 17:41:13 fileserver kernel: [919921.115310]  [8159ac94]
rwsem_down_read_failed+0x12/0x14
Aug 26 17:41:13 fileserver kernel: [919921.115314]  [81369fc4]
call_rwsem_down_read_failed+0x14/0x30
Aug 26 17:41:13 fileserver kernel: [919921.115318]  [815991f0]
? down_read+0x12/0x14
Aug 26 17:41:13 fileserver kernel: [919921.115326]  [8159db2f]
do_page_fault+0x259/0x45d
Aug 26 17:41:13 fileserver kernel: [919921.115332]  [8110acc2]
? vfsmount_lock_local_unlock+0x21/0x3c
Aug 26 17:41:13 fileserver kernel: [919921.115337]  [8110b6dd]
? mntput_no_expire+0x2a/0x101
Aug 26 17:41:13 fileserver kernel: [919921.115343]  [81104d19]
? __d_free+0x4e/0x53
Aug 26 17:41:13 fileserver kernel: [919921.115347]  [8110b7dc]
? mntput+0x28/0x2a
Aug 26 17:41:13 fileserver kernel: [919921.115351]  [8136a0ca]
? trace_hardirqs_off_thunk+0x3a/0x6c
Aug 26 17:41:13 fileserver kernel: [919921.115356]  [8159b61f]
page_fault+0x1f/0x30

Linux fileserver 3.4.0-dan-2-ga84219d #2 SMP PREEMPT Mon May 21
09:36:23 EDT 2012 x86_64 GNU/Linux

The only hardware shared between the bad drive and the rest of the
system is the first AHCI controller:
 00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI 
 SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]

Some other drives are on this one:
 02:00.0 SATA controller: JMicron Technology Corp. JMB363 SATA/IDE Controller 
 (rev 03)


This is obviously an extreme case, but I've felt this IO stalling in
other contexts, like doing a recursive shasum on large bodies of data.
 I purposely have /home and /root on separate spindles from the bulk
data, but I still get IO stalls when /largevol is being used.

It's pretty easy to reproduce, just a pain to work on it when I do.
That's a two minute failure to satisfy a pagefault on an otherwise idle
drive (I.E. not the slow one), so it really bogs down badly when it
fails.  This continues until I ^C the rsync process - and wait for it to
finish flushing the current set of dirtied pages.

This may involve btrfs, as that's the underlying filesystem on the
target drive and on /largevol.  Root on a LV, physically located on yet
another, separate drive (lots of disks here).

2x250gb SATA MD-raid1 LVM - / (reiserfs), swap
1x40gb IDE - LVM /home/me/.mozilla btrfs,
4x2tb MD-raid5 - (no LVM) /largevol, btrfs
1x1.5TB SLOW ESATA - /backup

So I should have tons of spindle independence, but I'm just not seeing it.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [37/50] Fix inet_diag OOPS.

2007-09-24 Thread Dan Merillat
On 9/24/07, Greg KH <[EMAIL PROTECTED]> wrote:
> netlink_run_queue() doesn't handle multiple processes processing the
> queue concurrently. Serialize queue processing in inet_diag to fix
> a oops in netlink_rcv_skb caused by netlink_run_queue passing a
> NULL for the skb.

I just got this one on 2.6.23-RC1, looks the same to me but posting
the oops anyway to doublecheck.

(is it possible to get gmail not to mangle code/patches/oopses without
attaching?)
[1015205.245213] Unable to handle kernel NULL pointer dereference at 
0068 RIP: 
[1015205.245221]  [] netlink_run_queue+0xb2/0x104
[1015205.245233] PGD a449067 PUD 2e803067 PMD 0 
[1015205.245237] Oops:  [1] SMP 
[1015205.245240] CPU 1 
[1015205.245242] Modules linked in: tcp_diag inet_diag radeon drm iscsi_tcp 
libiscsi scsi_transport_iscsi ipv6 fuse eeprom tsdev usbhid ff_memless 
snd_intel8x0 snd_ac97_codec fan ac97_bus snd_pcm thermal i2c_nforce2 k8temp 
ohci1394 ehci_hcd snd_timer ohci_hcd serio_raw pcspkr rtc hwmon button 
processor snd snd_page_alloc usbcore i2c_core ide_generic
[1015205.245267] Pid: 26036, comm: identd Not tainted 2.6.23-rc3 #1
[1015205.245269] RIP: 0010:[]  [] 
netlink_run_queue+0xb2/0x104
[1015205.245274] RSP: 0018:81002a1d5bf8  EFLAGS: 00010202
[1015205.245276] RAX:  RBX: 81003d94f600 RCX: 
81000c9fe3c0
[1015205.245278] RDX: 0009 RSI: 0202 RDI: 
81003d91d0c4
[1015205.245281] RBP:  R08: 8100059813c0 R09: 
81003d91d000
[1015205.245283] R10: 7fff R11: 81000c9fe3c0 R12: 
81002a1d5c44
[1015205.245286] R13: 81003d91d000 R14: 81003d91d0b0 R15: 
8819aa95
[1015205.245289] FS:  41001950(0063) GS:81003f40() 
knlGS:f7415a10
[1015205.245291] CS:  0010 DS:  ES:  CR0: 8005003b
[1015205.245294] CR2: 0068 CR3: 2c973000 CR4: 
06e0
[1015205.245296] DR0:  DR1:  DR2: 

[1015205.245298] DR3:  DR6: 0ff0 DR7: 
0400
[1015205.245301] Process identd (pid: 26036, threadinfo 81002a1d4000, task 
81002a3d50c0)
[1015205.245303] Stack:   81003d91d000 004c 
81001f219400
[1015205.245308]  81003d94f600 81002a1d5d60  
8819a024
[1015205.245311]  0292 0001804cffe4 81003d91d000 
804ebb37
[1015205.245315] Call Trace:
[1015205.245323]  [] :inet_diag:inet_diag_rcv+0x24/0x2f
[1015205.245328]  [] netlink_data_ready+0x12/0x50
[1015205.245331]  [] netlink_sendskb+0x23/0x3d
[1015205.245334]  [] netlink_sendmsg+0x2a9/0x2bc
[1015205.245342]  [] __wake_up_common+0x3e/0x68
[1015205.245348]  [] sock_aio_write+0x110/0x128
[1015205.245357]  [] __pagevec_lru_add_active+0xd1/0xe1
[1015205.245363]  [] do_sync_write+0xc9/0x10c
[1015205.245371]  [] autoremove_wake_function+0x0/0x2e
[1015205.245380]  [] vfs_write+0xe1/0x157
[1015205.245385]  [] sys_write+0x45/0x6e
[1015205.245390]  [] system_call+0x7e/0x83
[1015205.245396] 
[1015205.245397] 
[1015205.245397] Code: 8b 55 68 83 fa 0f 77 93 48 89 ef e8 ad 50 fe ff 41 ff 0c 
24 
[1015205.245405] RIP  [] netlink_run_queue+0xb2/0x104
[1015205.245409]  RSP 
[1015205.245410] CR2: 0068


Re: [37/50] Fix inet_diag OOPS.

2007-09-24 Thread Dan Merillat
On 9/24/07, Greg KH [EMAIL PROTECTED] wrote:
 netlink_run_queue() doesn't handle multiple processes processing the
 queue concurrently. Serialize queue processing in inet_diag to fix
 a oops in netlink_rcv_skb caused by netlink_run_queue passing a
 NULL for the skb.

I just got this one on 2.6.23-RC1, looks the same to me but posting
the oops anyway to doublecheck.

(is it possible to get gmail not to mangle code/patches/oopses without
attaching?)
[1015205.245213] Unable to handle kernel NULL pointer dereference at 
0068 RIP: 
[1015205.245221]  [804eb6a5] netlink_run_queue+0xb2/0x104
[1015205.245233] PGD a449067 PUD 2e803067 PMD 0 
[1015205.245237] Oops:  [1] SMP 
[1015205.245240] CPU 1 
[1015205.245242] Modules linked in: tcp_diag inet_diag radeon drm iscsi_tcp 
libiscsi scsi_transport_iscsi ipv6 fuse eeprom tsdev usbhid ff_memless 
snd_intel8x0 snd_ac97_codec fan ac97_bus snd_pcm thermal i2c_nforce2 k8temp 
ohci1394 ehci_hcd snd_timer ohci_hcd serio_raw pcspkr rtc hwmon button 
processor snd snd_page_alloc usbcore i2c_core ide_generic
[1015205.245267] Pid: 26036, comm: identd Not tainted 2.6.23-rc3 #1
[1015205.245269] RIP: 0010:[804eb6a5]  [804eb6a5] 
netlink_run_queue+0xb2/0x104
[1015205.245274] RSP: 0018:81002a1d5bf8  EFLAGS: 00010202
[1015205.245276] RAX:  RBX: 81003d94f600 RCX: 
81000c9fe3c0
[1015205.245278] RDX: 0009 RSI: 0202 RDI: 
81003d91d0c4
[1015205.245281] RBP:  R08: 8100059813c0 R09: 
81003d91d000
[1015205.245283] R10: 7fff R11: 81000c9fe3c0 R12: 
81002a1d5c44
[1015205.245286] R13: 81003d91d000 R14: 81003d91d0b0 R15: 
8819aa95
[1015205.245289] FS:  41001950(0063) GS:81003f40() 
knlGS:f7415a10
[1015205.245291] CS:  0010 DS:  ES:  CR0: 8005003b
[1015205.245294] CR2: 0068 CR3: 2c973000 CR4: 
06e0
[1015205.245296] DR0:  DR1:  DR2: 

[1015205.245298] DR3:  DR6: 0ff0 DR7: 
0400
[1015205.245301] Process identd (pid: 26036, threadinfo 81002a1d4000, task 
81002a3d50c0)
[1015205.245303] Stack:   81003d91d000 004c 
81001f219400
[1015205.245308]  81003d94f600 81002a1d5d60  
8819a024
[1015205.245311]  0292 0001804cffe4 81003d91d000 
804ebb37
[1015205.245315] Call Trace:
[1015205.245323]  [8819a024] :inet_diag:inet_diag_rcv+0x24/0x2f
[1015205.245328]  [804ebb37] netlink_data_ready+0x12/0x50
[1015205.245331]  [804ea93d] netlink_sendskb+0x23/0x3d
[1015205.245334]  [804ebb12] netlink_sendmsg+0x2a9/0x2bc
[1015205.245342]  [8022841c] __wake_up_common+0x3e/0x68
[1015205.245348]  [804ca69c] sock_aio_write+0x110/0x128
[1015205.245357]  [802654bd] __pagevec_lru_add_active+0xd1/0xe1
[1015205.245363]  [80284388] do_sync_write+0xc9/0x10c
[1015205.245371]  [80243546] autoremove_wake_function+0x0/0x2e
[1015205.245380]  [80284b60] vfs_write+0xe1/0x157
[1015205.245385]  [802850ab] sys_write+0x45/0x6e
[1015205.245390]  [8020b3ce] system_call+0x7e/0x83
[1015205.245396] 
[1015205.245397] 
[1015205.245397] Code: 8b 55 68 83 fa 0f 77 93 48 89 ef e8 ad 50 fe ff 41 ff 0c 
24 
[1015205.245405] RIP  [804eb6a5] netlink_run_queue+0xb2/0x104
[1015205.245409]  RSP 81002a1d5bf8
[1015205.245410] CR2: 0068


reset during bootup - solved

2007-08-12 Thread Dan Merillat
On 8/11/07, Dan Merillat <[EMAIL PROTECTED]> wrote:
> This one is going to be fun, since it's a hard reset back to bios, no
> OOPS or anything shown.  It may be about the time RadeonFB kicks in,
> but it's impossible to tell.  I'd guess 15-20 lines into dmesg.
>
> I'm in the process of bisecting, currently 94c18227..d23cf676.
>
> Any guesses of a specific patch to check?

Except it's not d23cf676, that was the version I was running while
building HEAD, whoops.
I'm used to monotonic version numbers, not SHA hashes.  I'll keep
better track next time.

For completeness, the commit that caused boot to fail was:

commit ab144f5ec64c42218a555ec1dbde6b60cf2982d6
Author: Andi Kleen <[EMAIL PROTECTED]>
Date:   Fri Aug 10 22:31:03 2007 +0200

i386: Make patching more robust, fix paravirt issue

Except I'm using x86_64?  So not sure why that one rev kills me.
ACPI? It dies right after the CPU is identified:
CPU: AMD Athlon(tm) 64 Processor 3000+ stepping 00
ACPI: Core revision 20070126

I don't think that second line is printed on the crash.

However, a git-pull at 2am (b8d3f244) fixed it.  Not sure where in
there it started working again, I can bisect backwards to find out if
need be.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


reset during bootup - solved

2007-08-12 Thread Dan Merillat
On 8/11/07, Dan Merillat [EMAIL PROTECTED] wrote:
 This one is going to be fun, since it's a hard reset back to bios, no
 OOPS or anything shown.  It may be about the time RadeonFB kicks in,
 but it's impossible to tell.  I'd guess 15-20 lines into dmesg.

 I'm in the process of bisecting, currently 94c18227..d23cf676.

 Any guesses of a specific patch to check?

Except it's not d23cf676, that was the version I was running while
building HEAD, whoops.
I'm used to monotonic version numbers, not SHA hashes.  I'll keep
better track next time.

For completeness, the commit that caused boot to fail was:

commit ab144f5ec64c42218a555ec1dbde6b60cf2982d6
Author: Andi Kleen [EMAIL PROTECTED]
Date:   Fri Aug 10 22:31:03 2007 +0200

i386: Make patching more robust, fix paravirt issue

Except I'm using x86_64?  So not sure why that one rev kills me.
ACPI? It dies right after the CPU is identified:
CPU: AMD Athlon(tm) 64 Processor 3000+ stepping 00
ACPI: Core revision 20070126

I don't think that second line is printed on the crash.

However, a git-pull at 2am (b8d3f244) fixed it.  Not sure where in
there it started working again, I can bisect backwards to find out if
need be.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


reset during bootup - 2.6.23-rc2 (git d23cf676)

2007-08-11 Thread Dan Merillat
This one is going to be fun, since it's a hard reset back to bios, no
OOPS or anything shown.  It may be about the time RadeonFB kicks in,
but it's impossible to tell.  I'd guess 15-20 lines into dmesg.

I'm in the process of bisecting, currently 94c18227..d23cf676.

Any guesses of a specific patch to check?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


reset during bootup - 2.6.23-rc2 (git d23cf676)

2007-08-11 Thread Dan Merillat
This one is going to be fun, since it's a hard reset back to bios, no
OOPS or anything shown.  It may be about the time RadeonFB kicks in,
but it's impossible to tell.  I'd guess 15-20 lines into dmesg.

I'm in the process of bisecting, currently 94c18227..d23cf676.

Any guesses of a specific patch to check?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-09 Thread Dan Merillat
On 8/1/07, Neil Brown <[EMAIL PROTECTED]> wrote:

> No, this does not use indefinite stack.
>
> loop will schedule each request to be handled by a kernel thread, so
> requests to 'loop' are serialised, never stacked.
>
> In 2.6.22, generic_make_request detects and serialises recursive calls,
> so unlimited recursion is not possible there either.

Is that saying "before 2.6.22, a read/write on a deeply layered device
would use a lot of stack?"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-09 Thread Dan Merillat
On 8/1/07, Alan Cox <[EMAIL PROTECTED]> wrote:
> On Wed, 1 Aug 2007 15:33:58 +0200
> Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
> > Tweaking kernel ptes is prohibitive during clone() because that's
> > kernel memory and it would require a flush tlb all with IPIs that
> > won't scale (IPIs are really the blocker)
>
> Agreed - except when doing debug work then its an acceptable cost. You
> still have to sort the debug side out because you are going to fault the
> kernel stack which will probably then cause a triple fault and reboot on
> the spot.

I was assuming debugging work, yes.  I was also thinking it wouldn't
be done at clone() time, but mapped (on a single CPU) at the time of a
context switch.  It would eliminate IPI, but would probably make the
rest of the TLB handling much too ugly to contemplate.As an
alternative, could the TLB flush and associated IPI be deferred until
the process migrates?   First migration would trigger flush/IPI,
further migration would be as now, no?   I'd happily run it with
various dm/md layers underneath

On 8/1/07, Denis Vlasenko <[EMAIL PROTECTED]> wrote:
> Hmm, neat. Why do you need to _allocate second page_ at all?
> Just mark it "not present"...

Because the kernel mapping covers all physical memory contiguously, so
if the page isn't allocated, it could be used by a kernel data
structure you need to access.  Same reason the kernel stack has to be
contiguous pages.   Well, for non-highmem at least.  Either way, you
don't want to mark an in-use page as inaccessable, you never know
what's under there.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-09 Thread Dan Merillat
On 8/1/07, Alan Cox [EMAIL PROTECTED] wrote:
 On Wed, 1 Aug 2007 15:33:58 +0200
 Andrea Arcangeli [EMAIL PROTECTED] wrote:
  Tweaking kernel ptes is prohibitive during clone() because that's
  kernel memory and it would require a flush tlb all with IPIs that
  won't scale (IPIs are really the blocker)

 Agreed - except when doing debug work then its an acceptable cost. You
 still have to sort the debug side out because you are going to fault the
 kernel stack which will probably then cause a triple fault and reboot on
 the spot.

I was assuming debugging work, yes.  I was also thinking it wouldn't
be done at clone() time, but mapped (on a single CPU) at the time of a
context switch.  It would eliminate IPI, but would probably make the
rest of the TLB handling much too ugly to contemplate.As an
alternative, could the TLB flush and associated IPI be deferred until
the process migrates?   First migration would trigger flush/IPI,
further migration would be as now, no?   I'd happily run it with
various dm/md layers underneath

On 8/1/07, Denis Vlasenko [EMAIL PROTECTED] wrote:
 Hmm, neat. Why do you need to _allocate second page_ at all?
 Just mark it not present...

Because the kernel mapping covers all physical memory contiguously, so
if the page isn't allocated, it could be used by a kernel data
structure you need to access.  Same reason the kernel stack has to be
contiguous pages.   Well, for non-highmem at least.  Either way, you
don't want to mark an in-use page as inaccessable, you never know
what's under there.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-09 Thread Dan Merillat
On 8/1/07, Neil Brown [EMAIL PROTECTED] wrote:

 No, this does not use indefinite stack.

 loop will schedule each request to be handled by a kernel thread, so
 requests to 'loop' are serialised, never stacked.

 In 2.6.22, generic_make_request detects and serialises recursive calls,
 so unlimited recursion is not possible there either.

Is that saying before 2.6.22, a read/write on a deeply layered device
would use a lot of stack?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-01 Thread Dan Merillat
On 7/31/07, Eric Sandeen <[EMAIL PROTECTED]> wrote:

> No, what I had did only that, so it was still a matter of probabilities...

How expensive would it be to allocate two , then use the MMU mark the
second page unwritable? Hardware wise it should be possible,  (for
constant 4k pagesizes, I have not worked with variable pagesize MMUs)
and since it's a per-context-switch constant operation, it would be a
special case in the fault handler rather then adding another entry to
the VM for every process.

Using large hardware pages to cover the kernel mapping could be worked
around by leaving the area where the current process stack resides
mapped via 4k pages.  Of course, I haven't touched a modern PC MMU in
ages, so I could be missing something fundamentally difficult.

The other issue is with the layered IO design - no matter what we
configure the stack size to, it is still possible to create a set of
translation layers that will cause it to crash regularly:  XFS on
dm_crypt on loop on XFS on dm_crypt on loop on ad infinitum.

That said, I'm missing something here - why is the stack growing?
Filesystems should be issuing bios with callbacks, so they should be
back off the stack, same with dm, loop, etc.   Am I missing step where
they use a wrapper function that pretends to be syncronous?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-08-01 Thread Dan Merillat
On 7/31/07, Eric Sandeen [EMAIL PROTECTED] wrote:

 No, what I had did only that, so it was still a matter of probabilities...

How expensive would it be to allocate two , then use the MMU mark the
second page unwritable? Hardware wise it should be possible,  (for
constant 4k pagesizes, I have not worked with variable pagesize MMUs)
and since it's a per-context-switch constant operation, it would be a
special case in the fault handler rather then adding another entry to
the VM for every process.

Using large hardware pages to cover the kernel mapping could be worked
around by leaving the area where the current process stack resides
mapped via 4k pages.  Of course, I haven't touched a modern PC MMU in
ages, so I could be missing something fundamentally difficult.

The other issue is with the layered IO design - no matter what we
configure the stack size to, it is still possible to create a set of
translation layers that will cause it to crash regularly:  XFS on
dm_crypt on loop on XFS on dm_crypt on loop on ad infinitum.

That said, I'm missing something here - why is the stack growing?
Filesystems should be issuing bios with callbacks, so they should be
back off the stack, same with dm, loop, etc.   Am I missing step where
they use a wrapper function that pretends to be syncronous?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata_promise disk error 2.6.22-rc5 with hrt1 patch

2007-07-07 Thread Dan Merillat

On 6/24/07, Mikael Pettersson <[EMAIL PROTECTED]> wrote:

(cc: linux-ide added)

The 300 TX4 model is causing transient errors for several people,
and we don't yet know why. In your case, port_status 0x1000
means that "host bus is busy more than 256 clock cycles for every
ATA I/O transfer" (quoting the docs). Basically it's a timeout.


I'm running vanilla 2.6.22-rc7 here, getting a similar error.

[  741.010863] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
[  741.010869] ata2.00: (port_status 0x2008)
[  741.010875] ata2.00: cmd c8/00:08:bf:01:0c/00:00:00:00:00/ef tag 0
cdb 0x0 data 4096 in
[  741.010877]  res 50/00:00:c6:01:0c/00:00:00:00:00/ef Emask
0x2 (HSM violation)
[  741.313131] ata2: soft resetting port
[  741.463831] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  741.587208] ata2: EH complete
[  741.640472] sd 1:0:0:0: [sdb] 1465149168 512-byte hardware sectors
(750156 MB)
[  741.644141] sd 1:0:0:0: [sdb] Write Protect is off
[  741.644144] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[  741.670736] sd 1:0:0:0: [sdb] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA

Identical card to the one otto posted.How common is this?  I
wouldn't notice if it wasn't logged - I don't notice the blip in disk
access.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata_promise disk error 2.6.22-rc5 with hrt1 patch

2007-07-07 Thread Dan Merillat

On 6/24/07, Mikael Pettersson [EMAIL PROTECTED] wrote:

(cc: linux-ide added)

The 300 TX4 model is causing transient errors for several people,
and we don't yet know why. In your case, port_status 0x1000
means that host bus is busy more than 256 clock cycles for every
ATA I/O transfer (quoting the docs). Basically it's a timeout.


I'm running vanilla 2.6.22-rc7 here, getting a similar error.

[  741.010863] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
[  741.010869] ata2.00: (port_status 0x2008)
[  741.010875] ata2.00: cmd c8/00:08:bf:01:0c/00:00:00:00:00/ef tag 0
cdb 0x0 data 4096 in
[  741.010877]  res 50/00:00:c6:01:0c/00:00:00:00:00/ef Emask
0x2 (HSM violation)
[  741.313131] ata2: soft resetting port
[  741.463831] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  741.587208] ata2: EH complete
[  741.640472] sd 1:0:0:0: [sdb] 1465149168 512-byte hardware sectors
(750156 MB)
[  741.644141] sd 1:0:0:0: [sdb] Write Protect is off
[  741.644144] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[  741.670736] sd 1:0:0:0: [sdb] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA

Identical card to the one otto posted.How common is this?  I
wouldn't notice if it wasn't logged - I don't notice the blip in disk
access.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: limits on raid

2007-06-15 Thread Dan Merillat

For raid5 on an array with more than 3 drive, if you attempt to write
a single block, it will:

 - read the current value of the block, and the parity block.
 - "subtract" the old value of the block from the parity, and "add"
   the new value.
 - write out the new data and the new parity.

If the parity was wrong before, it will still be wrong.  If you then
lose a drive, you lose your data.


Wow, that really needs to be put somewhere in 120 point red blinking
text.  A lot of us are used to uninitialized disks calculating the
parity-on-first-write, but if linux MD is forgoeing that
'dangerous-no-resync' sounds really REALLY bad.  How about at least a
'Warning: unlike other systems this WILL cause corruption if you
forego reconstruction' on mkraid?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: limits on raid

2007-06-15 Thread Dan Merillat

For raid5 on an array with more than 3 drive, if you attempt to write
a single block, it will:

 - read the current value of the block, and the parity block.
 - subtract the old value of the block from the parity, and add
   the new value.
 - write out the new data and the new parity.

If the parity was wrong before, it will still be wrong.  If you then
lose a drive, you lose your data.


Wow, that really needs to be put somewhere in 120 point red blinking
text.  A lot of us are used to uninitialized disks calculating the
parity-on-first-write, but if linux MD is forgoeing that
'dangerous-no-resync' sounds really REALLY bad.  How about at least a
'Warning: unlike other systems this WILL cause corruption if you
forego reconstruction' on mkraid?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.20-rc4 gfs2 bug

2007-01-24 Thread Dan Merillat
Running 2.6.20-rc4 _WITH_ the following patch: (Shouldn't be the issue,
but just in case, I'm listing it here)

Date:   Fri, 29 Dec 2006 21:03:57 +0100
From:   Ingo Molnar <[EMAIL PROTECTED]>
Subject: [patch] remove MAX_ARG_PAGES
Message-ID: <[EMAIL PROTECTED]>

Linux fileserver 2.6.20-rc4MAX_ARGS #4 PREEMPT Fri Jan 12 03:58:25 EST 2007 
x86_64 GNU/Linux

This happened when I started testing gfs2 for the first time.  I
installed userspace from CVS, loaded the gfs2/dlm modules, mkfs.gfs2,
then "mount -t gfs2 -v /dev/vg1/gfs2 /mnt/gfs"

This was the initial mount of the new filesystem.  I can create
directories, but attempting a stress-test with bonnie seems to have
deadlocked something.  (at "Start 'em", immediately.)

To clarify: the two oopses happened at first mount.  After that, I
created files/directories, then attempted to stress it a bit with
bonnie++.  No further oops/dmesg output.

For the GFS2 folks, latest CVS gfs_tool doesn't have lockdump, is there
any way to examine what I'm stuck on?

This machine is specifically for testing new things before I put them
into production, so I can leave it hung like this indefinitely for
debugging.


[845566.571468] GFS2 (built Jan 12 2007 04:02:27) installed
[849416.113382] DLM (built Jan 12 2007 04:01:21) installed
[849416.352219] Lock_DLM (built Jan 12 2007 04:02:46) installed
[850966.368016] GFS2: fsid=: Trying to join cluster "lock_dlm", 
"internal:gfs-test"
[850971.783223] dlm: gfs-test: recover 1
[850971.783242] dlm: gfs-test: add member 1
[850971.783246] dlm: gfs-test: total members 1 error 0
[850971.783248] dlm: gfs-test: dlm_recover_directory
[850971.783260] dlm: gfs-test: dlm_recover_directory 0 entries
[850971.783270] dlm: gfs-test: recover 1 done: 0 ms
[850971.783454] GFS2: fsid=internal:gfs-test.0: Joined cluster. Now mounting 
FS...
[850973.409048] GFS2: fsid=internal:gfs-test.0: jid=0, already locked for use
[850973.409135] GFS2: fsid=internal:gfs-test.0: jid=0: Looking at journal...
[850973.504558] GFS2: fsid=internal:gfs-test.0: jid=0: Done
[850973.504653] GFS2: fsid=internal:gfs-test.0: jid=1: Trying to acquire 
journal lock...
[850973.517086] GFS2: fsid=internal:gfs-test.0: jid=1: Looking at journal...
[850973.691546] GFS2: fsid=internal:gfs-test.0: jid=1: Done
[850973.691635] GFS2: fsid=internal:gfs-test.0: jid=2: Trying to acquire 
journal lock...
[850973.702646] GFS2: fsid=internal:gfs-test.0: jid=2: Looking at journal...
[850973.846397] GFS2: fsid=internal:gfs-test.0: jid=2: Done


[850973.869288] [ cut here ]
[850973.869294] kernel BUG at fs/gfs2/glock.c:738!
[850973.869297] invalid opcode:  [1] PREEMPT 
[850973.869300] CPU 0 
[850973.869302] Modules linked in: lock_dlm dlm gfs2 scsi_tgt bttv video_buf 
firmware_class ir_common compat_ioctl32 btcx_risc tveeprom videodev v4l2_common 
v4l1_compat radeon nbd eth1394 ohci1394 dm_crypt eeprom w83627hf hwmon_vid 
i2c_isa i2c_viapro snd_via82xx snd_mpu401_uart snd_emu10k1 snd_rawmidi 
snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_device 
snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore
[850973.869324] Pid: 31076, comm: gfs2_glockd Not tainted 2.6.20-rc4MAX_ARGS #4
[850973.869327] RIP: 0010:[]  [] 
:gfs2:gfs2_glmutex_unlock+0x2b/0x40
[850973.869355] RSP: 0018:81001849be70  EFLAGS: 00010282
[850973.869359] RAX: 810023ff4ee0 RBX: 810023ff4e68 RCX: 
88185800
[850973.869363] RDX:  RSI: 810023ff4ec0 RDI: 
810023ff4e68
[850973.869366] RBP: 810023ff4f38 R08:  R09: 
6052
[850973.869370] R10:  R11: 8816de60 R12: 
810023ff4e68
[850973.869374] R13: 810023ff4eb0 R14: 81003ffd6850 R15: 
81003ffd6870
[850973.869378] FS:  2aebf51826d0() GS:807fb000() 
knlGS:f72026c0
[850973.869381] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
[850973.869384] CR2: 2b9e93097fe0 CR3: 03a79000 CR4: 
06e0
[850973.869388] Process gfs2_glockd (pid: 31076, threadinfo 81001849a000, 
task 81b82890)
[850973.869390] Stack:  810023ff4eb0 8816cc08 81001849beb0 
810024322000
[850973.869397]  8100243223b8 8100074cf968 88163510 
88163528
[850973.869402]   81b82890 8029fe70 
81001849bec8
[850973.869407] Call Trace:
[850973.869421]  [] :gfs2:gfs2_reclaim_glock+0x138/0x180
[850973.869434]  [] :gfs2:gfs2_glockd+0x0/0xf0
[850973.869445]  [] :gfs2:gfs2_glockd+0x18/0xf0
[850973.869453]  [] autoremove_wake_function+0x0/0x30
[850973.869465]  [] :gfs2:gfs2_glockd+0x0/0xf0
[850973.869471]  [] kthread+0xd3/0x110
[850973.869476]  [] schedule_tail+0x37/0xc0
[850973.869481]  [] keventd_create_kthread+0x0/0xa0
[850973.869485]  [] child_rip+0xa/0x12
[850973.869490]  [] keventd_create_kthread+0x0/0xa0
[850973.869497]  [] kthread+0x0/0x110
[850973.869501]  [] child_rip+0x0/0x12
[850973.869504] 
[850973.869505] 
[850973.869506] Code: 0f 0b 

2.6.20-rc4 gfs2 bug

2007-01-24 Thread Dan Merillat
Running 2.6.20-rc4 _WITH_ the following patch: (Shouldn't be the issue,
but just in case, I'm listing it here)

Date:   Fri, 29 Dec 2006 21:03:57 +0100
From:   Ingo Molnar [EMAIL PROTECTED]
Subject: [patch] remove MAX_ARG_PAGES
Message-ID: [EMAIL PROTECTED]

Linux fileserver 2.6.20-rc4MAX_ARGS #4 PREEMPT Fri Jan 12 03:58:25 EST 2007 
x86_64 GNU/Linux

This happened when I started testing gfs2 for the first time.  I
installed userspace from CVS, loaded the gfs2/dlm modules, mkfs.gfs2,
then mount -t gfs2 -v /dev/vg1/gfs2 /mnt/gfs

This was the initial mount of the new filesystem.  I can create
directories, but attempting a stress-test with bonnie seems to have
deadlocked something.  (at Start 'em, immediately.)

To clarify: the two oopses happened at first mount.  After that, I
created files/directories, then attempted to stress it a bit with
bonnie++.  No further oops/dmesg output.

For the GFS2 folks, latest CVS gfs_tool doesn't have lockdump, is there
any way to examine what I'm stuck on?

This machine is specifically for testing new things before I put them
into production, so I can leave it hung like this indefinitely for
debugging.


[845566.571468] GFS2 (built Jan 12 2007 04:02:27) installed
[849416.113382] DLM (built Jan 12 2007 04:01:21) installed
[849416.352219] Lock_DLM (built Jan 12 2007 04:02:46) installed
[850966.368016] GFS2: fsid=: Trying to join cluster lock_dlm, 
internal:gfs-test
[850971.783223] dlm: gfs-test: recover 1
[850971.783242] dlm: gfs-test: add member 1
[850971.783246] dlm: gfs-test: total members 1 error 0
[850971.783248] dlm: gfs-test: dlm_recover_directory
[850971.783260] dlm: gfs-test: dlm_recover_directory 0 entries
[850971.783270] dlm: gfs-test: recover 1 done: 0 ms
[850971.783454] GFS2: fsid=internal:gfs-test.0: Joined cluster. Now mounting 
FS...
[850973.409048] GFS2: fsid=internal:gfs-test.0: jid=0, already locked for use
[850973.409135] GFS2: fsid=internal:gfs-test.0: jid=0: Looking at journal...
[850973.504558] GFS2: fsid=internal:gfs-test.0: jid=0: Done
[850973.504653] GFS2: fsid=internal:gfs-test.0: jid=1: Trying to acquire 
journal lock...
[850973.517086] GFS2: fsid=internal:gfs-test.0: jid=1: Looking at journal...
[850973.691546] GFS2: fsid=internal:gfs-test.0: jid=1: Done
[850973.691635] GFS2: fsid=internal:gfs-test.0: jid=2: Trying to acquire 
journal lock...
[850973.702646] GFS2: fsid=internal:gfs-test.0: jid=2: Looking at journal...
[850973.846397] GFS2: fsid=internal:gfs-test.0: jid=2: Done


[850973.869288] [ cut here ]
[850973.869294] kernel BUG at fs/gfs2/glock.c:738!
[850973.869297] invalid opcode:  [1] PREEMPT 
[850973.869300] CPU 0 
[850973.869302] Modules linked in: lock_dlm dlm gfs2 scsi_tgt bttv video_buf 
firmware_class ir_common compat_ioctl32 btcx_risc tveeprom videodev v4l2_common 
v4l1_compat radeon nbd eth1394 ohci1394 dm_crypt eeprom w83627hf hwmon_vid 
i2c_isa i2c_viapro snd_via82xx snd_mpu401_uart snd_emu10k1 snd_rawmidi 
snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_device 
snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore
[850973.869324] Pid: 31076, comm: gfs2_glockd Not tainted 2.6.20-rc4MAX_ARGS #4
[850973.869327] RIP: 0010:[8816cabb]  [8816cabb] 
:gfs2:gfs2_glmutex_unlock+0x2b/0x40
[850973.869355] RSP: 0018:81001849be70  EFLAGS: 00010282
[850973.869359] RAX: 810023ff4ee0 RBX: 810023ff4e68 RCX: 
88185800
[850973.869363] RDX:  RSI: 810023ff4ec0 RDI: 
810023ff4e68
[850973.869366] RBP: 810023ff4f38 R08:  R09: 
6052
[850973.869370] R10:  R11: 8816de60 R12: 
810023ff4e68
[850973.869374] R13: 810023ff4eb0 R14: 81003ffd6850 R15: 
81003ffd6870
[850973.869378] FS:  2aebf51826d0() GS:807fb000() 
knlGS:f72026c0
[850973.869381] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
[850973.869384] CR2: 2b9e93097fe0 CR3: 03a79000 CR4: 
06e0
[850973.869388] Process gfs2_glockd (pid: 31076, threadinfo 81001849a000, 
task 81b82890)
[850973.869390] Stack:  810023ff4eb0 8816cc08 81001849beb0 
810024322000
[850973.869397]  8100243223b8 8100074cf968 88163510 
88163528
[850973.869402]   81b82890 8029fe70 
81001849bec8
[850973.869407] Call Trace:
[850973.869421]  [8816cc08] :gfs2:gfs2_reclaim_glock+0x138/0x180
[850973.869434]  [88163510] :gfs2:gfs2_glockd+0x0/0xf0
[850973.869445]  [88163528] :gfs2:gfs2_glockd+0x18/0xf0
[850973.869453]  [8029fe70] autoremove_wake_function+0x0/0x30
[850973.869465]  [88163510] :gfs2:gfs2_glockd+0x0/0xf0
[850973.869471]  [80234d43] kthread+0xd3/0x110
[850973.869476]  [80229407] schedule_tail+0x37/0xc0
[850973.869481]  [8029fca0] keventd_create_kthread+0x0/0xa0
[850973.869485]  [80264618] child_rip+0xa/0x12
[850973.869490]  

Re: AMI MegaRAID support in 2.4.3-pre4

2001-03-20 Thread Dan Merillat


> (please cc: me any response, I only keep up with linux-kernel via the archives)

Dan Merillat writes:

> Apparently the chip is too new for driver version 1.07b, (not recognized
> at all by the kernel) and 1.14g has the problems I'm going over here.

An update... driver version 1e08 (stupid version number... 1.08e?) works,
but only on a 2.2.x kernel (2.2.18)  1e08 dosn't play nicely with 2.4.x
PCI scanning... compiles but never gets run.

I believe this is the version sent to RedHat.  Anyway, I can live with this,
since this particular box is single-CPU.  I'll have a SMP configuration on
another machine soon, though.

This box is available for another day or so for experimentation, before I wipe
the drive and do a final install.  If anyone has any ideas let me know now.

I can even give root level access to it for the moment. (again, it's getting
wiped in 48 hours)

--Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: AMI MegaRAID support in 2.4.3-pre4

2001-03-20 Thread Dan Merillat


 (please cc: me any response, I only keep up with linux-kernel via the archives)

Dan Merillat writes:

 Apparently the chip is too new for driver version 1.07b, (not recognized
 at all by the kernel) and 1.14g has the problems I'm going over here.

An update... driver version 1e08 (stupid version number... 1.08e?) works,
but only on a 2.2.x kernel (2.2.18)  1e08 dosn't play nicely with 2.4.x
PCI scanning... compiles but never gets run.

I believe this is the version sent to RedHat.  Anyway, I can live with this,
since this particular box is single-CPU.  I'll have a SMP configuration on
another machine soon, though.

This box is available for another day or so for experimentation, before I wipe
the drive and do a final install.  If anyone has any ideas let me know now.

I can even give root level access to it for the moment. (again, it's getting
wiped in 48 hours)

--Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



AMI MegaRAID support in 2.4.3-pre4

2001-03-19 Thread Dan Merillat

(please cc: me any response, I only keep up with linux-kernel via the archives)

I see pre4 has an updated megaraid.c...  It Just Don't Work(tm)

I thought it might be a part of Alan's merges, but it's not
in the -ac tree.  Linus, who sent you this patch?

Recognizes the controller, accesses my two logical volumes,
cp /dev/sda /dev/null causes the drives to light up
but the data being copied looks to be random bits of memory.
I see syslog in there and lilo messages... all sorts of things
that don't belong.  It appears to be claiming pages used by /dev/hda,
or something silly.  I even see my IDE MBR, and fdisk recognizes the 
SCSI disc as being partitoned the same as /dev/hda.

AMI MegaRAID express 500, trying 2.4.2, 2.4.3-pre4.

Apparently the chip is too new for driver version 1.07b, (not recognized
at all by the kernel) and 1.14g has the problems I'm going over here.

I also tried I2O, but it didn't recognize the card either.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



AMI MegaRAID support in 2.4.3-pre4

2001-03-19 Thread Dan Merillat

(please cc: me any response, I only keep up with linux-kernel via the archives)

I see pre4 has an updated megaraid.c...  It Just Don't Work(tm)

I thought it might be a part of Alan's merges, but it's not
in the -ac tree.  Linus, who sent you this patch?

Recognizes the controller, accesses my two logical volumes,
cp /dev/sda /dev/null causes the drives to light up
but the data being copied looks to be random bits of memory.
I see syslog in there and lilo messages... all sorts of things
that don't belong.  It appears to be claiming pages used by /dev/hda,
or something silly.  I even see my IDE MBR, and fdisk recognizes the 
SCSI disc as being partitoned the same as /dev/hda.

AMI MegaRAID express 500, trying 2.4.2, 2.4.3-pre4.

Apparently the chip is too new for driver version 1.07b, (not recognized
at all by the kernel) and 1.14g has the problems I'm going over here.

I also tried I2O, but it didn't recognize the card either.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.0/2.4.1 crashes in ext2

2001-02-05 Thread Dan Merillat


Alan Cox writes:
> > Ok, here's the crash I'm getting in 2.4.0.  Same thing is happening in 2.4.
1,
> > but It's dying harder so getting syslog info out is tougher.
> 
> What I/O subsystem

Adaptec 2940, although it appears to have been spontainous PCI bus death.

I've never seen a system die like that, so I took a while to rule out
hardware.  CPU didn't overheat, memory didn't go bad, no drives
failed...  even checked the PCI sockets for bad seating.  Nada.  Just...
the motherboard itself died, and only when doing a lot of SCSI I/O.

Rather unusal, it'd been running for a year now without a glitch until
I installed 2.4.  One of those stupid conincidences.

Sorry for the wasted cycles.

--Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



3 more traces, 2.4.1 this time.

2001-02-04 Thread Dan Merillat


Managed to get 2.4.1 to Oops without locking up entirely:

I have no idea what's causing this... I just mkfs'ed all the drives fresh, so there
shouldn't be any filesystem corruption.

Ideas?

Unable to handle kernel paging request at virtual address d8958100
c0130704
*pde = 
Oops: 
CPU:0
EIP:0010:[__block_prepare_write+224/584]
EFLAGS: 00010283
eax: 04f0   ebx: 0322   ecx:    edx: d89580e8
esi: 0800   edi: ce01a322   ebp: c25c5ef8   esp: c25c5ec8
ds: 0018   es: 0018   ss: 0018
Process innd (pid: 585, stackpage=c25c5000)
Stack: 0322 c13b86f8 c331c360 04f0 0400 0f3b c25c5ef8 ce044980 
   ce01a000 0400 00035221 d89580e8 000fc340 c013e3b0 c0130ea2 c39fc2c0 
   c13b86f8 0322 04f0 c014ccbc c13b86f8 fff4 c014d425 c13b86f8 
Call Trace: [] [posix_lock_file+1344/1360] [block_prepare_write+34/60] 
[ext2_get_block+0/1332] [ext2_prepare_write+25/32] [ext2_get_block+0/1332] 
[generic_file_write+812/1204] 
Code: f6 42 18 10 0f 85 b2 00 00 00 6a 01 52 8b 5c 24 30 53 8b 44 
Using defaults from ksymoops -t elf32-i386 -a i386

Trace; d89580e8 
Code;   Before first symbol
 <_EIP>:
Code;   Before first symbol
   0:   f6 42 18 10   testb  $0x10,0x18(%edx)
Code;  0004 Before first symbol
   4:   0f 85 b2 00 00 00 jnebc <_EIP+0xbc> 00bc Before first symbol
Code;  000a Before first symbol
   a:   6a 01 push   $0x1
Code;  000c Before first symbol
   c:   52push   %edx
Code;  000d Before first symbol
   d:   8b 5c 24 30   mov0x30(%esp,1),%ebx
Code;  0011 Before first symbol
  11:   53push   %ebx
Code;  0012 Before first symbol
  12:   8b 44 00 00   mov0x0(%eax,%eax,1),%eax

Unable to handle kernel paging request at virtual address d89580ec
c012fd64
*pde = 
Oops: 
CPU:0
EIP:0010:[getblk+128/296]
EFLAGS: 00010286
eax: c145   ebx: 0002   ecx: 0004c11a   edx: d89580e8
esi: 0008   edi: 0801   ebp: 0026   esp: c1bd1e1c
ds: 0018   es: 0018   ss: 0018
Process innfeed (pid: 588, stackpage=c1bd1000)
Stack: c1bd1f00 c3a64de0 c3a64de0 c3a64e50 0801 1f6e c014d253 0801 
   0004c11a 0400 cfb5cd40 0001 c3a64de0 c33293a0 0004c11a c354a3c0 
   c3320801 c354a3c0 00030002 0010 c1bd1e94 c1bd1e78 c1bd1e7c c198a2a0 
Call Trace: [ext2_getblk+99/204] [ext2_find_entry+177/944] [error_code+52/64] 
[file_read_actor+51/104] [ext2_unlink+39/204] [vfs_unlink+231/300] 
[sys_unlink+165/284] 
Code: 39 4a 04 75 12 31 c0 66 8b 42 08 3b 44 24 24 75 06 66 39 7a 

Code;   Before first symbol
 <_EIP>:
Code;   Before first symbol
   0:   39 4a 04  cmp%ecx,0x4(%edx)
Code;  0003 Before first symbol
   3:   75 12 jne17 <_EIP+0x17> 0017 Before first symbol
Code;  0005 Before first symbol
   5:   31 c0 xor%eax,%eax
Code;  0007 Before first symbol
   7:   66 8b 42 08   mov0x8(%edx),%ax
Code;  000b Before first symbol
   b:   3b 44 24 24   cmp0x24(%esp,1),%eax
Code;  000f Before first symbol
   f:   75 06 jne17 <_EIP+0x17> 0017 Before first symbol
Code;  0011 Before first symbol
  11:   66 39 7a 00   cmp%di,0x0(%edx)

Unable to handle kernel paging request at virtual address d89580ec
c012fd64
*pde = 
Oops: 
CPU:0
EIP:0010:[getblk+128/296]
EFLAGS: 00010286
eax: c145   ebx: 0002   ecx: 0004a014   edx: d89580e8
esi: 0008   edi: 0801   ebp: 0025   esp: c15e9f38
ds: 0018   es: 0018   ss: 0018
Process kupdate (pid: 6, stackpage=c15e9000)
Stack: 0004a014 00a0 2280 c1b25c20 0801 1555 c012fffe 0801 
   0004a014 0400 0801 c014e1f6 0801 0004a014 0400 0001 
   c1b25c20 cff68238 cff68200 c7707ab4 0286 0001 cff68200 0801 
Call Trace: [bread+26/112] [ext2_update_inode+302/952] [ext2_write_inode+15/20] 
[sync_inodes+296/388] [sync_old_buffers+14/56] [kupdate+222/232] [kernel_thread+35/48] 
Code: 39 4a 04 75 12 31 c0 66 8b 42 08 3b 44 24 24 75 06 66 39 7a 

Code;   Before first symbol
 <_EIP>:
Code;   Before first symbol
   0:   39 4a 04  cmp%ecx,0x4(%edx)
Code;  0003 Before first symbol
   3:   75 12 jne17 <_EIP+0x17> 0017 Before first symbol
Code;  0005 Before first symbol
   5:   31 c0 xor%eax,%eax
Code;  0007 Before first symbol
   7:   66 8b 42 08   mov0x8(%edx),%ax
Code;  000b Before first symbol
   b:   3b 44 24 24   cmp0x24(%esp,1),%eax
Code;  000f Before first symbol
   f:   75 06 jne17 <_EIP+0x17> 0017 Before first symbol
Code;  0011 Before first symbol
  11:   66 39 7a 00   cmp

2.4.0/2.4.1 crashes in ext2

2001-02-04 Thread Dan Merillat



Ok, here's the crash I'm getting in 2.4.0.  Same thing is happening in 2.4.1,
but It's dying harder so getting syslog info out is tougher.

Looks like it's trying to write WAY past the end of a drive (from some messages
that unfortunatly did not get logged, but were scrolling on the screen) but I'm
not sure if that's the cause or result of the oops.

The only thing different on this machine from my others running 2.4.1 is that
this one is actually using large files.  lots of them, all the time.  So that'd
be my first guess.

I made sure to compile 2.4.1 with 2.91.2, no change.

I get about 5-10 minutes of runtime on a full-feed newsserver before it
crashes.   Need to figure this out quickly, please.

--Dan


Unable to handle kernel paging request at virtual address d8958100
c012e253
*pde = 
Oops: 
CPU:0
EIP:0010:[__block_write_full_page+123/372]
EFLAGS: 00010206
eax:    ebx: d89580e8   ecx: 3057   edx: 
esi:    edi: 0c4c   ebp: c3886f60   esp: c15ebf5c
ds: 0018   es: 0018   ss: 0018
Process kupdate (pid: 6, stackpage=c15eb000)
Stack: c10f2388 c4ff6620 c4ff66c4 c4ff6620  c012eda1 c4ff6620 c10f2388 
   c0148e78 c10f2388 c4ff6620 c4ff66c4 c01494ec c4bd2d60 c01494fa c10f2388 
   c0148e78 c011fe6a c10f2388 0004 c4ff6620 cffca638 cffca600 c013db9b 
Call Trace: [block_write_full_page+49/328] [ext2_get_block+0/1168] 
[ext2_writepage+0/20] [ext2_writepage+14/20] [ext2_get_block+0/1168] 
[filemap_fdatasync+90/188] [sync_inodes+239/364] 
Code: 8b 73 18 83 e6 10 75 29 6a 01 53 57 8b 54 24 24 52 8b 54 24 
Using defaults from ksymoops -t elf32-i386 -a i386

Code;   Before first symbol
 <_EIP>:
Code;   Before first symbol
   0:   8b 73 18  mov0x18(%ebx),%esi
Code;  0003 Before first symbol
   3:   83 e6 10  and$0x10,%esi
Code;  0006 Before first symbol
   6:   75 29 jne31 <_EIP+0x31> 0031 Before first symbol
Code;  0008 Before first symbol
   8:   6a 01 push   $0x1
Code;  000a Before first symbol
   a:   53push   %ebx
Code;  000b Before first symbol
   b:   57push   %edi
Code;  000c Before first symbol
   c:   8b 54 24 24   mov0x24(%esp,1),%edx
Code;  0010 Before first symbol
  10:   52push   %edx
Code;  0011 Before first symbol
  11:   8b 54 24 00   mov0x0(%esp,1),%edx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4.0/2.4.1 crashes in ext2

2001-02-04 Thread Dan Merillat



Ok, here's the crash I'm getting in 2.4.0.  Same thing is happening in 2.4.1,
but It's dying harder so getting syslog info out is tougher.

Looks like it's trying to write WAY past the end of a drive (from some messages
that unfortunatly did not get logged, but were scrolling on the screen) but I'm
not sure if that's the cause or result of the oops.

The only thing different on this machine from my others running 2.4.1 is that
this one is actually using large files.  lots of them, all the time.  So that'd
be my first guess.

I made sure to compile 2.4.1 with 2.91.2, no change.

I get about 5-10 minutes of runtime on a full-feed newsserver before it
crashes.   Need to figure this out quickly, please.

--Dan


Unable to handle kernel paging request at virtual address d8958100
c012e253
*pde = 
Oops: 
CPU:0
EIP:0010:[__block_write_full_page+123/372]
EFLAGS: 00010206
eax:    ebx: d89580e8   ecx: 3057   edx: 
esi:    edi: 0c4c   ebp: c3886f60   esp: c15ebf5c
ds: 0018   es: 0018   ss: 0018
Process kupdate (pid: 6, stackpage=c15eb000)
Stack: c10f2388 c4ff6620 c4ff66c4 c4ff6620  c012eda1 c4ff6620 c10f2388 
   c0148e78 c10f2388 c4ff6620 c4ff66c4 c01494ec c4bd2d60 c01494fa c10f2388 
   c0148e78 c011fe6a c10f2388 0004 c4ff6620 cffca638 cffca600 c013db9b 
Call Trace: [block_write_full_page+49/328] [ext2_get_block+0/1168] 
[ext2_writepage+0/20] [ext2_writepage+14/20] [ext2_get_block+0/1168] 
[filemap_fdatasync+90/188] [sync_inodes+239/364] 
Code: 8b 73 18 83 e6 10 75 29 6a 01 53 57 8b 54 24 24 52 8b 54 24 
Using defaults from ksymoops -t elf32-i386 -a i386

Code;   Before first symbol
 _EIP:
Code;   Before first symbol
   0:   8b 73 18  mov0x18(%ebx),%esi
Code;  0003 Before first symbol
   3:   83 e6 10  and$0x10,%esi
Code;  0006 Before first symbol
   6:   75 29 jne31 _EIP+0x31 0031 Before first symbol
Code;  0008 Before first symbol
   8:   6a 01 push   $0x1
Code;  000a Before first symbol
   a:   53push   %ebx
Code;  000b Before first symbol
   b:   57push   %edi
Code;  000c Before first symbol
   c:   8b 54 24 24   mov0x24(%esp,1),%edx
Code;  0010 Before first symbol
  10:   52push   %edx
Code;  0011 Before first symbol
  11:   8b 54 24 00   mov0x0(%esp,1),%edx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



3 more traces, 2.4.1 this time.

2001-02-04 Thread Dan Merillat


Managed to get 2.4.1 to Oops without locking up entirely:

I have no idea what's causing this... I just mkfs'ed all the drives fresh, so there
shouldn't be any filesystem corruption.

Ideas?

Unable to handle kernel paging request at virtual address d8958100
c0130704
*pde = 
Oops: 
CPU:0
EIP:0010:[__block_prepare_write+224/584]
EFLAGS: 00010283
eax: 04f0   ebx: 0322   ecx:    edx: d89580e8
esi: 0800   edi: ce01a322   ebp: c25c5ef8   esp: c25c5ec8
ds: 0018   es: 0018   ss: 0018
Process innd (pid: 585, stackpage=c25c5000)
Stack: 0322 c13b86f8 c331c360 04f0 0400 0f3b c25c5ef8 ce044980 
   ce01a000 0400 00035221 d89580e8 000fc340 c013e3b0 c0130ea2 c39fc2c0 
   c13b86f8 0322 04f0 c014ccbc c13b86f8 fff4 c014d425 c13b86f8 
Call Trace: [d89580e8] [posix_lock_file+1344/1360] [block_prepare_write+34/60] 
[ext2_get_block+0/1332] [ext2_prepare_write+25/32] [ext2_get_block+0/1332] 
[generic_file_write+812/1204] 
Code: f6 42 18 10 0f 85 b2 00 00 00 6a 01 52 8b 5c 24 30 53 8b 44 
Using defaults from ksymoops -t elf32-i386 -a i386

Trace; d89580e8 END_OF_CODE+18675b5c/
Code;   Before first symbol
 _EIP:
Code;   Before first symbol
   0:   f6 42 18 10   testb  $0x10,0x18(%edx)
Code;  0004 Before first symbol
   4:   0f 85 b2 00 00 00 jnebc _EIP+0xbc 00bc Before first symbol
Code;  000a Before first symbol
   a:   6a 01 push   $0x1
Code;  000c Before first symbol
   c:   52push   %edx
Code;  000d Before first symbol
   d:   8b 5c 24 30   mov0x30(%esp,1),%ebx
Code;  0011 Before first symbol
  11:   53push   %ebx
Code;  0012 Before first symbol
  12:   8b 44 00 00   mov0x0(%eax,%eax,1),%eax

Unable to handle kernel paging request at virtual address d89580ec
c012fd64
*pde = 
Oops: 
CPU:0
EIP:0010:[getblk+128/296]
EFLAGS: 00010286
eax: c145   ebx: 0002   ecx: 0004c11a   edx: d89580e8
esi: 0008   edi: 0801   ebp: 0026   esp: c1bd1e1c
ds: 0018   es: 0018   ss: 0018
Process innfeed (pid: 588, stackpage=c1bd1000)
Stack: c1bd1f00 c3a64de0 c3a64de0 c3a64e50 0801 1f6e c014d253 0801 
   0004c11a 0400 cfb5cd40 0001 c3a64de0 c33293a0 0004c11a c354a3c0 
   c3320801 c354a3c0 00030002 0010 c1bd1e94 c1bd1e78 c1bd1e7c c198a2a0 
Call Trace: [ext2_getblk+99/204] [ext2_find_entry+177/944] [error_code+52/64] 
[file_read_actor+51/104] [ext2_unlink+39/204] [vfs_unlink+231/300] 
[sys_unlink+165/284] 
Code: 39 4a 04 75 12 31 c0 66 8b 42 08 3b 44 24 24 75 06 66 39 7a 

Code;   Before first symbol
 _EIP:
Code;   Before first symbol
   0:   39 4a 04  cmp%ecx,0x4(%edx)
Code;  0003 Before first symbol
   3:   75 12 jne17 _EIP+0x17 0017 Before first symbol
Code;  0005 Before first symbol
   5:   31 c0 xor%eax,%eax
Code;  0007 Before first symbol
   7:   66 8b 42 08   mov0x8(%edx),%ax
Code;  000b Before first symbol
   b:   3b 44 24 24   cmp0x24(%esp,1),%eax
Code;  000f Before first symbol
   f:   75 06 jne17 _EIP+0x17 0017 Before first symbol
Code;  0011 Before first symbol
  11:   66 39 7a 00   cmp%di,0x0(%edx)

Unable to handle kernel paging request at virtual address d89580ec
c012fd64
*pde = 
Oops: 
CPU:0
EIP:0010:[getblk+128/296]
EFLAGS: 00010286
eax: c145   ebx: 0002   ecx: 0004a014   edx: d89580e8
esi: 0008   edi: 0801   ebp: 0025   esp: c15e9f38
ds: 0018   es: 0018   ss: 0018
Process kupdate (pid: 6, stackpage=c15e9000)
Stack: 0004a014 00a0 2280 c1b25c20 0801 1555 c012fffe 0801 
   0004a014 0400 0801 c014e1f6 0801 0004a014 0400 0001 
   c1b25c20 cff68238 cff68200 c7707ab4 0286 0001 cff68200 0801 
Call Trace: [bread+26/112] [ext2_update_inode+302/952] [ext2_write_inode+15/20] 
[sync_inodes+296/388] [sync_old_buffers+14/56] [kupdate+222/232] [kernel_thread+35/48] 
Code: 39 4a 04 75 12 31 c0 66 8b 42 08 3b 44 24 24 75 06 66 39 7a 

Code;   Before first symbol
 _EIP:
Code;   Before first symbol
   0:   39 4a 04  cmp%ecx,0x4(%edx)
Code;  0003 Before first symbol
   3:   75 12 jne17 _EIP+0x17 0017 Before first symbol
Code;  0005 Before first symbol
   5:   31 c0 xor%eax,%eax
Code;  0007 Before first symbol
   7:   66 8b 42 08   mov0x8(%edx),%ax
Code;  000b Before first symbol
   b:   3b 44 24 24   cmp0x24(%esp,1),%eax
Code;  000f Before first symbol
   f:   75 06 jne17 _EIP+0x17 0017 Before first symbol
Code;  0011 Before first symbol
  11:   66 39 7a 00   

No Subject

2001-01-14 Thread Dan Merillat



Jan 15 00:09:55 news kernel: kernel BUG at inode.c:372!
Jan 15 00:09:55 news kernel: invalid operand: 
Jan 15 00:09:55 news kernel: CPU:0
Jan 15 00:09:55 news kernel: EIP:0010:[clear_inode+51/228]
Jan 15 00:09:55 news kernel: EFLAGS: 00010286
Jan 15 00:09:55 news kernel: eax: 001b   ebx: cb884b80   ecx:    edx: 

Jan 15 00:09:55 news kernel: esi: 0044   edi: c47ed7c0   ebp: cb884b80   esp: 
cacf9eec
Jan 15 00:09:55 news kernel: ds: 0018   es: 0018   ss: 0018
Jan 15 00:09:55 news kernel: Process expire (pid: 4292, stackpage=cacf9000)
Jan 15 00:09:55 news kernel: Stack: c021ff8c c022002c 0174 cffca600 c014bb77 
cb884b80 cb884b80 c026f080 
Jan 15 00:09:55 news kernel:c47ed7c0 ccfe8460 cc0eb400 cc3ecde0 0043 
  c6589de0 
Jan 15 00:09:55 news kernel:c014c2a6 c014c2cc cb884b80 cb884b80 c0140f3f 
cb884b80 c47ed7c0 cb884b80 
Jan 15 00:09:55 news kernel: Call Trace: [tvecs+23744/48412] [tvecs+23904/48412] 
[ext2_free_inode+167/388] [ext2_delete_inode+102/156] [ext2_delete_inode+140/156] 
[iput+167/340] [dput+238/324] 
Jan 15 00:09:55 news kernel:[sys_rename+434/568] [system_call+51/64] 
Jan 15 00:09:55 news kernel: Code: 0f 0b 83 c4 0c 8b 83 ec 00 00 00 a8 10 75 26 68 76 
01 00 00 


I'm gonna hazard a guess that this hasn't changed much between prerelease and
-final, but I'll upgrade and see what happens.

This is using large file support (but I don't think it was a large file being renamed)

--Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



No Subject

2001-01-14 Thread Dan Merillat



Jan 15 00:09:55 news kernel: kernel BUG at inode.c:372!
Jan 15 00:09:55 news kernel: invalid operand: 
Jan 15 00:09:55 news kernel: CPU:0
Jan 15 00:09:55 news kernel: EIP:0010:[clear_inode+51/228]
Jan 15 00:09:55 news kernel: EFLAGS: 00010286
Jan 15 00:09:55 news kernel: eax: 001b   ebx: cb884b80   ecx:    edx: 

Jan 15 00:09:55 news kernel: esi: 0044   edi: c47ed7c0   ebp: cb884b80   esp: 
cacf9eec
Jan 15 00:09:55 news kernel: ds: 0018   es: 0018   ss: 0018
Jan 15 00:09:55 news kernel: Process expire (pid: 4292, stackpage=cacf9000)
Jan 15 00:09:55 news kernel: Stack: c021ff8c c022002c 0174 cffca600 c014bb77 
cb884b80 cb884b80 c026f080 
Jan 15 00:09:55 news kernel:c47ed7c0 ccfe8460 cc0eb400 cc3ecde0 0043 
  c6589de0 
Jan 15 00:09:55 news kernel:c014c2a6 c014c2cc cb884b80 cb884b80 c0140f3f 
cb884b80 c47ed7c0 cb884b80 
Jan 15 00:09:55 news kernel: Call Trace: [tvecs+23744/48412] [tvecs+23904/48412] 
[ext2_free_inode+167/388] [ext2_delete_inode+102/156] [ext2_delete_inode+140/156] 
[iput+167/340] [dput+238/324] 
Jan 15 00:09:55 news kernel:[sys_rename+434/568] [system_call+51/64] 
Jan 15 00:09:55 news kernel: Code: 0f 0b 83 c4 0c 8b 83 ec 00 00 00 a8 10 75 26 68 76 
01 00 00 


I'm gonna hazard a guess that this hasn't changed much between prerelease and
-final, but I'll upgrade and see what happens.

This is using large file support (but I don't think it was a large file being renamed)

--Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



IDE hang when using v4l (bttv) on all kernels.

2000-12-13 Thread Dan Merillat



I've had this problem for a while now, reported it back in 2.3.x somewhere.
I havn't needed v4l so I ignored it.  Playing with my bttv again and having
a lot of trouble

After some (random) amount of frame grabs, my system loses.  Badly.
ethernet quits forwarding, IDE refuses to respond... All I can do is
switch VCs and watch errors scroll by.

This may be a chipset problem, although I've seen reference to others
having problems with this driver and IDE in the past.

Tested with v4l1/2 for all kernels up to 2.4.0-test12


Lockup still happens if IDE DMA is disabled.

Please Cc: me if you want any additional information, I read L-K via
the web archives.


spurious 8259A interrupt: IRQ7.
Failed to read 258048 bytes, got 0: Success
 ^ my v4l program.
ide_dmaproc: chipset supported ide_dma_losirq func only: 13
hda: lost interrupt
ide_dmaproc: chipset supported ide_dma_losirq func only: 13
hda: lost interrupt




harik@burned:~$ lspci
00:00.0 Host bridge: Intel Corporation 430FX - 82437FX TSC [Triton I] (rev 02)
00:07.0 ISA bridge: Intel Corporation 82371FB PIIX ISA [Triton I] (rev 02)
00:07.1 IDE interface: Intel Corporation 82371FB PIIX IDE [Triton I] (rev 02)
00:08.0 VGA compatible controller: Matrox Graphics, Inc. MGA 2164W [Millennium II]
00:09.0 Ethernet controller: Lite-On Communications Inc LNE100TX (rev 21)
00:0a.0 Ethernet controller: Lite-On Communications Inc LNE100TX (rev 21)
00:0b.0 Multimedia video controller: Brooktree Corporation Bt848 (rev 12)

--Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



IDE hang when using v4l (bttv) on all kernels.

2000-12-13 Thread Dan Merillat



I've had this problem for a while now, reported it back in 2.3.x somewhere.
I havn't needed v4l so I ignored it.  Playing with my bttv again and having
a lot of trouble

After some (random) amount of frame grabs, my system loses.  Badly.
ethernet quits forwarding, IDE refuses to respond... All I can do is
switch VCs and watch errors scroll by.

This may be a chipset problem, although I've seen reference to others
having problems with this driver and IDE in the past.

Tested with v4l1/2 for all kernels up to 2.4.0-test12


Lockup still happens if IDE DMA is disabled.

Please Cc: me if you want any additional information, I read L-K via
the web archives.


spurious 8259A interrupt: IRQ7.
Failed to read 258048 bytes, got 0: Success
 ^ my v4l program.
ide_dmaproc: chipset supported ide_dma_losirq func only: 13
hda: lost interrupt
ide_dmaproc: chipset supported ide_dma_losirq func only: 13
hda: lost interrupt
repeats forever



harik@burned:~$ lspci
00:00.0 Host bridge: Intel Corporation 430FX - 82437FX TSC [Triton I] (rev 02)
00:07.0 ISA bridge: Intel Corporation 82371FB PIIX ISA [Triton I] (rev 02)
00:07.1 IDE interface: Intel Corporation 82371FB PIIX IDE [Triton I] (rev 02)
00:08.0 VGA compatible controller: Matrox Graphics, Inc. MGA 2164W [Millennium II]
00:09.0 Ethernet controller: Lite-On Communications Inc LNE100TX (rev 21)
00:0a.0 Ethernet controller: Lite-On Communications Inc LNE100TX (rev 21)
00:0b.0 Multimedia video controller: Brooktree Corporation Bt848 (rev 12)

--Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/