Re: kernel bug in kvm_intel

2009-11-30 Thread Andrew Theurer
On Sun, 2009-11-29 at 16:46 +0200, Avi Kivity wrote:
 On 11/26/2009 03:35 AM, Andrew Theurer wrote:
  I just tried testing tip of kvm.git, but unfortunately I think I might 
  be hitting a different problem, where processes run 100% in kernel 
  mode.  In my case, cpus 9 and 13 were stuck, running qemu processes.  
  A stack backtrace for both cpus are below.  FWIW, kernel.org 
  2.6.32-rc7 does not have this problem, or the original problem.
 
 I just posted a patch fixing this, titled [PATCH tip:x86/entry] core: 
 fix user return notifier on fork().
 
Thank you, Avi.  I am running on this patch and am not seeing this
problem anymore.  I'll be testing for the previous issue next.

-Andrew

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-11-29 Thread Avi Kivity

On 11/26/2009 03:35 AM, Andrew Theurer wrote:
I just tried testing tip of kvm.git, but unfortunately I think I might 
be hitting a different problem, where processes run 100% in kernel 
mode.  In my case, cpus 9 and 13 were stuck, running qemu processes.  
A stack backtrace for both cpus are below.  FWIW, kernel.org 
2.6.32-rc7 does not have this problem, or the original problem.


I just posted a patch fixing this, titled [PATCH tip:x86/entry] core: 
fix user return notifier on fork().


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-11-26 Thread Avi Kivity

On 11/26/2009 03:35 AM, Andrew Theurer wrote:



NMI backtrace for cpu 9
CPU 9:
Modules linked in: tun sunrpc af_packet bridge stp ipv6 binfmt_misc 
dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod kvm_intel 
kvm uinput sr_mod cdrom ata_generic pata_acpi ata_piix joydev libata 
ide_pci_generic usbhid ide_core hid serio_raw cdc_ether usbnet mii 
matroxfb_base matroxfb_DAC1064 matroxfb_accel matroxfb_Ti3026 
matroxfb_g450 g450_pll matroxfb_misc iTCO_wdt i2c_i801 i2c_core 
pcspkr iTCO_vendor_support ioatdma thermal rtc_cmos rtc_core bnx2 
rtc_lib dca thermal_sys hwmon sg button shpchp pci_hotplug qla2xxx 
scsi_transport_fc scsi_tgt sd_mod scsi_mod crc_t10dif ext3 jbd 
mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: processor]
Pid: 5687, comm: qemu-system-x86 Not tainted 
2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1  
-[7947AC1]-
RIP: 0010:[810b802b]  [810b802b] 
fire_user_return_notifiers+0x31/0x36

RSP: 0018:88095024df08  EFLAGS: 0246
RAX:  RBX: 0800 RCX: 88095024c000
RDX: 88002834 RSI:  RDI: 88095024df58
RBP: 88095024df18 R08:  R09: 0001
R10: 00caf1fff62d R11: 8805b584de40 R12: 7fffae48e0f0
R13:  R14: 0001 R15: 
FS:  7f45c69d57c0() GS:88002834() 
knlGS:

CS:  0010 DS:  ES:  CR0: 8005003b
CR2: f9800121056e CR3: 000953d36000 CR4: 26e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Call Trace:
#DB[1] EOE Pid: 5687, comm: qemu-system-x86 Not tainted 
2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1

Call Trace:
NMI  [8100af53] ? show_regs+0x44/0x49
 [812e57b2] nmi_watchdog_tick+0xc2/0x1b9
 [812e4e73] do_nmi+0xb0/0x252
 [812e48a0] nmi+0x20/0x30
 [810b802b] ? fire_user_return_notifiers+0x31/0x36
EOE  [8100b844] do_notify_resume+0x62/0x69
 [8100bf48] ? int_check_syscall_exit_work+0x9/0x3d
 [8100bf8e] int_signal+0x12/0x17




That's a bug with the new user return notifiers.  Is your host kernel 
preemptible?


I think I saw this once but I'm not sure.  I can't reproduce with a host 
kernel build, some silly guest workload, and 'perf top' to generate an 
nmi load.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-11-25 Thread Andrew Theurer

Tejun Heo wrote:

Hello,

11/01/2009 08:31 PM, Avi Kivity wrote:

Here is the code in question:


 3ae7:   75 05   jne   
3aeevmx_vcpu_run+0x26a

   3ae9:   0f 01 c2vmlaunch
   3aec:   eb 03   jmp   
3af1vmx_vcpu_run+0x26d

   3aee:   0f 01 c3vmresume
   3af1:   48 87 0c 24 xchg   %rcx,(%rsp)
   

^^^ fault, but not at (%rsp)
 

Can you please post the full oops (including kernel debug messages
during boot) or give me a pointer to the original message?

http://www.mail-archive.com/kvm@vger.kernel.org/msg23458.html


Also, does
the faulting address coincide with any symbol?
   

No (at least, not in System.map).


Has there been any progress?  Is kvm + oprofile still broken?



I just tried testing tip of kvm.git, but unfortunately I think I might 
be hitting a different problem, where processes run 100% in kernel mode. 
 In my case, cpus 9 and 13 were stuck, running qemu processes.  A stack 
backtrace for both cpus are below.  FWIW, kernel.org 2.6.32-rc7 does not 
have this problem, or the original problem.




NMI backtrace for cpu 9
CPU 9:
Modules linked in: tun sunrpc af_packet bridge stp ipv6 binfmt_misc dm_mirror 
dm_region_hash dm_log dm_multipath scsi_dh dm_mod kvm_intel kvm uinput sr_mod 
cdrom ata_generic pata_acpi ata_piix joydev libata ide_pci_generic usbhid 
ide_core hid serio_raw cdc_ether usbnet mii matroxfb_base matroxfb_DAC1064 
matroxfb_accel matroxfb_Ti3026 matroxfb_g450 g450_pll matroxfb_misc iTCO_wdt 
i2c_i801 i2c_core pcspkr iTCO_vendor_support ioatdma thermal rtc_cmos rtc_core 
bnx2 rtc_lib dca thermal_sys hwmon sg button shpchp pci_hotplug qla2xxx 
scsi_transport_fc scsi_tgt sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd 
ohci_hcd ehci_hcd usbcore [last unloaded: processor]
Pid: 5687, comm: qemu-system-x86 Not tainted 
2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1  -[7947AC1]-
RIP: 0010:[810b802b]  [810b802b] 
fire_user_return_notifiers+0x31/0x36
RSP: 0018:88095024df08  EFLAGS: 0246
RAX:  RBX: 0800 RCX: 88095024c000
RDX: 88002834 RSI:  RDI: 88095024df58
RBP: 88095024df18 R08:  R09: 0001
R10: 00caf1fff62d R11: 8805b584de40 R12: 7fffae48e0f0
R13:  R14: 0001 R15: 
FS:  7f45c69d57c0() GS:88002834() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: f9800121056e CR3: 000953d36000 CR4: 26e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Call Trace:
 #DB[1]  EOE Pid: 5687, comm: qemu-system-x86 Not tainted 
2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1
Call Trace:
 NMI  [8100af53] ? show_regs+0x44/0x49
 [812e57b2] nmi_watchdog_tick+0xc2/0x1b9
 [812e4e73] do_nmi+0xb0/0x252
 [812e48a0] nmi+0x20/0x30
 [810b802b] ? fire_user_return_notifiers+0x31/0x36
 EOE  [8100b844] do_notify_resume+0x62/0x69
 [8100bf48] ? int_check_syscall_exit_work+0x9/0x3d
 [8100bf8e] int_signal+0x12/0x17



NMI backtrace for cpu 13
CPU 13:
Modules linked in: tun sunrpc af_packet bridge stp ipv6 binfmt_misc dm_mirror 
dm_region_hash dm_log dm_multipath scsi_dh dm_mod kvm_intel kvm uinput sr_mod 
cdrom ata_generic pata_acpi ata_piix joydev libata ide_pci_generic usbhid 
ide_core hid serio_raw cdc_ether usbnet mii matroxfb_base matroxfb_DAC1064 
matroxfb_accel matroxfb_Ti3026 matroxfb_g450 g450_pll matroxfb_misc iTCO_wdt 
i2c_i801 i2c_core pcspkr iTCO_vendor_support ioatdma thermal rtc_cmos rtc_core 
bnx2 rtc_lib dca thermal_sys hwmon sg button shpchp pci_hotplug qla2xxx 
scsi_transport_fc scsi_tgt sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd 
ohci_hcd ehci_hcd usbcore [last unloaded: processor]
Pid: 5792, comm: qemu-system-x86 Not tainted 
2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1  -[7947AC1]-
RIP: 0010:[8100bfb0]  [8100bfb0] int_restore_rest+0x1d/0x3d
RSP: 0018:88124f491f58  EFLAGS: 0292
RAX: 0800 RBX: 7fff9df852e0 RCX: 88124f49
RDX: 88099ff4 RSI:  RDI: fe2e
RBP: 7fff9df85260 R08: 88124f49 R09: 
R10: 0005 R11: 880954971da0 R12: 7fff9df851e0
R13:  R14: 0001 R15: 
FS:  7f73b5b1d7c0() GS:88099ff4() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f8d5a8de9d0 CR3: 000eb34d7000 CR4: 26e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Call Trace:
 #DB[1]  EOE Pid: 5792, comm: qemu-system-x86 Not tainted 

Re: kernel bug in kvm_intel

2009-11-25 Thread Tejun Heo
Hello,

11/26/2009 10:35 AM, Andrew Theurer wrote:
 I just tried testing tip of kvm.git, but unfortunately I think I might
 be hitting a different problem, where processes run 100% in kernel mode.
 In my case, cpus 9 and 13 were stuck, running qemu processes.  A stack
 backtrace for both cpus are below.  FWIW, kernel.org 2.6.32-rc7 does not
 have this problem, or the original problem.

2.6.32-rc7 doesn't have problem with kvm + oprofile?  If the original
analysis was right, I can't think of anything which could have changed
that between the merge commit and 2.6.32-rc7.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-11-18 Thread Tejun Heo
Hello,

11/01/2009 08:31 PM, Avi Kivity wrote:
 Here is the code in question:

 
  3ae7:   75 05   jne   
 3aeevmx_vcpu_run+0x26a
3ae9:   0f 01 c2vmlaunch
3aec:   eb 03   jmp   
 3af1vmx_vcpu_run+0x26d
3aee:   0f 01 c3vmresume
3af1:   48 87 0c 24 xchg   %rcx,(%rsp)

 ^^^ fault, but not at (%rsp)
  
 Can you please post the full oops (including kernel debug messages
 during boot) or give me a pointer to the original message?
 
 http://www.mail-archive.com/kvm@vger.kernel.org/msg23458.html
 
 Also, does
 the faulting address coincide with any symbol?

 
 No (at least, not in System.map).

Has there been any progress?  Is kvm + oprofile still broken?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-11-01 Thread Tejun Heo
Hello,

Avi Kivity wrote:
 Only, that merge doesn't change virt/kvm or arch/x86/kvm.
 
 Tejun, anything known bad about that merge?  ada3fa15 kills kvm.

Nothing rings a bell at the moment.  How does it kill kvm?  One big
difference caused by that merge is use of sparse areas near the top of
vmalloc area.  This caused vmalloc area shortage on sparc64 and
exposed paging code bug on ppc64 which caused the cpu to fault
repeatedly on the same address.  Maybe something similiar is happening
with kvm?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-11-01 Thread Avi Kivity

On 11/01/2009 12:00 PM, Tejun Heo wrote:

Hello,

Avi Kivity wrote:
   

Only, that merge doesn't change virt/kvm or arch/x86/kvm.

Tejun, anything known bad about that merge?  ada3fa15 kills kvm.
 

Nothing rings a bell at the moment.  How does it kill kvm?  One big
difference caused by that merge is use of sparse areas near the top of
vmalloc area.  This caused vmalloc area shortage on sparc64 and
exposed paging code bug on ppc64 which caused the cpu to fault
repeatedly on the same address.  Maybe something similiar is happening
with kvm?

   


We get a page fault immediately (next instruction) after returning from 
the guest when running with oprofile.  The page fault address does not 
match anything the instruction does, so presumably it is one of the 
accesses the processor performs in order to service an NMI (ordinary 
interrupts are masked; and the fact that it happens with oprofile 
strengthens this assumption).


If this is correct, the fault is not in the NMI handler itself, but in 
one of the memory areas the cpu looks in to vector the NMI, which can be:


- the IDT
- the GDT
- the TSS
- the NMI stack

Except for the IDT these are per-cpu structure, though I don't know 
whether they are allocated with the percpu infrastructure.


Here is the code in question:


3ae7:   75 05   jne3aeevmx_vcpu_run+0x26a
  3ae9:   0f 01 c2vmlaunch
  3aec:   eb 03   jmp3af1vmx_vcpu_run+0x26d
  3aee:   0f 01 c3vmresume
  3af1:   48 87 0c 24 xchg   %rcx,(%rsp)


^^^ fault, but not at (%rsp)


  3af5:   48 89 81 18 01 00 00mov%rax,0x118(%rcx)
  3afc:   48 89 99 30 01 00 00mov%rbx,0x130(%rcx)





--

error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-11-01 Thread Tejun Heo
Hello,

Avi Kivity wrote:
 We get a page fault immediately (next instruction) after returning from
 the guest when running with oprofile.  The page fault address does not
 match anything the instruction does, so presumably it is one of the
 accesses the processor performs in order to service an NMI (ordinary
 interrupts are masked; and the fact that it happens with oprofile
 strengthens this assumption).

Ah... okay, that's tricky but IIRC faults like that can be
distinguished from regular ones via processor state, right?

 If this is correct, the fault is not in the NMI handler itself, but in
 one of the memory areas the cpu looks in to vector the NMI, which can be:
 
 - the IDT
 - the GDT
 - the TSS
 - the NMI stack
 
 Except for the IDT these are per-cpu structure, though I don't know
 whether they are allocated with the percpu infrastructure.

Don't know where NMI stack is but all else are percpu.

 Here is the code in question:
 
 3ae7:   75 05   jne3aeevmx_vcpu_run+0x26a
   3ae9:   0f 01 c2vmlaunch
   3aec:   eb 03   jmp3af1vmx_vcpu_run+0x26d
   3aee:   0f 01 c3vmresume
   3af1:   48 87 0c 24 xchg   %rcx,(%rsp)
 
 ^^^ fault, but not at (%rsp)

Can you please post the full oops (including kernel debug messages
during boot) or give me a pointer to the original message?  Also, does
the faulting address coincide with any symbol?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-11-01 Thread Avi Kivity

On 11/01/2009 12:45 PM, Tejun Heo wrote:

Hello,

Avi Kivity wrote:
   

We get a page fault immediately (next instruction) after returning from
the guest when running with oprofile.  The page fault address does not
match anything the instruction does, so presumably it is one of the
accesses the processor performs in order to service an NMI (ordinary
interrupts are masked; and the fact that it happens with oprofile
strengthens this assumption).
 

Ah... okay, that's tricky but IIRC faults like that can be
distinguished from regular ones via processor state, right?
   


Not on x86.  But given that the fault address is different from %rsp 
(which is what the instruction accesses) and %rip, there aren't many 
alternatives.



Here is the code in question:

 

 3ae7:   75 05   jne3aeevmx_vcpu_run+0x26a
   3ae9:   0f 01 c2vmlaunch
   3aec:   eb 03   jmp3af1vmx_vcpu_run+0x26d
   3aee:   0f 01 c3vmresume
   3af1:   48 87 0c 24 xchg   %rcx,(%rsp)
   

^^^ fault, but not at (%rsp)
 

Can you please post the full oops (including kernel debug messages
during boot) or give me a pointer to the original message?


http://www.mail-archive.com/kvm@vger.kernel.org/msg23458.html


Also, does
the faulting address coincide with any symbol?
   


No (at least, not in System.map).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-10-31 Thread Avi Kivity

On 10/30/2009 08:07 PM, Andrew Theurer wrote:


I have finally bisected and isolated this to the following commit:

ada3fa15057205b7d3f727bba5cd26b5912e350f
http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=ada3fa15057205b7d3f727bba5cd26b5912e350f
   

Merge branch 'for-linus' of git://git./linux/kernel/git/tj/percpu

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 
commits)
   powerpc64: convert to dynamic percpu allocator
   sparc64: use embedding percpu first chunk allocator
   percpu: kill lpage first chunk allocator
   x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
   percpu: update embedding first chunk allocator to handle sparse units
   percpu: use group information to allocate vmap areas sparsely
   vmalloc: implement pcpu_get_vm_areas()
   vmalloc: separate out insert_vmalloc_vm()
   percpu: add chunk-base_addr
   percpu: add pcpu_unit_offsets[]
   percpu: introduce pcpu_alloc_info and pcpu_group_info
   percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
   percpu: add @align to pcpu_fc_alloc_fn_t
   percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
   percpu: drop @static_size from first chunk allocators
   percpu: generalize first chunk allocator selection
   percpu: build first chunk allocators selectively
   percpu: rename 4k first chunk allocator to page
   percpu: improve boot messages
   percpu: fix pcpu_reclaim() locking
 

The previous commit (5579fd7e6aed8860ea0c8e3f11897493153b10ad) does not
this problem.  FYI, this problem only occurs when oprofile is active.

Any idea what in this commit might be the issue?

   


5579 is not the preceding commit, it is the merged branch:

commit ada3fa15057205b7d3f727bba5cd26b5912e350f
Merge: 2f82af0 5579fd7
Author: Linus Torvalds torva...@linux-foundation.org
Date:   Tue Sep 15 09:39:44 2009 -0700

Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu



What happens with 2f82af0?

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-10-31 Thread Andrew Theurer

Avi Kivity wrote:

On 10/30/2009 08:07 PM, Andrew Theurer wrote:


I have finally bisected and isolated this to the following commit:

ada3fa15057205b7d3f727bba5cd26b5912e350f
http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=ada3fa15057205b7d3f727bba5cd26b5912e350f 

  

Merge branch 'for-linus' of git://git./linux/kernel/git/tj/percpu

* 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)

   powerpc64: convert to dynamic percpu allocator
   sparc64: use embedding percpu first chunk allocator
   percpu: kill lpage first chunk allocator
   x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
   percpu: update embedding first chunk allocator to handle sparse units
   percpu: use group information to allocate vmap areas sparsely
   vmalloc: implement pcpu_get_vm_areas()
   vmalloc: separate out insert_vmalloc_vm()
   percpu: add chunk-base_addr
   percpu: add pcpu_unit_offsets[]
   percpu: introduce pcpu_alloc_info and pcpu_group_info
   percpu: move pcpu_lpage_build_unit_map() and 
pcpul_lpage_dump_cfg() upward

   percpu: add @align to pcpu_fc_alloc_fn_t
   percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
   percpu: drop @static_size from first chunk allocators
   percpu: generalize first chunk allocator selection
   percpu: build first chunk allocators selectively
   percpu: rename 4k first chunk allocator to page
   percpu: improve boot messages
   percpu: fix pcpu_reclaim() locking
 

The previous commit (5579fd7e6aed8860ea0c8e3f11897493153b10ad) does not
this problem.  FYI, this problem only occurs when oprofile is active.

Any idea what in this commit might be the issue?

   


5579 is not the preceding commit, it is the merged branch:

commit ada3fa15057205b7d3f727bba5cd26b5912e350f
Merge: 2f82af0 5579fd7
Author: Linus Torvalds torva...@linux-foundation.org
Date:   Tue Sep 15 09:39:44 2009 -0700

Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu



What happens with 2f82af0?


2f82af0 is:


Nicolas Pitre has a new email address

Due to problems at cam.org, my n...@cam.org email address is no longer
valid.  FRom now on, n...@fluxnic.net should be used instead.


I have not tested that, but it doesn't seem likely that it would have 
anything to do with the problem.  Or maybe I am misunderstanding the 
impact of this commit?


FWIW, here is the bisect log:

git bisect start
# good: [227423904c709a8e60245c97081bbeb4fb500655] Merge branch 
'x86-pat-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

git bisect good 227423904c709a8e60245c97081bbeb4fb500655
# bad: [0f29f5871c165e346409f62d903f97cfad3894c5] Staging: rtl8192su: 
remove RTL8192SU ifdefs

git bisect bad 0f29f5871c165e346409f62d903f97cfad3894c5
# bad: [ada3fa15057205b7d3f727bba5cd26b5912e350f] Merge branch 
'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu

git bisect bad ada3fa15057205b7d3f727bba5cd26b5912e350f
# bad: [ada3fa15057205b7d3f727bba5cd26b5912e350f] Merge branch 
'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu

git bisect bad ada3fa15057205b7d3f727bba5cd26b5912e350f
# good: [decee2e8a9538ae5476e6cb3f4b7714c92a04a2b] V4L/DVB (12485): 
zl10353: correct implementation of FE_READ_UNCORRECTED_BLOCKS

git bisect good decee2e8a9538ae5476e6cb3f4b7714c92a04a2b
# good: [0ee7e4d6d4f58c3b2d9f0ca8ad8f63abda8694b1] V4L/DVB (12694): 
gspca - vc032x: Change the start exchanges of the sensor hv7131r.

git bisect good 0ee7e4d6d4f58c3b2d9f0ca8ad8f63abda8694b1
# good: [f58dc01ba2ca9fe3ab2ba4ca43d9c8a735cf62d8] percpu: generalize 
first chunk allocator selection

git bisect good f58dc01ba2ca9fe3ab2ba4ca43d9c8a735cf62d8
# good: [2f82af08fcc7dc01a7e98a49a5995a77e32a2925] Nicolas Pitre has a 
new email address

git bisect good 2f82af08fcc7dc01a7e98a49a5995a77e32a2925
# good: [cf88c79006bd6a09ad725ba0b34c0e23db20b19e] vmalloc: separate out 
insert_vmalloc_vm()

git bisect good cf88c79006bd6a09ad725ba0b34c0e23db20b19e
# good: [4518e6a0c038b98be4c480e6f4481e8676bd15dd] x86,percpu: use 
embedding for 64bit NUMA and page for 32bit NUMA

git bisect good 4518e6a0c038b98be4c480e6f4481e8676bd15dd
# good: [bcb2107fdbecef3de55d597d23453747af81ba88] sparc64: use 
embedding percpu first chunk allocator

git bisect good bcb2107fdbecef3de55d597d23453747af81ba88
# good: [5579fd7e6aed8860ea0c8e3f11897493153b10ad] Merge branch 
'for-next' into for-linus

git bisect good 5579fd7e6aed8860ea0c8e3f11897493153b10ad


Oh, wait, that commit was tested, in the middle of the log above.

-Andrew


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-10-31 Thread Avi Kivity

On 10/31/2009 06:25 PM, Andrew Theurer wrote:

5579 is not the preceding commit, it is the merged branch:

commit ada3fa15057205b7d3f727bba5cd26b5912e350f
Merge: 2f82af0 5579fd7
Author: Linus Torvalds torva...@linux-foundation.org
Date:   Tue Sep 15 09:39:44 2009 -0700

Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu



What happens with 2f82af0?



2f82af0 is:


Nicolas Pitre has a new email address

Due to problems at cam.org, my n...@cam.org email address is no longer
valid.  FRom now on, n...@fluxnic.net should be used instead.


I have not tested that, but it doesn't seem likely that it would have 
anything to do with the problem.  Or maybe I am misunderstanding the 
impact of this commit?


ada3fa15 is known broken.  It is the merge of two branches: 2f82 
(mainline) and 5597.  Testing both branches would indicate the problem 
is in the merge if both test ok.



Oh, wait, that commit was tested, in the middle of the log above.


So the problem is the merge.  Will look.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-10-31 Thread Avi Kivity

On 10/31/2009 06:32 PM, Avi Kivity wrote:

On 10/31/2009 06:25 PM, Andrew Theurer wrote:

5579 is not the preceding commit, it is the merged branch:

commit ada3fa15057205b7d3f727bba5cd26b5912e350f
Merge: 2f82af0 5579fd7
Author: Linus Torvalds torva...@linux-foundation.org
Date:   Tue Sep 15 09:39:44 2009 -0700

Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu



What happens with 2f82af0?



2f82af0 is:


Nicolas Pitre has a new email address

Due to problems at cam.org, my n...@cam.org email address is no longer
valid.  FRom now on, n...@fluxnic.net should be used instead.


I have not tested that, but it doesn't seem likely that it would have 
anything to do with the problem.  Or maybe I am misunderstanding the 
impact of this commit?


ada3fa15 is known broken.  It is the merge of two branches: 2f82 
(mainline) and 5597.  Testing both branches would indicate the problem 
is in the merge if both test ok.



Oh, wait, that commit was tested, in the middle of the log above.


So the problem is the merge.  Will look.



Only, that merge doesn't change virt/kvm or arch/x86/kvm.

Tejun, anything known bad about that merge?  ada3fa15 kills kvm.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-10-30 Thread Andrew Theurer
On Thu, 2009-10-15 at 15:18 -0500, Andrew Theurer wrote:
 On Thu, 2009-10-15 at 02:10 +0900, Avi Kivity wrote:
  On 10/13/2009 11:04 PM, Andrew Theurer wrote:
  
   Look at the address where vmx_vcpu_run starts, add 0x26d, and show the
   surrounding code.
  
   Thinking about it, it probably _is_ what you showed, due to module page
   alignment.  But please verify this; I can't reconcile the fault address
   (9fe9a2b) with %rsp at the time of the fault.

   Here is the start of the function:
  
  
   3884vmx_vcpu_run:
3884:   55  push   %rbp
3885:   48 89 e5mov%rsp,%rbp

   and 0x26d later is 0x3af1:
  
  
3ad2:   4c 8b b1 88 01 00 00mov0x188(%rcx),%r14
3ad9:   4c 8b b9 90 01 00 00mov0x190(%rcx),%r15
3ae0:   48 8b 89 20 01 00 00mov0x120(%rcx),%rcx
3ae7:   75 05   jne3aeevmx_vcpu_run+0x26a
3ae9:   0f 01 c2vmlaunch
3aec:   eb 03   jmp3af1vmx_vcpu_run+0x26d
3aee:   0f 01 c3vmresume
3af1:   48 87 0c 24 xchg   %rcx,(%rsp)
3af5:   48 89 81 18 01 00 00mov%rax,0x118(%rcx)
3afc:   48 89 99 30 01 00 00mov%rbx,0x130(%rcx)
3b03:   ff 34 24pushq  (%rsp)
3b06:   8f 81 20 01 00 00   popq   0x120(%rcx)

  
  
  Ok.  So it faults on the xchg instruction, rsp is 8806369ffc80 but 
  the fault address is 9fe9a2b4.  So it looks like the IDT is 
  corrupted.
  

I have finally bisected and isolated this to the following commit:

ada3fa15057205b7d3f727bba5cd26b5912e350f
http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=ada3fa15057205b7d3f727bba5cd26b5912e350f
 Merge branch 'for-linus' of git://git./linux/kernel/git/tj/percpu
 
 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 
 commits)
   powerpc64: convert to dynamic percpu allocator
   sparc64: use embedding percpu first chunk allocator
   percpu: kill lpage first chunk allocator
   x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
   percpu: update embedding first chunk allocator to handle sparse units
   percpu: use group information to allocate vmap areas sparsely
   vmalloc: implement pcpu_get_vm_areas()
   vmalloc: separate out insert_vmalloc_vm()
   percpu: add chunk-base_addr
   percpu: add pcpu_unit_offsets[]
   percpu: introduce pcpu_alloc_info and pcpu_group_info
   percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
   percpu: add @align to pcpu_fc_alloc_fn_t
   percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
   percpu: drop @static_size from first chunk allocators
   percpu: generalize first chunk allocator selection
   percpu: build first chunk allocators selectively
   percpu: rename 4k first chunk allocator to page
   percpu: improve boot messages
   percpu: fix pcpu_reclaim() locking

The previous commit (5579fd7e6aed8860ea0c8e3f11897493153b10ad) does not
this problem.  FYI, this problem only occurs when oprofile is active.

Any idea what in this commit might be the issue?

-Andrew

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-10-15 Thread Andrew Theurer
On Thu, 2009-10-15 at 02:10 +0900, Avi Kivity wrote:
 On 10/13/2009 11:04 PM, Andrew Theurer wrote:
 
  Look at the address where vmx_vcpu_run starts, add 0x26d, and show the
  surrounding code.
 
  Thinking about it, it probably _is_ what you showed, due to module page
  alignment.  But please verify this; I can't reconcile the fault address
  (9fe9a2b) with %rsp at the time of the fault.
   
  Here is the start of the function:
 
 
  3884vmx_vcpu_run:
   3884:   55  push   %rbp
   3885:   48 89 e5mov%rsp,%rbp
   
  and 0x26d later is 0x3af1:
 
 
   3ad2:   4c 8b b1 88 01 00 00mov0x188(%rcx),%r14
   3ad9:   4c 8b b9 90 01 00 00mov0x190(%rcx),%r15
   3ae0:   48 8b 89 20 01 00 00mov0x120(%rcx),%rcx
   3ae7:   75 05   jne3aeevmx_vcpu_run+0x26a
   3ae9:   0f 01 c2vmlaunch
   3aec:   eb 03   jmp3af1vmx_vcpu_run+0x26d
   3aee:   0f 01 c3vmresume
   3af1:   48 87 0c 24 xchg   %rcx,(%rsp)
   3af5:   48 89 81 18 01 00 00mov%rax,0x118(%rcx)
   3afc:   48 89 99 30 01 00 00mov%rbx,0x130(%rcx)
   3b03:   ff 34 24pushq  (%rsp)
   3b06:   8f 81 20 01 00 00   popq   0x120(%rcx)
   
 
 
 Ok.  So it faults on the xchg instruction, rsp is 8806369ffc80 but 
 the fault address is 9fe9a2b4.  So it looks like the IDT is 
 corrupted.
 
 Can you check what's around 9fe9a2b4 in System.map?

85d85b24 B __bss_stop
85d86000 B __brk_base
85d96000 b .brk.dmi_alloc
85da6000 B __brk_limit
ff60 T vgettimeofday
ff600100 t vread_tsc
ff600130 t vread_hpet
ff600140 D __vsyscall_gtod_data
ff600400 T vtime

-Andrew


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-10-14 Thread Avi Kivity

On 10/13/2009 11:04 PM, Andrew Theurer wrote:



Look at the address where vmx_vcpu_run starts, add 0x26d, and show the
surrounding code.

Thinking about it, it probably _is_ what you showed, due to module page
alignment.  But please verify this; I can't reconcile the fault address
(9fe9a2b) with %rsp at the time of the fault.
 

Here is the start of the function:

   

3884vmx_vcpu_run:
 3884:   55  push   %rbp
 3885:   48 89 e5mov%rsp,%rbp
 

and 0x26d later is 0x3af1:

   

 3ad2:   4c 8b b1 88 01 00 00mov0x188(%rcx),%r14
 3ad9:   4c 8b b9 90 01 00 00mov0x190(%rcx),%r15
 3ae0:   48 8b 89 20 01 00 00mov0x120(%rcx),%rcx
 3ae7:   75 05   jne3aeevmx_vcpu_run+0x26a
 3ae9:   0f 01 c2vmlaunch
 3aec:   eb 03   jmp3af1vmx_vcpu_run+0x26d
 3aee:   0f 01 c3vmresume
 3af1:   48 87 0c 24 xchg   %rcx,(%rsp)
 3af5:   48 89 81 18 01 00 00mov%rax,0x118(%rcx)
 3afc:   48 89 99 30 01 00 00mov%rbx,0x130(%rcx)
 3b03:   ff 34 24pushq  (%rsp)
 3b06:   8f 81 20 01 00 00   popq   0x120(%rcx)
 




Ok.  So it faults on the xchg instruction, rsp is 8806369ffc80 but 
the fault address is 9fe9a2b4.  So it looks like the IDT is 
corrupted.


Can you check what's around 9fe9a2b4 in System.map?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-10-13 Thread Avi Kivity

On 10/12/2009 08:42 PM, Andrew Theurer wrote:

On Sun, 2009-10-11 at 07:19 +0200, Avi Kivity wrote:
   

On 10/09/2009 10:04 PM, Andrew Theurer wrote:
 

This is on latest master branch on kvm.git and qemu-kvm.git, running
12 Windows Server2008 VMs, and using oprofile.  I ran again without
oprofile and did not get the BUG.  I am wondering if anyone else is
seeing this.

Thanks,

-Andrew

   

Oct  9 11:55:13 virtvictory-eth0 kernel: BUG: unable to handle kernel
paging request at 9fe9a2b4
Oct  9 11:55:13 virtvictory-eth0 kernel: IP: [a02e1af1]
vmx_vcpu_run+0x26d/0x64f [kvm_intel]
 

Can you run this through objdump or gdb to see what source this
corresponds to?

 

Somewhere here I think (?)

objdump -d
   



Look at the address where vmx_vcpu_run starts, add 0x26d, and show the 
surrounding code.


Thinking about it, it probably _is_ what you showed, due to module page 
alignment.  But please verify this; I can't reconcile the fault address 
(9fe9a2b) with %rsp at the time of the fault.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-10-13 Thread Andrew Theurer
On Tue, 2009-10-13 at 08:50 +0200, Avi Kivity wrote:
 On 10/12/2009 08:42 PM, Andrew Theurer wrote:
  On Sun, 2009-10-11 at 07:19 +0200, Avi Kivity wrote:
 
  On 10/09/2009 10:04 PM, Andrew Theurer wrote:
   
  This is on latest master branch on kvm.git and qemu-kvm.git, running
  12 Windows Server2008 VMs, and using oprofile.  I ran again without
  oprofile and did not get the BUG.  I am wondering if anyone else is
  seeing this.
 
  Thanks,
 
  -Andrew
 
 
  Oct  9 11:55:13 virtvictory-eth0 kernel: BUG: unable to handle kernel
  paging request at 9fe9a2b4
  Oct  9 11:55:13 virtvictory-eth0 kernel: IP: [a02e1af1]
  vmx_vcpu_run+0x26d/0x64f [kvm_intel]
   
  Can you run this through objdump or gdb to see what source this
  corresponds to?
 
   
  Somewhere here I think (?)
 
  objdump -d
 
 
 
 Look at the address where vmx_vcpu_run starts, add 0x26d, and show the 
 surrounding code.
 
 Thinking about it, it probably _is_ what you showed, due to module page 
 alignment.  But please verify this; I can't reconcile the fault address 
 (9fe9a2b) with %rsp at the time of the fault.

Here is the start of the function:

 3884 vmx_vcpu_run:
 3884:   55  push   %rbp
 3885:   48 89 e5mov%rsp,%rbp

and 0x26d later is 0x3af1:

 3ad2:   4c 8b b1 88 01 00 00mov0x188(%rcx),%r14
 3ad9:   4c 8b b9 90 01 00 00mov0x190(%rcx),%r15
 3ae0:   48 8b 89 20 01 00 00mov0x120(%rcx),%rcx
 3ae7:   75 05   jne3aee vmx_vcpu_run+0x26a
 3ae9:   0f 01 c2vmlaunch
 3aec:   eb 03   jmp3af1 vmx_vcpu_run+0x26d
 3aee:   0f 01 c3vmresume
 3af1:   48 87 0c 24 xchg   %rcx,(%rsp)
 3af5:   48 89 81 18 01 00 00mov%rax,0x118(%rcx)
 3afc:   48 89 99 30 01 00 00mov%rbx,0x130(%rcx)
 3b03:   ff 34 24pushq  (%rsp)
 3b06:   8f 81 20 01 00 00   popq   0x120(%rcx)


-Andrew

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-10-13 Thread Marcelo Tosatti
On Tue, Oct 13, 2009 at 08:50:07AM +0200, Avi Kivity wrote:
 On 10/12/2009 08:42 PM, Andrew Theurer wrote:
 On Sun, 2009-10-11 at 07:19 +0200, Avi Kivity wrote:

 On 10/09/2009 10:04 PM, Andrew Theurer wrote:
  
 This is on latest master branch on kvm.git and qemu-kvm.git, running
 12 Windows Server2008 VMs, and using oprofile.  I ran again without
 oprofile and did not get the BUG.  I am wondering if anyone else is
 seeing this.

 Thanks,

 -Andrew


 Oct  9 11:55:13 virtvictory-eth0 kernel: BUG: unable to handle kernel
 paging request at 9fe9a2b4
 Oct  9 11:55:13 virtvictory-eth0 kernel: IP: [a02e1af1]
 vmx_vcpu_run+0x26d/0x64f [kvm_intel]
  
 Can you run this through objdump or gdb to see what source this
 corresponds to?

  
 Somewhere here I think (?)

 objdump -d



 Look at the address where vmx_vcpu_run starts, add 0x26d, and show the  
 surrounding code.

 Thinking about it, it probably _is_ what you showed, due to module page  
 alignment.  But please verify this; I can't reconcile the fault address  
 (9fe9a2b) with %rsp at the time of the fault.

There's some scary erratas (such as corrupted RSP pushed on the stack   
on event injected, including NMI which is used by oprofile, right after 
VMExit, AAK56) on the Xeon X55xx spec update.   

Andrew, you might make sure the firmware/BIOS is uptodate on this
machine before reproducing.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-10-12 Thread Andrew Theurer
On Sun, 2009-10-11 at 07:19 +0200, Avi Kivity wrote:
 On 10/09/2009 10:04 PM, Andrew Theurer wrote:
  This is on latest master branch on kvm.git and qemu-kvm.git, running 
  12 Windows Server2008 VMs, and using oprofile.  I ran again without 
  oprofile and did not get the BUG.  I am wondering if anyone else is 
  seeing this.
 
  Thanks,
 
  -Andrew
 
  Oct  9 11:55:13 virtvictory-eth0 kernel: BUG: unable to handle kernel 
  paging request at 9fe9a2b4
  Oct  9 11:55:13 virtvictory-eth0 kernel: IP: [a02e1af1] 
  vmx_vcpu_run+0x26d/0x64f [kvm_intel]
 
 Can you run this through objdump or gdb to see what source this 
 corresponds to?
 

Somewhere here I think (?)

objdump -d
 3ad9:   4c 8b b9 90 01 00 00mov0x190(%rcx),%r15
 3ae0:   48 8b 89 20 01 00 00mov0x120(%rcx),%rcx
 3ae7:   75 05   jne3aee vmx_vcpu_run+0x26a
 3ae9:   0f 01 c2vmlaunch
 3aec:   eb 03   jmp3af1 vmx_vcpu_run+0x26d
 3aee:   0f 01 c3vmresume
 3af1:   48 87 0c 24 xchg   %rcx,(%rsp)
 3af5:   48 89 81 18 01 00 00mov%rax,0x118(%rcx)
 3afc:   48 89 99 30 01 00 00mov%rbx,0x130(%rcx)
 3b03:   ff 34 24pushq  (%rsp)
 3b06:   8f 81 20 01 00 00   popq   0x120(%rcx)
 3b0c:   48 89 91 28 01 00 00mov%rdx,0x128(%rcx)


objdump -S
 /* Enter guest mode */
 jne .Llaunched \n\t
 __ex(ASM_VMX_VMLAUNCH) \n\t
 jmp .Lkvm_vmx_return \n\t
 .Llaunched:  __ex(ASM_VMX_VMRESUME) \n\t
 .Lkvm_vmx_return: 
 /* Save guest registers, load host registers, keep flags */
 xchg %0, (%%Rsp) \n\t
 mov %%Rax, %c[rax](%0) \n\t
 mov %%Rbx, %c[rbx](%0) \n\t
 pushQ (%%Rsp); popQ %c[rcx](%0) \n\t
 mov %%Rdx, %c[rdx](%0) \n\t
 mov %%Rsi, %c[rsi](%0) \n\t


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-10-10 Thread Avi Kivity

On 10/09/2009 10:04 PM, Andrew Theurer wrote:
This is on latest master branch on kvm.git and qemu-kvm.git, running 
12 Windows Server2008 VMs, and using oprofile.  I ran again without 
oprofile and did not get the BUG.  I am wondering if anyone else is 
seeing this.


Thanks,

-Andrew

Oct  9 11:55:13 virtvictory-eth0 kernel: BUG: unable to handle kernel 
paging request at 9fe9a2b4
Oct  9 11:55:13 virtvictory-eth0 kernel: IP: [a02e1af1] 
vmx_vcpu_run+0x26d/0x64f [kvm_intel]


Can you run this through objdump or gdb to see what source this 
corresponds to?


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html