Re: EPT: Misconfiguration

2011-03-05 Thread Ruben Kerkhof
On Sun, Feb 27, 2011 at 11:46, Avi Kivity a...@redhat.com wrote:

 Copying netdev: looks like memory corruption in the networking stack.

 Archive link: http://www.spinics.net/lists/kvm/msg50651.html (for the
 attachment).

There's now only a single guest running on this host (Ubuntu Maverick).
I've also upgraded the host kernel to 2.6.38-rc6, and this just
happened (after a day or so):

2011-03-05T19:41:58.328866+01:00 phy005 kernel: [85271.656862] BUG
kmalloc-2048 (Not tainted): Object padding overwritten
2011-03-05T19:41:58.328870+01:00 phy005 kernel: [85271.656864]
-
2011-03-05T19:41:58.328875+01:00 phy005 kernel: [85271.656866]
2011-03-05T19:41:58.328885+01:00 phy005 kernel: [85271.656870] INFO:
0x880c0d52a960-0x880c0d52a967. First byte 0x0 instead of 0x5a
2011-03-05T19:41:58.328890+01:00 phy005 kernel: [85271.656880] INFO:
Allocated in __netdev_alloc_skb+0x1f/0x3b age=16039 cpu=5 pid=0
2011-03-05T19:41:58.328894+01:00 phy005 kernel: [85271.656886] INFO:
Freed in skb_release_data+0xa5/0xaa age=0 cpu=5 pid=1766
2011-03-05T19:41:58.328898+01:00 phy005 kernel: [85271.656890] INFO:
Slab 0xea002a2ea0c0 objects=15 used=13 fp=0x880c0d52a120
flags=0xc040c1
2011-03-05T19:41:58.328902+01:00 phy005 kernel: [85271.656894] INFO:
Object 0x880c0d52a120 @offset=8480 fp=0x880c0d52d2d0
2011-03-05T19:41:58.328905+01:00 phy005 kernel: [85271.656895]
2011-03-05T19:41:58.328909+01:00 phy005 kernel: [85271.656897] Bytes
b4 0x880c0d52a110:  14 89 12 05 01 00 00 00 5a 5a 5a 5a 5a 5a 5a
5a 
2011-03-05T19:41:58.328913+01:00 phy005 kernel: [85271.656909]
Object 0x880c0d52a120:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 

We have a quite complex network stack, two interfaces (igb) attached
to bond0, with on top two bridges and on that two vlans.
The guest is running a vpn and an IPv6 tunnel.

Let me know if more info is needed.

Kind regards,

Ruben
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: EPT: Misconfiguration

2011-02-27 Thread Avi Kivity


Copying netdev: looks like memory corruption in the networking stack.

Archive link: http://www.spinics.net/lists/kvm/msg50651.html (for the 
attachment).


On 02/24/2011 11:15 PM, Ruben Kerkhof wrote:


  On Tue, Feb 15, 2011 at 18:16, Marcelo Tosattimtosa...@redhat.com  wrote:

  This and the others reported. So yes, it looks something is corrupting
  memory. Ruben, you can try to boot with slub_debug=ZFPU kernel option.

Ok, there are now only 6 vms left on this host, and I've booted it
with the slub_debug=ZFPU option.
After a few hours, I got the following result:

2011-02-24T21:41:30.818496+01:00 phy005 kernel:
=
2011-02-24T21:41:30.818517+01:00 phy005 kernel: BUG kmalloc-2048 (Not
tainted): Object padding overwritten
2011-02-24T21:41:30.818523+01:00 phy005 kernel:
-
2011-02-24T21:41:30.818526+01:00 phy005 kernel:
2011-02-24T21:41:30.818530+01:00 phy005 kernel: INFO:
0x8806230752ca-0x8806230752cf. First byte 0x0 instead of 0x5a
2011-02-24T21:41:30.818534+01:00 phy005 kernel: INFO: Allocated in
__netdev_alloc_skb+0x34/0x51 age=2231 cpu=8 pid=0
2011-02-24T21:41:30.818537+01:00 phy005 kernel: INFO: Freed in
skb_release_data+0xc9/0xce age=2368 cpu=8 pid=2159
2011-02-24T21:41:30.818541+01:00 phy005 kernel: INFO: Slab
0xea00157a9880 objects=15 used=13 fp=0x8806230752d0
flags=0x404083
2011-02-24T21:41:30.818545+01:00 phy005 kernel: INFO: Object
0x880623074a88 @offset=19080 fp=0x8806230752d0

The rest of the output is attached since it's quite large.

Kind regards,

Ruben



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: EPT: Misconfiguration

2011-02-15 Thread Marcelo Tosatti
On Sun, Feb 13, 2011 at 03:03:40PM +0200, Avi Kivity wrote:
 On 02/13/2011 04:07 AM, Ruben Kerkhof wrote:
 And tonight we had another one of those errors we had a few weeks ago:
 
 2011-02-13T02:56:28.694496+01:00 phy005 kernel: EPT: Misconfiguration.
 2011-02-13T02:56:28.694908+01:00 phy005 kernel: EPT: GPA: 0x2edff000
 
 This GPA indexes into the 511th entry of the spte.  Marcelo, does
 this remind you of https://bugzilla.kernel.org/show_bug.cgi?id=27052
 by any chance?

This and the others reported. So yes, it looks something is corrupting
memory. Ruben, you can try to boot with slub_debug=ZFPU kernel option.
Is there any reason for not upgrading to FC14?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: EPT: Misconfiguration

2011-02-15 Thread Ruben Kerkhof
Hi Marcelo,

On Tue, Feb 15, 2011 at 18:16, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Sun, Feb 13, 2011 at 03:03:40PM +0200, Avi Kivity wrote:
 On 02/13/2011 04:07 AM, Ruben Kerkhof wrote:
 And tonight we had another one of those errors we had a few weeks ago:
 
 2011-02-13T02:56:28.694496+01:00 phy005 kernel: EPT: Misconfiguration.
 2011-02-13T02:56:28.694908+01:00 phy005 kernel: EPT: GPA: 0x2edff000

 This GPA indexes into the 511th entry of the spte.  Marcelo, does
 this remind you of https://bugzilla.kernel.org/show_bug.cgi?id=27052
 by any chance?

 This and the others reported. So yes, it looks something is corrupting
 memory. Ruben, you can try to boot with slub_debug=ZFPU kernel option.

Sure, but not for a while, I'm first moving all my customers of this
machine. We've had to reboot it like 5 or 6 times in the last couple
of weeks.
As soon as that's done I'm going to test the hell out of it.

Now that we moved a few of the vm's we don't see any oopses, so it
could either be that it only triggers under load, or there's a
specific guest which is triggering it.

 Is there any reason for not upgrading to FC14?

I haven't had a reason to upgrade yet, all our other machines are
running fine, using the same kernel.
Plus I'm still finding lots of issues unrelated to kvm on F14, broken
ssh in combination with openldap, ipmi bugs, selinux policy etc.
Next to that it takes a lot of time to test all our images etc.

I'll probably skip the F14 kernel and go straight to 2.638, since that
should bring significant improvements like THP, async pagefaults etc.

Kind regards,

Ruben
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: EPT: Misconfiguration

2011-02-13 Thread Avi Kivity

On 02/10/2011 05:23 PM, Ruben Kerkhof wrote:


This machine has been running for a week without problems, but then we
started to get the following oopses again:

2011-02-06T19:45:35.221555+01:00 phy005 kernel: BUG: unable to handle
kernel paging request at ea71929180e0
2011-02-06T19:45:35.222194+01:00 phy005 kernel: IP:
[81034880] gup_pte_range+0x94/0xd3
2011-02-06T19:45:35.222199+01:00 phy005 kernel: PGD 118600067 PUD 0
2011-02-06T19:45:35.03+01:00 phy005 kernel: Oops:  [#1] SMP
2011-02-06T19:45:35.21+01:00 phy005 kernel: last sysfs file:
/sys/devices/system/cpu/cpu15/topology/thread_siblings
2011-02-06T19:45:35.24+01:00 phy005 kernel: CPU 4
2011-02-06T19:45:35.29+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt serio_raw igb
iTCO_vendor_support joydev ioatdma dca 3w_9xxx [last unloaded:
scsi_wait_scan]
2011-02-06T19:45:35.31+01:00 phy005 kernel:
2011-02-06T19:45:35.33+01:00 phy005 kernel: Pid: 3650, comm:
qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X8DTU/X8DTU
2011-02-06T19:45:35.36+01:00 phy005 kernel: RIP:
0010:[81034880]  [81034880]
gup_pte_range+0x94/0xd3
2011-02-06T19:45:35.39+01:00 phy005 kernel: RSP:
0018:88060b9bda78  EFLAGS: 00010082
2011-02-06T19:45:35.41+01:00 phy005 kernel: RAX: ea71929180e0
RBX: 3000 RCX: 0005
2011-02-06T19:45:35.43+01:00 phy005 kernel: RDX: 7fe54e40
RSI: 7fe54e3ff000 RDI: 1603a07305004067
2011-02-06T19:45:35.45+01:00 phy005 kernel: RBP: 88060b9bda98
R08: 880b94384560 R09: 88060b9bdb44
2011-02-06T19:45:35.48+01:00 phy005 kernel: R10: 880606b2fff8
R11: ea00 R12: 0205
2011-02-06T19:45:35.51+01:00 phy005 kernel: R13: cfff
R14: 0005 R15: 
2011-02-06T19:45:35.55+01:00 phy005 kernel: FS:
7fe64cb0e700() GS:88065540()
knlGS:
2011-02-06T19:45:35.59+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
002b CR0: 80050033
2011-02-06T19:45:35.63+01:00 phy005 kernel: CR2: ea71929180e0
CR3: 000bff06d000 CR4: 26e0
2011-02-06T19:45:35.67+01:00 phy005 kernel: DR0: 
DR1:  DR2: 
2011-02-06T19:45:35.71+01:00 phy005 kernel: DR3: 
DR6: 0ff0 DR7: 0400
2011-02-06T19:45:35.74+01:00 phy005 kernel: Process qemu-kvm (pid:
3650, threadinfo 88060b9bc000, task 880623ed2ee0)
2011-02-06T19:45:35.78+01:00 phy005 kernel: Stack:
2011-02-06T19:45:35.81+01:00 phy005 kernel: 7fe54e40
7fe54e40 7fe54e40 88053a0d2388
2011-02-06T19:45:35.85+01:00 phy005 kernel:0  88060b9bdaf8
81034a15 7fe54e3f 7fe54e3f
2011-02-06T19:45:35.89+01:00 phy005 kernel:0  88060b9bdb44
880b94384560 880bff06eca8 880bff06d7f8
2011-02-06T19:45:35.92+01:00 phy005 kernel: Call Trace:
2011-02-06T19:45:35.96+01:00 phy005 kernel: [81034a15]
gup_pud_range+0x156/0x192
2011-02-06T19:45:35.222300+01:00 phy005 kernel: [81034b15]
get_user_pages_fast+0xc4/0x172
2011-02-06T19:45:35.222304+01:00 phy005 kernel: [81131fbc] ?
bio_add_page+0x36/0x38
2011-02-06T19:45:35.222308+01:00 phy005 kernel: [81134730]
dio_get_page+0x54/0x127
2011-02-06T19:45:35.222312+01:00 phy005 kernel: [81135317]
__blockdev_direct_IO+0x41d/0xa36
2011-02-06T19:45:35.222316+01:00 phy005 kernel: [a0080f69] ?
x86_emulate_insn+0x1ff8/0x2d61 [kvm]
2011-02-06T19:45:35.222320+01:00 phy005 kernel: [8113379b]
blkdev_direct_IO+0x4e/0x50
2011-02-06T19:45:35.222324+01:00 phy005 kernel: [81132c49] ?
blkdev_get_blocks+0x0/0x8d
2011-02-06T19:45:35.222328+01:00 phy005 kernel: [810cb516]
generic_file_direct_write+0xed/0x16d
2011-02-06T19:45:35.222331+01:00 phy005 kernel: [810cb72c]
__generic_file_aio_write+0x196/0x281
2011-02-06T19:45:35.222335+01:00 phy005 kernel: [811d5352] ?
file_has_perm+0xa4/0xc6
2011-02-06T19:45:35.222339+01:00 phy005 kernel: [81133043] ?
blkdev_aio_write+0x0/0x69
2011-02-06T19:45:35.222343+01:00 phy005 kernel: [8113306d]
blkdev_aio_write+0x2a/0x69
2011-02-06T19:45:35.222347+01:00 phy005 kernel: [81133043] ?
blkdev_aio_write+0x0/0x69
2011-02-06T19:45:35.222351+01:00 phy005 kernel: [8113d4eb]
aio_rw_vect_retry+0x85/0x18e
2011-02-06T19:45:35.222355+01:00 phy005 kernel: [8113e9b3]
aio_run_iocb+0x77/0x10f
2011-02-06T19:45:35.222359+01:00 phy005 kernel: [8113f508]
do_io_submit+0x558/0x7ce
2011-02-06T19:45:35.222363+01:00 phy005 kernel: [8113f78e]
sys_io_submit+0x10/0x12
2011-02-06T19:45:35.222366+01:00 phy005 kernel: [81009c72]
system_call_fastpath+0x16/0x1b

Re: EPT: Misconfiguration

2011-02-13 Thread Avi Kivity

On 02/13/2011 04:07 AM, Ruben Kerkhof wrote:

And tonight we had another one of those errors we had a few weeks ago:

2011-02-13T02:56:28.694496+01:00 phy005 kernel: EPT: Misconfiguration.
2011-02-13T02:56:28.694908+01:00 phy005 kernel: EPT: GPA: 0x2edff000


This GPA indexes into the 511th entry of the spte.  Marcelo, does this 
remind you of https://bugzilla.kernel.org/show_bug.cgi?id=27052 by any 
chance?



2011-02-13T02:56:28.694914+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x25602d007 level 4
2011-02-13T02:56:28.694916+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x3df3e2007 level 3
2011-02-13T02:56:28.694919+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x5e90c7007 level 2
2011-02-13T02:56:28.694925+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x1603a0730500d277 level 1


Magic 1603a073 pte.


2011-02-13T02:56:28.694928+01:00 phy005 kernel:
ept_misconfig_inspect_spte: rsvd_bits = 0x3a000
2011-02-13T02:56:28.694930+01:00 phy005 kernel: [ cut here
]
2011-02-13T02:56:28.694933+01:00 phy005 kernel: WARNING: at
arch/x86/kvm/vmx.c:3425 handle_ept_misconfig+0x152/0x1d8 [kvm_intel]()
2011-02-13T02:56:28.694936+01:00 phy005 kernel: Hardware name: X8DTU
2011-02-13T02:56:28.694941+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt igb ioatdma
dca iTCO_vendor_support joydev serio_raw microcode 3w_9xxx [last
unloaded: scsi_wait_scan]
2011-02-13T02:56:28.695004+01:00 phy005 kernel: Pid: 4756, comm:
qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1
2011-02-13T02:56:28.695008+01:00 phy005 kernel: Call Trace:
2011-02-13T02:56:28.695013+01:00 phy005 kernel: [8104d11f]
warn_slowpath_common+0x7c/0x94
2011-02-13T02:56:28.695020+01:00 phy005 kernel: [8104d14b]
warn_slowpath_null+0x14/0x16
2011-02-13T02:56:28.695024+01:00 phy005 kernel: [a00c97fb]
handle_ept_misconfig+0x152/0x1d8 [kvm_intel]
2011-02-13T02:56:28.695028+01:00 phy005 kernel: [a00ca401]
vmx_handle_exit+0x204/0x23a [kvm_intel]
2011-02-13T02:56:28.695033+01:00 phy005 kernel: [a0084998]
kvm_arch_vcpu_ioctl_run+0x7cd/0xa74 [kvm]
2011-02-13T02:56:28.695037+01:00 phy005 kernel: [a00735ba]
kvm_vcpu_ioctl+0xfd/0x56e [kvm]
2011-02-13T02:56:28.695042+01:00 phy005 kernel: [810feaab] ?
virt_to_head_page+0xe/0x2f
2011-02-13T02:56:28.695046+01:00 phy005 kernel: [810cc6ca] ?
mempool_kfree+0xe/0x10
2011-02-13T02:56:28.695051+01:00 phy005 kernel: [810cc857] ?
mempool_free+0x76/0x7b
2011-02-13T02:56:28.695055+01:00 phy005 kernel: [8111aa2f]
vfs_ioctl+0x32/0xa6
2011-02-13T02:56:28.695060+01:00 phy005 kernel: [8111afa2]
do_vfs_ioctl+0x483/0x4c9
2011-02-13T02:56:28.695065+01:00 phy005 kernel: [8111b03e]
sys_ioctl+0x56/0x79
2011-02-13T02:56:28.695070+01:00 phy005 kernel: [81009c72]
system_call_fastpath+0x16/0x1b
2011-02-13T02:56:28.695074+01:00 phy005 kernel: ---[ end trace
d95032626ea304ca ]---

Any help would be much appreciated. It seems very strange that I'm the
first one who runs into this.
I've found two bugreports which report the same, the first one at
https://partner-bugzilla.redhat.com/show_bug.cgi?format=multipleid=613691,
but that's a duplicate of
https://partner-bugzilla.redhat.com/show_bug.cgi?id=606131 which I'm
not authorized to see...


These don't appear to be related.  Are you running ksm, btw?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: EPT: Misconfiguration

2011-02-13 Thread Ruben Kerkhof
Hi Avi,

On Sun, Feb 13, 2011 at 13:58, Avi Kivity a...@redhat.com wrote:
 On 02/10/2011 05:23 PM, Ruben Kerkhof wrote:

 This machine has been running for a week without problems, but then we
 started to get the following oopses again:

 2011-02-06T19:45:35.221555+01:00 phy005 kernel: BUG: unable to handle
 kernel paging request at ea71929180e0
 2011-02-06T19:45:35.222194+01:00 phy005 kernel: IP:
 [81034880] gup_pte_range+0x94/0xd3
 2011-02-06T19:45:35.222199+01:00 phy005 kernel: PGD 118600067 PUD 0
 2011-02-06T19:45:35.03+01:00 phy005 kernel: Oops:  [#1] SMP
 2011-02-06T19:45:35.21+01:00 phy005 kernel: last sysfs file:
 /sys/devices/system/cpu/cpu15/topology/thread_siblings
 2011-02-06T19:45:35.24+01:00 phy005 kernel: CPU 4
 2011-02-06T19:45:35.29+01:00 phy005 kernel: Modules linked in: tun
 ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
 xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
 ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt serio_raw igb
 iTCO_vendor_support joydev ioatdma dca 3w_9xxx [last unloaded:
 scsi_wait_scan]
 2011-02-06T19:45:35.31+01:00 phy005 kernel:
 2011-02-06T19:45:35.33+01:00 phy005 kernel: Pid: 3650, comm:
 qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X8DTU/X8DTU
 2011-02-06T19:45:35.36+01:00 phy005 kernel: RIP:
 0010:[81034880]  [81034880]
 gup_pte_range+0x94/0xd3
 2011-02-06T19:45:35.39+01:00 phy005 kernel: RSP:
 0018:88060b9bda78  EFLAGS: 00010082
 2011-02-06T19:45:35.41+01:00 phy005 kernel: RAX: ea71929180e0
 RBX: 3000 RCX: 0005
 2011-02-06T19:45:35.43+01:00 phy005 kernel: RDX: 7fe54e40
 RSI: 7fe54e3ff000 RDI: 1603a07305004067
 2011-02-06T19:45:35.45+01:00 phy005 kernel: RBP: 88060b9bda98
 R08: 880b94384560 R09: 88060b9bdb44
 2011-02-06T19:45:35.48+01:00 phy005 kernel: R10: 880606b2fff8
 R11: ea00 R12: 0205
 2011-02-06T19:45:35.51+01:00 phy005 kernel: R13: cfff
 R14: 0005 R15: 
 2011-02-06T19:45:35.55+01:00 phy005 kernel: FS:
 7fe64cb0e700() GS:88065540()
 knlGS:
 2011-02-06T19:45:35.59+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
 002b CR0: 80050033
 2011-02-06T19:45:35.63+01:00 phy005 kernel: CR2: ea71929180e0
 CR3: 000bff06d000 CR4: 26e0
 2011-02-06T19:45:35.67+01:00 phy005 kernel: DR0: 
 DR1:  DR2: 
 2011-02-06T19:45:35.71+01:00 phy005 kernel: DR3: 
 DR6: 0ff0 DR7: 0400
 2011-02-06T19:45:35.74+01:00 phy005 kernel: Process qemu-kvm (pid:
 3650, threadinfo 88060b9bc000, task 880623ed2ee0)
 2011-02-06T19:45:35.78+01:00 phy005 kernel: Stack:
 2011-02-06T19:45:35.81+01:00 phy005 kernel: 7fe54e40
 7fe54e40 7fe54e40 88053a0d2388
 2011-02-06T19:45:35.85+01:00 phy005 kernel:0  88060b9bdaf8
 81034a15 7fe54e3f 7fe54e3f
 2011-02-06T19:45:35.89+01:00 phy005 kernel:0  88060b9bdb44
 880b94384560 880bff06eca8 880bff06d7f8
 2011-02-06T19:45:35.92+01:00 phy005 kernel: Call Trace:
 2011-02-06T19:45:35.96+01:00 phy005 kernel: [81034a15]
 gup_pud_range+0x156/0x192
 2011-02-06T19:45:35.222300+01:00 phy005 kernel: [81034b15]
 get_user_pages_fast+0xc4/0x172
 2011-02-06T19:45:35.222304+01:00 phy005 kernel: [81131fbc] ?
 bio_add_page+0x36/0x38
 2011-02-06T19:45:35.222308+01:00 phy005 kernel: [81134730]
 dio_get_page+0x54/0x127
 2011-02-06T19:45:35.222312+01:00 phy005 kernel: [81135317]
 __blockdev_direct_IO+0x41d/0xa36
 2011-02-06T19:45:35.222316+01:00 phy005 kernel: [a0080f69] ?
 x86_emulate_insn+0x1ff8/0x2d61 [kvm]
 2011-02-06T19:45:35.222320+01:00 phy005 kernel: [8113379b]
 blkdev_direct_IO+0x4e/0x50
 2011-02-06T19:45:35.222324+01:00 phy005 kernel: [81132c49] ?
 blkdev_get_blocks+0x0/0x8d
 2011-02-06T19:45:35.222328+01:00 phy005 kernel: [810cb516]
 generic_file_direct_write+0xed/0x16d
 2011-02-06T19:45:35.222331+01:00 phy005 kernel: [810cb72c]
 __generic_file_aio_write+0x196/0x281
 2011-02-06T19:45:35.222335+01:00 phy005 kernel: [811d5352] ?
 file_has_perm+0xa4/0xc6
 2011-02-06T19:45:35.222339+01:00 phy005 kernel: [81133043] ?
 blkdev_aio_write+0x0/0x69
 2011-02-06T19:45:35.222343+01:00 phy005 kernel: [8113306d]
 blkdev_aio_write+0x2a/0x69
 2011-02-06T19:45:35.222347+01:00 phy005 kernel: [81133043] ?
 blkdev_aio_write+0x0/0x69
 2011-02-06T19:45:35.222351+01:00 phy005 kernel: [8113d4eb]
 aio_rw_vect_retry+0x85/0x18e
 2011-02-06T19:45:35.222355+01:00 phy005 kernel: [8113e9b3]
 aio_run_iocb+0x77/0x10f
 2011-02-06T19:45:35.222359+01:00 phy005 kernel: [8113f508]
 do_io_submit+0x558/0x7ce
 2011-02-06T19:45:35.222363+01:00 

Re: EPT: Misconfiguration

2011-02-13 Thread Ruben Kerkhof
On Sun, Feb 13, 2011 at 14:03, Avi Kivity a...@redhat.com wrote:
 On 02/13/2011 04:07 AM, Ruben Kerkhof wrote:

 And tonight we had another one of those errors we had a few weeks ago:

 2011-02-13T02:56:28.694496+01:00 phy005 kernel: EPT: Misconfiguration.
 2011-02-13T02:56:28.694908+01:00 phy005 kernel: EPT: GPA: 0x2edff000

 This GPA indexes into the 511th entry of the spte.  Marcelo, does this
 remind you of https://bugzilla.kernel.org/show_bug.cgi?id=27052 by any
 chance?

 2011-02-13T02:56:28.694914+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x25602d007 level 4
 2011-02-13T02:56:28.694916+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x3df3e2007 level 3
 2011-02-13T02:56:28.694919+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x5e90c7007 level 2
 2011-02-13T02:56:28.694925+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x1603a0730500d277 level 1

 Magic 1603a073 pte.

 2011-02-13T02:56:28.694928+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: rsvd_bits = 0x3a000
 2011-02-13T02:56:28.694930+01:00 phy005 kernel: [ cut here
 ]
 2011-02-13T02:56:28.694933+01:00 phy005 kernel: WARNING: at
 arch/x86/kvm/vmx.c:3425 handle_ept_misconfig+0x152/0x1d8 [kvm_intel]()
 2011-02-13T02:56:28.694936+01:00 phy005 kernel: Hardware name: X8DTU
 2011-02-13T02:56:28.694941+01:00 phy005 kernel: Modules linked in: tun
 ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
 xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
 ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt igb ioatdma
 dca iTCO_vendor_support joydev serio_raw microcode 3w_9xxx [last
 unloaded: scsi_wait_scan]
 2011-02-13T02:56:28.695004+01:00 phy005 kernel: Pid: 4756, comm:
 qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1
 2011-02-13T02:56:28.695008+01:00 phy005 kernel: Call Trace:
 2011-02-13T02:56:28.695013+01:00 phy005 kernel: [8104d11f]
 warn_slowpath_common+0x7c/0x94
 2011-02-13T02:56:28.695020+01:00 phy005 kernel: [8104d14b]
 warn_slowpath_null+0x14/0x16
 2011-02-13T02:56:28.695024+01:00 phy005 kernel: [a00c97fb]
 handle_ept_misconfig+0x152/0x1d8 [kvm_intel]
 2011-02-13T02:56:28.695028+01:00 phy005 kernel: [a00ca401]
 vmx_handle_exit+0x204/0x23a [kvm_intel]
 2011-02-13T02:56:28.695033+01:00 phy005 kernel: [a0084998]
 kvm_arch_vcpu_ioctl_run+0x7cd/0xa74 [kvm]
 2011-02-13T02:56:28.695037+01:00 phy005 kernel: [a00735ba]
 kvm_vcpu_ioctl+0xfd/0x56e [kvm]
 2011-02-13T02:56:28.695042+01:00 phy005 kernel: [810feaab] ?
 virt_to_head_page+0xe/0x2f
 2011-02-13T02:56:28.695046+01:00 phy005 kernel: [810cc6ca] ?
 mempool_kfree+0xe/0x10
 2011-02-13T02:56:28.695051+01:00 phy005 kernel: [810cc857] ?
 mempool_free+0x76/0x7b
 2011-02-13T02:56:28.695055+01:00 phy005 kernel: [8111aa2f]
 vfs_ioctl+0x32/0xa6
 2011-02-13T02:56:28.695060+01:00 phy005 kernel: [8111afa2]
 do_vfs_ioctl+0x483/0x4c9
 2011-02-13T02:56:28.695065+01:00 phy005 kernel: [8111b03e]
 sys_ioctl+0x56/0x79
 2011-02-13T02:56:28.695070+01:00 phy005 kernel: [81009c72]
 system_call_fastpath+0x16/0x1b
 2011-02-13T02:56:28.695074+01:00 phy005 kernel: ---[ end trace
 d95032626ea304ca ]---

 Any help would be much appreciated. It seems very strange that I'm the
 first one who runs into this.
 I've found two bugreports which report the same, the first one at

 https://partner-bugzilla.redhat.com/show_bug.cgi?format=multipleid=613691,
 but that's a duplicate of
 https://partner-bugzilla.redhat.com/show_bug.cgi?id=606131 which I'm
 not authorized to see...

 These don't appear to be related.  Are you running ksm, btw?

No.

Kind regards,

Ruben
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: EPT: Misconfiguration

2011-02-12 Thread Ruben Kerkhof
On Thu, Feb 10, 2011 at 16:23, Ruben Kerkhof ru...@rubenkerkhof.com wrote:
 On Wed, Jan 26, 2011 at 16:00, Ruben Kerkhof ru...@rubenkerkhof.com wrote:
 On Wed, Jan 26, 2011 at 10:52, Avi Kivity a...@redhat.com wrote:
 On 01/25/2011 08:29 PM, Ruben Kerkhof wrote:

   When you say suddenly, this was with no changes to software and
  hardware?

 The host software and hardware hasn't changed in the two months since
 the machine has been running. 2.6.34.7 kernel and qemu-kvm 0.13.

 We host customer vms on it though, so virtual machines come and go.
 Various operating systems, a mixture of Linux, FreeBSD and Windows
 2008 R2. We have other machines with the same config without these
 problems though.

 Are those other machines running a similar workload?

 Yes, similar, or they're more heavily loaded.

 On this machine, about half of the 48GB memory was used for virtual machines.

 The traces look awfully like bad hardware, though that can also be explained
 by random memory corruption due to a bug.

 Yeah, that's what I'm expecting. We already replaced the memory, next
 step is to move the disks over to another server to make sure it's not
 the board or cpu's.

 This time I have a few different messages though:

 2011-01-25T11:58:50.001208+01:00 phy005 kernel: general protection fault:
  [#1] SMP

 RSI:  RDI: 1603a07305001568

 2011-01-25T11:58:50.001486+01:00 phy005 kernel: Code: ff ff 41 8b 46
 08 41 29 06 4c 89 e7 57 9d 0f 1f 44 00 00 48 83 c4 18 5b 41 5c 41 5d
 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00f0  ff 4f 08 0f 94 c0 84
 c0 74 10 85 f6 75 07 e8 63 fe ff ff eb

 lock decl 0x8(%rdi)

 %rdi is completely crap, looks like corruption again.  Strangely, it is
 similar to the bad spte from the previous trace: 0x1603a0730500d277.  The
 upper 48 bits are identical, the lower 16 bits are different.:

 2011-01-25T12:06:32.673937+01:00 phy005 kernel: qemu-kvm: Corrupted
 page table at address 7f37b37ff000
 2011-01-25T12:06:32.673959+01:00 phy005 kernel: PGD c201d1067 PUD
 94e538067 PMD 61e5bf067 PTE 1603a0730500e067

 Here are those magic 48 bits again, in the PTE entry.

 2011-01-25T12:38:49.416943+01:00 phy005 kernel: EPT: Misconfiguration.
 2011-01-25T12:38:49.417518+01:00 phy005 kernel: EPT: GPA: 0x2abff038
 2011-01-25T12:38:49.417526+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x5f49e9007 level 4
 2011-01-25T12:38:49.417532+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x5db595007 level 3
 2011-01-25T12:38:49.417553+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x5d5da7007 level 2
 2011-01-25T12:38:49.417558+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x1603a07305006277 level 1

 Again.

 2011-01-25T13:16:58.192440+01:00 phy005 kernel: BUG: Bad page map in
 process qemu-kvm  pte:1603a0730500d067 pmd:61059f067

 Again.

 However, these all came from a single boot, yes?

 Correct.

 If so they can be the same
 corruption.  Please collect more traces, with reboots in between.

 This machine has been running for a week without problems, but then we
 started to get the following oopses again:

 2011-02-06T19:45:35.221555+01:00 phy005 kernel: BUG: unable to handle
 kernel paging request at ea71929180e0
 2011-02-06T19:45:35.222194+01:00 phy005 kernel: IP:
 [81034880] gup_pte_range+0x94/0xd3
 2011-02-06T19:45:35.222199+01:00 phy005 kernel: PGD 118600067 PUD 0
 2011-02-06T19:45:35.03+01:00 phy005 kernel: Oops:  [#1] SMP
 2011-02-06T19:45:35.21+01:00 phy005 kernel: last sysfs file:
 /sys/devices/system/cpu/cpu15/topology/thread_siblings
 2011-02-06T19:45:35.24+01:00 phy005 kernel: CPU 4
 2011-02-06T19:45:35.29+01:00 phy005 kernel: Modules linked in: tun
 ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
 xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
 ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt serio_raw igb
 iTCO_vendor_support joydev ioatdma dca 3w_9xxx [last unloaded:
 scsi_wait_scan]
 2011-02-06T19:45:35.31+01:00 phy005 kernel:
 2011-02-06T19:45:35.33+01:00 phy005 kernel: Pid: 3650, comm:
 qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X8DTU/X8DTU
 2011-02-06T19:45:35.36+01:00 phy005 kernel: RIP:
 0010:[81034880]  [81034880]
 gup_pte_range+0x94/0xd3
 2011-02-06T19:45:35.39+01:00 phy005 kernel: RSP:
 0018:88060b9bda78  EFLAGS: 00010082
 2011-02-06T19:45:35.41+01:00 phy005 kernel: RAX: ea71929180e0
 RBX: 3000 RCX: 0005
 2011-02-06T19:45:35.43+01:00 phy005 kernel: RDX: 7fe54e40
 RSI: 7fe54e3ff000 RDI: 1603a07305004067
 2011-02-06T19:45:35.45+01:00 phy005 kernel: RBP: 88060b9bda98
 R08: 880b94384560 R09: 88060b9bdb44
 2011-02-06T19:45:35.48+01:00 phy005 kernel: R10: 880606b2fff8
 R11: ea00 R12: 0205
 2011-02-06T19:45:35.51+01:00 phy005 kernel: R13: cfff
 R14: 0005 R15: 
 

Re: EPT: Misconfiguration

2011-02-10 Thread Ruben Kerkhof
On Wed, Jan 26, 2011 at 16:00, Ruben Kerkhof ru...@rubenkerkhof.com wrote:
 On Wed, Jan 26, 2011 at 10:52, Avi Kivity a...@redhat.com wrote:
 On 01/25/2011 08:29 PM, Ruben Kerkhof wrote:

   When you say suddenly, this was with no changes to software and
  hardware?

 The host software and hardware hasn't changed in the two months since
 the machine has been running. 2.6.34.7 kernel and qemu-kvm 0.13.

 We host customer vms on it though, so virtual machines come and go.
 Various operating systems, a mixture of Linux, FreeBSD and Windows
 2008 R2. We have other machines with the same config without these
 problems though.

 Are those other machines running a similar workload?

 Yes, similar, or they're more heavily loaded.

 On this machine, about half of the 48GB memory was used for virtual machines.

 The traces look awfully like bad hardware, though that can also be explained
 by random memory corruption due to a bug.

 Yeah, that's what I'm expecting. We already replaced the memory, next
 step is to move the disks over to another server to make sure it's not
 the board or cpu's.

 This time I have a few different messages though:

 2011-01-25T11:58:50.001208+01:00 phy005 kernel: general protection fault:
  [#1] SMP

 RSI:  RDI: 1603a07305001568

 2011-01-25T11:58:50.001486+01:00 phy005 kernel: Code: ff ff 41 8b 46
 08 41 29 06 4c 89 e7 57 9d 0f 1f 44 00 00 48 83 c4 18 5b 41 5c 41 5d
 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00f0  ff 4f 08 0f 94 c0 84
 c0 74 10 85 f6 75 07 e8 63 fe ff ff eb

 lock decl 0x8(%rdi)

 %rdi is completely crap, looks like corruption again.  Strangely, it is
 similar to the bad spte from the previous trace: 0x1603a0730500d277.  The
 upper 48 bits are identical, the lower 16 bits are different.:

 2011-01-25T12:06:32.673937+01:00 phy005 kernel: qemu-kvm: Corrupted
 page table at address 7f37b37ff000
 2011-01-25T12:06:32.673959+01:00 phy005 kernel: PGD c201d1067 PUD
 94e538067 PMD 61e5bf067 PTE 1603a0730500e067

 Here are those magic 48 bits again, in the PTE entry.

 2011-01-25T12:38:49.416943+01:00 phy005 kernel: EPT: Misconfiguration.
 2011-01-25T12:38:49.417518+01:00 phy005 kernel: EPT: GPA: 0x2abff038
 2011-01-25T12:38:49.417526+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x5f49e9007 level 4
 2011-01-25T12:38:49.417532+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x5db595007 level 3
 2011-01-25T12:38:49.417553+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x5d5da7007 level 2
 2011-01-25T12:38:49.417558+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x1603a07305006277 level 1

 Again.

 2011-01-25T13:16:58.192440+01:00 phy005 kernel: BUG: Bad page map in
 process qemu-kvm  pte:1603a0730500d067 pmd:61059f067

 Again.

 However, these all came from a single boot, yes?

 Correct.

 If so they can be the same
 corruption.  Please collect more traces, with reboots in between.

This machine has been running for a week without problems, but then we
started to get the following oopses again:

2011-02-06T19:45:35.221555+01:00 phy005 kernel: BUG: unable to handle
kernel paging request at ea71929180e0
2011-02-06T19:45:35.222194+01:00 phy005 kernel: IP:
[81034880] gup_pte_range+0x94/0xd3
2011-02-06T19:45:35.222199+01:00 phy005 kernel: PGD 118600067 PUD 0
2011-02-06T19:45:35.03+01:00 phy005 kernel: Oops:  [#1] SMP
2011-02-06T19:45:35.21+01:00 phy005 kernel: last sysfs file:
/sys/devices/system/cpu/cpu15/topology/thread_siblings
2011-02-06T19:45:35.24+01:00 phy005 kernel: CPU 4
2011-02-06T19:45:35.29+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt serio_raw igb
iTCO_vendor_support joydev ioatdma dca 3w_9xxx [last unloaded:
scsi_wait_scan]
2011-02-06T19:45:35.31+01:00 phy005 kernel:
2011-02-06T19:45:35.33+01:00 phy005 kernel: Pid: 3650, comm:
qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X8DTU/X8DTU
2011-02-06T19:45:35.36+01:00 phy005 kernel: RIP:
0010:[81034880]  [81034880]
gup_pte_range+0x94/0xd3
2011-02-06T19:45:35.39+01:00 phy005 kernel: RSP:
0018:88060b9bda78  EFLAGS: 00010082
2011-02-06T19:45:35.41+01:00 phy005 kernel: RAX: ea71929180e0
RBX: 3000 RCX: 0005
2011-02-06T19:45:35.43+01:00 phy005 kernel: RDX: 7fe54e40
RSI: 7fe54e3ff000 RDI: 1603a07305004067
2011-02-06T19:45:35.45+01:00 phy005 kernel: RBP: 88060b9bda98
R08: 880b94384560 R09: 88060b9bdb44
2011-02-06T19:45:35.48+01:00 phy005 kernel: R10: 880606b2fff8
R11: ea00 R12: 0205
2011-02-06T19:45:35.51+01:00 phy005 kernel: R13: cfff
R14: 0005 R15: 
2011-02-06T19:45:35.55+01:00 phy005 kernel: FS:
7fe64cb0e700() GS:88065540()
knlGS:

Re: EPT: Misconfiguration

2011-01-26 Thread Avi Kivity

On 01/25/2011 08:29 PM, Ruben Kerkhof wrote:

  When you say suddenly, this was with no changes to software and hardware?

The host software and hardware hasn't changed in the two months since
the machine has been running. 2.6.34.7 kernel and qemu-kvm 0.13.

We host customer vms on it though, so virtual machines come and go.
Various operating systems, a mixture of Linux, FreeBSD and Windows
2008 R2. We have other machines with the same config without these
problems though.


Are those other machines running a similar workload?

The traces look awfully like bad hardware, though that can also be 
explained by random memory corruption due to a bug.



This time I have a few different messages though:

2011-01-25T11:58:50.001208+01:00 phy005 kernel: general protection fault:  
[#1] SMP

RSI:  RDI: 1603a07305001568

2011-01-25T11:58:50.001486+01:00 phy005 kernel: Code: ff ff 41 8b 46
08 41 29 06 4c 89 e7 57 9d 0f 1f 44 00 00 48 83 c4 18 5b 41 5c 41 5d
41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00f0  ff 4f 08 0f 94 c0 84
c0 74 10 85 f6 75 07 e8 63 fe ff ff eb


lock decl 0x8(%rdi)

%rdi is completely crap, looks like corruption again.  Strangely, it is 
similar to the bad spte from the previous trace: 0x1603a0730500d277.  
The upper 48 bits are identical, the lower 16 bits are different.:

2011-01-25T12:06:32.673937+01:00 phy005 kernel: qemu-kvm: Corrupted
page table at address 7f37b37ff000
2011-01-25T12:06:32.673959+01:00 phy005 kernel: PGD c201d1067 PUD
94e538067 PMD 61e5bf067 PTE 1603a0730500e067


Here are those magic 48 bits again, in the PTE entry.

2011-01-25T12:38:49.416943+01:00 phy005 kernel: EPT: Misconfiguration.
2011-01-25T12:38:49.417518+01:00 phy005 kernel: EPT: GPA: 0x2abff038
2011-01-25T12:38:49.417526+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x5f49e9007 level 4
2011-01-25T12:38:49.417532+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x5db595007 level 3
2011-01-25T12:38:49.417553+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x5d5da7007 level 2
2011-01-25T12:38:49.417558+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x1603a07305006277 level 1


Again.


2011-01-25T13:16:58.192440+01:00 phy005 kernel: BUG: Bad page map in
process qemu-kvm  pte:1603a0730500d067 pmd:61059f067


Again.

However, these all came from a single boot, yes?  If so they can be the 
same corruption.  Please collect more traces, with reboots in between.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: EPT: Misconfiguration

2011-01-26 Thread Ruben Kerkhof
On Wed, Jan 26, 2011 at 10:52, Avi Kivity a...@redhat.com wrote:
 On 01/25/2011 08:29 PM, Ruben Kerkhof wrote:

   When you say suddenly, this was with no changes to software and
  hardware?

 The host software and hardware hasn't changed in the two months since
 the machine has been running. 2.6.34.7 kernel and qemu-kvm 0.13.

 We host customer vms on it though, so virtual machines come and go.
 Various operating systems, a mixture of Linux, FreeBSD and Windows
 2008 R2. We have other machines with the same config without these
 problems though.

 Are those other machines running a similar workload?

Yes, similar, or they're more heavily loaded.

On this machine, about half of the 48GB memory was used for virtual machines.

 The traces look awfully like bad hardware, though that can also be explained
 by random memory corruption due to a bug.

Yeah, that's what I'm expecting. We already replaced the memory, next
step is to move the disks over to another server to make sure it's not
the board or cpu's.

 This time I have a few different messages though:

 2011-01-25T11:58:50.001208+01:00 phy005 kernel: general protection fault:
  [#1] SMP

 RSI:  RDI: 1603a07305001568

 2011-01-25T11:58:50.001486+01:00 phy005 kernel: Code: ff ff 41 8b 46
 08 41 29 06 4c 89 e7 57 9d 0f 1f 44 00 00 48 83 c4 18 5b 41 5c 41 5d
 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00f0  ff 4f 08 0f 94 c0 84
 c0 74 10 85 f6 75 07 e8 63 fe ff ff eb

 lock decl 0x8(%rdi)

 %rdi is completely crap, looks like corruption again.  Strangely, it is
 similar to the bad spte from the previous trace: 0x1603a0730500d277.  The
 upper 48 bits are identical, the lower 16 bits are different.:

 2011-01-25T12:06:32.673937+01:00 phy005 kernel: qemu-kvm: Corrupted
 page table at address 7f37b37ff000
 2011-01-25T12:06:32.673959+01:00 phy005 kernel: PGD c201d1067 PUD
 94e538067 PMD 61e5bf067 PTE 1603a0730500e067

 Here are those magic 48 bits again, in the PTE entry.

 2011-01-25T12:38:49.416943+01:00 phy005 kernel: EPT: Misconfiguration.
 2011-01-25T12:38:49.417518+01:00 phy005 kernel: EPT: GPA: 0x2abff038
 2011-01-25T12:38:49.417526+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x5f49e9007 level 4
 2011-01-25T12:38:49.417532+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x5db595007 level 3
 2011-01-25T12:38:49.417553+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x5d5da7007 level 2
 2011-01-25T12:38:49.417558+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x1603a07305006277 level 1

 Again.

 2011-01-25T13:16:58.192440+01:00 phy005 kernel: BUG: Bad page map in
 process qemu-kvm  pte:1603a0730500d067 pmd:61059f067

 Again.

 However, these all came from a single boot, yes?

Correct.

 If so they can be the same
 corruption.  Please collect more traces, with reboots in between.

Ok, thanks, will do.

Kind regards,

Ruben
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: EPT: Misconfiguration

2011-01-25 Thread Ruben Kerkhof
Hi Marcello,

On Fri, Jan 21, 2011 at 14:22, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Thu, Jan 20, 2011 at 12:48:00PM +0100, Ruben Kerkhof wrote:
 I'm suddenly getting lots of the following errors on a server running
 2.36.7, but I have no idea what it means:

 2011-01-20T12:41:18.358603+01:00 phy005 kernel: EPT: Misconfiguration.
 2011-01-20T12:41:18.358621+01:00 phy005 kernel: EPT: GPA: 0x3dbff6b0
 2011-01-20T12:41:18.358624+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x50743e007 level 4
 2011-01-20T12:41:18.358627+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x523de2007 level 3
 2011-01-20T12:41:18.358629+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x62336f007 level 2
 2011-01-20T12:41:18.360109+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x1603a0730500d277 level 1
 2011-01-20T12:41:18.360137+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: rsvd_bits = 0x3a000
 2011-01-20T12:41:18.360151+01:00 phy005 kernel: [ cut here
 ]

 A shadow pagetable entry in memory has bits 45-49 set, which is not
 allowed. Its probably bad memory if this errors were not present before
 with the same workload and host software. Would be useful to see what
 memtest86 says.

I did 2 memtest86+ passes, but no errors were found.

Just to be save, we replaced all memory. The machine has been running
stable over the weekend, but now gives exactly the same error.

Is there anything else which could cause this?

Kind regards,

Ruben
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: EPT: Misconfiguration

2011-01-25 Thread Avi Kivity

On 01/25/2011 04:44 PM, Ruben Kerkhof wrote:

Hi Marcello,

On Fri, Jan 21, 2011 at 14:22, Marcelo Tosattimtosa...@redhat.com  wrote:
  On Thu, Jan 20, 2011 at 12:48:00PM +0100, Ruben Kerkhof wrote:
  I'm suddenly getting lots of the following errors on a server running
  2.36.7, but I have no idea what it means:

  2011-01-20T12:41:18.358603+01:00 phy005 kernel: EPT: Misconfiguration.
  2011-01-20T12:41:18.358621+01:00 phy005 kernel: EPT: GPA: 0x3dbff6b0
  2011-01-20T12:41:18.358624+01:00 phy005 kernel:
  ept_misconfig_inspect_spte: spte 0x50743e007 level 4
  2011-01-20T12:41:18.358627+01:00 phy005 kernel:
  ept_misconfig_inspect_spte: spte 0x523de2007 level 3
  2011-01-20T12:41:18.358629+01:00 phy005 kernel:
  ept_misconfig_inspect_spte: spte 0x62336f007 level 2
  2011-01-20T12:41:18.360109+01:00 phy005 kernel:
  ept_misconfig_inspect_spte: spte 0x1603a0730500d277 level 1
  2011-01-20T12:41:18.360137+01:00 phy005 kernel:
  ept_misconfig_inspect_spte: rsvd_bits = 0x3a000
  2011-01-20T12:41:18.360151+01:00 phy005 kernel: [ cut here
  ]

  A shadow pagetable entry in memory has bits 45-49 set, which is not
  allowed. Its probably bad memory if this errors were not present before
  with the same workload and host software. Would be useful to see what
  memtest86 says.

I did 2 memtest86+ passes, but no errors were found.

Just to be save, we replaced all memory. The machine has been running
stable over the weekend, but now gives exactly the same error.

Is there anything else which could cause this?


Try updating the BIOS.

When you say suddenly, this was with no changes to software and hardware?

Is cooling adequate?

How much memory is on that machine?  Even outside the reserved bits the 
address looks way too large.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: EPT: Misconfiguration

2011-01-25 Thread Ruben Kerkhof
Hi Avi,

On Tue, Jan 25, 2011 at 18:39, Avi Kivity a...@redhat.com wrote:
 On 01/25/2011 04:44 PM, Ruben Kerkhof wrote:

 Hi Marcello,

 On Fri, Jan 21, 2011 at 14:22, Marcelo Tosattimtosa...@redhat.com
  wrote:
   On Thu, Jan 20, 2011 at 12:48:00PM +0100, Ruben Kerkhof wrote:
   I'm suddenly getting lots of the following errors on a server running
   2.36.7, but I have no idea what it means:
 
   2011-01-20T12:41:18.358603+01:00 phy005 kernel: EPT: Misconfiguration.
   2011-01-20T12:41:18.358621+01:00 phy005 kernel: EPT: GPA: 0x3dbff6b0
   2011-01-20T12:41:18.358624+01:00 phy005 kernel:
   ept_misconfig_inspect_spte: spte 0x50743e007 level 4
   2011-01-20T12:41:18.358627+01:00 phy005 kernel:
   ept_misconfig_inspect_spte: spte 0x523de2007 level 3
   2011-01-20T12:41:18.358629+01:00 phy005 kernel:
   ept_misconfig_inspect_spte: spte 0x62336f007 level 2
   2011-01-20T12:41:18.360109+01:00 phy005 kernel:
   ept_misconfig_inspect_spte: spte 0x1603a0730500d277 level 1
   2011-01-20T12:41:18.360137+01:00 phy005 kernel:
   ept_misconfig_inspect_spte: rsvd_bits = 0x3a000
   2011-01-20T12:41:18.360151+01:00 phy005 kernel: [ cut here
   ]
 
   A shadow pagetable entry in memory has bits 45-49 set, which is not
   allowed. Its probably bad memory if this errors were not present before
   with the same workload and host software. Would be useful to see what
   memtest86 says.

 I did 2 memtest86+ passes, but no errors were found.

 Just to be save, we replaced all memory. The machine has been running
 stable over the weekend, but now gives exactly the same error.

 Is there anything else which could cause this?

 Try updating the BIOS.

That's the first thing we did. It's a Supermicro with an X8DTU-F
board, updated to bios version 2.0b (which includes the latest
microcode). The procs are Intel 5620's

 When you say suddenly, this was with no changes to software and hardware?

The host software and hardware hasn't changed in the two months since
the machine has been running. 2.6.34.7 kernel and qemu-kvm 0.13.

We host customer vms on it though, so virtual machines come and go.
Various operating systems, a mixture of Linux, FreeBSD and Windows
2008 R2. We have other machines with the same config without these
problems though.

 Is cooling adequate?

Yes.

 How much memory is on that machine?  Even outside the reserved bits the
 address looks way too large.

48GB.

This time I have a few different messages though:

2011-01-25T11:58:50.001208+01:00 phy005 kernel: general protection
fault:  [#1] SMP
2011-01-25T11:58:50.001310+01:00 phy005 kernel: last sysfs file:
/sys/devices/system/cpu/cpu15/topology/thread_siblings
2011-01-25T11:58:50.001316+01:00 phy005 kernel: CPU 12
2011-01-25T11:58:50.001323+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 kvm_intel kvm igb i2c_i801 iTCO_wdt i2c_core ioatdma
joydev iTCO_vendor_support dca serio_raw 3w_9xxx [last unloaded:
scsi_wait_scan]
2011-01-25T11:58:50.001327+01:00 phy005 kernel:
2011-01-25T11:58:50.001331+01:00 phy005 kernel: Pid: 1849, comm:
qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X8DTU/X8DTU
2011-01-25T11:58:50.001336+01:00 phy005 kernel: RIP:
0010:[810d0216]  [810d0216] __free_pages+0x9/0x26
2011-01-25T11:58:50.001339+01:00 phy005 kernel: RSP:
0018:8802fbe45ab8  EFLAGS: 00010216
2011-01-25T11:58:50.001343+01:00 phy005 kernel: RAX: 88061ef8c000
RBX: 8803131ec100 RCX: 
2011-01-25T11:58:50.001348+01:00 phy005 kernel: RDX: 00ff
RSI:  RDI: 1603a07305001568
2011-01-25T11:58:50.001352+01:00 phy005 kernel: RBP: 8802fbe45ab8
R08: ea000a83b7f0 R09: 0004
2011-01-25T11:58:50.001356+01:00 phy005 kernel: R10: 
R11: 8802fbe45b38 R12: 0100
2011-01-25T11:58:50.001359+01:00 phy005 kernel: R13: 0001
R14: 8802e934c010 R15: 8802e934c010
2011-01-25T11:58:50.001363+01:00 phy005 kernel: FS:
7f1f14844700() GS:88065548()
knlGS:
2011-01-25T11:58:50.001366+01:00 phy005 kernel: CS:  0010 DS:  ES:
 CR0: 8005003b
2011-01-25T11:58:50.001370+01:00 phy005 kernel: CR2: b72f6cb0
CR3: 000ba561c000 CR4: 26e0
2011-01-25T11:58:50.001374+01:00 phy005 kernel: DR0: 
DR1:  DR2: 
2011-01-25T11:58:50.001378+01:00 phy005 kernel: DR3: 
DR6: 0ff0 DR7: 0400
2011-01-25T11:58:50.001382+01:00 phy005 kernel: Process qemu-kvm (pid:
1849, threadinfo 8802fbe44000, task 8802ea11aee0)
2011-01-25T11:58:50.001385+01:00 phy005 kernel: Stack:
2011-01-25T11:58:50.001389+01:00 phy005 kernel: 8802fbe45af8
810ee455 0206 c9001e2d4000
2011-01-25T11:58:50.001392+01:00 phy005 kernel: 0 8802e934c010

Re: EPT: Misconfiguration

2011-01-21 Thread Marcelo Tosatti
On Thu, Jan 20, 2011 at 12:48:00PM +0100, Ruben Kerkhof wrote:
 I'm suddenly getting lots of the following errors on a server running
 2.36.7, but I have no idea what it means:
 
 2011-01-20T12:41:18.358603+01:00 phy005 kernel: EPT: Misconfiguration.
 2011-01-20T12:41:18.358621+01:00 phy005 kernel: EPT: GPA: 0x3dbff6b0
 2011-01-20T12:41:18.358624+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x50743e007 level 4
 2011-01-20T12:41:18.358627+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x523de2007 level 3
 2011-01-20T12:41:18.358629+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x62336f007 level 2
 2011-01-20T12:41:18.360109+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: spte 0x1603a0730500d277 level 1
 2011-01-20T12:41:18.360137+01:00 phy005 kernel:
 ept_misconfig_inspect_spte: rsvd_bits = 0x3a000
 2011-01-20T12:41:18.360151+01:00 phy005 kernel: [ cut here
 ]

A shadow pagetable entry in memory has bits 45-49 set, which is not
allowed. Its probably bad memory if this errors were not present before 
with the same workload and host software. Would be useful to see what
memtest86 says.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: EPT: Misconfiguration

2011-01-20 Thread Ruben Kerkhof
On Thu, Jan 20, 2011 at 12:48, Ruben Kerkhof ru...@rubenkerkhof.com wrote:
 I'm suddenly getting lots of the following errors on a server running
 2.36.7, but I have no idea what it means:

Sorry, that should be 2.34.7.

Kind regards,

Ruben
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html