Re: general protection fault in vmx_vcpu_run (2)

2021-02-25 Thread Sean Christopherson
On Thu, Feb 25, 2021, Dmitry Vyukov wrote:
> On Wed, Feb 24, 2021 at 7:08 PM 'Sean Christopherson' via
> syzkaller-bugs  wrote:
> >
> > On Wed, Feb 24, 2021, Borislav Petkov wrote:
> > > Hi Dmitry,
> > >
> > > On Wed, Feb 24, 2021 at 06:12:57PM +0100, Dmitry Vyukov wrote:
> > > > Looking at the bisection log, the bisection was distracted by something 
> > > > else.
> > >
> > > Meaning the bisection result:
> > >
> > > 167dcfc08b0b ("x86/mm: Increase pgt_buf size for 5-level page tables")
> > >
> > > is bogus?
> >
> > Ya, looks 100% bogus.
> >
> > > > You can always find the original reported issue over the dashboard link:
> > > > https://syzkaller.appspot.com/bug?extid=42a71c84ef04577f1aef
> > > > or on lore:
> > > > https://lore.kernel.org/lkml/7ff56205ba985...@google.com/
> > >
> > > Ok, so this looks like this is trying to run kvm ioctls *in* a guest,
> > > i.e., nested. Right?
> >
> > Yep.  I tried to run the reproducer yesterday, but the kernel config 
> > wouldn't
> > boot my VM.  I haven't had time to dig in.  Anyways, I think you can safely
> > assume this is a KVM issue unless more data comes along that says otherwise.
> 
> Interesting. What happens? Does the kernel crash? Userspace crash?
> Rootfs is not mounted? Or something else?

Not sure, it ended up in the EFI shell instead of the kernel (running with 
QEMU's
-kernel).  My QEMU+KVM setup does a variety of shenanigans, I'm guessing it's an
incompatibility in my setup.


Re: general protection fault in vmx_vcpu_run (2)

2021-02-25 Thread Dmitry Vyukov
On Wed, Feb 24, 2021 at 7:08 PM 'Sean Christopherson' via
syzkaller-bugs  wrote:
>
> On Wed, Feb 24, 2021, Borislav Petkov wrote:
> > Hi Dmitry,
> >
> > On Wed, Feb 24, 2021 at 06:12:57PM +0100, Dmitry Vyukov wrote:
> > > Looking at the bisection log, the bisection was distracted by something 
> > > else.
> >
> > Meaning the bisection result:
> >
> > 167dcfc08b0b ("x86/mm: Increase pgt_buf size for 5-level page tables")
> >
> > is bogus?
>
> Ya, looks 100% bogus.
>
> > > You can always find the original reported issue over the dashboard link:
> > > https://syzkaller.appspot.com/bug?extid=42a71c84ef04577f1aef
> > > or on lore:
> > > https://lore.kernel.org/lkml/7ff56205ba985...@google.com/
> >
> > Ok, so this looks like this is trying to run kvm ioctls *in* a guest,
> > i.e., nested. Right?
>
> Yep.  I tried to run the reproducer yesterday, but the kernel config wouldn't
> boot my VM.  I haven't had time to dig in.  Anyways, I think you can safely
> assume this is a KVM issue unless more data comes along that says otherwise.

Interesting. What happens? Does the kernel crash? Userspace crash?
Rootfs is not mounted? Or something else?


Re: general protection fault in vmx_vcpu_run (2)

2021-02-25 Thread Dmitry Vyukov
On Wed, Feb 24, 2021 at 6:49 PM Borislav Petkov  wrote:
>
> Hi Dmitry,
>
> On Wed, Feb 24, 2021 at 06:12:57PM +0100, Dmitry Vyukov wrote:
> > Looking at the bisection log, the bisection was distracted by something 
> > else.
>
> Meaning the bisection result:
>
> 167dcfc08b0b ("x86/mm: Increase pgt_buf size for 5-level page tables")
>
> is bogus?
>
> > You can always find the original reported issue over the dashboard link:
> > https://syzkaller.appspot.com/bug?extid=42a71c84ef04577f1aef
> > or on lore:
> > https://lore.kernel.org/lkml/7ff56205ba985...@google.com/
>
> Ok, so this looks like this is trying to run kvm ioctls *in* a guest,
> i.e., nested. Right?

Yes, testing happens in VM. But the kernel that crashes is the one
that receives the ioctls.


Re: general protection fault in vmx_vcpu_run (2)

2021-02-24 Thread Sean Christopherson
On Wed, Feb 24, 2021, Borislav Petkov wrote:
> Hi Dmitry,
> 
> On Wed, Feb 24, 2021 at 06:12:57PM +0100, Dmitry Vyukov wrote:
> > Looking at the bisection log, the bisection was distracted by something 
> > else.
> 
> Meaning the bisection result:
> 
> 167dcfc08b0b ("x86/mm: Increase pgt_buf size for 5-level page tables")
> 
> is bogus?

Ya, looks 100% bogus.

> > You can always find the original reported issue over the dashboard link:
> > https://syzkaller.appspot.com/bug?extid=42a71c84ef04577f1aef
> > or on lore:
> > https://lore.kernel.org/lkml/7ff56205ba985...@google.com/
> 
> Ok, so this looks like this is trying to run kvm ioctls *in* a guest,
> i.e., nested. Right?

Yep.  I tried to run the reproducer yesterday, but the kernel config wouldn't
boot my VM.  I haven't had time to dig in.  Anyways, I think you can safely
assume this is a KVM issue unless more data comes along that says otherwise.


Re: general protection fault in vmx_vcpu_run (2)

2021-02-24 Thread Borislav Petkov
Hi Dmitry,

On Wed, Feb 24, 2021 at 06:12:57PM +0100, Dmitry Vyukov wrote:
> Looking at the bisection log, the bisection was distracted by something else.

Meaning the bisection result:

167dcfc08b0b ("x86/mm: Increase pgt_buf size for 5-level page tables")

is bogus?

> You can always find the original reported issue over the dashboard link:
> https://syzkaller.appspot.com/bug?extid=42a71c84ef04577f1aef
> or on lore:
> https://lore.kernel.org/lkml/7ff56205ba985...@google.com/

Ok, so this looks like this is trying to run kvm ioctls *in* a guest,
i.e., nested. Right?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: general protection fault in vmx_vcpu_run (2)

2021-02-24 Thread Dmitry Vyukov
On Wed, Feb 24, 2021 at 1:27 PM Borislav Petkov  wrote:
>
> On Tue, Feb 23, 2021 at 03:17:07PM -0800, syzbot wrote:
> > syzbot has bisected this issue to:
> >
> > commit 167dcfc08b0b1f964ea95d410aa496fd78adf475
> > Author: Lorenzo Stoakes 
> > Date:   Tue Dec 15 20:56:41 2020 +
> >
> > x86/mm: Increase pgt_buf size for 5-level page tables
> >
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13fe3ea8d0
> > start commit:   a99163e9 Merge tag 'devicetree-for-5.12' of git://git.kern..
> > git tree:   upstream
> > final oops: https://syzkaller.appspot.com/x/report.txt?x=10013ea8d0
>
> No oops here.
>
> > console output: https://syzkaller.appspot.com/x/log.txt?x=17fe3ea8d0
>
> Nothing special here too.
>
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=49116074dd53b631
>
> Tried this on two boxes, the Intel one doesn't even boot with that
> config - and it is pretty standard one - and on the AMD one the
> reproducer doesn't trigger anything. It probably won't because the GP
> is in vmx_vcpu_run() but since the ioctls were doing something with
> IRQCHIP, I thought it is probably vendor-agnostic.
>
> So, all in all, I could use some more info on how you're reproducing and
> maybe you could show the oops too.

Hi Boris,

Looking at the bisection log, the bisection was distracted by something else.
You can always find the original reported issue over the dashboard link:
https://syzkaller.appspot.com/bug?extid=42a71c84ef04577f1aef
or on lore:
https://lore.kernel.org/lkml/7ff56205ba985...@google.com/


Re: general protection fault in vmx_vcpu_run (2)

2021-02-24 Thread Borislav Petkov
On Tue, Feb 23, 2021 at 03:17:07PM -0800, syzbot wrote:
> syzbot has bisected this issue to:
> 
> commit 167dcfc08b0b1f964ea95d410aa496fd78adf475
> Author: Lorenzo Stoakes 
> Date:   Tue Dec 15 20:56:41 2020 +
> 
> x86/mm: Increase pgt_buf size for 5-level page tables
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13fe3ea8d0
> start commit:   a99163e9 Merge tag 'devicetree-for-5.12' of git://git.kern..
> git tree:   upstream
> final oops: https://syzkaller.appspot.com/x/report.txt?x=10013ea8d0

No oops here.

> console output: https://syzkaller.appspot.com/x/log.txt?x=17fe3ea8d0

Nothing special here too.

> kernel config:  https://syzkaller.appspot.com/x/.config?x=49116074dd53b631

Tried this on two boxes, the Intel one doesn't even boot with that
config - and it is pretty standard one - and on the AMD one the
reproducer doesn't trigger anything. It probably won't because the GP
is in vmx_vcpu_run() but since the ioctls were doing something with
IRQCHIP, I thought it is probably vendor-agnostic.

So, all in all, I could use some more info on how you're reproducing and
maybe you could show the oops too.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: general protection fault in vmx_vcpu_run (2)

2021-02-23 Thread syzbot
syzbot has bisected this issue to:

commit 167dcfc08b0b1f964ea95d410aa496fd78adf475
Author: Lorenzo Stoakes 
Date:   Tue Dec 15 20:56:41 2020 +

x86/mm: Increase pgt_buf size for 5-level page tables

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13fe3ea8d0
start commit:   a99163e9 Merge tag 'devicetree-for-5.12' of git://git.kern..
git tree:   upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=10013ea8d0
console output: https://syzkaller.appspot.com/x/log.txt?x=17fe3ea8d0
kernel config:  https://syzkaller.appspot.com/x/.config?x=49116074dd53b631
dashboard link: https://syzkaller.appspot.com/bug?extid=42a71c84ef04577f1aef
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=141f3f04d0
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17de4f12d0

Reported-by: syzbot+42a71c84ef04577f1...@syzkaller.appspotmail.com
Fixes: 167dcfc08b0b ("x86/mm: Increase pgt_buf size for 5-level page tables")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection


Re: general protection fault in vmx_vcpu_run (2)

2021-02-23 Thread syzbot
syzbot has found a reproducer for the following issue on:

HEAD commit:a99163e9 Merge tag 'devicetree-for-5.12' of git://git.kern..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15cd357f50
kernel config:  https://syzkaller.appspot.com/x/.config?x=49116074dd53b631
dashboard link: https://syzkaller.appspot.com/bug?extid=42a71c84ef04577f1aef
compiler:   Debian clang version 11.0.1-2
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=12c7f8a8d0
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=137fc232d0

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+42a71c84ef04577f1...@syzkaller.appspotmail.com

RBP: 00402ed0 R08: 00400488 R09: 00400488
R10: 00400488 R11: 0246 R12: 00402f60
R13:  R14: 004ac018 R15: 00400488
==
BUG: KASAN: global-out-of-bounds in atomic_switch_perf_msrs 
arch/x86/kvm/vmx/vmx.c:6604 [inline]
BUG: KASAN: global-out-of-bounds in vmx_vcpu_run+0x4f1/0x13f0 
arch/x86/kvm/vmx/vmx.c:6771
Read of size 8 at addr 89a000e9 by task syz-executor198/8346

CPU: 0 PID: 8346 Comm: syz-executor198 Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x125/0x19e lib/dump_stack.c:120
 print_address_description+0x5f/0x3a0 mm/kasan/report.c:230
 __kasan_report mm/kasan/report.c:396 [inline]
 kasan_report+0x15e/0x200 mm/kasan/report.c:413
 atomic_switch_perf_msrs arch/x86/kvm/vmx/vmx.c:6604 [inline]
 vmx_vcpu_run+0x4f1/0x13f0 arch/x86/kvm/vmx/vmx.c:6771
 vcpu_enter_guest+0x2ed9/0x8f10 arch/x86/kvm/x86.c:9074
 vcpu_run+0x316/0xb70 arch/x86/kvm/x86.c:9225
 kvm_arch_vcpu_ioctl_run+0x4e8/0xa40 arch/x86/kvm/x86.c:9453
 kvm_vcpu_ioctl+0x62a/0xa30 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3295
 vfs_ioctl fs/ioctl.c:48 [inline]
 __do_sys_ioctl fs/ioctl.c:753 [inline]
 __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:739
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x43eee9
Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:7ffe7ad00d38 EFLAGS: 0246 ORIG_RAX: 0010
RAX: ffda RBX: 00400488 RCX: 0043eee9
RDX:  RSI: ae80 RDI: 0005
RBP: 00402ed0 R08: 00400488 R09: 00400488
R10: 00400488 R11: 0246 R12: 00402f60
R13:  R14: 004ac018 R15: 00400488

The buggy address belongs to the variable:
 str__initcall__trace_system_name+0x9/0x40

Memory state around the buggy address:
 899fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 89a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>89a00080: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 f9 f9
  ^
 89a00100: f9 f9 f9 f9 07 f9 f9 f9 f9 f9 f9 f9 00 03 f9 f9
 89a00180: f9 f9 f9 f9 00 06 f9 f9 f9 f9 f9 f9 00 00 00 00
==



Re: general protection fault in vmx_vcpu_run

2018-07-04 Thread Dmitry Vyukov
On Wed, Jul 4, 2018 at 9:31 PM, Raslan, KarimAllah  wrote:
> Dmitry,
>
> Can you share the host kernel version?
>
> I can not reproduce any of these crash signatures and I think it's
> really a nested virtualization bug. So I will need the exact host
> kernel version as well.
>
> I am currently getting all sorts of:
>
> "KVM: entry failed, hardware error 0x7"
>
> ... instead of the crash signatures that you are posting.


Hi Raslan,

The tested kernel runs as GCE VM.
Jim, how can we describe the host kernel for GCE? Potentially only we
can debug this.


> On Sat, 2018-06-30 at 08:09 +, Raslan, KarimAllah wrote:
>> Looking also at the other crash [0]:
>>
>> msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
>> 811f65b7:   e8 44 cb 57 00  callq  81773100
>> <__sanitizer_cov_trace_pc>
>> 811f65bc:   48 8b 54 24 08  mov0x8(%rsp),%rdx
>> 811f65c1:   48 b8 00 00 00 00 00movabs
>> $0xdc00,%rax
>> 811f65c8:   fc ff df
>> 811f65cb:   48 c1 ea 03 shr$0x3,%rdx
>> 811f65cf:   80 3c 02
>> 00 cmpb   $0x0,(%rdx,%rax,1)<- fault here.
>> 811f65d3:   0f 85 36 19 00 00   jne811f7f0f
>> 
>>
>> %rdx should contain a pointer to loaded_vmcs. It is directly loaded
>> from the stack [0x8(%rsp)]. This same stack location was just used
>> before the inlined assembly for VMRESUME/VMLAUNCH here:
>>
>> vmx->__launched = vmx->loaded_vmcs->launched;
>> 811f639f:   e8 5c cd 57 00  callq  81773100
>> <__sanitizer_cov_trace_pc>
>> 811f63a4:   48 8b 54 24 08  mov0x8(%rsp),%rdx
>> 811f63a9:   48 b8 00 00 00 00 00movabs
>> $0xdc00,%rax
>> 811f63b0:   fc ff df
>> 811f63b3:   48 c1 ea 03 shr$0x3,%rdx
>> 811f63b7:   80 3c 02
>> 00 cmpb   $0x0,(%rdx,%rax,1)<- used here.
>>
>> ... and this stack location was never touched by anything in between!
>> So something must have corrupted the stack itself not really the
>> kvm_vc
>> pu struct.
>>
>> Obviously the inlined assembly block is using the stack as well, but I
>> can not see anything that would cause this corruption there.
>>
>> That being said, looking at the %rsp and %rbp values that are dumped
>> in the stack trace:
>>
>> RSP: 8801b7d7f380
>> RBP: 8801b8260140
>>
>> ... they are almost 4.8 MiB apart! Should not these two register be a
>> bit closer to each other? :)
>>
>> So 2 possibilities here:
>>
>> 1- %rsp is wrong
>>
>> That would explain why the loaded_vmcs was NULL. However, it is a bit
>> harder to understand how it became wrong! It should have been restored
>> during the VMEXIT from the HOST_RSP value in the VMCS!
>>
>> Is this a nested setup?
>>
>> 2- %rbp is wrong
>>
>> That would also explain why the loaded_vmcs was NULL. Whatever
>> corrupted the stack that caused loaded_vmcs to be NULL could have also
>> corrupted the %rbp saved in the stack. That would mean that it happened
>> during a function call. All function calls that happened between the
>> point when the stack was sane (just before the "asm" block for
>> VMLAUNCH) and the crash-site are only kcov related. Looking at kcov, I
>> can not see where the stack would get corrupted though! Obviously
>> another source of corruption can be a completely unrelated thread
>> directly corruption this thread's memory.
>>
>> Maybe it would be easier to just try to repro it first and see which
>> one is true (if at all).
>>
>> [0] https://syzkaller.appspot.com/bug?extid=cc483201a3c6436d3550
>>
>>
>> On Thu, 2018-06-28 at 10:18 -0700, Jim Mattson wrote:
>> >
>> >   22: 0f 01 c3  vmresume
>> >   25: 48 89 4c 24 08mov%rcx,0x8(%rsp)
>> >   2a: 59pop%rcx
>> >
>> > :
>> >   2b: 0f 96 81 88 56 00 00 setbe  0x5688(%rcx)
>> >   32: 48 89 81 00 03 00 00 mov%rax,0x300(%rcx)
>> >   39: 48 89 99 18 03 00 00 mov%rbx,0x318(%rcx)
>> >
>> > %rcx should be pointing to the vcpu_vmx structure, but it's not even
>> > canonical: 110035842e78.
>> >
> Amazon Development Center Germany GmbH
> Berlin - Dresden - Aachen
> main office: Krausenstr. 38, 10117 Berlin
> Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
> Ust-ID: DE289237879
> Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: general protection fault in vmx_vcpu_run

2018-07-04 Thread Raslan, KarimAllah
Dmitry,

Can you share the host kernel version?

I can not reproduce any of these crash signatures and I think it's 
really a nested virtualization bug. So I will need the exact host 
kernel version as well.

I am currently getting all sorts of:

"KVM: entry failed, hardware error 0x7"

... instead of the crash signatures that you are posting.

Regards.

On Sat, 2018-06-30 at 08:09 +, Raslan, KarimAllah wrote:
> Looking also at the other crash [0]:
> 
>         msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
> 811f65b7:   e8 44 cb 57 00  callq  81773100
> <__sanitizer_cov_trace_pc>
> 811f65bc:   48 8b 54 24 08  mov0x8(%rsp),%rdx
> 811f65c1:   48 b8 00 00 00 00 00movabs
> $0xdc00,%rax
> 811f65c8:   fc ff df
> 811f65cb:   48 c1 ea 03 shr$0x3,%rdx
> 811f65cf:   80 3c 02
> 00 cmpb   $0x0,(%rdx,%rax,1)        <- fault here.
> 811f65d3:   0f 85 36 19 00 00   jne811f7f0f
> 
> 
> %rdx should contain a pointer to loaded_vmcs. It is directly loaded 
> from the stack [0x8(%rsp)]. This same stack location was just used 
> before the inlined assembly for VMRESUME/VMLAUNCH here:
> 
>         vmx->__launched = vmx->loaded_vmcs->launched;
> 811f639f:   e8 5c cd 57 00  callq  81773100
> <__sanitizer_cov_trace_pc>
> 811f63a4:   48 8b 54 24 08  mov0x8(%rsp),%rdx
> 811f63a9:   48 b8 00 00 00 00 00movabs
> $0xdc00,%rax
> 811f63b0:   fc ff df
> 811f63b3:   48 c1 ea 03 shr$0x3,%rdx
> 811f63b7:   80 3c 02
> 00 cmpb   $0x0,(%rdx,%rax,1)        <- used here.
> 
> ... and this stack location was never touched by anything in between! 
> So something must have corrupted the stack itself not really the 
> kvm_vc
> pu struct.
> 
> Obviously the inlined assembly block is using the stack as well, but I 
> can not see anything that would cause this corruption there.
> 
> That being said, looking at the %rsp and %rbp values that are dumped
> in the stack trace:
> 
> RSP: 8801b7d7f380
> RBP: 8801b8260140
> 
> ... they are almost 4.8 MiB apart! Should not these two register be a 
> bit closer to each other? :)
> 
> So 2 possibilities here:
> 
> 1- %rsp is wrong
> 
> That would explain why the loaded_vmcs was NULL. However, it is a bit 
> harder to understand how it became wrong! It should have been restored 
> during the VMEXIT from the HOST_RSP value in the VMCS!
> 
> Is this a nested setup?
> 
> 2- %rbp is wrong
> 
> That would also explain why the loaded_vmcs was NULL. Whatever
> corrupted the stack that caused loaded_vmcs to be NULL could have also
> corrupted the %rbp saved in the stack. That would mean that it happened
> during a function call. All function calls that happened between the
> point when the stack was sane (just before the "asm" block for
> VMLAUNCH) and the crash-site are only kcov related. Looking at kcov, I
> can not see where the stack would get corrupted though! Obviously
> another source of corruption can be a completely unrelated thread
> directly corruption this thread's memory.
> 
> Maybe it would be easier to just try to repro it first and see which 
> one is true (if at all).
> 
> [0] https://syzkaller.appspot.com/bug?extid=cc483201a3c6436d3550
> 
> 
> On Thu, 2018-06-28 at 10:18 -0700, Jim Mattson wrote:
> > 
> >   22: 0f 01 c3  vmresume
> >   25: 48 89 4c 24 08mov%rcx,0x8(%rsp)
> >   2a: 59pop%rcx
> > 
> > :
> >   2b: 0f 96 81 88 56 00 00 setbe  0x5688(%rcx)
> >   32: 48 89 81 00 03 00 00 mov%rax,0x300(%rcx)
> >   39: 48 89 99 18 03 00 00 mov%rbx,0x318(%rcx)
> > 
> > %rcx should be pointing to the vcpu_vmx structure, but it's not even
> > canonical: 110035842e78.
> > 
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: general protection fault in vmx_vcpu_run

2018-06-30 Thread Raslan, KarimAllah
Looking also at the other crash [0]:

        msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
811f65b7:   e8 44 cb 57 00  callq  81773100
<__sanitizer_cov_trace_pc>
811f65bc:   48 8b 54 24 08  mov0x8(%rsp),%rdx
811f65c1:   48 b8 00 00 00 00 00movabs
$0xdc00,%rax
811f65c8:   fc ff df
811f65cb:   48 c1 ea 03 shr$0x3,%rdx
811f65cf:   80 3c 02
00 cmpb   $0x0,(%rdx,%rax,1)        <- fault here.
811f65d3:   0f 85 36 19 00 00   jne811f7f0f


%rdx should contain a pointer to loaded_vmcs. It is directly loaded 
from the stack [0x8(%rsp)]. This same stack location was just used 
before the inlined assembly for VMRESUME/VMLAUNCH here:

        vmx->__launched = vmx->loaded_vmcs->launched;
811f639f:   e8 5c cd 57 00  callq  81773100
<__sanitizer_cov_trace_pc>
811f63a4:   48 8b 54 24 08  mov0x8(%rsp),%rdx
811f63a9:   48 b8 00 00 00 00 00movabs
$0xdc00,%rax
811f63b0:   fc ff df
811f63b3:   48 c1 ea 03 shr$0x3,%rdx
811f63b7:   80 3c 02
00 cmpb   $0x0,(%rdx,%rax,1)        <- used here.

... and this stack location was never touched by anything in between! 
So something must have corrupted the stack itself not really the 
kvm_vc
pu struct.

Obviously the inlined assembly block is using the stack as well, but I 
can not see anything that would cause this corruption there.

That being said, looking at the %rsp and %rbp values that are dumped
in the stack trace:

RSP: 8801b7d7f380
RBP: 8801b8260140

... they are almost 4.8 MiB apart! Should not these two register be a 
bit closer to each other? :)

So 2 possibilities here:

1- %rsp is wrong

That would explain why the loaded_vmcs was NULL. However, it is a bit 
harder to understand how it became wrong! It should have been restored 
during the VMEXIT from the HOST_RSP value in the VMCS!

Is this a nested setup?

2- %rbp is wrong

That would also explain why the loaded_vmcs was NULL. Whatever
corrupted the stack that caused loaded_vmcs to be NULL could have also
corrupted the %rbp saved in the stack. That would mean that it happened
during a function call. All function calls that happened between the
point when the stack was sane (just before the "asm" block for
VMLAUNCH) and the crash-site are only kcov related. Looking at kcov, I
can not see where the stack would get corrupted though! Obviously
another source of corruption can be a completely unrelated thread
directly corruption this thread's memory.

Maybe it would be easier to just try to repro it first and see which 
one is true (if at all).

[0] https://syzkaller.appspot.com/bug?extid=cc483201a3c6436d3550


On Thu, 2018-06-28 at 10:18 -0700, Jim Mattson wrote:
>   22: 0f 01 c3  vmresume
>   25: 48 89 4c 24 08mov%rcx,0x8(%rsp)
>   2a: 59pop%rcx
> 
> :
>   2b: 0f 96 81 88 56 00 00 setbe  0x5688(%rcx)
>   32: 48 89 81 00 03 00 00 mov%rax,0x300(%rcx)
>   39: 48 89 99 18 03 00 00 mov%rbx,0x318(%rcx)
> 
> %rcx should be pointing to the vcpu_vmx structure, but it's not even
> canonical: 110035842e78.
> 
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: general protection fault in vmx_vcpu_run

2018-06-28 Thread Jim Mattson
  22: 0f 01 c3  vmresume
  25: 48 89 4c 24 08mov%rcx,0x8(%rsp)
  2a: 59pop%rcx

:
  2b: 0f 96 81 88 56 00 00 setbe  0x5688(%rcx)
  32: 48 89 81 00 03 00 00 mov%rax,0x300(%rcx)
  39: 48 89 99 18 03 00 00 mov%rbx,0x318(%rcx)

%rcx should be pointing to the vcpu_vmx structure, but it's not even
canonical: 110035842e78.


Re: general protection fault in vmx_vcpu_run

2018-06-27 Thread Dmitry Vyukov
On Sat, Apr 14, 2018 at 3:07 AM, syzbot
 wrote:
> syzbot has found reproducer for the following crash on upstream commit
> 1bad9ce155a7c010a9a5f3261ad12a6a8eccfb2c (Fri Apr 13 19:27:11 2018 +)
> Merge tag 'sh-for-4.17' of git://git.libc.org/linux-sh
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=cc483201a3c6436d3550
>
> So far this crash happened 4 times on upstream.
> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6257386297753600
> syzkaller reproducer:
> https://syzkaller.appspot.com/x/repro.syz?id=4808329293463552
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=4943675322793984
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-5947642240294114534
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+cc483201a3c6436d3...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed.

#syz dup: BUG: unable to handle kernel paging request in vmx_vcpu_run


> IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> 8021q: adding VLAN 0 to HW filter on device team0
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 6472 Comm: syzkaller667776 Not tainted 4.16.0+ #1
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:vmx_vcpu_run+0x95f/0x25f0 arch/x86/kvm/vmx.c:9746
> RSP: 0018:8801c95bf368 EFLAGS: 00010002
> RAX: 8801b44df6e8 RBX: 8801ada0ec40 RCX: 1100392b7e78
> RDX:  RSI: 81467b15 RDI: 8801ada0ec50
> RBP: 8801b44df790 R08: 8801c4efe780 R09: fbfff1141218
> R10: fbfff1141218 R11: 88a090c3 R12: 8801b186aa90
> R13: 8801ae61e000 R14: dc00 R15: 8801ae61e3e0
> FS:  7fa147982700() GS:8801db00() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2:  CR3: 0001d780d000 CR4: 001426f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
> Code: 8b a9 68 03 00 00 4c 8b b1 70 03 00 00 4c 8b b9 78 03 00 00 48 8b 89
> 08 03 00 00 75 05 0f 01 c2 eb 03 0f 01 c3 48 89 4c 24 08 59 <0f> 96 81 88 56
> 00 00 48 89 81 00 03 00 00 48 89 99 18 03 00 00
> RIP: vmx_vcpu_run+0x95f/0x25f0 arch/x86/kvm/vmx.c:9746 RSP: 8801c95bf368
> ---[ end trace ffd91ebc3bb06b01 ]---
> Kernel panic - not syncing: Fatal exception
> Shutting down cpus with NMI
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bugs+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/37b58a0569c49b70%40google.com.
>
> For more options, visit https://groups.google.com/d/optout.


Re: general protection fault in vmx_vcpu_run

2018-04-13 Thread syzbot

syzbot has found reproducer for the following crash on upstream commit
1bad9ce155a7c010a9a5f3261ad12a6a8eccfb2c (Fri Apr 13 19:27:11 2018 +)
Merge tag 'sh-for-4.17' of git://git.libc.org/linux-sh
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=cc483201a3c6436d3550


So far this crash happened 4 times on upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6257386297753600
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=4808329293463552
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=4943675322793984
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-5947642240294114534

compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+cc483201a3c6436d3...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed.

IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 6472 Comm: syzkaller667776 Not tainted 4.16.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

RIP: 0010:vmx_vcpu_run+0x95f/0x25f0 arch/x86/kvm/vmx.c:9746
RSP: 0018:8801c95bf368 EFLAGS: 00010002
RAX: 8801b44df6e8 RBX: 8801ada0ec40 RCX: 1100392b7e78
RDX:  RSI: 81467b15 RDI: 8801ada0ec50
RBP: 8801b44df790 R08: 8801c4efe780 R09: fbfff1141218
R10: fbfff1141218 R11: 88a090c3 R12: 8801b186aa90
R13: 8801ae61e000 R14: dc00 R15: 8801ae61e3e0
FS:  7fa147982700() GS:8801db00() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2:  CR3: 0001d780d000 CR4: 001426f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
Code: 8b a9 68 03 00 00 4c 8b b1 70 03 00 00 4c 8b b9 78 03 00 00 48 8b 89  
08 03 00 00 75 05 0f 01 c2 eb 03 0f 01 c3 48 89 4c 24 08 59 <0f> 96 81 88  
56 00 00 48 89 81 00 03 00 00 48 89 99 18 03 00 00

RIP: vmx_vcpu_run+0x95f/0x25f0 arch/x86/kvm/vmx.c:9746 RSP: 8801c95bf368
---[ end trace ffd91ebc3bb06b01 ]---
Kernel panic - not syncing: Fatal exception
Shutting down cpus with NMI
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..