Re: [PATCH 1/1] i386/tcg: Allow IRET from user mode to user mode for dotnet runtime

2024-06-16 Thread Robert Henry
I do not think I will have the time or focus to work on improving this
patch this summer, as I will retire in 2 weeks and need to make a clean
break to focus on other things (health, for one) for a while.

If anyone wants to put into place Richard's ideas, I will not be offended!

I do not see any of this chatter in this email thread on the bug report
https://gitlab.com/qemu-project/qemu/-/issues/249

Robert Henry

On Sat, Jun 15, 2024 at 4:25 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 6/11/24 09:20, Robert R. Henry wrote:
> > This fixes a bug wherein i386/tcg assumed an interrupt return using
> > the IRET instruction was always returning from kernel mode to either
> > kernel mode or user mode. This assumption is violated when IRET is used
> > as a clever way to restore thread state, as for example in the dotnet
> > runtime. There, IRET returns from user mode to user mode.
> >
> > This bug manifested itself as a page fault in the guest Linux kernel.
> >
> > This bug appears to have been in QEMU since the beginning.
> >
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/249
> > Signed-off-by: Robert R. Henry 
> > ---
> >   target/i386/tcg/seg_helper.c | 78 ++--
> >   1 file changed, 47 insertions(+), 31 deletions(-)
> >
> > diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c
> > index 715db1f232..815d26e61d 100644
> > --- a/target/i386/tcg/seg_helper.c
> > +++ b/target/i386/tcg/seg_helper.c
> > @@ -843,20 +843,35 @@ static void do_interrupt_protected(CPUX86State
> *env, int intno, int is_int,
> >
> >   #ifdef TARGET_X86_64
> >
> > -#define PUSHQ_RA(sp, val, ra)   \
> > -{   \
> > -sp -= 8;\
> > -cpu_stq_kernel_ra(env, sp, (val), ra);  \
> > -}
> > -
> > -#define POPQ_RA(sp, val, ra)\
> > -{   \
> > -val = cpu_ldq_kernel_ra(env, sp, ra);   \
> > -sp += 8;\
> > -}
> > +#define PUSHQ_RA(sp, val, ra, cpl, dpl) \
> > +  FUNC_PUSHQ_RA(env, , val, ra, cpl, dpl)
> > +
> > +static inline void FUNC_PUSHQ_RA(
> > +CPUX86State *env, target_ulong *sp,
> > +target_ulong val, target_ulong ra, int cpl, int dpl) {
> > +  *sp -= 8;
> > +  if (dpl == 0) {
> > +cpu_stq_kernel_ra(env, *sp, val, ra);
> > +  } else {
> > +cpu_stq_data_ra(env, *sp, val, ra);
> > +  }
> > +}
>
> This doesn't seem quite right.
>
> I would be much happier if we were to resolve the proper mmu index
> earlier, once, rather
> than within each call to cpu_{ld,st}*_{kernel,data}_ra.  With the mmu
> index in hand, use
> cpu_{ld,st}*_mmuidx_ra instead.
>
> I believe you will want to factor out a subroutine of x86_cpu_mmu_index
> which passes in
> the pl, rather than reading cpl from env->hflags.  This will also allow
> cpu_mmu_index_kernel to be eliminated or simplified, which is written to
> assume pl=0.
>
>
> r~
>


Re: [EXTERNAL] Plugins Not Reporting AArch64 SVE Memory Operations

2022-03-24 Thread Robert Henry
I have not done anything on this front, alas.

From: Aaron Lindsay 
Sent: Thursday, March 24, 2022 1:17 PM
To: qemu-devel@nongnu.org ; qemu-...@nongnu.org 

Cc: Alex Bennée ; richard.hender...@linaro.org 
; Robert Henry 
Subject: [EXTERNAL] Plugins Not Reporting AArch64 SVE Memory Operations

Hi folks,

I see there has been some previous discussion [1] about 1.5 years ago
around the fact that AArch64 SVE instructions do not emit any memory
operations via the plugin interface, as one might expect them to.

I am interested in being able to more accurately trace the memory
operations of SVE instructions using the plugin interface - has there
been any further discussion or work on this topic off-list (or that
escaped my searching)?

In the previous discussion [1], Richard raised some interesting
questions:

> The plugin interface needs extension for this.  How should I signal that 256
> consecutive byte loads have occurred?  How should I signal that the 
> controlling
> predicate was not all true, so only 250 of those 256 were actually active?  
> How
> should I signal 59 non-consecutive (gather) loads have occurred?
>
> If the answer is simply that you want 256 or 250 or 59 plugin callbacks
> respectively, then we might be able to force the memory operations into the
> slow path, and hook the operation there.  As if it were an i/o operation.

My initial reaction is that simply sending individual callbacks for each
access (only the ones which were active, in the case of predication)
seems to fit reasonably well with the existing plugin interface. For
instance, I think we already receive two callbacks for each AArch64
`LDP` instruction, right?

If this is an agreeable solution that wouldn't take too much effort to
implement (and no one else is doing it), would someone mind pointing me
in the right direction to get started?

Thanks!

-Aaron

[1] 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nongnu.org%2Farchive%2Fhtml%2Fqemu-discuss%2F2020-12%2Fmsg00015.htmldata=04%7C01%7Crobhenry%40microsoft.com%7C4fbf9f5adeca457a475e08da0dd35dc4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637837498833440416%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=bTHMSkvOvvX7o7seFIbf7gDk5V%2BMBhC6YytOorHbNts%3Dreserved=0


Re: [EXTERNAL] Re: Range of vcpu_index to plugin callbacks

2021-09-20 Thread Robert Henry
Yes, the value of the cpu_index seems to track the number of clone() syscalls.  
(I did strace the process, but forgot to look for clone() syscalls... I still 
have vfork()/execve() on the brain.)

From: Qemu-discuss  on 
behalf of Philippe Mathieu-Daudé 
Sent: Sunday, September 19, 2021 10:54 AM
To: rrh.henry ; qemu-disc...@nongnu.org 

Cc: Alex Bennée ; qemu-devel 
Subject: [EXTERNAL] Re: Range of vcpu_index to plugin callbacks

(Cc'ing qemu-devel@ mailing list since this is a development question).

On 9/19/21 19:44, Robert Henry wrote:
> What is the range of the values for vcpu_index given to callbacks, such as:
>
> typedef void (*qemu_plugin_vcpu_udata_cb_t)(unsigned int vcpu_index,
> void *userdata);
>
> Empirically, when QEMU is in system mode, the maximum vcpu_index is 1
> less than the -smp cpus=$(NCPUS) value.
>
> Empirically, when QEMU is in user mode, the values for vcpu_index slowly
> increase without an apparent upper bound known statically (or when the
> plugin is loaded?).

Isn't it related to clone() calls? I'd expect new threads use
a new vCPU, incrementing vcpu_index. But that is just a guess
without having looked at the code to corroborate...

Regards,

Phil.



Re: [EXTERNAL] Re: ia-32/ia-64 fxsave64 instruction behavior when saving mmx

2020-07-31 Thread Robert Henry
On 7/31/20 1:34 PM, Eduardo Habkost wrote:
> On Mon, Jun 01, 2020 at 08:19:51AM +0200, Philippe Mathieu-Daudé wrote:
>> Hi Robert.
>>
>> Top-posting is difficult to read on technical lists,
>> it's better to reply inline.
>>
>> Cc'ing the X86 FPU maintainers:
>>
>> ./scripts/get_maintainer.pl -f target/i386/fpu_helper.c
>> Paolo Bonzini  (maintainer:X86 TCG CPUs)
>> Richard Henderson  (maintainer:X86 TCG CPUs)
>> Eduardo Habkost  (maintainer:X86 TCG CPUs)
>>
>> On 6/1/20 1:22 AM, Robert Henry wrote:
>>> Here's additional information.
>>>
>>> All of the remill tests of the legacy MMX instructions fail. These
>>> instructions work on 64-bit registers aliased with the lower 64-bits of
>>> the x87 fp80 registers. Ã, The tests fail because remill expects the
>>> fxsave64 instruction to deliver 16 bits of 1's (infinity or nan prefix)
>>> in the fp80 exponent, eg bits 79:64. Ã, Metal does this, but QEMU does not.
>> Metal is what matters, QEMU should emulate it when possible.
>>
>>> Reading of Intel Software development manual, table 3.44
>>> (https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.felixcloutier.com%2Fx86%2FFXSAVE.html%23tbl-3-44data=02%7C01%7Crobhenry%40microsoft.com%7C56b85c84b7234f16f07c08d8359125bd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637318245390249551sdata=8CFUNe%2F%2BbLyukyV6BmmTBFyoV%2B3pTFh%2Bg5QAs3ccLbc%3Dreserved=0)
>>>  says these 16
>>> bits are reserved, but another version of the manual
>>> (https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmath-atlas.sourceforge.net%2Fdevel%2Farch%2Fia32_arch.pdfdata=02%7C01%7Crobhenry%40microsoft.com%7C56b85c84b7234f16f07c08d8359125bd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637318245390249551sdata=2SiKu1cx4SVhwzzSbj7hgz%2B8ICCZHDnu0npUs9yMJLA%3Dreserved=0)
>>>  section
>>> 9.6.2 "Transitions between x87 fpu and mmx code" says a write to an MMX
>>> register sets those 16 bits to all 1s.
>> You are [1] here answering [2] you asked below.
>>
>>> In digging through the code for the implementation of the SSE/mmx
>>> instruction pavgb I see a nice clean implementation in the SSE_HELPER_B
>>> macro which takes a MMXREG which is an MMREG_UNION which does not
>>> provide, to the extent that I can figure this out, a handle to bits
>>> 79:64 of the aliased-with x87 register.
>>>
>>> I find it hard to believe that an apparent bug like this has been here
>>> "forever". Am I missing something?
>> Likely the developer who implemented this code didn't have all the
>> information you found, nor the test-suite, and eventually not even the
>> hardware to compare.
>>
>> Since you have a good understanding of Intel FPU and hardware to
>> compare, do you mind sending a patch to have QEMU emulate the correct
>> hardware behavior?
>>
>> If possible add a test case to tests/tcg/i386/test-i386.c (see
>> test_fxsave there).
> Was this issue addressed, or does it remain unfixed?  I remember
> seeing x86 FPU patches merged recently, but I don't know if they
> were related to this.
>
I haven't done anything to address this issue.
>>> Robert Henry
>>> 
>>> *From:* Robert Henry
>>> *Sent:* Friday, May 29, 2020 10:38 AM
>>> *To:* qemu-devel@nongnu.org 
>>> *Subject:* ia-32/ia-64 fxsave64 instruction behavior when saving mmx
>>> Ã,Â
>>> Background: The ia-32/ia-64 fxsave64 instruction saves fp80 or legacy
>>> SSE mmx registers. The mmx registers are saved as if they were fp80
>>> values. The lower 64 bits of the constructed fp80 value is the mmx
>>> register.Ã,  The upper 16 bits of the constructed fp80 value are reserved;
>>> see the last row of table 3-44
>>> ofÃ, 
>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.felixcloutier.com%2Fx86%2Ffxsave%23tbl-3-44data=02%7C01%7Crobhenry%40microsoft.com%7C56b85c84b7234f16f07c08d8359125bd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637318245390249551sdata=0CBE%2Btnm2b%2FJu9FjNHHjuh5vrYJ2MTfkitMApxXRSZQ%3Dreserved=0
>>>
>>> The Intel core i9-9980XE Skylake metal I have puts 0x into these
>>> reserved 16 bits when saving MMX.
>>>
>>> QEMU appears to put 0's there.
>>>
>>> Does anybody have insight as to what "reserved" really means, or must
>>> be, in this case?
>> You self-answered to this [2] in [1] earlier.
>>
>>> I take the verb "reserved" to mean something other
>>> than "undefined".
>>>
>>> I came across this issue when running the remill instruction test
>>> engine.Ã,  See my
>>> issueÃ, 
>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flifting-bits%2Fremill%2Fissues%2F423%25C3%2583data=02%7C01%7Crobhenry%40microsoft.com%7C56b85c84b7234f16f07c08d8359125bd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637318245390249551sdata=BjW7gZplqoUlKdpOT7dRnCvlzTrC4Vgpy%2BFf8bNpT0k%3Dreserved=0,Â
>>>  For better or
>>> worse, remill assumes that those bits are 0x, not 0x
>>>
>> Regards,
>>
>> Phil.
>>



Failure of test 'basic gdbstub support'

2020-06-10 Thread Robert Henry
The newish test 'basic gdbstub support' fails for me on an out-of-the-box  
build on a host x86_64.  (See below for the config.log head.)

Is this failure expected?  If so, where can I see that in the various CI 
engines you have running them?

In digging through the test driver python code in 
tests/tcg/multiarch/gdbstub/sha1.py I see that the test assumes that a 
breakpoint on the function SHA1Init is a breakpoint at the 1st assignment 
statement; the 1st next executes the 1st assignment statement, etc.

This is a very fragile assumption.  It depends on the compiler used to compile 
sha1.c; it depends on the optimization level; it depends on the accuracy of the 
pc mapping in the debug info; it depends on gdb.

Better would be to change SHA1Init to do its work, and then call another 
non-inlined function taking a context pointer, and then examine 
context->state[0] and context->state[1].

Thanks in advance

TESTbasic gdbstub support
make[2]: *** 
[/mnt/robhenry/qemu_robhenry_amd64/qemu/tests/tcg/multiarch/Makefile.target:51: 
run-gdbstub-sha1] Error 2


 QEMU configure log Tue 09 Jun 2020 02:45:06 PM PDT
# Configured with: '../configure' '--disable-sdl' '--enable-gtk' 
'--extra-ldflags=-L/usr/lib' '--enable-plugins' '--target-list=x86_64-softmmu 
x86_64-linux-user'



Re: ia-32/ia-64 fxsave64 instruction behavior when saving mmx

2020-05-31 Thread Robert Henry
Here's additional information.

All of the remill tests of the legacy MMX instructions fail. These instructions 
work on 64-bit registers aliased with the lower 64-bits of the x87 fp80 
registers.  The tests fail because remill expects the fxsave64 instruction to 
deliver 16 bits of 1's (infinity or nan prefix) in the fp80 exponent, eg bits 
79:64.  Metal does this, but QEMU does not.

Reading of Intel Software development manual, table 3.44 
(https://www.felixcloutier.com/x86/FXSAVE.html#tbl-3-44) says these 16 bits are 
reserved, but another version of the manual 
(http://math-atlas.sourceforge.net/devel/arch/ia32_arch.pdf) section 9.6.2 
"Transitions between x87 fpu and mmx code" says a write to an MMX register sets 
those 16 bits to all 1s.

In digging through the code for the implementation of the SSE/mmx instruction 
pavgb I see a nice clean implementation in the SSE_HELPER_B macro which takes a 
MMXREG which is an MMREG_UNION which does not provide, to the extent that I can 
figure this out, a handle to bits 79:64 of the aliased-with x87 register.

I find it hard to believe that an apparent bug like this has been here 
"forever". Am I missing something?

Robert Henry
________
From: Robert Henry
Sent: Friday, May 29, 2020 10:38 AM
To: qemu-devel@nongnu.org 
Subject: ia-32/ia-64 fxsave64 instruction behavior when saving mmx

Background: The ia-32/ia-64 fxsave64 instruction saves fp80 or legacy SSE mmx 
registers. The mmx registers are saved as if they were fp80 values. The lower 
64 bits of the constructed fp80 value is the mmx register.  The upper 16 bits 
of the constructed fp80 value are reserved; see the last row of table 3-44 of 
https://www.felixcloutier.com/x86/fxsave#tbl-3-44

The Intel core i9-9980XE Skylake metal I have puts 0x into these reserved 
16 bits when saving MMX.

QEMU appears to put 0's there.

Does anybody have insight as to what "reserved" really means, or must be, in 
this case?  I take the verb "reserved" to mean something other than "undefined".

I came across this issue when running the remill instruction test engine.  See 
my issue https://github.com/lifting-bits/remill/issues/423 For better or worse, 
remill assumes that those bits are 0x, not 0x



ia-32/ia-64 fxsave64 instruction behavior when saving mmx

2020-05-29 Thread Robert Henry
Background: The ia-32/ia-64 fxsave64 instruction saves fp80 or legacy SSE mmx 
registers. The mmx registers are saved as if they were fp80 values. The lower 
64 bits of the constructed fp80 value is the mmx register.  The upper 16 bits 
of the constructed fp80 value are reserved; see the last row of table 3-44 of 
https://www.felixcloutier.com/x86/fxsave#tbl-3-44

The Intel core i9-9980XE Skylake metal I have puts 0x into these reserved 
16 bits when saving MMX.

QEMU appears to put 0's there.

Does anybody have insight as to what "reserved" really means, or must be, in 
this case?  I take the verb "reserved" to mean something other than "undefined".

I came across this issue when running the remill instruction test engine.  See 
my issue https://github.com/lifting-bits/remill/issues/423 For better or worse, 
remill assumes that those bits are 0x, not 0x



qemu plugin exposure of register addresses

2020-04-02 Thread Robert Henry
There is now a qemu plugin interface function qemu_plugin_register_vcpu_mem_cb 
which registers a plugin-side callback. This callback is later invoked at the 
start of each emulated instruction, and it receives information about memory 
addresses and read/write indicators.

I'm wondering how hard it is to add a similar callback to expose register 
addresses and read/write indicators.  For example, executing `add r3, r1, $1` 
would generate two callbacks, one {write r3} and the other {read r1}. I'd like 
this for all kinds of registers such as simd regs, and, gulp, flags registers.

With this information ISA simulators could examine the data flow graph and 
register dependencies.

I'm not asking for register contents; we don't get memory contents either!

I gather there is some concern about exposing too much functionality to the 
plugin API, as a plugin might then be used to subvert some aspects of the GPL.  
I don't understand the details of this concern, nor know where the "line in the 
sand" is.

Robert Henry


[Bug 1866892] Re: guest OS catches a page fault bug when running dotnet

2020-03-25 Thread Robert Henry
The change should only be dynamically visible when doing an iretq from
and to the same protection level, AFAICT. The code clearly[sic] works
now for the interrupt return that is used by the linux kernel,
presumably {from=kernel, to=kernel} or {from=kernel, to=user}.  I would
claim that to make this code correct it needs to work for all 4*4 state
changes, as windows (notoriously) uses all 4 protection levels.  I'm
still digging, but your suggestion is certainly on the path forward.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866892

Title:
  guest OS catches a page  fault bug when running dotnet

Status in QEMU:
  New

Bug description:
  The linux guest OS catches a page fault bug when running the dotnet
  application.

  host = metal = x86_64
  host OS = ubuntu 19.10
  qemu emulation, without KVM, with "tiny code generator" tcg; no plugins; 
built from head/master
  guest emulation = x86_64
  guest OS = ubuntu 19.10
  guest app = dotnet, running any program

  qemu sha=7bc4d1980f95387c4cc921d7a066217ff4e42b70 (head/master Mar 10,
  2020)

  qemu invocation is:

  qemu/build/x86_64-softmmu/qemu-system-x86_64 \
-m size=4096 \
-smp cpus=1 \
-machine type=pc-i440fx-5.0,accel=tcg \
-cpu Skylake-Server-v1 \
-nographic \
-bios OVMF-pure-efi.fd \
-drive if=none,id=hd0,file=ubuntu-19.10-server-cloudimg-amd64.img \
-device virtio-blk,drive=hd0 \
-drive if=none,id=cloud,file=linux_cloud_config.img \
-device virtio-blk,drive=cloud \
-netdev user,id=user0,hostfwd=tcp::2223-:22 \
-device virtio-net,netdev=user0

  
  Here's the guest kernel console output:

  
  [ 2834.005449] BUG: unable to handle page fault for address: 7fffc2c0
  [ 2834.009895] #PF: supervisor read access in user mode
  [ 2834.013872] #PF: error_code(0x0001) - permissions violation
  [ 2834.018025] IDT: 0xfe00 (limit=0xfff) GDT: 0xfe001000 
(limit=0x7f)
  [ 2834.022242] LDTR: NULL
  [ 2834.026306] TR: 0x40 -- base=0xfe003000 limit=0x206f
  [ 2834.030395] PGD 8000360d0067 P4D 8000360d0067 PUD 36105067 PMD 
36193067 PTE 800076d8e867
  [ 2834.038672] Oops: 0001 [#4] SMP PTI
  [ 2834.042707] CPU: 0 PID: 13537 Comm: dotnet Tainted: G  D   
5.3.0-29-generic #31-Ubuntu
  [ 2834.050591] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
0.0.0 02/06/2015
  [ 2834.054785] RIP: 0033:0x147eaeda
  [ 2834.059017] Code: d0 00 00 00 4c 8b a7 d8 00 00 00 4c 8b af e0 00 00 00 4c 
8b b7 e8 00 00 00 4c 8b bf f0 00 00 00 48 8b bf b0 00 00 00 9d 74 02 <48> cf 48 
8d 64 24 30 5d c3 90 cc c3 66 90 55 4c 8b a7 d8 00 00 00
  [ 2834.072103] RSP: 002b:7fffc2c0 EFLAGS: 0202
  [ 2834.076507] RAX:  RBX: 1554b401af38 RCX: 
0001
  [ 2834.080832] RDX:  RSI:  RDI: 
7fffcfb0
  [ 2834.085010] RBP: 7fffd730 R08:  R09: 
7fffd1b0
  [ 2834.089184] R10: 15331dd5 R11: 153ad8d0 R12: 
0002
  [ 2834.093350] R13: 0001 R14: 0001 R15: 
1554b401d388
  [ 2834.097309] FS:  14fa5740 GS:  
  [ 2834.101131] Modules linked in: isofs nls_iso8859_1 dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev input_leds serio_raw parport_pc 
parport sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper 
virtio_net psmouse net_failover failover virtio_blk floppy
  [ 2834.122539] CR2: 7fffc2c0
  [ 2834.126867] ---[ end trace dfae51f1d9432708 ]---
  [ 2834.131239] RIP: 0033:0x14d793262eda
  [ 2834.135715] Code: Bad RIP value.
  [ 2834.140243] RSP: 002b:7ffddb4e2980 EFLAGS: 0202
  [ 2834.144615] RAX:  RBX: 14d6f402acb8 RCX: 
0002
  [ 2834.148943] RDX: 01cd6950 RSI:  RDI: 
7ffddb4e3670
  [ 2834.153335] RBP: 7ffddb4e3df0 R08: 0001 R09: 
7ffddb4e3870
  [ 2834.157774] R10: 14d793da9dd5 R11: 14d793e258d0 R12: 
0002
  [ 2834.162132] R13: 0001 R14: 0001 R15: 
14d6f402d040
  [ 2834.166239] FS:  14fa5740() GS:97213ba0() 
knlGS:
  [ 2834.170529] CS:  0033 DS:  ES:  CR0: 80050033
  [ 2834.174751] CR2: 14d793262eb0 CR3: 3613 CR4: 
007406f0
  [ 2834.178892] PKRU: 5554

  I run the application from a shell with `ulimit -s unlimited`
  (unlimited stack to size).

  The application creates a number of threads, and those threads make a
  lot of calls to sigaltstack() and mprotect(); see the relevant source
  for dotnet here
  

[Bug 1866892] Re: guest OS catches a page fault bug when running dotnet

2020-03-24 Thread Robert Henry
Peter: I think your intuition is right.  The POPQ_RA (pop quad, passing
through return address handle) is only called from helper_ret_protected,
and it suspiciously calls cpu_ldq_kernel_ra which calls
cpu_mmu_index_kernel which only is prepared for kernel space iretq (and
of course the substring _kernel in the function name tells us that too).

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866892

Title:
  guest OS catches a page  fault bug when running dotnet

Status in QEMU:
  New

Bug description:
  The linux guest OS catches a page fault bug when running the dotnet
  application.

  host = metal = x86_64
  host OS = ubuntu 19.10
  qemu emulation, without KVM, with "tiny code generator" tcg; no plugins; 
built from head/master
  guest emulation = x86_64
  guest OS = ubuntu 19.10
  guest app = dotnet, running any program

  qemu sha=7bc4d1980f95387c4cc921d7a066217ff4e42b70 (head/master Mar 10,
  2020)

  qemu invocation is:

  qemu/build/x86_64-softmmu/qemu-system-x86_64 \
-m size=4096 \
-smp cpus=1 \
-machine type=pc-i440fx-5.0,accel=tcg \
-cpu Skylake-Server-v1 \
-nographic \
-bios OVMF-pure-efi.fd \
-drive if=none,id=hd0,file=ubuntu-19.10-server-cloudimg-amd64.img \
-device virtio-blk,drive=hd0 \
-drive if=none,id=cloud,file=linux_cloud_config.img \
-device virtio-blk,drive=cloud \
-netdev user,id=user0,hostfwd=tcp::2223-:22 \
-device virtio-net,netdev=user0

  
  Here's the guest kernel console output:

  
  [ 2834.005449] BUG: unable to handle page fault for address: 7fffc2c0
  [ 2834.009895] #PF: supervisor read access in user mode
  [ 2834.013872] #PF: error_code(0x0001) - permissions violation
  [ 2834.018025] IDT: 0xfe00 (limit=0xfff) GDT: 0xfe001000 
(limit=0x7f)
  [ 2834.022242] LDTR: NULL
  [ 2834.026306] TR: 0x40 -- base=0xfe003000 limit=0x206f
  [ 2834.030395] PGD 8000360d0067 P4D 8000360d0067 PUD 36105067 PMD 
36193067 PTE 800076d8e867
  [ 2834.038672] Oops: 0001 [#4] SMP PTI
  [ 2834.042707] CPU: 0 PID: 13537 Comm: dotnet Tainted: G  D   
5.3.0-29-generic #31-Ubuntu
  [ 2834.050591] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
0.0.0 02/06/2015
  [ 2834.054785] RIP: 0033:0x147eaeda
  [ 2834.059017] Code: d0 00 00 00 4c 8b a7 d8 00 00 00 4c 8b af e0 00 00 00 4c 
8b b7 e8 00 00 00 4c 8b bf f0 00 00 00 48 8b bf b0 00 00 00 9d 74 02 <48> cf 48 
8d 64 24 30 5d c3 90 cc c3 66 90 55 4c 8b a7 d8 00 00 00
  [ 2834.072103] RSP: 002b:7fffc2c0 EFLAGS: 0202
  [ 2834.076507] RAX:  RBX: 1554b401af38 RCX: 
0001
  [ 2834.080832] RDX:  RSI:  RDI: 
7fffcfb0
  [ 2834.085010] RBP: 7fffd730 R08:  R09: 
7fffd1b0
  [ 2834.089184] R10: 15331dd5 R11: 153ad8d0 R12: 
0002
  [ 2834.093350] R13: 0001 R14: 0001 R15: 
1554b401d388
  [ 2834.097309] FS:  14fa5740 GS:  
  [ 2834.101131] Modules linked in: isofs nls_iso8859_1 dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev input_leds serio_raw parport_pc 
parport sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper 
virtio_net psmouse net_failover failover virtio_blk floppy
  [ 2834.122539] CR2: 7fffc2c0
  [ 2834.126867] ---[ end trace dfae51f1d9432708 ]---
  [ 2834.131239] RIP: 0033:0x14d793262eda
  [ 2834.135715] Code: Bad RIP value.
  [ 2834.140243] RSP: 002b:7ffddb4e2980 EFLAGS: 0202
  [ 2834.144615] RAX:  RBX: 14d6f402acb8 RCX: 
0002
  [ 2834.148943] RDX: 01cd6950 RSI:  RDI: 
7ffddb4e3670
  [ 2834.153335] RBP: 7ffddb4e3df0 R08: 0001 R09: 
7ffddb4e3870
  [ 2834.157774] R10: 14d793da9dd5 R11: 14d793e258d0 R12: 
0002
  [ 2834.162132] R13: 0001 R14: 0001 R15: 
14d6f402d040
  [ 2834.166239] FS:  14fa5740() GS:97213ba0() 
knlGS:
  [ 2834.170529] CS:  0033 DS:  ES:  CR0: 80050033
  [ 2834.174751] CR2: 14d793262eb0 CR3: 3613 CR4: 
007406f0
  [ 2834.178892] PKRU: 5554

  I run the application from a shell with `ulimit -s unlimited`
  (unlimited stack to size).

  The application creates a number of threads, and those threads make a
  lot of calls to sigaltstack() and mprotect(); see the relevant source
  for dotnet here
  
https://github.com/dotnet/runtime/blob/15ec69e47b4dc56098e6058a11ccb6ae4d5d4fa1/src/coreclr/src/pal/src/thread/thread.cpp#L2467

  using strace -f on the app shows that no 

[Bug 1866892] Re: guest OS catches a page fault bug when running dotnet

2020-03-24 Thread Robert Henry
I've stepped/nexted from the helper_iret_protected, going deep into the
bowels of the TLB, MMU and page table engine.  None of which I
understand. The helper_ret_protected faults in the first POPQ_RA.  I'll
investigate the value of sp at the time of the POPQ_RA.

Here's the POPQ_RA in i386/seg_helper.c:2140

sp = env->regs[R_ESP];
ssp = env->segs[R_SS].base;
new_eflags = 0; /* avoid warning */
#ifdef TARGET_X86_64
if (shift == 2) {
POPQ_RA(sp, new_eip, retaddr);
POPQ_RA(sp, new_cs, retaddr);
new_cs &= 0x;
if (is_iret) {
POPQ_RA(sp, new_eflags, retaddr);
}

and here's the stack.  Note some of the logical intermediate frames are
optimized out due to -O3 and inline. (the value of env-errorcode is 1)

0  0x55a370c0 in raise_interrupt2
(env=env@entry=0x566ef200, intno=14, is_int=is_int@entry=0, 
error_code=1, next_eip_addend=next_eip_addend@entry=0, 
retaddr=retaddr@entry=140736367565663) at 
/mnt/robhenry/qemu_robhenry_amd64/qemu/include/exec/cpu-all.h:426
#1  0x55a377f9 in raise_exception_err_ra
(env=env@entry=0x566ef200, exception_index=, 
error_code=, retaddr=retaddr@entry=140736367565663) at 
/mnt/robhenry/qemu_robhenry_amd64/qemu/target/i386/excp_helper.c:127
#2  0x55a37d69 in x86_cpu_tlb_fill
(cs=0x566e69a0, addr=140727872411616, size=, 
access_type=MMU_DATA_LOAD, mmu_idx=0, probe=, 
retaddr=140736367565663) at 
/mnt/robhenry/qemu_robhenry_amd64/qemu/target/i386/excp_helper.c:697
#3  0x55952295 in tlb_fill
(cpu=0x566e69a0, addr=140727872411616, size=8, 
access_type=MMU_DATA_LOAD, mmu_idx=0, retaddr=140736367565663)
at /mnt/robhenry/qemu_robhenry_amd64/qemu/accel/tcg/cputlb.c:1017
#4  0x55956320 in load_helper
(full_load=0x55956140 , code_read=false, op=MO_64, 
retaddr=93825010692608, oi=48, addr=140727872411616, env=0x566ef200) at 
/mnt/robhenry/qemu_robhenry_amd64/qemu/include/exec/cpu-all.h:426
#5  0x55956320 in helper_le_ldq_mmu
(env=env@entry=0x566ef200, addr=addr@entry=140727872411616, 
oi=oi@entry=48, retaddr=retaddr@entry=140736367565663)
at /mnt/robhenry/qemu_robhenry_amd64/qemu/accel/tcg/cputlb.c:1688
#6  0x55956dc0 in cpu_load_helper
(full_load=0x55956140 , op=MO_64, 
retaddr=140736367565663, mmu_idx=, addr=140727872411616, 
env=0x566ef200) at 
/mnt/robhenry/qemu_robhenry_amd64/qemu/accel/tcg/cputlb.c:1752
#7  0x55956dc0 in cpu_ldq_mmuidx_ra
(env=env@entry=0x566ef200, addr=addr@entry=140727872411616, 
mmu_idx=, ra=ra@entry=140736367565663)
--Type  for more, q to quit, c to continue without paging--
at /mnt/robhenry/qemu_robhenry_amd64/qemu/accel/tcg/cputlb.c:1799
#8  0x55a4ff09 in helper_ret_protected
(env=env@entry=0x566ef200, shift=shift@entry=2, 
is_iret=is_iret@entry=1, addend=addend@entry=0, retaddr=140736367565663)
at /mnt/robhenry/qemu_robhenry_amd64/qemu/target/i386/seg_helper.c:2140
#9  0x55a50ff5 in helper_iret_protected (env=0x566ef200, shift=2, 
next_eip=-999377888)
at /mnt/robhenry/qemu_robhenry_amd64/qemu/target/i386/seg_helper.c:2363
#10 0x7fffbd321b5f in code_gen_buffer ()

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866892

Title:
  guest OS catches a page  fault bug when running dotnet

Status in QEMU:
  New

Bug description:
  The linux guest OS catches a page fault bug when running the dotnet
  application.

  host = metal = x86_64
  host OS = ubuntu 19.10
  qemu emulation, without KVM, with "tiny code generator" tcg; no plugins; 
built from head/master
  guest emulation = x86_64
  guest OS = ubuntu 19.10
  guest app = dotnet, running any program

  qemu sha=7bc4d1980f95387c4cc921d7a066217ff4e42b70 (head/master Mar 10,
  2020)

  qemu invocation is:

  qemu/build/x86_64-softmmu/qemu-system-x86_64 \
-m size=4096 \
-smp cpus=1 \
-machine type=pc-i440fx-5.0,accel=tcg \
-cpu Skylake-Server-v1 \
-nographic \
-bios OVMF-pure-efi.fd \
-drive if=none,id=hd0,file=ubuntu-19.10-server-cloudimg-amd64.img \
-device virtio-blk,drive=hd0 \
-drive if=none,id=cloud,file=linux_cloud_config.img \
-device virtio-blk,drive=cloud \
-netdev user,id=user0,hostfwd=tcp::2223-:22 \
-device virtio-net,netdev=user0

  
  Here's the guest kernel console output:

  
  [ 2834.005449] BUG: unable to handle page fault for address: 7fffc2c0
  [ 2834.009895] #PF: supervisor read access in user mode
  [ 2834.013872] #PF: error_code(0x0001) - permissions violation
  [ 2834.018025] IDT: 0xfe00 (limit=0xfff) GDT: 0xfe001000 
(limit=0x7f)
  [ 2834.022242] LDTR: NULL
  [ 2834.026306] TR: 0x40 -- base=0xfe003000 limit=0x206f
  [ 2834.030395] PGD 8000360d0067 P4D 8000360d0067 PUD 36105067 PMD 
36193067 PTE 800076d8e867
  [ 2834.038672] Oops: 0001 [#4] SMP PTI
  [ 

[Bug 1866892] Re: guest OS catches a page fault bug when running dotnet

2020-03-19 Thread Robert Henry
yes, it is intentional.  I don't yet understand why, but am talking to
those who do.
https://github.com/dotnet/runtime/blob/1b02665be501b695b9c22c1ebd69148c07a225f6/src/coreclr/src/pal/src/arch/amd64/context2.S#L183

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866892

Title:
  guest OS catches a page  fault bug when running dotnet

Status in QEMU:
  New

Bug description:
  The linux guest OS catches a page fault bug when running the dotnet
  application.

  host = metal = x86_64
  host OS = ubuntu 19.10
  qemu emulation, without KVM, with "tiny code generator" tcg; no plugins; 
built from head/master
  guest emulation = x86_64
  guest OS = ubuntu 19.10
  guest app = dotnet, running any program

  qemu sha=7bc4d1980f95387c4cc921d7a066217ff4e42b70 (head/master Mar 10,
  2020)

  qemu invocation is:

  qemu/build/x86_64-softmmu/qemu-system-x86_64 \
-m size=4096 \
-smp cpus=1 \
-machine type=pc-i440fx-5.0,accel=tcg \
-cpu Skylake-Server-v1 \
-nographic \
-bios OVMF-pure-efi.fd \
-drive if=none,id=hd0,file=ubuntu-19.10-server-cloudimg-amd64.img \
-device virtio-blk,drive=hd0 \
-drive if=none,id=cloud,file=linux_cloud_config.img \
-device virtio-blk,drive=cloud \
-netdev user,id=user0,hostfwd=tcp::2223-:22 \
-device virtio-net,netdev=user0

  
  Here's the guest kernel console output:

  
  [ 2834.005449] BUG: unable to handle page fault for address: 7fffc2c0
  [ 2834.009895] #PF: supervisor read access in user mode
  [ 2834.013872] #PF: error_code(0x0001) - permissions violation
  [ 2834.018025] IDT: 0xfe00 (limit=0xfff) GDT: 0xfe001000 
(limit=0x7f)
  [ 2834.022242] LDTR: NULL
  [ 2834.026306] TR: 0x40 -- base=0xfe003000 limit=0x206f
  [ 2834.030395] PGD 8000360d0067 P4D 8000360d0067 PUD 36105067 PMD 
36193067 PTE 800076d8e867
  [ 2834.038672] Oops: 0001 [#4] SMP PTI
  [ 2834.042707] CPU: 0 PID: 13537 Comm: dotnet Tainted: G  D   
5.3.0-29-generic #31-Ubuntu
  [ 2834.050591] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
0.0.0 02/06/2015
  [ 2834.054785] RIP: 0033:0x147eaeda
  [ 2834.059017] Code: d0 00 00 00 4c 8b a7 d8 00 00 00 4c 8b af e0 00 00 00 4c 
8b b7 e8 00 00 00 4c 8b bf f0 00 00 00 48 8b bf b0 00 00 00 9d 74 02 <48> cf 48 
8d 64 24 30 5d c3 90 cc c3 66 90 55 4c 8b a7 d8 00 00 00
  [ 2834.072103] RSP: 002b:7fffc2c0 EFLAGS: 0202
  [ 2834.076507] RAX:  RBX: 1554b401af38 RCX: 
0001
  [ 2834.080832] RDX:  RSI:  RDI: 
7fffcfb0
  [ 2834.085010] RBP: 7fffd730 R08:  R09: 
7fffd1b0
  [ 2834.089184] R10: 15331dd5 R11: 153ad8d0 R12: 
0002
  [ 2834.093350] R13: 0001 R14: 0001 R15: 
1554b401d388
  [ 2834.097309] FS:  14fa5740 GS:  
  [ 2834.101131] Modules linked in: isofs nls_iso8859_1 dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev input_leds serio_raw parport_pc 
parport sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper 
virtio_net psmouse net_failover failover virtio_blk floppy
  [ 2834.122539] CR2: 7fffc2c0
  [ 2834.126867] ---[ end trace dfae51f1d9432708 ]---
  [ 2834.131239] RIP: 0033:0x14d793262eda
  [ 2834.135715] Code: Bad RIP value.
  [ 2834.140243] RSP: 002b:7ffddb4e2980 EFLAGS: 0202
  [ 2834.144615] RAX:  RBX: 14d6f402acb8 RCX: 
0002
  [ 2834.148943] RDX: 01cd6950 RSI:  RDI: 
7ffddb4e3670
  [ 2834.153335] RBP: 7ffddb4e3df0 R08: 0001 R09: 
7ffddb4e3870
  [ 2834.157774] R10: 14d793da9dd5 R11: 14d793e258d0 R12: 
0002
  [ 2834.162132] R13: 0001 R14: 0001 R15: 
14d6f402d040
  [ 2834.166239] FS:  14fa5740() GS:97213ba0() 
knlGS:
  [ 2834.170529] CS:  0033 DS:  ES:  CR0: 80050033
  [ 2834.174751] CR2: 14d793262eb0 CR3: 3613 CR4: 
007406f0
  [ 2834.178892] PKRU: 5554

  I run the application from a shell with `ulimit -s unlimited`
  (unlimited stack to size).

  The application creates a number of threads, and those threads make a
  lot of calls to sigaltstack() and mprotect(); see the relevant source
  for dotnet here
  
https://github.com/dotnet/runtime/blob/15ec69e47b4dc56098e6058a11ccb6ae4d5d4fa1/src/coreclr/src/pal/src/thread/thread.cpp#L2467

  using strace -f on the app shows that no alt stacks come anywhere near
  the failing address; all alt stacks are in the heap, as expected.
  None of the mmap/mprotect/munmap 

[Bug 1866892] Re: guest OS catches a page fault bug when running dotnet

2020-03-19 Thread Robert Henry
I have confirmed that the dotnet guest application is executing a
"iretq" instruction when this guest kernel bug is hit. A first round of
analysis shows nothing unreasonable at the point the iretq is executed.
The $rsp points into the middle of a mapped in page, the returned-to
$rip looks reasonable, etc. We continue our analysis of qemu and the
dotnet runtime.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866892

Title:
  guest OS catches a page  fault bug when running dotnet

Status in QEMU:
  New

Bug description:
  The linux guest OS catches a page fault bug when running the dotnet
  application.

  host = metal = x86_64
  host OS = ubuntu 19.10
  qemu emulation, without KVM, with "tiny code generator" tcg; no plugins; 
built from head/master
  guest emulation = x86_64
  guest OS = ubuntu 19.10
  guest app = dotnet, running any program

  qemu sha=7bc4d1980f95387c4cc921d7a066217ff4e42b70 (head/master Mar 10,
  2020)

  qemu invocation is:

  qemu/build/x86_64-softmmu/qemu-system-x86_64 \
-m size=4096 \
-smp cpus=1 \
-machine type=pc-i440fx-5.0,accel=tcg \
-cpu Skylake-Server-v1 \
-nographic \
-bios OVMF-pure-efi.fd \
-drive if=none,id=hd0,file=ubuntu-19.10-server-cloudimg-amd64.img \
-device virtio-blk,drive=hd0 \
-drive if=none,id=cloud,file=linux_cloud_config.img \
-device virtio-blk,drive=cloud \
-netdev user,id=user0,hostfwd=tcp::2223-:22 \
-device virtio-net,netdev=user0

  
  Here's the guest kernel console output:

  
  [ 2834.005449] BUG: unable to handle page fault for address: 7fffc2c0
  [ 2834.009895] #PF: supervisor read access in user mode
  [ 2834.013872] #PF: error_code(0x0001) - permissions violation
  [ 2834.018025] IDT: 0xfe00 (limit=0xfff) GDT: 0xfe001000 
(limit=0x7f)
  [ 2834.022242] LDTR: NULL
  [ 2834.026306] TR: 0x40 -- base=0xfe003000 limit=0x206f
  [ 2834.030395] PGD 8000360d0067 P4D 8000360d0067 PUD 36105067 PMD 
36193067 PTE 800076d8e867
  [ 2834.038672] Oops: 0001 [#4] SMP PTI
  [ 2834.042707] CPU: 0 PID: 13537 Comm: dotnet Tainted: G  D   
5.3.0-29-generic #31-Ubuntu
  [ 2834.050591] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
0.0.0 02/06/2015
  [ 2834.054785] RIP: 0033:0x147eaeda
  [ 2834.059017] Code: d0 00 00 00 4c 8b a7 d8 00 00 00 4c 8b af e0 00 00 00 4c 
8b b7 e8 00 00 00 4c 8b bf f0 00 00 00 48 8b bf b0 00 00 00 9d 74 02 <48> cf 48 
8d 64 24 30 5d c3 90 cc c3 66 90 55 4c 8b a7 d8 00 00 00
  [ 2834.072103] RSP: 002b:7fffc2c0 EFLAGS: 0202
  [ 2834.076507] RAX:  RBX: 1554b401af38 RCX: 
0001
  [ 2834.080832] RDX:  RSI:  RDI: 
7fffcfb0
  [ 2834.085010] RBP: 7fffd730 R08:  R09: 
7fffd1b0
  [ 2834.089184] R10: 15331dd5 R11: 153ad8d0 R12: 
0002
  [ 2834.093350] R13: 0001 R14: 0001 R15: 
1554b401d388
  [ 2834.097309] FS:  14fa5740 GS:  
  [ 2834.101131] Modules linked in: isofs nls_iso8859_1 dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev input_leds serio_raw parport_pc 
parport sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper 
virtio_net psmouse net_failover failover virtio_blk floppy
  [ 2834.122539] CR2: 7fffc2c0
  [ 2834.126867] ---[ end trace dfae51f1d9432708 ]---
  [ 2834.131239] RIP: 0033:0x14d793262eda
  [ 2834.135715] Code: Bad RIP value.
  [ 2834.140243] RSP: 002b:7ffddb4e2980 EFLAGS: 0202
  [ 2834.144615] RAX:  RBX: 14d6f402acb8 RCX: 
0002
  [ 2834.148943] RDX: 01cd6950 RSI:  RDI: 
7ffddb4e3670
  [ 2834.153335] RBP: 7ffddb4e3df0 R08: 0001 R09: 
7ffddb4e3870
  [ 2834.157774] R10: 14d793da9dd5 R11: 14d793e258d0 R12: 
0002
  [ 2834.162132] R13: 0001 R14: 0001 R15: 
14d6f402d040
  [ 2834.166239] FS:  14fa5740() GS:97213ba0() 
knlGS:
  [ 2834.170529] CS:  0033 DS:  ES:  CR0: 80050033
  [ 2834.174751] CR2: 14d793262eb0 CR3: 3613 CR4: 
007406f0
  [ 2834.178892] PKRU: 5554

  I run the application from a shell with `ulimit -s unlimited`
  (unlimited stack to size).

  The application creates a number of threads, and those threads make a
  lot of calls to sigaltstack() and mprotect(); see the relevant source
  for dotnet here
  
https://github.com/dotnet/runtime/blob/15ec69e47b4dc56098e6058a11ccb6ae4d5d4fa1/src/coreclr/src/pal/src/thread/thread.cpp#L2467

  using strace -f on the 

[Bug 1824344] Re: x86: retf or iret pagefault sets wrong error code

2020-03-16 Thread Robert Henry
This appears to be similar to
https://bugs.launchpad.net/qemu/+bug/1866892 (and much simpler)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1824344

Title:
  x86: retf or iret pagefault sets wrong error code

Status in QEMU:
  New

Bug description:
  With a x86_64 or i386 guest, non-KVM, when trying to execute a
  "iret/iretq/retf" instruction in userspace with invalid stack pointer
  (under a protected mode OS, like Linux), wrong bits are set in the
  pushed error code; bit 2 is not set, indicating the error comes from
  kernel space.

  If the guest OS is using this flag to decide whether this was a kernel
  or user page fault, it will mistakenly decide a kernel has irrecoverably
  faulted, possibly causing guest OS panic.

  
  How to reproduce the problem a guest (non-KVM) Linux:
  Note, on recent Linux kernel version, this needs a CPU with SMAP support
  (eg. -cpu max)

  $ cat tst.c
  int main()
  {
  __asm__ volatile (
  "mov $0,%esp\n"
  "retf"
  );
  return 0;
  }

  $ gcc tst.c
  $ ./a.out
  Killed

  
  "dmesg" shows the kernel has in fact triggered a "BUG: unable to handle
  kernel NULL pointer dereference...", but it has "recovered" by killing
  the faulting process (see attached screenshot).

  
  Using self-compiled qemu from git:
  commit 532cc6da74ec25b5ba6893b5757c977d54582949 (HEAD -> master, tag: 
v4.0.0-rc3, origin/master, origin/HEAD)
  Author: Peter Maydell 
  Date:   Wed Apr 10 15:38:59 2019 +0100

  Update version for v4.0.0-rc3 release
  
  Signed-off-by: Peter Maydell 

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1824344/+subscriptions



[Bug 1866892] Re: guest OS catches a page fault bug when running dotnet

2020-03-16 Thread Robert Henry
A simpler case seems to produce the same error.  See
https://bugs.launchpad.net/qemu/+bug/1824344

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866892

Title:
  guest OS catches a page  fault bug when running dotnet

Status in QEMU:
  New

Bug description:
  The linux guest OS catches a page fault bug when running the dotnet
  application.

  host = metal = x86_64
  host OS = ubuntu 19.10
  qemu emulation, without KVM, with "tiny code generator" tcg; no plugins; 
built from head/master
  guest emulation = x86_64
  guest OS = ubuntu 19.10
  guest app = dotnet, running any program

  qemu sha=7bc4d1980f95387c4cc921d7a066217ff4e42b70 (head/master Mar 10,
  2020)

  qemu invocation is:

  qemu/build/x86_64-softmmu/qemu-system-x86_64 \
-m size=4096 \
-smp cpus=1 \
-machine type=pc-i440fx-5.0,accel=tcg \
-cpu Skylake-Server-v1 \
-nographic \
-bios OVMF-pure-efi.fd \
-drive if=none,id=hd0,file=ubuntu-19.10-server-cloudimg-amd64.img \
-device virtio-blk,drive=hd0 \
-drive if=none,id=cloud,file=linux_cloud_config.img \
-device virtio-blk,drive=cloud \
-netdev user,id=user0,hostfwd=tcp::2223-:22 \
-device virtio-net,netdev=user0

  
  Here's the guest kernel console output:

  
  [ 2834.005449] BUG: unable to handle page fault for address: 7fffc2c0
  [ 2834.009895] #PF: supervisor read access in user mode
  [ 2834.013872] #PF: error_code(0x0001) - permissions violation
  [ 2834.018025] IDT: 0xfe00 (limit=0xfff) GDT: 0xfe001000 
(limit=0x7f)
  [ 2834.022242] LDTR: NULL
  [ 2834.026306] TR: 0x40 -- base=0xfe003000 limit=0x206f
  [ 2834.030395] PGD 8000360d0067 P4D 8000360d0067 PUD 36105067 PMD 
36193067 PTE 800076d8e867
  [ 2834.038672] Oops: 0001 [#4] SMP PTI
  [ 2834.042707] CPU: 0 PID: 13537 Comm: dotnet Tainted: G  D   
5.3.0-29-generic #31-Ubuntu
  [ 2834.050591] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
0.0.0 02/06/2015
  [ 2834.054785] RIP: 0033:0x147eaeda
  [ 2834.059017] Code: d0 00 00 00 4c 8b a7 d8 00 00 00 4c 8b af e0 00 00 00 4c 
8b b7 e8 00 00 00 4c 8b bf f0 00 00 00 48 8b bf b0 00 00 00 9d 74 02 <48> cf 48 
8d 64 24 30 5d c3 90 cc c3 66 90 55 4c 8b a7 d8 00 00 00
  [ 2834.072103] RSP: 002b:7fffc2c0 EFLAGS: 0202
  [ 2834.076507] RAX:  RBX: 1554b401af38 RCX: 
0001
  [ 2834.080832] RDX:  RSI:  RDI: 
7fffcfb0
  [ 2834.085010] RBP: 7fffd730 R08:  R09: 
7fffd1b0
  [ 2834.089184] R10: 15331dd5 R11: 153ad8d0 R12: 
0002
  [ 2834.093350] R13: 0001 R14: 0001 R15: 
1554b401d388
  [ 2834.097309] FS:  14fa5740 GS:  
  [ 2834.101131] Modules linked in: isofs nls_iso8859_1 dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev input_leds serio_raw parport_pc 
parport sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper 
virtio_net psmouse net_failover failover virtio_blk floppy
  [ 2834.122539] CR2: 7fffc2c0
  [ 2834.126867] ---[ end trace dfae51f1d9432708 ]---
  [ 2834.131239] RIP: 0033:0x14d793262eda
  [ 2834.135715] Code: Bad RIP value.
  [ 2834.140243] RSP: 002b:7ffddb4e2980 EFLAGS: 0202
  [ 2834.144615] RAX:  RBX: 14d6f402acb8 RCX: 
0002
  [ 2834.148943] RDX: 01cd6950 RSI:  RDI: 
7ffddb4e3670
  [ 2834.153335] RBP: 7ffddb4e3df0 R08: 0001 R09: 
7ffddb4e3870
  [ 2834.157774] R10: 14d793da9dd5 R11: 14d793e258d0 R12: 
0002
  [ 2834.162132] R13: 0001 R14: 0001 R15: 
14d6f402d040
  [ 2834.166239] FS:  14fa5740() GS:97213ba0() 
knlGS:
  [ 2834.170529] CS:  0033 DS:  ES:  CR0: 80050033
  [ 2834.174751] CR2: 14d793262eb0 CR3: 3613 CR4: 
007406f0
  [ 2834.178892] PKRU: 5554

  I run the application from a shell with `ulimit -s unlimited`
  (unlimited stack to size).

  The application creates a number of threads, and those threads make a
  lot of calls to sigaltstack() and mprotect(); see the relevant source
  for dotnet here
  
https://github.com/dotnet/runtime/blob/15ec69e47b4dc56098e6058a11ccb6ae4d5d4fa1/src/coreclr/src/pal/src/thread/thread.cpp#L2467

  using strace -f on the app shows that no alt stacks come anywhere near
  the failing address; all alt stacks are in the heap, as expected.
  None of the mmap/mprotect/munmap syscalls were given arguments in the
  high memory 0x7fff and up.

  gdb (with default signal stop/print/pass 

[Bug 1866892] [NEW] guest OS catches a page fault bug when running dotnet

2020-03-10 Thread Robert Henry
Public bug reported:

The linux guest OS catches a page fault bug when running the dotnet
application.

host = metal = x86_64
host OS = ubuntu 19.10
qemu emulation, without KVM, with "tiny code generator" tcg; no plugins; built 
from head/master
guest emulation = x86_64
guest OS = ubuntu 19.10
guest app = dotnet, running any program

qemu sha=7bc4d1980f95387c4cc921d7a066217ff4e42b70 (head/master Mar 10,
2020)

qemu invocation is:

qemu/build/x86_64-softmmu/qemu-system-x86_64 \
  -m size=4096 \
  -smp cpus=1 \
  -machine type=pc-i440fx-5.0,accel=tcg \
  -cpu Skylake-Server-v1 \
  -nographic \
  -bios OVMF-pure-efi.fd \
  -drive if=none,id=hd0,file=ubuntu-19.10-server-cloudimg-amd64.img \
  -device virtio-blk,drive=hd0 \
  -drive if=none,id=cloud,file=linux_cloud_config.img \
  -device virtio-blk,drive=cloud \
  -netdev user,id=user0,hostfwd=tcp::2223-:22 \
  -device virtio-net,netdev=user0


Here's the guest kernel console output:


[ 2834.005449] BUG: unable to handle page fault for address: 7fffc2c0
[ 2834.009895] #PF: supervisor read access in user mode
[ 2834.013872] #PF: error_code(0x0001) - permissions violation
[ 2834.018025] IDT: 0xfe00 (limit=0xfff) GDT: 0xfe001000 
(limit=0x7f)
[ 2834.022242] LDTR: NULL
[ 2834.026306] TR: 0x40 -- base=0xfe003000 limit=0x206f
[ 2834.030395] PGD 8000360d0067 P4D 8000360d0067 PUD 36105067 PMD 
36193067 PTE 800076d8e867
[ 2834.038672] Oops: 0001 [#4] SMP PTI
[ 2834.042707] CPU: 0 PID: 13537 Comm: dotnet Tainted: G  D   
5.3.0-29-generic #31-Ubuntu
[ 2834.050591] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
0.0.0 02/06/2015
[ 2834.054785] RIP: 0033:0x147eaeda
[ 2834.059017] Code: d0 00 00 00 4c 8b a7 d8 00 00 00 4c 8b af e0 00 00 00 4c 
8b b7 e8 00 00 00 4c 8b bf f0 00 00 00 48 8b bf b0 00 00 00 9d 74 02 <48> cf 48 
8d 64 24 30 5d c3 90 cc c3 66 90 55 4c 8b a7 d8 00 00 00
[ 2834.072103] RSP: 002b:7fffc2c0 EFLAGS: 0202
[ 2834.076507] RAX:  RBX: 1554b401af38 RCX: 0001
[ 2834.080832] RDX:  RSI:  RDI: 7fffcfb0
[ 2834.085010] RBP: 7fffd730 R08:  R09: 7fffd1b0
[ 2834.089184] R10: 15331dd5 R11: 153ad8d0 R12: 0002
[ 2834.093350] R13: 0001 R14: 0001 R15: 1554b401d388
[ 2834.097309] FS:  14fa5740 GS:  
[ 2834.101131] Modules linked in: isofs nls_iso8859_1 dm_multipath scsi_dh_rdac 
scsi_dh_emc scsi_dh_alua ppdev input_leds serio_raw parport_pc parport 
sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper 
virtio_net psmouse net_failover failover virtio_blk floppy
[ 2834.122539] CR2: 7fffc2c0
[ 2834.126867] ---[ end trace dfae51f1d9432708 ]---
[ 2834.131239] RIP: 0033:0x14d793262eda
[ 2834.135715] Code: Bad RIP value.
[ 2834.140243] RSP: 002b:7ffddb4e2980 EFLAGS: 0202
[ 2834.144615] RAX:  RBX: 14d6f402acb8 RCX: 0002
[ 2834.148943] RDX: 01cd6950 RSI:  RDI: 7ffddb4e3670
[ 2834.153335] RBP: 7ffddb4e3df0 R08: 0001 R09: 7ffddb4e3870
[ 2834.157774] R10: 14d793da9dd5 R11: 14d793e258d0 R12: 0002
[ 2834.162132] R13: 0001 R14: 0001 R15: 14d6f402d040
[ 2834.166239] FS:  14fa5740() GS:97213ba0() 
knlGS:
[ 2834.170529] CS:  0033 DS:  ES:  CR0: 80050033
[ 2834.174751] CR2: 14d793262eb0 CR3: 3613 CR4: 007406f0
[ 2834.178892] PKRU: 5554

I run the application from a shell with `ulimit -s unlimited` (unlimited
stack to size).

The application creates a number of threads, and those threads make a
lot of calls to sigaltstack() and mprotect(); see the relevant source
for dotnet here
https://github.com/dotnet/runtime/blob/15ec69e47b4dc56098e6058a11ccb6ae4d5d4fa1/src/coreclr/src/pal/src/thread/thread.cpp#L2467

using strace -f on the app shows that no alt stacks come anywhere near
the failing address; all alt stacks are in the heap, as expected.  None
of the mmap/mprotect/munmap syscalls were given arguments in the high
memory 0x7fff and up.

gdb (with default signal stop/print/pass semantics) does not report any
signals prior to the kernel bug being tripped, so I doubt the alternate
signal stack is actually used.

When I run the same dotnet binary on the host (eg, on "bare metal"), the
host kernel seems happy and dotnet runs as expected.

I have not tried different qemu or guest or host O/S.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to 

[Bug 1860610] Re: cap_disas_plugin leaks memory

2020-01-31 Thread Robert Henry
I run git blame in the capstone repository, and cs_free has been around
for at least 4 years in the capstone ABI. I can not tell if the need to
call cs_free is a (new) requirement. Documentation capstone is a little
informal...

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1860610

Title:
  cap_disas_plugin leaks memory

Status in QEMU:
  New

Bug description:
  Looking at origin/master head, the function cap_disas_plugin leaks
  memory.

  per capstone's examples using their ABI, cs_free(insn, count); needs
  to called just before cs_close.

  I discovered this running qemu under valgrind.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1860610/+subscriptions



Re: [EXTERNAL] Re: QEMU for aarch64 with plugins seems to fail basic consistency checks

2020-01-27 Thread Robert Henry
This proposed text sounds good. Better English is to say "concepts" rather than 
"conceptions".

My plugin currently allocates its own unique user data structure on every call 
to the instrumentation-time callback.  This piece of user data captures the 
transient data presented from qemu.   What's missing from the interface is a 
callback when qemu can guarantee that the run-time callback will never be 
called again with this piece of user data.  At that point I would free my piece 
of user data.

I'm not too worried about this memory loss, yet.  I ran ubuntu for 3 days on 
qemu+plugins and only observed a tolerable growth in qemu's memory consumption.

From: Alex Bennée 
Sent: Friday, January 24, 2020 11:44 AM
To: Robert Henry 
Cc: Laurent Desnogues ; qemu-devel@nongnu.org 

Subject: Re: [EXTERNAL] Re: QEMU for aarch64 with plugins seems to fail basic 
consistency checks


Robert Henry  writes:

> I found at least one problem with my plugin.
>
> I was assuming that the insn data from
>   struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
> could be passed into qemu_plugin_register_vcpu_insn_exec_cb both as the 1st 
> argument AND as the user data last argument.  This assumed that insn would 
> persist and be unique from when qemu_plugin_register_vcpu_insn_exec_cb was 
> called to when the execution-time callback (vcpu_insn_exec_before) was called.
>
> This assumption is not true.
>
> I now capture pieces of *insn into my own persistent data structure, and pass 
> that in as void *udata, my problems went away.
>
> I think this lack of persistence of insn should be documented, or
> treated as a bug to be fixed.

I thought I had done this but it turns out I only mentioned it for
hwaddr:

  /*
   * qemu_plugin_get_hwaddr():
   * @vaddr: the virtual address of the memory operation
   *
   * For system emulation returns a qemu_plugin_hwaddr handle to query
   * details about the actual physical address backing the virtual
   * address. For linux-user guests it just returns NULL.
   *
   * This handle is *only* valid for the duration of the callback. Any
   * information about the handle should be recovered before the
   * callback returns.
   */

But the concept of handle lifetime is true for all the handles. I
propose something like this for the docs:


--8<---cut here---start->8---
docs/devel: document query handle lifetimes

I forgot to document the lifetime of handles in the developer
documentation. Do so now.

Signed-off-by: Alex Bennée 

1 file changed, 11 insertions(+), 2 deletions(-)
docs/devel/tcg-plugins.rst | 13 +++--

modified   docs/devel/tcg-plugins.rst
@@ -51,8 +51,17 @@ about how QEMU's translation works to the plugins. While 
there are
 conceptions such as translation time and translation blocks the
 details are opaque to plugins. The plugin is able to query select
 details of instructions and system configuration only through the
-exported *qemu_plugin* functions. The types used to describe
-instructions and events are opaque to the plugins themselves.
+exported *qemu_plugin* functions.
+
+Query Handle Lifetime
+-
+
+Each callback provides an opaque anonymous information handle which
+can usually be further queried to find out information about a
+translation, instruction or operation. The handles themselves are only
+valid during the lifetime of the callback so it is important that any
+information that is needed is extracted during the callback and saved
+by the plugin.

 Usage
 =

--8<---cut here---end--->8---

--
Alex Bennée


Re: [EXTERNAL] Re: QEMU for aarch64 with plugins seems to fail basic consistency checks

2020-01-24 Thread Robert Henry
I found at least one problem with my plugin.

I was assuming that the insn data from
  struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
could be passed into qemu_plugin_register_vcpu_insn_exec_cb both as the 1st 
argument AND as the user data last argument.  This assumed that insn would 
persist and be unique from when qemu_plugin_register_vcpu_insn_exec_cb was 
called to when the execution-time callback (vcpu_insn_exec_before) was called.

This assumption is not true.

I now capture pieces of *insn into my own persistent data structure, and pass 
that in as void *udata, my problems went away.

I think this lack of persistence of insn should be documented, or treated as a 
bug to be fixed.

From: Alex Bennée 
Sent: Friday, January 24, 2020 8:36 AM
To: Robert Henry 
Cc: qemu-devel@nongnu.org 
Subject: [EXTERNAL] Re: QEMU for aarch64 with plugins seems to fail basic 
consistency checks


Robert Henry  writes:

> I wrote a QEMU plugin for aarch64 where the insn and mem callbacks
> print out the specifics of the guest instructions as they are
> "executed".  I expect this trace stream to be well behaved but it is
> not.

Can you post your plugin? It's hard to diagnose what might be wrong
without the actual code.

> By well-behaved, I expect memory insns print out some memory details,
> non-memory insns don't print anything, and the pc only changes after a
> control flow instruction.

Exactly how are you tracking the PC? You should have the correct PC as
you instrument each instruction. Are you saying qemu_plugin_insn_vaddr()
doesn't report a different PC for each instrumented instruction in a block?

> I don't see that gross correctness about 2%
> of the time.


>
>
>   1.  I'm using qemu at tag v4.2.0 (or master head; it doesn't matter), 
> running on a x86_64 host.
>   2.  I build qemu using   ./configure --disable-sdl --enable-gtk 
> --enable-plugins --enable-debug --target-list=aarch64-softmmu 
> aarch64-linux-user
>   3.  I execute qemu from its build area 
> build/aarch64-linux-user/qemu-aarch64, with flags --cpu cortex-a72 and the 
> appropriate args to --plugin ... -d plugin -D .
>   4.  I'm emulating a simple C program in linux emulation mode.
>   5.  The resulting qemu execution is valgrind clean (eg, I run qemu under 
> valgrind) for my little program save for memory leaks I reported a few days 
> ago.
>
> Below is an example of my trace output (the first int printed is the 
> cpu_index, checked to be always 0). Note that the ldr instruction at 0x41a608 
> sometimes reports a memop, but most of the time it doesn't.  Note that 
> 0x41a608 is seen, by trace, running back to back. Note that (bottom of trace) 
> that the movz instruction reports a memop.  (The executed code comes from 
> glibc _dl_aux_init, executed before main() is called.)
>
> How should this problem be tackled? I can't figure out how to make each tcg 
> block be exactly 1 guest (aarch64) insn, which is where I'd first start out.
>
> 0 0x0041a784 0x0041a784 0xf1000c3f cmp x1, #3
> 0 0x0041a788 0x0041a788 0x54fff401 b.ne #0xfe80
> 0 0x0041a78c 0x0041a78c 0x52800033 movz w19, #0x1
> 0 0x0041a790 0x0041a790 0xf9400416 ldr x22, [x0, #8] 0 mem  
> {3 0 0 0} 0x004000800618
> 0 0x0041a794 0x0041a794 0x179d b #0xfe74
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!  0 
> mem  {3 0 0 0} 0x004000800620
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!  0 
> mem  {3 0 0 0} 0x004000800630
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
> 0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
> 0 0x0041a608 0x

QEMU for aarch64 with plugins seems to fail basic consistency checks

2020-01-23 Thread Robert Henry
I wrote a QEMU plugin for aarch64 where the insn and mem callbacks print out 
the specifics of the guest instructions as they are "executed".  I expect this 
trace stream to be well behaved but it is not. By well-behaved, I expect memory 
insns print out some memory details, non-memory insns don't print anything, and 
the pc only changes after a control flow instruction.  I don't see that gross 
correctness about 2% of the time.


  1.  I'm using qemu at tag v4.2.0 (or master head; it doesn't matter), running 
on a x86_64 host.
  2.  I build qemu using   ./configure --disable-sdl --enable-gtk 
--enable-plugins --enable-debug --target-list=aarch64-softmmu aarch64-linux-user
  3.  I execute qemu from its build area build/aarch64-linux-user/qemu-aarch64, 
with flags --cpu cortex-a72 and the appropriate args to --plugin ... -d plugin 
-D .
  4.  I'm emulating a simple C program in linux emulation mode.
  5.  The resulting qemu execution is valgrind clean (eg, I run qemu under 
valgrind) for my little program save for memory leaks I reported a few days ago.

Below is an example of my trace output (the first int printed is the cpu_index, 
checked to be always 0). Note that the ldr instruction at 0x41a608 sometimes 
reports a memop, but most of the time it doesn't.  Note that 0x41a608 is seen, 
by trace, running back to back. Note that (bottom of trace) that the movz 
instruction reports a memop.  (The executed code comes from glibc _dl_aux_init, 
executed before main() is called.)

How should this problem be tackled? I can't figure out how to make each tcg 
block be exactly 1 guest (aarch64) insn, which is where I'd first start out.

0 0x0041a784 0x0041a784 0xf1000c3f cmp x1, #3
0 0x0041a788 0x0041a788 0x54fff401 b.ne #0xfe80
0 0x0041a78c 0x0041a78c 0x52800033 movz w19, #0x1
0 0x0041a790 0x0041a790 0xf9400416 ldr x22, [x0, #8] 0 mem  {3 
0 0 0} 0x004000800618
0 0x0041a794 0x0041a794 0x179d b #0xfe74
0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!  0 mem  
{3 0 0 0} 0x004000800620
0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!  0 mem  
{3 0 0 0} 0x004000800630
0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
0 0x0041a608 0x0041a608 0xf8410c01 ldr x1, [x0, #0x10]!
0 0x0041a60c 0x0041a60c 0xb4000221 cbz x1, #0x44
0 0x0041a7d8 0x0041a7d8 0x52800035 movz w21, #0x1
0 0x0041a7dc 0x0041a7dc 0xf9400418 ldr x24, [x0, #8] 0 mem  {3 
0 0 0} 0x004000800638
0 0x0041a7e0 0x0041a7e0 0x178a b #0xfe28
0 0x0041a7d8 0x0041a7d8 0x52800035 movz w21, #0x1 0 mem  {3 0 0 
0} 0x004000800640







[Bug 1860610] [NEW] cap_disas_plugin leaks memory

2020-01-22 Thread Robert Henry
Public bug reported:

Looking at origin/master head, the function cap_disas_plugin leaks
memory.

per capstone's examples using their ABI, cs_free(insn, count); needs to
called just before cs_close.

I discovered this running qemu under valgrind.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1860610

Title:
  cap_disas_plugin leaks memory

Status in QEMU:
  New

Bug description:
  Looking at origin/master head, the function cap_disas_plugin leaks
  memory.

  per capstone's examples using their ABI, cs_free(insn, count); needs
  to called just before cs_close.

  I discovered this running qemu under valgrind.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1860610/+subscriptions



plugin interface function qemu_plugin_mem_size_shift

2020-01-21 Thread Robert Henry
I don't understand what
  unsigned int qemu_plugin_mem_size_shift(qemu_plugin_meminfo_t info);
does.   The documentation in qemu-plugin.h is silent on this matter.  It 
appears to expose more of the guts of qemu that I don't yet know.


plugin order of registration and order of callback

2020-01-06 Thread Robert Henry
The documentation on the new plugin capabilities of qemu is silent about the 
order call back registration should be done, and is also silent on the order in 
which call backs are fired.

Case in point: The callback registered by qemu_plugin_register_vcpu_mem_cb is 
called after the call back registered by 
qemu_plugin_register_vcpu_insn_exec_cb, regardless of the order of registration.

However, I'd like to have the insn_exec_cb called after the mem_cb so that I 
can save the mem information to be consumed by the insn callback.