Re: spin loop 100x faster in user mode (CPL=3) than superuser (CPL=0)?

Garrick Toubassi Fri, 12 Nov 2021 10:16:37 -0800

Thanks Alex!

Thanks for the pointer to gdbstub.  I just single stepped through the loop
and see it behaving "as expected", which leads me to believe the
performance issue doesn't show up in the execution of the client code.  But
it sounds like you are saying you see evidence of it executing at 0x9fffc?
Can you elaborate?


Here's what I did, let me know if I'm misunderstanding.  I ran

% qemu-system-x86_64 -s -S ./kernel.img

Then

% gdb ./kernel.img
(gdb) target remote localhost:1234

Then set a breakpoint at the spin() function (see source
<https://github.com/gtoubassi/qemu-spinrepro/blob/master/start.c#L5>)
(gdb) b *0x7e00
(gdb) cont

At that point I stepped through the loop several times and it behaved as
expected.  I even "let it rip" with:

(gdb) while 1
 > x/i $rip
 > stepi
 > end

And it stayed well behaved operating on the "client" code as I'd expect.

My next step would be to step through the emulator itself but it sounds
like you are seeing something that would short circuit that labor intensive
exercise.  Pointers appreciated!

gt

On Fri, Nov 12, 2021 at 3:06 AM Alex Bennée <alex.ben...@linaro.org> wrote:

>
> Alex Bennée <alex.ben...@linaro.org> writes:
>
> > Garrick Toubassi <gtouba...@gmail.com> writes:
> >
> >> I went ahead and created a short repro case which can be found at
> https://github.com/gtoubassi/qemu-spinrepro.  Would appreciate
> >> thoughts from anyone or guidance on how to debug.
> >
> > Well something weird is going on that is chewing through the code
> > generation logic. If you run with:
> >
> >  ./qemu-system-x86_64 -serial mon:stdio -kernel ~/Downloads/kernel.img
> >
> > And then C-a c to bring up the monitor you can type "info jit" and see:
> >
> >   (qemu) info jit
> >   Translation buffer state:
> >   gen code size       1063758051/1073736704
> >   TB count            1
> >   TB avg target size  1 max=1 bytes
> >   TB avg host size    64 bytes (expansion ratio: 64.0)
> >   cross page TB count 0 (0%)
> >   direct jump count   0 (0%) (2 jumps=0 0%)
> >   TB hash buckets     1/8192 (0.01% head buckets used)
> >   TB hash occupancy   0.00% avg chain occ. Histogram: [0.0,2.5)%|█
>   ▁|[22.5,25.0]%
> >   TB hash avg chain   1.000 buckets. Histogram: 1|█|1
> >
> <snip>
>
> Hmm ok that's just a result of the code disappearing down a hole:
>
>   0x0009fffc:  00 00                    addb     %al, (%bx, %si)
>   0x0009fffe:  00 00                    addb     %al, (%bx, %si)
>   0x000a0000:  ff                       .byte    0xff
>   0x000a0001:  ff                       .byte    0xff
>
> and as that code is being executed out of a place without a phys_pc we
> don't cache the TB (by design). Usually this isn't a massive problem but
> obviously something has gone wrong in the code to be executing these
> junk instructions.
>
> Have you traced the execution of your code via gdbstub?
>
> --
> Alex Bennée
>

Re: spin loop 100x faster in user mode (CPL=3) than superuser (CPL=0)?

Reply via email to