I'm still digesting some of your points, but in general, something I
noticed in "newer" kernels is the "hard-coded" assumption for some of these
(specially security related) features (take SMAP as an example). So, to my
understanding, if our CPUID simply says "I don't know", in some cases
kernel interprets that as a yes rather than a no! So, again to my limited
knowledge, I think it'd best to respond negative until we have support for
these features.

Regarding xsave, if you recall discussions we had about change 19892
<https://gem5-review.googlesource.com/c/public/gem5/+/19892>, our CPUID
returns 0x04000209 for 0_1. The most significant set bit we have is bit 26,
which tells the kernel we do have support for xsave and then kernel tries
to set bit 18 on CR4. Correct me if I'm wrong, but my understanding was
that we have "some" support for xsave in gem5. Although looking at my
kernel logs, kernel seem to disable it after some tests during SMP boot
process (probably our support is not enough for kernel and it masks it off).

Best,

On Wed, Aug 14, 2019 at 3:28 PM Gabe Black <gabebl...@google.com> wrote:

> Actually it looks like somebody added a new function while skipping over
> the ones below. That's how the unimplemented functions slipped through. I'm
> not going to try to implement those for now, but I don't want to discourage
> anyone that wants to do something with them.
>
> I'm also looking into why the kernel thinks we support xsave (which seems
> to be fairly complicated) when we do not. I think there's just an extra bit
> set in CPUID I need to turn off.
>
> Gabe
>
> On Wed, Aug 14, 2019 at 3:01 PM Gabe Black <gabebl...@google.com> wrote:
>
>> I was actually just looking this since I noticed that one of the x86
>> kernels I have lying around was crashing with an undefined opcode
>> exception. I see that the doCpuid function will just bail out for some of
>> the functions which are below the largest it supports (so it can support
>> the extended functions). The CPUID instruction will just leave EAX, EBX,
>> ECX and EDX unmodified in this case since it isn't supposed to raise any
>> type of fault. The kernel will try to interpret those fields as an actual
>> answer since we told it those functions were supported, and depending on
>> what executed before it do something arbitrary. We should definitely stop
>> doing that for starters. I think this is something I partially implemented
>> since it was blocking boot a long time ago, and then never went back and
>> filled out. For some of these functions we may not have good answers, for
>> instance where reporting cache sizes. I'm not sure what to do in that case.
>> We may need to look at those fields one by one and try to come up with
>> safe, fairly inert answers. If we can return something that says "I don't
>> know", that would be best.
>>
>> The specific case I'm looking at is function 0xd though, which we would
>> have told the kernel we don't support. That's also passing through its
>> values which is also giving bad answers.
>>
>> I'll put up some CLs which fill out function constants we don't yet have,
>> return 0 when we don't get an answer from doCpuid, and start looking at
>> what the unimplemented functions should return. We can build on that to add
>> in functions that are missing so the kernel at least stops tripping over
>> itself when it gets nonsensical answers from CPUID.
>>
>> Gabe
>>
>> On Wed, Aug 14, 2019 at 2:01 PM Pouya Fotouhi <pfoto...@ucdavis.edu>
>> wrote:
>>
>>> Hi All,
>>>
>>> During kernel boot up with the timing/atomic/O3 CPU modes I get the
>>> following kernel oops at native_flush_tlb_global. Looking closer at the
>>> issue, Exec traces show:
>>>
>>> 2014093750: system.cpu A0 T0 : @native_flush_tlb_global+96    : mov
>>> eax, 0x2
>>> 2014093750: system.cpu A0 T0 : @native_flush_tlb_global+96.0  :
>>> MOV_R_I : limm   eax, 0x2 : IntAlu :  D=0x0000000000000002
>>>  flags=(IsInteger|IsMicroop|IsLastMicroop|IsFirstMicroop)
>>> 2014094250: system.cpu A0 T0 : @native_flush_tlb_global+101    : ud2
>>> 2014094250: system.cpu A0 T0 : @native_flush_tlb_global+101.0  :   UD2 :
>>> fault   Invalid-Opcode : No_OpClass :
>>> flags=(IsMicroop|IsLastMicroop|IsFirstMicroop)
>>> 2014094500: system.cpu A0 T0 : @native_flush_tlb_global+101.32768 :
>>> Microcode_ROM : slli   t4, t1, 0x4 : IntAlu :  D=0x0000000000000060
>>>  flags=(IsInteger|IsMicroop|IsDelayedCommit)
>>>
>>> Looking at  the decode of the "undefined" instruction raising the fault:
>>> 2014094250: system.cpu: Decode: Decoded fault instruction:
>>> {
>>>         leg = 0x10,
>>>         rex = 0,
>>>         vex/xop = 0,
>>>         op = {
>>>                 type = three byte 0f38,
>>>                 op = 0x82,
>>>                 },
>>>         modRM = 0,
>>>         sib = 0,
>>>         immediate = 0,
>>>         displacement = 0
>>>         dispSize = 0}
>>>
>>> Which apparently is  invpcid, and dump of native_flush_tlb_global
>>> confirms:
>>>
>>>    0xffffffff81033a68 <+96>:    mov    $0x2,%eax
>>>    0xffffffff81033a6d <+101>:   invpcid (%rcx),%rax
>>>    0xffffffff81033a72 <+106>:   add    $0x18,%rsp
>>>
>>> We do not implement this instruction, and It seems like this
>>> functionality is reported in function 0_7 of CPUID (which we do not
>>> implement).
>>>
>>> I also have a different, yet related, issue with SMAP and FSGSBASE bits
>>> (bits 20 and 16 in CR4), where kernel tries to set those resulting in a
>>> fault which our CPUs can't handle and kernel panics upon them. These
>>> functionalities are also reported by function 0_7 in CPUID which we do not
>>> implement
>>>
>>> I was wondering if it would be safe to simply return 0s for function
>>> 0_7? I checked, and I couldn't find anything violating the functionalities
>>> we support in gem5. However, I would appreciate if someone more familiar
>>> with our support for x86 can double check
>>> https://www.sandpile.org/x86/cpuid.htm#level_0000_0007h and verify that
>>> returning 0s would be fine here.
>>>
>>> For the corner case my kernel was hitting, I tested and returning 0s
>>> would get me past both these issues. Upon confirmation from someone in the
>>> community, I can proceed and submit the change.
>>>
>>> Best,
>>> --
>>> Pouya Fotouhi
>>> PhD Candidate
>>> Department of Electrical and Computer Engineering
>>> University of California, Davis
>>>
>>

-- 
Pouya Fotouhi
PhD Candidate
Department of Electrical and Computer Engineering
University of California, Davis
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to