craig.topper added a comment.

In D155145#4553922 <https://reviews.llvm.org/D155145#4553922>, @anna wrote:

> In D155145#4551621 <https://reviews.llvm.org/D155145#4551621>, @craig.topper 
> wrote:
>
>> In D155145#4551526 <https://reviews.llvm.org/D155145#4551526>, @anna wrote:
>>
>>> In D155145#4544068 <https://reviews.llvm.org/D155145#4544068>, @pengfei 
>>> wrote:
>>>
>>>> In D155145#4543326 <https://reviews.llvm.org/D155145#4543326>, @anna wrote:
>>>>
>>>>> We see a crash bisected to this patch about using an illegal instruction. 
>>>>> Here's the CPUInfo for the machine:
>>>>>
>>>>>   CPU info:
>>>>>   current cpu id: 22
>>>>>   total 32(physical cores 16) (assigned logical cores 32) (assigned 
>>>>> physical cores 16) (assigned_sockets:2 of 2) (8 cores per cpu, 2 threads 
>>>>> per core) family 6 model 45 stepping 7 microcode 0x71a, cmov, cx8, fxsr, 
>>>>> mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, 
>>>>> aes, clmul, ht, tsc, tscinvbit, tscinv, clflush
>>>>>   AvgLoads: 0.30, 0.10, 0.18
>>>>>   CPU Model and flags from /proc/cpuinfo:
>>>>>   model name      : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
>>>>>   flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
>>>>> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
>>>>> syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good 
>>>>> nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor 
>>>>> ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic 
>>>>> popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb 
>>>>> stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln 
>>>>> pts md_clear flush_l1d
>>>>>   Online cpus: 0-31
>>>>>   Offline cpus:
>>>>>   BIOS frequency limitation: <Not Available>
>>>>>   Frequency switch latency (ns): 20000
>>>>>   Available cpu frequencies: <Not Available>
>>>>>   Current governor: schedutil
>>>>>   Core performance/turbo boost: <Not Available>
>>>>>
>>>>> I don't see `avxvnniint16` in the flags list nor avx2. So, this 
>>>>> (relatively new) instruction shouldn't be generated for this machine. Any 
>>>>> ideas on why this might be happening?
>>>>
>>>> As far as I can see from the patch, the only way to generate avxvnniint16 
>>>> instructions is to call its specific intrinsics explicitly. And we will 
>>>> check compiling options in FE before allowing to call the intrinsics. We 
>>>> do have an optimization to generate vnni instructions without intrinsics, 
>>>> but we haven't extend it to avxvnniint16 so far.
>>>> So I don't know what's wrong in your case, could you provide a reproducer 
>>>> for your problem?
>>>
>>> I've investigated what is going on. With this patch, we are now passing in 
>>> `+avxvnniint16` into machine attributes. With that attribute, we now 
>>> generate an instruction which is illegal on sandybridge machine:
>>>
>>>    0x3013f2af:      jmpq   0x3013f09b
>>>      0x3013f2b4:    mov    %rax,%rdi
>>>      0x3013f2b7:    and    $0xfffffffffffffff0,%rdi
>>>   => 0x3013f2bb:    vpbroadcastd %xmm0,%ymm2
>>>      0x3013f2c0:    vpbroadcastd %xmm1,%ymm3
>>>
>>> The instruction `vpbroadcastd %xmm0,%ymm2` requires `AVX2` CPU flag: 
>>> https://www.felixcloutier.com/x86/vpbroadcast. However, the machine has 
>>> only AVX flag.
>>>
>>> This is the complete mattr generated:
>>>
>>>   !3 = 
>>> !{!"-mattr=-prfchw,-cldemote,+avx,+aes,+sahf,+pclmul,-xop,+crc32,-xsaves,-avx512fp16,-sm4,+sse4.1,-avx512ifma,+xsave,-avx512pf,+sse4.2,-tsxldtrk,-ptwrite,-widekl,-sm3,-invpcid,+64bit,-xsavec,-avx512vpopcntdq,+cmov,-avx512vp2intersect,-avx512cd,-movbe,-avxvnniint8,-avx512er,-amx-int8,-kl,-sha512,-avxvnni,-rtm,-adx,-avx2,-hreset,-movdiri,-serialize,-vpclmulqdq,-avx512vl,-uintr,-clflushopt,-raoint,-cmpccxadd,-bmi,-amx-tile,+sse,-gfni,+avxvnniint16,-amx-fp16,+xsaveopt,-rdrnd,-avx512f,-amx-bf16,-avx512bf16,-avx512vnni,+cx8,-avx512bw,+sse3,-pku,-fsgsbase,-clzero,-mwaitx,-lwp,-lzcnt,-sha,-movdir64b,-wbnoinvd,-enqcmd,-prefetchwt1,-avxneconvert,-tbm,-pconfig,-amx-complex,+ssse3,+cx16,-bmi2,-fma,+popcnt,-avxifma,-f16c,-avx512bitalg,-rdpru,-clwb,+mmx,+sse2,-rdseed,-avx512vbmi2,-prefetchi,-rdpid,-fma4,-avx512vbmi,-shstk,-vaes,-waitpkg,-sgx,+fxsr,-avx512dq,-sse4a"}
>>>
>>> I've confirmed if we changed to `-avxvnniint16` we do not generate 
>>> `vpbroadcastd`.
>>>
>>> W.r.t. how we get the machine attributes generated through our front-end:
>>>
>>>   if (!sys::getHostCPUFeatures(Features))
>>>         return std::move(mattr);
>>>     
>>>       // Fill mattr with default values.
>>>       mattr.reserve(Features.getNumItems());
>>>       for (auto &I : Features) {
>>>         std::string attr(I.first());
>>>         mattr.emplace_back(std::string(I.second ? "+" : "-") + attr);
>>>       }
>>>
>>> So, the problem is in getHostCPUFeatures, possibly this line from the patch 
>>> : 
>>> `Features["avxvnniint16"] = HasLeaf7Subleaf1 && ((EDX >> 10) & 1) && 
>>> HasAVXSave;`.
>>
>> Does this patch help
>>
>>   diff --git a/llvm/lib/TargetParser/Host.cpp 
>> b/llvm/lib/TargetParser/Host.cpp
>>   index 1141df09307c..11a6879fb76a 100644
>>   --- a/llvm/lib/TargetParser/Host.cpp
>>   +++ b/llvm/lib/TargetParser/Host.cpp
>>   @@ -1769,7 +1769,7 @@ bool sys::getHostCPUFeatures(StringMap<bool> 
>> &Features) {
>>      Features["amx-tile"]   = HasLeaf7 && ((EDX >> 24) & 1) && HasAMXSave;
>>      Features["amx-int8"]   = HasLeaf7 && ((EDX >> 25) & 1) && HasAMXSave;
>>      bool HasLeaf7Subleaf1 =
>>   -      MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, &ECX, 
>> &EDX);
>>   +      HasLeaf7 && EAX >= 1 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, 
>> &ECX, &EDX);
>>      Features["sha512"]     = HasLeaf7Subleaf1 && ((EAX >> 0) & 1);
>>      Features["sm3"]        = HasLeaf7Subleaf1 && ((EAX >> 1) & 1);
>>      Features["sm4"]        = HasLeaf7Subleaf1 && ((EAX >> 2) & 1);
>
> Yes, @craig.topper  that works! Thanks! Could you pls land the patch, if 
> possible?

Can you try to help us understand what is happening?

Sandy Bridge doesn't use leaf 7 at all, but has leaves after it. I thought it 
should always return 0 for all EAX, EBX, ECX, EDX. The EAX value for Leaf 7 
contains how many subleaves of leaf 7 exist.

The documentation says that invalid subleaves of leaf 7 should return all 0s. 
So we thought it was safe to check the bits of sub leaf 1 even if eax from 
subleaf 0 doesn't say subleaf 1 is supported.

Can you capture the values of EAX, EBX, ECX, and EDX after the two calls to 
`getX86CpuIDAndInfoEx` that have 0x7 as the first argument? Maybe there's a bug 
in CPUID on Sandy Bridge.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155145/new/

https://reviews.llvm.org/D155145

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to