On 05/06/2015 10:41 PM, H. Peter Anvin wrote:
> On 05/06/2015 12:09 PM, Denys Vlasenko wrote:
>>>
>>> How on Earth does it make 44 bytes?  Is this due to paravirt_fail?
>>
>> No, just this construct
>>
>>         unsigned int eax, ebx, ecx, edx;
>>         cpuid(op, &eax, &ebx, &ecx, &edx);
>>
>> is not really that cheap to set up. You need to allocate
>> variables on stack and take address of each:
>>
>> ffffffff81063668 <cpuid_eax>:
>> ffffffff81063668:       55                      push   %rbp
>> ffffffff81063669:       48 89 e5                mov    %rsp,%rbp
>> ffffffff8106366c:       48 83 ec 10             sub    $0x10,%rsp
>> ffffffff81063670:       48 8d 4d fc             lea    -0x4(%rbp),%rcx
>> ffffffff81063674:       89 7d f0                mov    %edi,-0x10(%rbp)
>> ffffffff81063677:       48 8d 55 f8             lea    -0x8(%rbp),%rdx
>> ffffffff8106367b:       48 8d 75 f4             lea    -0xc(%rbp),%rsi
>> ffffffff8106367f:       48 8d 7d f0             lea    -0x10(%rbp),%rdi
>> ffffffff81063683:       c7 45 f8 00 00 00 00    movl   $0x0,-0x8(%rbp)
>> ffffffff8106368a:       e8 3c ff ff ff          callq  ffffffff810635cb 
>> <__cpuid>
>> ffffffff8106368f:       8b 45 f0                mov    -0x10(%rbp),%eax
>> ffffffff81063692:       c9                      leaveq
>> ffffffff81063693:       c3                      retq
>>
> 
> That almost certainly is due to paravirt_fail, because otherwise cpuid
> would be inline, and gcc actually knows how to optimize around the cpuid
> instruction to the point of eliminating the temporaries.

Yes, with HYPERVISOR_GUEST off cpuid_eax() is smaller:

ffffffff81055a66 <cpuid_eax>:
ffffffff81055a66:       55                      push   %rbp
ffffffff81055a67:       89 f8                   mov    %edi,%eax
ffffffff81055a69:       31 c9                   xor    %ecx,%ecx
ffffffff81055a6b:       48 89 e5                mov    %rsp,%rbp
ffffffff81055a6e:       53                      push   %rbx
ffffffff81055a6f:       0f a2                   cpuid
ffffffff81055a71:       5b                      pop    %rbx
ffffffff81055a72:       5d                      pop    %rbp
ffffffff81055a73:       c3                      retq

However, it is not small enough to make vmlinux grow:

    text     data      bss       dec     hex filename
81746530 13978160 20066304 115790994 6e6d492 vmlinux.before
81746509 13978160 20066304 115790973 6e6d47d vmlinux

To recap: with this patch
Code is smaller with and without HYPERVISOR_GUEST.
Slowdown per cpuid_REG() call is at worst 4%.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to