>>>On Mon, Dec 04, 2017 at 08:16:47PM +0000, Bhanuprakash Bodireddy
>wrote:
>>>> Processors support prefetch instruction in anticipation of write but
>>>> compilers(gcc) won't use them unless explicitly asked to do so even
>>>> with '-march=native' specified.
>>>>
>>>> [Problem]
>>>>   Case A:
>>>>     OVS_PREFETCH_CACHE(addr, OPCH_HTW)
>>>>        __builtin_prefetch(addr, 1, 3)
>>>>          leaq    -112(%rbp), %rax        [Assembly]
>>>>          prefetchw  (%rax)
>>>>
>>>>   Case B:
>>>>     OVS_PREFETCH_CACHE(addr, OPCH_LTW)
>>>>        __builtin_prefetch(addr, 1, 1)
>>>>          leaq    -112(%rbp), %rax        [Assembly]
>>>>          prefetchw  (%rax)             <***problem***>
>>>>
>>>>   Inspite of specifying -march=native and using Low Temporal
>>>Write(OPCH_LTW),
>>>>   the compiler generates 'prefetchw' instruction instead of 'prefetchwt1'
>>>>   instruction available on processor.
>>>>
>>>> [Solution]
>>>>   Include -mprefetchwt1
>>>>
>>>>   Case B:
>>>>     OVS_PREFETCH_CACHE(addr, OPCH_LTW)
>>>>        __builtin_prefetch(addr, 1, 1)
>>>>          leaq    -112(%rbp), %rax        [Assembly]
>>>>          prefetchwt1  (%rax)
>>>>
>>>> [Testing]
>>>>   $ ./boot.sh
>>>>   $ ./configure
>>>>      checking target hint for cgcc... x86_64
>>>>      checking whether gcc accepts -mprefetchwt1... yes
>>>>   $ make -j
>>>>
>>>> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at
>>>> intel.com>
>>>
>>>Does this have any effect if the architecture or CPU configured for
>>>use does not support prefetchwt1?
>>
>> That's a good question and I spent reasonable time today to figure this out.
>> I have Haswell, Broadwell and Skylake CPUs and they all support this
>instruction.
>
>Hmm. I have 2 different Broadwell machines (Xeon E5 v4 and i7-6800K) and
>both of them doesn't have prefetchwt1 instruction according to cpuid:
>
>       PREFETCHWT1                              = false

Xeon E5-26XX v4 is Broadwell workstation/server but i7-6800k is Skylake Desktop 
variant where as E3-12XX v5 is equivalent skylake workstation/server variant.
AFAIK, prefetchwt1 should be available on above processors, not sure why cpuid 
displays it otherwise.

pmd_thread_main()
-------------------------------------------------------------------------------------------
WITH OPCH_HTW, we see prefetchw instruction. 

OVS_PREFETCH_CACHE(&pmd->cachelineC, OPCH_HTW);
    cycles_count_start(pmd);
    for (;;) {
        for (i = 0; i < poll_cnt; i++) {
            process_packets =
                dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
                                           poll_list[i].port_no);
            cycles_count_intermediate(pmd, poll_list[i].rxq,


Address Source Line     Assembly        
0x6e29ef        4,086   movl  0x823ecb(%rip), %edi                              
                        
0x6e29f5        4,085   movq  0x50(%rsp), %rax                                  
                
0x6e29fa        4,086   test %edi, %edi                                         
        
0x6e29fc        4,085   prefetchwz  (%rax)                                      
                
----------------------------------------------------------------------------------------
With OPCH_LTW, we can see prefetchwt1b instruction being used(change made to 
show this).

OVS_PREFETCH_CACHE(&pmd->cachelineC, OPCH_LTW);
    cycles_count_start(pmd);
    for (;;) {
        for (i = 0; i < poll_cnt; i++) {
            ..........

Address Source Line     Assembly        
0x6e29ef        4,086   movl  0x823ecb(%rip), %edi                              
                        
0x6e29f5        4,085   movq  0x50(%rsp), %rax                                  
                
0x6e29fa        4,086   test %edi, %edi                                         
        
0x6e29fc        4,085   prefetchwt1b  (%rax)                                    
                
-----------------------------------------------------------------------------------------

>
>This means that introducing of this change will break binary compatibility even
>between CPUs of the same generation, i.e. I will not be able to run on my
>system binaries compiled on yours.
>
>If it's true I prefer to not have this change.
>
>Anyway adding of this change will make compiling a generic binary for a
>different platforms impossible if your build server supports prefetchwt1.
>There should be way to disable this arch specific compiler flag even if it
>supported on my current platform.

I see your point where a build server can be advanced and supports the 
prefetchwt1 instruction
and when I copy and run the precompiled binaries on a server not supporting it, 
how does this behave?

Not sure on this. May be Redhat/canonical developers can comment on how they 
handle this kind of cases.

I will try to check this on my side.

- Bhanuprakash.

>
>Best regards, Ilya Maximets.
>
>> But I found that this instruction isn't enabled by default even with
>march=native and so need to explicitly enable this.
>>
>> Coming to your question, there won't be side effects on using OPCH_LTW.
>> On Processors that *doesn't* support PREFETCHW and PREFETCHWT1 the
>compiler generates a 'prefetcht1' instruction.
>> On processors that support PREFETCHW the compiler generates 'prefetchw'
>instruction.
>> On processors that support PREFETCHW & PREFETCHWT1, the compiler
>generates 'prefetchwt1' instruction with -mprefetchwt1 explicitly enabled.
>>
>>>If it could lead to that situation, then this does not seem like the
>>>right thing to do, and we might want to fall back to recommending use
>>>of the option when the person building knows that the software will
>>>run on a machine with prefetchwt1.
>>
>> According to above on processors that doesn't have this instruction support,
>'prefetchnt1' instruction would be generated and doesn't have side effects.
>> I verified this using https://gcc.godbolt.org/  and carefully checking the
>instructions generated for different compiler versions and march flags.
>>
>> - Bhanuprakash.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to