>>>On Mon, Dec 04, 2017 at 08:16:47PM +0000, Bhanuprakash Bodireddy >wrote: >>>> Processors support prefetch instruction in anticipation of write but >>>> compilers(gcc) won't use them unless explicitly asked to do so even >>>> with '-march=native' specified. >>>> >>>> [Problem] >>>> Case A: >>>> OVS_PREFETCH_CACHE(addr, OPCH_HTW) >>>> __builtin_prefetch(addr, 1, 3) >>>> leaq -112(%rbp), %rax [Assembly] >>>> prefetchw (%rax) >>>> >>>> Case B: >>>> OVS_PREFETCH_CACHE(addr, OPCH_LTW) >>>> __builtin_prefetch(addr, 1, 1) >>>> leaq -112(%rbp), %rax [Assembly] >>>> prefetchw (%rax) <***problem***> >>>> >>>> Inspite of specifying -march=native and using Low Temporal >>>Write(OPCH_LTW), >>>> the compiler generates 'prefetchw' instruction instead of 'prefetchwt1' >>>> instruction available on processor. >>>> >>>> [Solution] >>>> Include -mprefetchwt1 >>>> >>>> Case B: >>>> OVS_PREFETCH_CACHE(addr, OPCH_LTW) >>>> __builtin_prefetch(addr, 1, 1) >>>> leaq -112(%rbp), %rax [Assembly] >>>> prefetchwt1 (%rax) >>>> >>>> [Testing] >>>> $ ./boot.sh >>>> $ ./configure >>>> checking target hint for cgcc... x86_64 >>>> checking whether gcc accepts -mprefetchwt1... yes >>>> $ make -j >>>> >>>> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at >>>> intel.com> >>> >>>Does this have any effect if the architecture or CPU configured for >>>use does not support prefetchwt1? >> >> That's a good question and I spent reasonable time today to figure this out. >> I have Haswell, Broadwell and Skylake CPUs and they all support this >instruction. > >Hmm. I have 2 different Broadwell machines (Xeon E5 v4 and i7-6800K) and >both of them doesn't have prefetchwt1 instruction according to cpuid: > > PREFETCHWT1 = false
Xeon E5-26XX v4 is Broadwell workstation/server but i7-6800k is Skylake Desktop variant where as E3-12XX v5 is equivalent skylake workstation/server variant. AFAIK, prefetchwt1 should be available on above processors, not sure why cpuid displays it otherwise. pmd_thread_main() ------------------------------------------------------------------------------------------- WITH OPCH_HTW, we see prefetchw instruction. OVS_PREFETCH_CACHE(&pmd->cachelineC, OPCH_HTW); cycles_count_start(pmd); for (;;) { for (i = 0; i < poll_cnt; i++) { process_packets = dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, poll_list[i].port_no); cycles_count_intermediate(pmd, poll_list[i].rxq, Address Source Line Assembly 0x6e29ef 4,086 movl 0x823ecb(%rip), %edi 0x6e29f5 4,085 movq 0x50(%rsp), %rax 0x6e29fa 4,086 test %edi, %edi 0x6e29fc 4,085 prefetchwz (%rax) ---------------------------------------------------------------------------------------- With OPCH_LTW, we can see prefetchwt1b instruction being used(change made to show this). OVS_PREFETCH_CACHE(&pmd->cachelineC, OPCH_LTW); cycles_count_start(pmd); for (;;) { for (i = 0; i < poll_cnt; i++) { .......... Address Source Line Assembly 0x6e29ef 4,086 movl 0x823ecb(%rip), %edi 0x6e29f5 4,085 movq 0x50(%rsp), %rax 0x6e29fa 4,086 test %edi, %edi 0x6e29fc 4,085 prefetchwt1b (%rax) ----------------------------------------------------------------------------------------- > >This means that introducing of this change will break binary compatibility even >between CPUs of the same generation, i.e. I will not be able to run on my >system binaries compiled on yours. > >If it's true I prefer to not have this change. > >Anyway adding of this change will make compiling a generic binary for a >different platforms impossible if your build server supports prefetchwt1. >There should be way to disable this arch specific compiler flag even if it >supported on my current platform. I see your point where a build server can be advanced and supports the prefetchwt1 instruction and when I copy and run the precompiled binaries on a server not supporting it, how does this behave? Not sure on this. May be Redhat/canonical developers can comment on how they handle this kind of cases. I will try to check this on my side. - Bhanuprakash. > >Best regards, Ilya Maximets. > >> But I found that this instruction isn't enabled by default even with >march=native and so need to explicitly enable this. >> >> Coming to your question, there won't be side effects on using OPCH_LTW. >> On Processors that *doesn't* support PREFETCHW and PREFETCHWT1 the >compiler generates a 'prefetcht1' instruction. >> On processors that support PREFETCHW the compiler generates 'prefetchw' >instruction. >> On processors that support PREFETCHW & PREFETCHWT1, the compiler >generates 'prefetchwt1' instruction with -mprefetchwt1 explicitly enabled. >> >>>If it could lead to that situation, then this does not seem like the >>>right thing to do, and we might want to fall back to recommending use >>>of the option when the person building knows that the software will >>>run on a machine with prefetchwt1. >> >> According to above on processors that doesn't have this instruction support, >'prefetchnt1' instruction would be generated and doesn't have side effects. >> I verified this using https://gcc.godbolt.org/ and carefully checking the >instructions generated for different compiler versions and march flags. >> >> - Bhanuprakash. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev