Re: volk questions
Hi Fons, sorry for the abbreviation PR == Pull Request. A request to merge your contribution into the VOLK repository in this case. This includes a forum to discuss the specifics and possible improvements before we merge your code. Did your Boost issue get solved? Boost was only required for old VOLK and environment versions that did not support `std::filesystem` yet. Your kernel versions imply you use a more recent system that supports this C++17 feature. For everyone else, libvolk is still a C library. We only use C++ for tests etc. For reference, these are the CPUs Intel Core i5-3470 https://www.intel.com/content/www/us/en/products/sku/68316/intel-core-i53470-processor-6m-cache-up-to-3-60-ghz/specifications.html With SSE and AVX Intel Core i5-4300U https://www.intel.com/content/www/us/en/products/sku/76308/intel-core-i54300u-processor-3m-cache-up-to-2-90-ghz/specifications.html With SSE, AVX, and AVX2 I can't tell from a distance, why VOLK would not select AVX kernels on an AVX capable CPU. For your benchmarking needs: https://github.com/google/benchmark This might be a very important section of the docs: https://github.com/google/benchmark/blob/main/docs/user_guide.md#preventing-optimization Especially `DoNotOptimize` should be interesting. It could very well happen, that your code was optimized out. Cheers Johannes On 14.10.23 11:02, Fons Adriaensen wrote: Hi Johannes, Thanks for your response ! first off, we'd need to know a bit more about your setup. Could you share the versions of VOLK and your host system, e.g. OS, version, etc. Furthermore, do you use a VM, a container, or smth like this? VOLK was 2.5.0, now upgraded to 3.0.0, same results. No VM, container, etc used. Machine info: zita1 (desktop) fons@zita1:~> lscpu Architecture:x86_64 CPU op-mode(s):32-bit, 64-bit Address sizes: 36 bits physical, 48 bits virtual Byte Order:Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: GenuineIntel Model name:Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz CPU family: 6 Model: 58 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 Stepping:9 CPU(s) scaling MHz: 45% CPU max MHz: 3600. CPU min MHz: 1600. BogoMIPS:6387.26 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti tpr_shadow flexp riority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts vnmi Virtualization features: Virtualization:VT-x Caches (sum of all): L1d: 128 KiB (4 instances) L1i: 128 KiB (4 instances) L2:1 MiB (4 instances) L3:6 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-3 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: KVM: Mitigation: VMX disabled L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT disabled Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT disabled Meltdown: Mitigation; PTI Mmio stale data: Unknown: No mitigations Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Vulnerable Spectre v1:Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2:Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected Srbds: Vulnerable: No microcode Tsx async abort: Not affected fons@zita1:~> uname -a Linux zita1 6.5.5-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 23 Sep 2023 22:55:13 + x86_64 GNU/Linux zita4 (laptop) Architecture:x86_64 CPU op-mode(s):32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order:Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: GenuineIntel Model name:Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz CPU family: 6 Model: 69 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 Stepping:1 CPU(s) scaling MHz: 46% CPU max MHz:
Re: volk questions
Hi Johannes, Thanks for your response ! > first off, we'd need to know a bit more about your setup. Could you share > the versions of VOLK and your host system, e.g. OS, version, etc. > Furthermore, do you use a VM, a container, or smth like this? VOLK was 2.5.0, now upgraded to 3.0.0, same results. No VM, container, etc used. Machine info: zita1 (desktop) fons@zita1:~> lscpu Architecture:x86_64 CPU op-mode(s):32-bit, 64-bit Address sizes: 36 bits physical, 48 bits virtual Byte Order:Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: GenuineIntel Model name:Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz CPU family: 6 Model: 58 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 Stepping:9 CPU(s) scaling MHz: 45% CPU max MHz: 3600. CPU min MHz: 1600. BogoMIPS:6387.26 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti tpr_shadow flexp riority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts vnmi Virtualization features: Virtualization:VT-x Caches (sum of all): L1d: 128 KiB (4 instances) L1i: 128 KiB (4 instances) L2:1 MiB (4 instances) L3:6 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-3 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: KVM: Mitigation: VMX disabled L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT disabled Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT disabled Meltdown: Mitigation; PTI Mmio stale data: Unknown: No mitigations Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Vulnerable Spectre v1:Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2:Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected Srbds: Vulnerable: No microcode Tsx async abort: Not affected fons@zita1:~> uname -a Linux zita1 6.5.5-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 23 Sep 2023 22:55:13 + x86_64 GNU/Linux zita4 (laptop) Architecture:x86_64 CPU op-mode(s):32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order:Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: GenuineIntel Model name:Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz CPU family: 6 Model: 69 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 Stepping:1 CPU(s) scaling MHz: 46% CPU max MHz: 2900. CPU min MHz: 800. BogoMIPS:4990.47 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nons top_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm xsaveopt dtherm ida arat pln pts Virtualization features: Virtualization:VT-x Caches (sum of all): L1d: 64 KiB (2 instances) L1i: 64 KiB (2 instances) L2:512 KiB (2 instances) L3:3 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-3 Vulnerabilities: Itlb multihit: KVM: Mitigation: VMX disabled L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable Mds:
Re: volk questions
Hi Fons, first off, we'd need to know a bit more about your setup. Could you share the versions of VOLK and your host system, e.g. OS, version, etc. Furthermore, do you use a VM, a container, or smth like this? Regarding your question, if these functions may be useful to VOLK / GNU Radio. I'd say yes. We'd have to figure out how this may work in practice though. I'd suggest to start with a PR. Cheers Johannes On 13.10.23 16:08, Fons Adriaensen wrote: Hello all, I've recently installed volk on two machines, both Intel i5 but different types. 1. On one of the two machines 'make' failed while 'cmake ..' ran without errors. Installing boost fixed this. Could it be that the cmake step doesn't check for boost ? 2. On one the two machines, 'volk_profile -R volk_32f_sin_32f' only tests the SSE4 versions even if AVX is supported (as confirmed by profiling e.g. volk_32f_asin_32f), and an AVX implemention is available (it is the selected one on the other machine). Why is this ? 3. The reason I installed volk was to compare it against some SSE4 code I developed for sin() and cos(). These calls use cycles instead of rads as the argument as I find that generally more convenient. They are not to full float32 precision, but more than good enough for e.g. music synthesis. Testing them against volk on the two machines produces: fons@zita1:~/library/ssemath> testsin sinf(): 0.005975 us volk_32f_sin_32f(): 0.003983 us (SSE4.1) sse_fastsin_f32():0.000802 us fons@zita4:~/library/ssemath> testsin sinf(): 0.005324 us volk_32f_sin_32f(): 0.002579 us (AVX2) sse_fastsin_f32():0.001088 us The results for volk are consistent with the volk_profile as above. Harmonic distortion is: Harm Level (dB) 3 -162.0 5 -137.8 7 -128.6 9 -147.3 11 -141.1 13 -138.1 15 -138.1 17 -138.6 19 -139.4 21 -140.5 23 -141.6 25 -142.8 27 -143.6 29 -144.6 31 -145.7 THD: -125.7 Would such routines be useful in volk / gnuradio ? Ciao,
volk questions
Hello all, I've recently installed volk on two machines, both Intel i5 but different types. 1. On one of the two machines 'make' failed while 'cmake ..' ran without errors. Installing boost fixed this. Could it be that the cmake step doesn't check for boost ? 2. On one the two machines, 'volk_profile -R volk_32f_sin_32f' only tests the SSE4 versions even if AVX is supported (as confirmed by profiling e.g. volk_32f_asin_32f), and an AVX implemention is available (it is the selected one on the other machine). Why is this ? 3. The reason I installed volk was to compare it against some SSE4 code I developed for sin() and cos(). These calls use cycles instead of rads as the argument as I find that generally more convenient. They are not to full float32 precision, but more than good enough for e.g. music synthesis. Testing them against volk on the two machines produces: fons@zita1:~/library/ssemath> testsin sinf(): 0.005975 us volk_32f_sin_32f(): 0.003983 us (SSE4.1) sse_fastsin_f32():0.000802 us fons@zita4:~/library/ssemath> testsin sinf(): 0.005324 us volk_32f_sin_32f(): 0.002579 us (AVX2) sse_fastsin_f32():0.001088 us The results for volk are consistent with the volk_profile as above. Harmonic distortion is: Harm Level (dB) 3 -162.0 5 -137.8 7 -128.6 9 -147.3 11 -141.1 13 -138.1 15 -138.1 17 -138.6 19 -139.4 21 -140.5 23 -141.6 25 -142.8 27 -143.6 29 -144.6 31 -145.7 THD: -125.7 Would such routines be useful in volk / gnuradio ? Ciao, -- FA