Re: volk questions

2023-10-17 Thread Johannes Demel

Hi Fons,

sorry for the abbreviation PR == Pull Request. A request to merge your 
contribution into the VOLK repository in this case. This includes a 
forum to discuss the specifics and possible improvements before we merge 
your code.


Did your Boost issue get solved? Boost was only required for old VOLK 
and environment versions that did not support `std::filesystem` yet. 
Your kernel versions imply you use a more recent system that supports 
this C++17 feature.
For everyone else, libvolk is still a C library. We only use C++ for 
tests etc.


For reference, these are the CPUs
Intel Core i5-3470
https://www.intel.com/content/www/us/en/products/sku/68316/intel-core-i53470-processor-6m-cache-up-to-3-60-ghz/specifications.html
With SSE and AVX

Intel Core i5-4300U
https://www.intel.com/content/www/us/en/products/sku/76308/intel-core-i54300u-processor-3m-cache-up-to-2-90-ghz/specifications.html
With SSE, AVX, and AVX2

I can't tell from a distance, why VOLK would not select AVX kernels on 
an AVX capable CPU.


For your benchmarking needs:
https://github.com/google/benchmark

This might be a very important section of the docs:
https://github.com/google/benchmark/blob/main/docs/user_guide.md#preventing-optimization
Especially `DoNotOptimize` should be interesting.
It could very well happen, that your code was optimized out.

Cheers
Johannes



On 14.10.23 11:02, Fons Adriaensen wrote:

Hi Johannes,

Thanks for your response !


first off, we'd need to know a bit more about your setup. Could you share
the versions of VOLK and your host system, e.g. OS, version, etc.
Furthermore, do you use a VM, a container, or smth like this?


VOLK was 2.5.0, now upgraded to 3.0.0, same results.
No VM, container, etc used.

Machine info:

zita1 (desktop)

fons@zita1:~> lscpu
Architecture:x86_64
   CPU op-mode(s):32-bit, 64-bit
   Address sizes: 36 bits physical, 48 bits virtual
   Byte Order:Little Endian
CPU(s):  4
   On-line CPU(s) list:   0-3
Vendor ID:   GenuineIntel
   Model name:Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
 CPU family:  6
 Model:   58
 Thread(s) per core:  1
 Core(s) per socket:  4
 Socket(s):   1
 Stepping:9
 CPU(s) scaling MHz:  45%
 CPU max MHz: 3600.
 CPU min MHz: 1600.
 BogoMIPS:6387.26
 Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov
  pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm 
pbe syscall
  nx rdtscp lm constant_tsc arch_perfmon pebs bts 
rep_good nopl
  xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq 
dtes64
  monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm 
pcid
  sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes 
xsave avx
  f16c rdrand lahf_lm cpuid_fault epb pti tpr_shadow 
flexp
  riority ept vpid fsgsbase smep erms xsaveopt dtherm 
ida arat
  pln pts vnmi
Virtualization features:
   Virtualization:VT-x
Caches (sum of all):
   L1d:   128 KiB (4 instances)
   L1i:   128 KiB (4 instances)
   L2:1 MiB (4 instances)
   L3:6 MiB (1 instance)
NUMA:
   NUMA node(s):  1
   NUMA node0 CPU(s): 0-3
Vulnerabilities:
   Gather data sampling:  Not affected
   Itlb multihit: KVM: Mitigation: VMX disabled
   L1tf:  Mitigation; PTE Inversion; VMX conditional cache 
flushes, SMT disabled
   Mds:   Vulnerable: Clear CPU buffers attempted, no 
microcode; SMT disabled
   Meltdown:  Mitigation; PTI
   Mmio stale data:   Unknown: No mitigations
   Retbleed:  Not affected
   Spec rstack overflow:  Not affected
   Spec store bypass: Vulnerable
   Spectre v1:Mitigation; usercopy/swapgs barriers and __user 
pointer sanitization
   Spectre v2:Mitigation; Retpolines, STIBP disabled, RSB filling, 
PBRSB-eIBRS Not affected
   Srbds: Vulnerable: No microcode
   Tsx async abort:   Not affected

fons@zita1:~> uname -a
Linux zita1 6.5.5-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 23 Sep 2023 22:55:13 
+ x86_64 GNU/Linux


zita4 (laptop)

Architecture:x86_64
   CPU op-mode(s):32-bit, 64-bit
   Address sizes: 39 bits physical, 48 bits virtual
   Byte Order:Little Endian
CPU(s):  4
   On-line CPU(s) list:   0-3
Vendor ID:   GenuineIntel
   Model name:Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz
 CPU family:  6
 Model:   69
 Thread(s) per core:  2
 Core(s) per socket:  2
 Socket(s):   1
 Stepping:1
 CPU(s) scaling MHz:  46%
 CPU max MHz:   

Re: volk questions

2023-10-14 Thread Fons Adriaensen
Hi Johannes,

Thanks for your response !

> first off, we'd need to know a bit more about your setup. Could you share
> the versions of VOLK and your host system, e.g. OS, version, etc.
> Furthermore, do you use a VM, a container, or smth like this?

VOLK was 2.5.0, now upgraded to 3.0.0, same results.
No VM, container, etc used.

Machine info:

zita1 (desktop)

fons@zita1:~> lscpu
Architecture:x86_64
  CPU op-mode(s):32-bit, 64-bit
  Address sizes: 36 bits physical, 48 bits virtual
  Byte Order:Little Endian
CPU(s):  4
  On-line CPU(s) list:   0-3
Vendor ID:   GenuineIntel
  Model name:Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
CPU family:  6
Model:   58
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):   1
Stepping:9
CPU(s) scaling MHz:  45%
CPU max MHz: 3600.
CPU min MHz: 1600.
BogoMIPS:6387.26
Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov
 pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm 
pbe syscall
 nx rdtscp lm constant_tsc arch_perfmon pebs bts 
rep_good nopl
 xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq 
dtes64
 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm 
pcid
 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes 
xsave avx
 f16c rdrand lahf_lm cpuid_fault epb pti tpr_shadow 
flexp
 riority ept vpid fsgsbase smep erms xsaveopt dtherm 
ida arat
 pln pts vnmi
Virtualization features: 
  Virtualization:VT-x
Caches (sum of all): 
  L1d:   128 KiB (4 instances)
  L1i:   128 KiB (4 instances)
  L2:1 MiB (4 instances)
  L3:6 MiB (1 instance)
NUMA:
  NUMA node(s):  1
  NUMA node0 CPU(s): 0-3
Vulnerabilities: 
  Gather data sampling:  Not affected
  Itlb multihit: KVM: Mitigation: VMX disabled
  L1tf:  Mitigation; PTE Inversion; VMX conditional cache 
flushes, SMT disabled
  Mds:   Vulnerable: Clear CPU buffers attempted, no microcode; 
SMT disabled
  Meltdown:  Mitigation; PTI
  Mmio stale data:   Unknown: No mitigations
  Retbleed:  Not affected
  Spec rstack overflow:  Not affected
  Spec store bypass: Vulnerable
  Spectre v1:Mitigation; usercopy/swapgs barriers and __user 
pointer sanitization
  Spectre v2:Mitigation; Retpolines, STIBP disabled, RSB filling, 
PBRSB-eIBRS Not affected
  Srbds: Vulnerable: No microcode
  Tsx async abort:   Not affected

fons@zita1:~> uname -a
Linux zita1 6.5.5-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 23 Sep 2023 22:55:13 
+ x86_64 GNU/Linux


zita4 (laptop)

Architecture:x86_64
  CPU op-mode(s):32-bit, 64-bit
  Address sizes: 39 bits physical, 48 bits virtual
  Byte Order:Little Endian
CPU(s):  4
  On-line CPU(s) list:   0-3
Vendor ID:   GenuineIntel
  Model name:Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz
CPU family:  6
Model:   69
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):   1
Stepping:1
CPU(s) scaling MHz:  46%
CPU max MHz: 2900.
CPU min MHz: 800.
BogoMIPS:4990.47
Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov
 pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm 
pbe syscall
 nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs 
bts rep_good
 nopl xtopology nons top_tsc cpuid aperfmperf pni 
pclmulqdq dtes64
 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 
xtpr pdcm pcid
 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c 
rdrand lahf_lm
 abm cpuid_fault epb invpcid_single pti tpr_shadow vnmi 
flexpriority
 ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep 
bmi2 erms
 invpcid rtm xsaveopt dtherm ida arat pln pts
Virtualization features: 
  Virtualization:VT-x
Caches (sum of all): 
  L1d:   64 KiB (2 instances)
  L1i:   64 KiB (2 instances)
  L2:512 KiB (2 instances)
  L3:3 MiB (1 instance)
NUMA:
  NUMA node(s):  1
  NUMA node0 CPU(s): 0-3
Vulnerabilities: 
  Itlb multihit: KVM: Mitigation: VMX disabled
  L1tf:  Mitigation; PTE Inversion; VMX conditional cache 
flushes, SMT vulnerable
  Mds:  

Re: volk questions

2023-10-14 Thread Johannes Demel

Hi Fons,

first off, we'd need to know a bit more about your setup. Could you 
share the versions of VOLK and your host system, e.g. OS, version, etc. 
Furthermore, do you use a VM, a container, or smth like this?


Regarding your question, if these functions may be useful to VOLK / GNU 
Radio. I'd say yes. We'd have to figure out how this may work in 
practice though. I'd suggest to start with a PR.


Cheers
Johannes

On 13.10.23 16:08, Fons Adriaensen wrote:

Hello all,

I've recently installed volk on two machines, both Intel i5
but different types.

1. On one of the two machines 'make' failed while 'cmake ..'
ran without errors. Installing boost fixed this. Could it be
that the cmake step doesn't check for boost ?

2. On one the two machines, 'volk_profile -R volk_32f_sin_32f'
only tests the SSE4 versions even if AVX is supported (as
confirmed by profiling e.g. volk_32f_asin_32f), and an AVX
implemention is available (it is the selected one on the
other machine). Why is this ?

3. The reason I installed volk was to compare it against
some SSE4 code I developed for sin() and cos(). These calls
use cycles instead of rads as the argument as I find that
generally more convenient. They are not to full float32
precision, but more than good enough for e.g. music
synthesis. Testing them against volk on the two machines
produces:

fons@zita1:~/library/ssemath> testsin
sinf():   0.005975 us
volk_32f_sin_32f():   0.003983 us (SSE4.1)
sse_fastsin_f32():0.000802 us

fons@zita4:~/library/ssemath> testsin
sinf():   0.005324 us
volk_32f_sin_32f():   0.002579 us (AVX2)
sse_fastsin_f32():0.001088 us

The results for volk are consistent with the volk_profile
as above.

Harmonic distortion is:

  Harm  Level (dB)
 3 -162.0
 5 -137.8
 7 -128.6
 9 -147.3
11 -141.1
13 -138.1
15 -138.1
17 -138.6
19 -139.4
21 -140.5
23 -141.6
25 -142.8
27 -143.6
29 -144.6
31 -145.7
THD:  -125.7

Would such routines be useful in volk / gnuradio ?

Ciao,





volk questions

2023-10-13 Thread Fons Adriaensen
Hello all,

I've recently installed volk on two machines, both Intel i5
but different types.

1. On one of the two machines 'make' failed while 'cmake ..'
ran without errors. Installing boost fixed this. Could it be
that the cmake step doesn't check for boost ?

2. On one the two machines, 'volk_profile -R volk_32f_sin_32f'
only tests the SSE4 versions even if AVX is supported (as
confirmed by profiling e.g. volk_32f_asin_32f), and an AVX
implemention is available (it is the selected one on the
other machine). Why is this ?

3. The reason I installed volk was to compare it against
some SSE4 code I developed for sin() and cos(). These calls
use cycles instead of rads as the argument as I find that
generally more convenient. They are not to full float32
precision, but more than good enough for e.g. music 
synthesis. Testing them against volk on the two machines
produces:

fons@zita1:~/library/ssemath> testsin
sinf():   0.005975 us
volk_32f_sin_32f():   0.003983 us (SSE4.1)
sse_fastsin_f32():0.000802 us

fons@zita4:~/library/ssemath> testsin
sinf():   0.005324 us
volk_32f_sin_32f():   0.002579 us (AVX2)
sse_fastsin_f32():0.001088 us

The results for volk are consistent with the volk_profile
as above.

Harmonic distortion is:

 Harm  Level (dB)
3 -162.0
5 -137.8
7 -128.6
9 -147.3
   11 -141.1
   13 -138.1
   15 -138.1
   17 -138.6
   19 -139.4
   21 -140.5
   23 -141.6
   25 -142.8
   27 -143.6
   29 -144.6
   31 -145.7
THD:  -125.7

Would such routines be useful in volk / gnuradio ?

Ciao,

-- 
FA