> > > On Apr 26 20:46:51, b...@comstyle.com wrote: > > > > Implement SSE2 lrint() and lrintf() on amd64. > > > Does it make the resampling noticably better/faster? > > > > Playing with the benchmark mentioned in > > https://github.com/libsndfile/libsamplerate/issues/187 > > suggests that it's going to be *hugely* faster with clang > > what would be a good way to test the actual performance before and after?
OK, here's a naive example: using SRC_LINEAR, convert 4 hours of silence from 48000 to 44100. Before: 0m17.23s real 0m11.63s user 0m03.52s system 0m17.03s real 0m11.35s user 0m03.58s system 0m17.55s real 0m11.68s user 0m03.72s system After: 0m17.98s real 0m12.40s user 0m03.41s system 0m17.98s real 0m12.13s user 0m03.78s system 0m18.10s real 0m12.44s user 0m03.57s system Same thing with four hours worth of a sine wave: Before: 0m29.87s real 0m24.28s user 0m03.42s system 0m29.85s real 0m23.79s user 0m03.77s system 0m29.75s real 0m24.28s user 0m03.21s system After: 0m30.54s real 0m24.91s user 0m03.55s system 0m30.51s real 0m24.70s user 0m03.64s system 0m30.65s real 0m24.91s user 0m03.44s system This is an amd64 machine using clang 16.0.6 Is my test naive? Am I missing something? (How much lrintf gets used inside this?) Jan OpenBSD 7.5-current (GENERIC.MP) #34: Sat Apr 27 21:19:57 MDT 2024 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 8285454336 (7901MB) avail mem = 8013254656 (7642MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf0100 (36 entries) bios0: vendor Award Software International, Inc. version "F3" date 03/31/2011 bios0: Gigabyte Technology Co., Ltd. H67MA-USB3-B3 acpi0 at bios0: ACPI 1.0 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP HPET MCFG ASPT SSPT EUDS MATS TAMG APIC SSDT acpi0: wakeup devices PCI0(S5) PEX0(S5) PEX1(S5) PEX2(S5) PEX3(S5) PEX4(S5) PEX5(S5) PEX6(S5) PEX7(S5) HUB0(S5) UAR1(S3) USBE(S3) USE2(S3) AZAL(S5) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 14318179 Hz acpimcfg0 at acpi0 acpimcfg0: addr 0xf4000000, bus 0-63 acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.09 MHz, 06-2a-07, patch 0000002f cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 8-way L2 cache, 8MB 64b/line 16-way L3 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.12 MHz, 06-2a-07, patch 0000002f cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 8-way L2 cache, 8MB 64b/line 16-way L3 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.19 MHz, 06-2a-07, patch 0000002f cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu2: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 8-way L2 cache, 8MB 64b/line 16-way L3 cache cpu2: smt 0, core 2, package 0 cpu3 at mainbus0: apid 6 (application processor) cpu3: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.25 MHz, 06-2a-07, patch 0000002f cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu3: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 8-way L2 cache, 8MB 64b/line 16-way L3 cache cpu3: smt 0, core 3, package 0 cpu4 at mainbus0: apid 1 (application processor) cpu4: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.21 MHz, 06-2a-07, patch 0000002f cpu4: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu4: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 8-way L2 cache, 8MB 64b/line 16-way L3 cache cpu4: smt 1, core 0, package 0 cpu5 at mainbus0: apid 3 (application processor) cpu5: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.33 MHz, 06-2a-07, patch 0000002f cpu5: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu5: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 8-way L2 cache, 8MB 64b/line 16-way L3 cache cpu5: smt 1, core 1, package 0 cpu6 at mainbus0: apid 5 (application processor) cpu6: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.25 MHz, 06-2a-07, patch 0000002f cpu6: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu6: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 8-way L2 cache, 8MB 64b/line 16-way L3 cache cpu6: smt 1, core 2, package 0 cpu7 at mainbus0: apid 7 (application processor) cpu7: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.37 MHz, 06-2a-07, patch 0000002f cpu7: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu7: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 8-way L2 cache, 8MB 64b/line 16-way L3 cache cpu7: smt 1, core 3, package 0 ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins, remapped acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 1 (PEG0) acpiprt2 at acpi0: bus -1 (PEG1) acpiprt3 at acpi0: bus 2 (PEX0) acpiprt4 at acpi0: bus -1 (PEX1) acpiprt5 at acpi0: bus -1 (PEX2) acpiprt6 at acpi0: bus -1 (PEX3) acpiprt7 at acpi0: bus 3 (PEX4) acpiprt8 at acpi0: bus 4 (PEX5) acpiprt9 at acpi0: bus 5 (PEX6) acpiprt10 at acpi0: bus 6 (PEX7) acpibtn0 at acpi0: PWRB acpipci0 at acpi0 PCI0 acpicmos0 at acpi0 com0 at acpi0 UAR1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo acpicpu0 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpicpu1 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpicpu2 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpicpu3 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpicpu4 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpicpu5 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpicpu6 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpicpu7 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), C1(1000@1 mwait.1), PSS cpu0: using VERW MDS workaround (except on vmm entry) cpu0: Enhanced SpeedStep 3492 MHz: speeds: 3701, 3700, 3600, 3500, 3400, 3300, 3200, 3100, 3000, 2900, 2800, 2700, 2600, 2500, 2400, 2300, 2200, 2100, 1600 MHz pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel Core 2G Host" rev 0x09 ppb0 at pci0 dev 1 function 0 "Intel Core 2G PCIE" rev 0x09: apic 2 int 16 pci1 at ppb0 bus 1 nvme0 at pci1 dev 0 function 0 vendor "Kingston", unknown product 0x5017 rev 0x03: apic 2 int 16, NVMe 1.4 nvme0: KINGSTON SNV2S2000G, firmware SBM02103, serial 50026B76861D2EE0 scsibus1 at nvme0: 2 targets, initiator 0 sd0 at scsibus1 targ 1 lun 0: <NVMe, KINGSTON SNV2S20, SBM0> sd0: 1907729MB, 512 bytes/sector, 3907029168 sectors inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 2000" rev 0x09 drm0 at inteldrm0 inteldrm0: apic 2 int 16, SANDYBRIDGE, gen 6 "Intel 6 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured ehci0 at pci0 dev 26 function 0 "Intel 6 Series USB" rev 0x05: apic 2 int 18 usb0 at ehci0: USB revision 2.0 uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1 azalia0 at pci0 dev 27 function 0 "Intel 6 Series HD Audio" rev 0x05: apic 2 int 22 azalia0: codecs: Realtek ALC889, Intel/0x2805, using Realtek ALC889 audio0 at azalia0 ppb1 at pci0 dev 28 function 0 "Intel 6 Series PCIE" rev 0xb5: apic 2 int 16 pci2 at ppb1 bus 2 ppb2 at pci0 dev 28 function 4 "Intel 6 Series PCIE" rev 0xb5: apic 2 int 16 pci3 at ppb2 bus 3 nvme1 at pci3 dev 0 function 0 vendor "Kingston", unknown product 0x5017 rev 0x03: apic 2 int 16, NVMe 1.4 nvme1: KINGSTON SNV2S2000G, firmware SBM02103, serial 50026B768680E52A scsibus2 at nvme1: 2 targets, initiator 0 sd1 at scsibus2 targ 1 lun 0: <NVMe, KINGSTON SNV2S20, SBM0> sd1: 1907729MB, 512 bytes/sector, 3907029168 sectors ppb3 at pci0 dev 28 function 5 "Intel 6 Series PCIE" rev 0xb5: apic 2 int 17 pci4 at ppb3 bus 4 nvme2 at pci4 dev 0 function 0 "Intel NVMe" rev 0x03: apic 2 int 17, NVMe 1.3 nvme2: INTEL SSDPEKNW512G8, firmware 002C, serial PHNH9323049T512A scsibus3 at nvme2: 2 targets, initiator 0 sd2 at scsibus3 targ 1 lun 0: <NVMe, INTEL SSDPEKNW51, 002C> sd2: 488386MB, 512 bytes/sector, 1000215216 sectors ppb4 at pci0 dev 28 function 6 "Intel 6 Series PCIE" rev 0xb5: apic 2 int 18 pci5 at ppb4 bus 5 re0 at pci5 dev 0 function 0 "Realtek 8168" rev 0x06: RTL8168E/8111E-VL (0x2c80), apic 2 int 18, address 50:e5:49:36:ec:0d rgephy0 at re0 phy 7: RTL8169S/8110S/8211 PHY, rev. 5 ppb5 at pci0 dev 28 function 7 "Intel 6 Series PCIE" rev 0xb5: apic 2 int 19 pci6 at ppb5 bus 6 xhci0 at pci6 dev 0 function 0 "Etron EJ168 xHCI" rev 0x01: apic 2 int 19, xHCI 1.0 usb1 at xhci0: USB revision 3.0 uhub1 at usb1 configuration 1 interface 0 "Etron xHCI root hub" rev 3.00/1.00 addr 1 ehci1 at pci0 dev 29 function 0 "Intel 6 Series USB" rev 0x05: apic 2 int 23 usb2 at ehci1: USB revision 2.0 uhub2 at usb2 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1 ppb6 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0xa5 pci7 at ppb6 bus 7 pcib0 at pci0 dev 31 function 0 "Intel H67 LPC" rev 0x05 ahci0 at pci0 dev 31 function 2 "Intel 6 Series AHCI" rev 0x05: apic 2 int 19, AHCI 1.3 ahci0: port 0: 6.0Gb/s ahci0: port 2: 3.0Gb/s ahci0: port 4: 3.0Gb/s scsibus4 at ahci0: 32 targets sd3 at scsibus4 targ 0 lun 0: <ATA, SanDisk SD7SN3Q0, X217> naa.5001b44c52e5d2a1 sd3: 61057MB, 512 bytes/sector, 125045424 sectors, thin sd4 at scsibus4 targ 2 lun 0: <ATA, ST2000DM008-2FR1, 0001> naa.5000c500c128f41c sd4: 1907729MB, 512 bytes/sector, 3907029168 sectors, thin sd5 at scsibus4 targ 4 lun 0: <ATA, KINGSTON SA400S3, S340> naa.50026b7381124414 sd5: 915715MB, 512 bytes/sector, 1875385008 sectors, thin ichiic0 at pci0 dev 31 function 3 "Intel 6 Series SMBus" rev 0x05: apic 2 int 18 iic0 at ichiic0 spdmem0 at iic0 addr 0x50: 4GB DDR3 SDRAM PC3-12800 spdmem1 at iic0 addr 0x52: 4GB DDR3 SDRAM PC3-12800 isa0 at pcib0 isadma0 at isa0 pckbc0 at isa0 port 0x60/5 irq 1 irq 12 pcppi0 at isa0 port 0x61 spkr0 at pcppi0 it0 at isa0 port 0x2e/2: IT8728F rev 1, EC port 0x290 vmm0 at mainbus0: VMX/EPT uhub3 at uhub0 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2 uhidev0 at uhub3 port 5 configuration 1 interface 0 "Logitech USB Keyboard" rev 1.10/64.00 addr 3 uhidev0: iclass 3/1 ukbd0 at uhidev0: 8 variable keys, 6 key codes wskbd0 at ukbd0: console keyboard uhidev1 at uhub3 port 5 configuration 1 interface 1 "Logitech USB Keyboard" rev 1.10/64.00 addr 3 uhidev1: iclass 3/0, 3 report ids ucc0 at uhidev1 reportid 1: 2 usages, 3 keys, enum wskbd1 at ucc0 mux 1 uhid0 at uhidev1 reportid 2: input=1, output=0, feature=0 ucc1 at uhidev1 reportid 3: 21 usages, 14 keys, enum wskbd2 at ucc1 mux 1 uhidev2 at uhub3 port 6 configuration 1 interface 0 "Genius Optical Mouse" rev 1.10/1.00 addr 4 uhidev2: iclass 3/1 ums0 at uhidev2: 3 buttons, Z dir wsmouse0 at ums0 mux 0 uhub4 at uhub2 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.00 addr 2 vscsi0 at root scsibus5 at vscsi0: 256 targets softraid0 at root scsibus6 at softraid0: 256 targets root on sd3a (761a7a05237a5a1d.a) swap on sd3b dump on sd3b inteldrm0: 1360x768, 32bpp wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation), using wskbd0 wskbd1: connecting to wsdisplay0 wskbd2: connecting to wsdisplay0 wsdisplay0: screen 1-5 added (std, vt100 emulation)
#include <stdlib.h> #include <math.h> #include <err.h> #include <samplerate.h> int main() { float irate = 48000; float orate = 44100; float *signal = NULL; long time = 60 * 60 * 4; long len = irate * time; SRC_DATA data; float t; long s; signal = calloc(len, sizeof(float)); for (t = 0, s = 0; s < len; t += 1/irate, s++) signal[s] = sin(t); /*signal[s] = 0;*/ data.data_in = signal; data.data_out = calloc(len, sizeof(float)); data.src_ratio = (1.0 * orate) / irate; data.input_frames = len; data.output_frames = len; if (src_simple (&data, SRC_LINEAR, 1)) { /* See api_misc.md#error-reporting for how to convert the error value into a text string. */ warnx("rate conversion error"); } if (data.input_frames_used < len) warnx("Only %ld < %ld used", data.input_frames_used, len); if (data.output_frames_gen < (orate * time)) warnx("Only %ld < %ld generated", data.output_frames_gen, (long) orate * time); return 0; }