> > > On Apr 26 20:46:51, b...@comstyle.com wrote:
> > > > Implement SSE2 lrint() and lrintf() on amd64.
> > > Does it make the resampling noticably better/faster?
> > 
> > Playing with the benchmark mentioned in
> > https://github.com/libsndfile/libsamplerate/issues/187
> > suggests that it's going to be *hugely* faster with clang
> 
> what would be a good way to test the actual performance before and after?

OK, here's a naive example: using SRC_LINEAR,
convert 4 hours of silence from 48000 to 44100.

Before:

0m17.23s real     0m11.63s user     0m03.52s system
0m17.03s real     0m11.35s user     0m03.58s system
0m17.55s real     0m11.68s user     0m03.72s system

After:

0m17.98s real     0m12.40s user     0m03.41s system
0m17.98s real     0m12.13s user     0m03.78s system
0m18.10s real     0m12.44s user     0m03.57s system

Same thing with four hours worth of a sine wave:

Before:

0m29.87s real     0m24.28s user     0m03.42s system
0m29.85s real     0m23.79s user     0m03.77s system
0m29.75s real     0m24.28s user     0m03.21s system

After:

0m30.54s real     0m24.91s user     0m03.55s system
0m30.51s real     0m24.70s user     0m03.64s system
0m30.65s real     0m24.91s user     0m03.44s system

This is an amd64 machine using clang 16.0.6
Is my test naive? Am I missing something?
(How much lrintf gets used inside this?)

        Jan



OpenBSD 7.5-current (GENERIC.MP) #34: Sat Apr 27 21:19:57 MDT 2024
    dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8285454336 (7901MB)
avail mem = 8013254656 (7642MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf0100 (36 entries)
bios0: vendor Award Software International, Inc. version "F3" date 03/31/2011
bios0: Gigabyte Technology Co., Ltd. H67MA-USB3-B3
acpi0 at bios0: ACPI 1.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP HPET MCFG ASPT SSPT EUDS MATS TAMG APIC SSDT
acpi0: wakeup devices PCI0(S5) PEX0(S5) PEX1(S5) PEX2(S5) PEX3(S5) PEX4(S5) 
PEX5(S5) PEX6(S5) PEX7(S5) HUB0(S5) UAR1(S3) USBE(S3) USE2(S3) AZAL(S5)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimcfg0 at acpi0
acpimcfg0: addr 0xf4000000, bus 0-63
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.09 MHz, 06-2a-07, patch 
0000002f
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
8-way L2 cache, 8MB 64b/line 16-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.12 MHz, 06-2a-07, patch 
0000002f
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
8-way L2 cache, 8MB 64b/line 16-way L3 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.19 MHz, 06-2a-07, patch 
0000002f
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu2: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
8-way L2 cache, 8MB 64b/line 16-way L3 cache
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.25 MHz, 06-2a-07, patch 
0000002f
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu3: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
8-way L2 cache, 8MB 64b/line 16-way L3 cache
cpu3: smt 0, core 3, package 0
cpu4 at mainbus0: apid 1 (application processor)
cpu4: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.21 MHz, 06-2a-07, patch 
0000002f
cpu4: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu4: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
8-way L2 cache, 8MB 64b/line 16-way L3 cache
cpu4: smt 1, core 0, package 0
cpu5 at mainbus0: apid 3 (application processor)
cpu5: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.33 MHz, 06-2a-07, patch 
0000002f
cpu5: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu5: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
8-way L2 cache, 8MB 64b/line 16-way L3 cache
cpu5: smt 1, core 1, package 0
cpu6 at mainbus0: apid 5 (application processor)
cpu6: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.25 MHz, 06-2a-07, patch 
0000002f
cpu6: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu6: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
8-way L2 cache, 8MB 64b/line 16-way L3 cache
cpu6: smt 1, core 2, package 0
cpu7 at mainbus0: apid 7 (application processor)
cpu7: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.37 MHz, 06-2a-07, patch 
0000002f
cpu7: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu7: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
8-way L2 cache, 8MB 64b/line 16-way L3 cache
cpu7: smt 1, core 3, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins, remapped
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (PEG0)
acpiprt2 at acpi0: bus -1 (PEG1)
acpiprt3 at acpi0: bus 2 (PEX0)
acpiprt4 at acpi0: bus -1 (PEX1)
acpiprt5 at acpi0: bus -1 (PEX2)
acpiprt6 at acpi0: bus -1 (PEX3)
acpiprt7 at acpi0: bus 3 (PEX4)
acpiprt8 at acpi0: bus 4 (PEX5)
acpiprt9 at acpi0: bus 5 (PEX6)
acpiprt10 at acpi0: bus 6 (PEX7)
acpibtn0 at acpi0: PWRB
acpipci0 at acpi0 PCI0
acpicmos0 at acpi0
com0 at acpi0 UAR1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
acpicpu0 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), C1(1000@1 
mwait.1), PSS
acpicpu1 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), C1(1000@1 
mwait.1), PSS
acpicpu2 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), C1(1000@1 
mwait.1), PSS
acpicpu3 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), C1(1000@1 
mwait.1), PSS
acpicpu4 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), C1(1000@1 
mwait.1), PSS
acpicpu5 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), C1(1000@1 
mwait.1), PSS
acpicpu6 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), C1(1000@1 
mwait.1), PSS
acpicpu7 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), C1(1000@1 
mwait.1), PSS
cpu0: using VERW MDS workaround (except on vmm entry)
cpu0: Enhanced SpeedStep 3492 MHz: speeds: 3701, 3700, 3600, 3500, 3400, 3300, 
3200, 3100, 3000, 2900, 2800, 2700, 2600, 2500, 2400, 2300, 2200, 2100, 1600 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 2G Host" rev 0x09
ppb0 at pci0 dev 1 function 0 "Intel Core 2G PCIE" rev 0x09: apic 2 int 16
pci1 at ppb0 bus 1
nvme0 at pci1 dev 0 function 0 vendor "Kingston", unknown product 0x5017 rev 
0x03: apic 2 int 16, NVMe 1.4
nvme0: KINGSTON SNV2S2000G, firmware SBM02103, serial 50026B76861D2EE0
scsibus1 at nvme0: 2 targets, initiator 0
sd0 at scsibus1 targ 1 lun 0: <NVMe, KINGSTON SNV2S20, SBM0>
sd0: 1907729MB, 512 bytes/sector, 3907029168 sectors
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 2000" rev 0x09
drm0 at inteldrm0
inteldrm0: apic 2 int 16, SANDYBRIDGE, gen 6
"Intel 6 Series MEI" rev 0x04 at pci0 dev 22 function 0 not configured
ehci0 at pci0 dev 26 function 0 "Intel 6 Series USB" rev 0x05: apic 2 int 18
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 
addr 1
azalia0 at pci0 dev 27 function 0 "Intel 6 Series HD Audio" rev 0x05: apic 2 
int 22
azalia0: codecs: Realtek ALC889, Intel/0x2805, using Realtek ALC889
audio0 at azalia0
ppb1 at pci0 dev 28 function 0 "Intel 6 Series PCIE" rev 0xb5: apic 2 int 16
pci2 at ppb1 bus 2
ppb2 at pci0 dev 28 function 4 "Intel 6 Series PCIE" rev 0xb5: apic 2 int 16
pci3 at ppb2 bus 3
nvme1 at pci3 dev 0 function 0 vendor "Kingston", unknown product 0x5017 rev 
0x03: apic 2 int 16, NVMe 1.4
nvme1: KINGSTON SNV2S2000G, firmware SBM02103, serial 50026B768680E52A
scsibus2 at nvme1: 2 targets, initiator 0
sd1 at scsibus2 targ 1 lun 0: <NVMe, KINGSTON SNV2S20, SBM0>
sd1: 1907729MB, 512 bytes/sector, 3907029168 sectors
ppb3 at pci0 dev 28 function 5 "Intel 6 Series PCIE" rev 0xb5: apic 2 int 17
pci4 at ppb3 bus 4
nvme2 at pci4 dev 0 function 0 "Intel NVMe" rev 0x03: apic 2 int 17, NVMe 1.3
nvme2: INTEL SSDPEKNW512G8, firmware 002C, serial PHNH9323049T512A
scsibus3 at nvme2: 2 targets, initiator 0
sd2 at scsibus3 targ 1 lun 0: <NVMe, INTEL SSDPEKNW51, 002C>
sd2: 488386MB, 512 bytes/sector, 1000215216 sectors
ppb4 at pci0 dev 28 function 6 "Intel 6 Series PCIE" rev 0xb5: apic 2 int 18
pci5 at ppb4 bus 5
re0 at pci5 dev 0 function 0 "Realtek 8168" rev 0x06: RTL8168E/8111E-VL 
(0x2c80), apic 2 int 18, address 50:e5:49:36:ec:0d
rgephy0 at re0 phy 7: RTL8169S/8110S/8211 PHY, rev. 5
ppb5 at pci0 dev 28 function 7 "Intel 6 Series PCIE" rev 0xb5: apic 2 int 19
pci6 at ppb5 bus 6
xhci0 at pci6 dev 0 function 0 "Etron EJ168 xHCI" rev 0x01: apic 2 int 19, xHCI 
1.0
usb1 at xhci0: USB revision 3.0
uhub1 at usb1 configuration 1 interface 0 "Etron xHCI root hub" rev 3.00/1.00 
addr 1
ehci1 at pci0 dev 29 function 0 "Intel 6 Series USB" rev 0x05: apic 2 int 23
usb2 at ehci1: USB revision 2.0
uhub2 at usb2 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 
addr 1
ppb6 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0xa5
pci7 at ppb6 bus 7
pcib0 at pci0 dev 31 function 0 "Intel H67 LPC" rev 0x05
ahci0 at pci0 dev 31 function 2 "Intel 6 Series AHCI" rev 0x05: apic 2 int 19, 
AHCI 1.3
ahci0: port 0: 6.0Gb/s
ahci0: port 2: 3.0Gb/s
ahci0: port 4: 3.0Gb/s
scsibus4 at ahci0: 32 targets
sd3 at scsibus4 targ 0 lun 0: <ATA, SanDisk SD7SN3Q0, X217> naa.5001b44c52e5d2a1
sd3: 61057MB, 512 bytes/sector, 125045424 sectors, thin
sd4 at scsibus4 targ 2 lun 0: <ATA, ST2000DM008-2FR1, 0001> naa.5000c500c128f41c
sd4: 1907729MB, 512 bytes/sector, 3907029168 sectors, thin
sd5 at scsibus4 targ 4 lun 0: <ATA, KINGSTON SA400S3, S340> naa.50026b7381124414
sd5: 915715MB, 512 bytes/sector, 1875385008 sectors, thin
ichiic0 at pci0 dev 31 function 3 "Intel 6 Series SMBus" rev 0x05: apic 2 int 18
iic0 at ichiic0
spdmem0 at iic0 addr 0x50: 4GB DDR3 SDRAM PC3-12800
spdmem1 at iic0 addr 0x52: 4GB DDR3 SDRAM PC3-12800
isa0 at pcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
it0 at isa0 port 0x2e/2: IT8728F rev 1, EC port 0x290
vmm0 at mainbus0: VMX/EPT
uhub3 at uhub0 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 
2.00/0.00 addr 2
uhidev0 at uhub3 port 5 configuration 1 interface 0 "Logitech USB Keyboard" rev 
1.10/64.00 addr 3
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd0 at ukbd0: console keyboard
uhidev1 at uhub3 port 5 configuration 1 interface 1 "Logitech USB Keyboard" rev 
1.10/64.00 addr 3
uhidev1: iclass 3/0, 3 report ids
ucc0 at uhidev1 reportid 1: 2 usages, 3 keys, enum
wskbd1 at ucc0 mux 1
uhid0 at uhidev1 reportid 2: input=1, output=0, feature=0
ucc1 at uhidev1 reportid 3: 21 usages, 14 keys, enum
wskbd2 at ucc1 mux 1
uhidev2 at uhub3 port 6 configuration 1 interface 0 "Genius Optical Mouse" rev 
1.10/1.00 addr 4
uhidev2: iclass 3/1
ums0 at uhidev2: 3 buttons, Z dir
wsmouse0 at ums0 mux 0
uhub4 at uhub2 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 
2.00/0.00 addr 2
vscsi0 at root
scsibus5 at vscsi0: 256 targets
softraid0 at root
scsibus6 at softraid0: 256 targets
root on sd3a (761a7a05237a5a1d.a) swap on sd3b dump on sd3b
inteldrm0: 1360x768, 32bpp
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation), using wskbd0
wskbd1: connecting to wsdisplay0
wskbd2: connecting to wsdisplay0
wsdisplay0: screen 1-5 added (std, vt100 emulation)
#include <stdlib.h>
#include <math.h>
#include <err.h>

#include <samplerate.h>

int
main()
{
        float irate = 48000;
        float orate = 44100;
        float *signal = NULL;

        long time = 60 * 60 * 4;
        long len = irate * time;

        SRC_DATA data;
        float t;
        long s;

        signal = calloc(len, sizeof(float));
        for (t = 0, s = 0; s < len; t += 1/irate, s++)
                signal[s] = sin(t);
                /*signal[s] = 0;*/

        data.data_in = signal;
        data.data_out = calloc(len, sizeof(float));
        data.src_ratio = (1.0 * orate) / irate;
        data.input_frames = len;
        data.output_frames = len;

        if (src_simple (&data, SRC_LINEAR, 1)) {
/*
See api_misc.md#error-reporting for how to convert
the error value into a text string.
*/
                warnx("rate conversion error");
        }

        if (data.input_frames_used < len)
                warnx("Only %ld < %ld used", data.input_frames_used, len);
        if (data.output_frames_gen < (orate * time))
                warnx("Only %ld < %ld generated",
                        data.output_frames_gen, (long) orate * time);

        return 0;
}

Reply via email to