пт, 25 апр. 2025 г., 22:06 Andrew Randrianasulu <[email protected]>:
> > > чт, 24 апр. 2025 г., 19:54 Andrew Randrianasulu <[email protected]>: > >> >> >> чт, 24 апр. 2025 г., 18:39 Andrew Randrianasulu <[email protected] >> >: >> >>> note, openCL is different to openGL, mostly being more about more >>> accurate computations. >>> >>> On AMD FX4300, 32bit userspace but llvm probably uses avx? >>> >>> >>> guest@slax:/dev/shm/mesa/BUILD$ RUSTICL_ENABLE=llvmpipe clpeak >>> >>> Platform: rusticl >>> Device: llvmpipe (LLVM 20.1.3, 256 bits) >>> Driver version : 25.2.0-devel (git-845611bb43) (Linux x86) >>> Compute units : 8 >>> Clock frequency : 300 MHz >>> >>> Global memory bandwidth (GBPS) >>> float : 3.72 >>> float2 : 4.08 >>> float4 : 3.59 >>> float8 : 2.81 >>> float16 : 2.09 >>> >>> Single-precision compute (GFLOPS) >>> float : 14.67 >>> float2 : 17.86 >>> float4 : 15.99 >>> float8 : 14.72 >>> float16 : 14.63 >>> >>> No half precision support! Skipped >>> >>> No double precision support! Skipped >>> >>> Integer compute (GIOPS) >>> int : 13.89 >>> int2 : 13.25 >>> int4 : 12.85 >>> int8 : 13.04 >>> int16 : 11.51 >>> >>> Integer compute Fast 24bit (GIOPS) >>> int : 13.65 >>> int2 : 13.29 >>> int4 : 13.23 >>> int8 : 12.90 >>> int16 : 11.08 >>> >>> Transfer bandwidth (GBPS) >>> enqueueWriteBuffer : 2.82 >>> enqueueReadBuffer : 1.08 >>> enqueueWriteBuffer non-blocking : 2.89 >>> enqueueReadBuffer non-blocking : 1.02 >>> enqueueMapBuffer(for read) : 1.15 >>> memcpy from mapped ptr : 3.02 >>> enqueueUnmap(after write) : 2.22 >>> memcpy to mapped ptr : 3.01 >>> >>> Kernel launch latency : 21.55 us >>> >>> guest@slax:/dev/shm/mesa/BUILD$ >>> >>> command to build somewhat minimal mesa (llvmpipe + amd): >>> >>> >>> meson ../ --prefix=/usr/X11R7 --libdir=lib --strip --buildtype >>> debugoptimized -Degl=enabled -Dosmesa=true -Dplatforms=x11 >>> -Dgallium-drivers=r600,radeonsi,llvmpipe -Dvulkan-drivers=amd,swrast >>> -Dgallium-nine=true -Dgallium-va=enabled -Dgallium-xa=disabled >>> -Dgallium-rusticl=true -Dllvm=enabled -Drust_std=2021 -Dvideo-codecs="all" >>> >>> of course you can set your own prefix ( I have X installed into >>> non-default location). >>> >>> Biggest obstacle for me was that mesa git require some new llvm, and >>> just released two days ago SPIRV-Tools-2024.4 ! >>> >>> And github "release" is of course broken, in sense you need to manually >>> fetch headers at specific commit. >>> >>> Of course "real gpu" will get like >200 GFLOPS , even my puny GF710 was >>> that fast, but possibility of lock up makes this option less attractive ;) >>> >> >> >> >> but of course real ffmpeg command fail mysteriously: >> >> RUSTICL_ENABLE=llvmpipe ffmpeg -init_hw_device opencl=ocl >> -filter_hw_device ocl -i >> ~/K38_sdcard1/Documents/iPhone11_4K-recorder_59.940HDR10.mov -s 512:384 -r >> 10 -vf >> "format=p010,hwupload,tonemap_opencl=tonemap=mobius:param=0.01:desat=0:r=tv:p=bt709:t=bt709:m=bt709:format=nv12,hwdownload,format=nv12" >> -c:a copy -c:s copy -c:v libx264 -f mp4 /dev/null -debug verbose >> >> ffmpeg: ../src/compiler/nir/nir_metadata.c:172: >> nir_metadata_check_validation_flag: Assertion `!(impl->valid_metadata & >> nir_metadata_not_properly_reset)' failed. >> Aborted >> > > > real hw opencl from RX550 and ffmpeg 7.1.1 works, just with ~5 fps ;) > > ./ffmpeg -hwaccel vaapi -init_hw_device opencl=ocl -filter_hw_device ocl > -i ~/K38_sdcard1/Documents/iPhone11_4K-recorder_59.940HDR10.mov -vf > "format=p010,hwupload,tonemap_opencl=tonemap=mobius:param=0.01:desat=0:r=tv:p=bt709:t=bt709:m=bt709:format=p010,hwdownload,format=p010" > -c:a copy -c:s copy -c:v rawvideo -f avi /dev/null -loglevel verbose > > [out#0/avi @ 0xbe7b000] Starting thread... > frame= 1 fps=0.7 q=-0.0 size= 10KiB time=00:00:00.01 > bitrate=4750.2kbits/s speed=0.0111x > frame= 3 fps=1.5 q=-0.0 size= 32256KiB time=00:00:00.05 > bitrate=5279543.5kbits/s speed=0.025x > frame= 5 fps=2.0 q=-0.0 size= 80896KiB time=00:00:00.08 > bitrate=7944424.2kbits/s speed=0.0334x > frame= 8 fps=2.7 q=-0.0 size= 113408KiB time=00:00:00.13 > bitrate=6960809.3kbits/s speed=0.0445x > frame= 10 fps=2.9 q=-0.0 size= 161792KiB time=00:00:00.18 > bitrate=7222219.5kbits/s speed=0.0524x > frame= 13 fps=3.2 q=-0.0 size= 194304KiB time=00:00:00.23 > bitrate=6814911.2kbits/s speed=0.0584x > frame= 15 fps=3.3 q=-0.0 size= 242944KiB time=00:00:00.26 > bitrate=7455793.2kbits/s speed=0.0593x > frame= 18 fps=3.6 q=-0.0 size= 275456KiB time=00:00:00.31 > bitrate=7118790.4kbits/s speed=0.0634x > frame= 20 fps=3.6 q=-0.0 size= 323840KiB time=00:00:00.35 > bitrate=7572134.4kbits/s speed=0.0637x > frame= 23 fps=3.8 q=-0.0 size= 372480KiB time=00:00:00.40 > bitrate=7620769.6kbits/s speed=0.0667x > frame= 26 fps=4.0 q=-0.0 size= 404992KiB time=00:00:00.45 > bitrate=7365289.1kbits/s speed=0.0693x > frame= 28 fps=4.0 q=-0.0 size= 453632KiB time=00:00:00.48 > bitrate=7680906.9kbits/s speed=0.0691x > frame= 31 fps=4.1 q=-0.0 size= 485888KiB time=00:00:00.53 > bitrate=7455779.2kbits/s speed=0.0711x > frame= 33 fps=4.1 q=-0.0 size= 534528KiB time=00:00:00.56 > bitrate=7719673.2kbits/s speed=0.0709x > frame= 36 fps=4.2 q=-0.0 size= 567040KiB time=00:00:00.61 > bitrate=7525222.1kbits/s speed=0.0726x > frame= 38 fps=4.2 q=-0.0 size= 615680KiB time=00:00:00.65 > bitrate=7751710.7kbits/s speed=0.0723x > frame= 41 fps=4.3 q=-0.0 size= 664320KiB time=00:00:00.70 > bitrate=7766675.4kbits/s speed=0.0737x > frame= 43 fps=4.3 q=-0.0 size= 680448KiB time=00:00:00.73 > bitrate=7593625.7kbits/s speed=0.0734x > frame= 46 fps=4.4 q=-0.0 size= 745216KiB time=00:00:00.78 > bitrate=7785584.9kbits/s speed=0.0746x > frame= 49 fps=4.5 q=-0.0 size= 777728KiB time=00:00:00.83 > bitrate=7637736.5kbits/s speed=0.0758x > frame= 51 fps=4.4 q=-0.0 size= 826368KiB time=00:00:00.86 > bitrate=7803284.3kbits/s speed=0.0754x > frame= 54 fps=4.5 q=-0.0 size= 858624KiB time=00:00:00.91 > bitrate=7665625.7kbits/s speed=0.0764xframe= 56 fps=4.5 q=-0.0 size= > 891136KiB time=00:00:00.95 bitrate=7676729.7kbits/s speed=0.076x frame= > 59 fps=4.5 q=-0.0 size= 955904KiB time=00:00:01.00 > bitrate=7822942.6kbits/s speed=0.077x frame= 61 fps=4.5 q=-0.0 size= > 972032KiB time=00:00:01.03 bitrate=7698318.0kbits/s speed=0.0766xframe= > 64 fps=4.6 q=-0.0 size= 1036800KiB time=00:00:01.08 > bitrate=7832287.4kbits/s speed=0.0774xframe= 67 fps=4.6 q=-0.0 size= > 1069345KiB time=00:00:01.13 bitrate=7721752.5kbits/s speed=0.0782xframe= > 69 fps=4.6 q=-0.0 size= 1117985KiB time=00:00:01.16 > bitrate=7842330.5kbits/s speed=0.0778xframe= 72 fps=4.6 q=-0.0 size= > 1150241KiB time=00:00:01.21 bitrate=7737010.4kbits/s speed=0.0785xframe= > 74 fps=4.6 q=-0.0 size= 1198881KiB time=00:00:01.25 > bitrate=7849136.7kbits/s speed=0.0782xframe= 77 fps=4.7 q=-0.0 size= > 1231393KiB time=00:00:01.30 bitrate=7751917.8kbits/s speed=0.0788xframe= > 79 fps=4.6 q=-0.0 size= 1263649KiB time=00:00:01.33 > bitrate=7756100.8kbits/s speed=0.0785xframe= 82 fps=4.7 q=-0.0 size= > 1328673KiB time=00:00:01.38 bitrate=7860442.5kbits/s speed=0.0791xframe= > 85 fps=4.7 q=-0.0 size= 1360929KiB time=00:00:01.43 > bitrate=7770411.2kbits/s speed=0.0797xframe= 87 fps=4.7 q=-0.0 size= > 1409569KiB time=00:00:01.46 bitrate=7865219.6kbits/s speed=0.0793xframe= > 90 fps=4.7 q=-0.0 size= 1442081KiB time=00:00:01.51 > bitrate=7781358.9kbits/s speed=0.0799xframe= 92 fps=4.7 q=-0.0 size= > 1490465KiB time=00:00:01.55 bitrate=7869477.9kbits/s speed=0.0795xframe= > 95 fps=4.7 q=-0.0 size= 1522977KiB time=00:00:01.60 > bitrate=7789851.9kbits/s speed=0.08x frame= 97 fps=4.7 q=-0.0 size= > 1555489KiB time=00:00:01.63 bitrate=7793775.1kbits/s speed=0.0797xframe= > 100 fps=4.8 q=-0.0 size= 1620257KiB time=00:00:01.68 > bitrate=7877157.6kbits/s speed=0.0802xframe= 103 fps=4.8 q=-0.0 size= > 1652513KiB time=00:00:01.73 bitrate=7802226.5kbits/s speed=0.0807xframe= > 105 fps=4.8 q=-0.0 size= 1701153KiB time=00:00:01.76 > bitrate=7880335.1kbits/s speed=0.0803xframe= 108 fps=4.8 q=-0.0 size= > 1733665KiB time=00:00:01.81 bitrate=7809906.9kbits/s speed=0.0808xframe= > 110 fps=4.8 q=-0.0 size= 1782305KiB time=00:00:01.85 > bitrate=7884354.4kbits/s speed=0.0805xframe= 113 fps=4.8 q=-0.0 size= > 1830945KiB time=00:00:01.90 bitrate=7886377.1kbits/s speed=0.0809xframe= > 115 fps=4.8 q=-0.0 size= 1863201KiB time=00:00:01.93 > bitrate=7886943.6kbits/s speed=0.0806xframe= 118 fps=4.8 q=-0.0 size= > 1911841KiB time=00:00:01.98 bitrate=7888816.1kbits/s speed=0.081x frame= > 120 fps=4.8 q=-0.0 size= 1927969KiB time=00:00:02.01 > bitrate=7823873.9kbits/s speed=0.0807xframe= 123 fps=4.8 q=-0.0 size= > 1992993KiB time=00:00:02.06 bitrate=7892075.9kbits/s speed=0.0811xframe= > 126 fps=4.8 q=-0.0 size= 2025249KiB time=00:00:02.11 > bitrate=7830362.5kbits/s speed=0.0814xframe= 128 fps=4.8 q=-0.0 size= > 2073889KiB time=00:00:02.15 bitrate=7894104.9kbits/s speed=0.0812xframe= > 131 fps=4.8 q=-0.0 size= 2106401KiB time=00:00:02.20 > bitrate=7835635.4kbits/s speed=0.0815xframe= 133 fps=4.8 q=-0.0 size= > 2154809KiB time=00:00:02.23 bitrate=7896071.0kbits/s speed=0.0813xframe= > 136 fps=4.9 q=-0.0 size= 2187321KiB time=00:00:02.28 > bitrate=7839692.4kbits/s speed=0.0816xframe= 138 fps=4.8 q=-0.0 size= > 2235961KiB time=00:00:02.31 bitrate=7898718.1kbits/s speed=0.0813xframe= > 141 fps=4.9 q=-0.0 size= 2284601KiB time=00:00:02.36 > bitrate=7900038.5kbits/s speed=0.0816xframe= 144 fps=4.9 q=-0.0 size= > 2316857KiB time=00:00:02.41 bitrate=7845821.4kbits/s speed=0.082x frame= > 146 fps=4.9 q=-0.0 size= 2365497KiB time=00:00:02.45 > bitrate=7901548.2kbits/s speed=0.0817xframe= 149 fps=4.9 q=-0.0 size= > 2398009KiB time=00:00:02.50 bitrate=7849946.2kbits/s speed=0.082x frame= > 151 fps=4.9 q=-0.0 size= 2446649KiB time=00:00:02.53 > bitrate=7903785.6kbits/s speed=0.0818xframe= 154 fps=4.9 q=-0.0 size= > 2478905KiB time=00:00:02.58 bitrate=7852993.8kbits/s speed=0.0821xframe= > 156 fps=4.9 q=-0.0 size= 2527545KiB time=00:00:02.61 > bitrate=7905082.9kbits/s speed=0.0818xframe= 159 fps=4.9 q=-0.0 size= > 2576185KiB time=00:00:02.66 bitrate=7906135.4kbits/s speed=0.0821xframe= > 161 fps=4.9 q=-0.0 size= 2608697KiB time=00:00:02.70 > bitrate=7907073.1kbits/s speed=0.0819xframe= 164 fps=4.9 q=-0.0 size= > 2657081KiB time=00:00:02.75 bitrate=7907295.6kbits/s speed=0.0821xframe= > 166 fps=4.9 q=-0.0 size= 2673465KiB time=00:00:02.78 > bitrate=7860770.3kbits/s speed=0.0819xframe= 169 fps=4.9 q=-0.0 size= > 2738233KiB time=00:00:02.83 bitrate=7909127.1kbits/s speed=0.0822xframe= > 172 fps=4.9 q=-0.0 size= 2770745KiB time=00:00:02.88 > bitrate=7864254.0kbits/s speed=0.0824x > > > rawvideo here only for testing opencl and decoder alone, without x264 > overhead (can't make opencl operate fully on GPU) > > Not exactly stellar results, but may be GB/S in pcie bandwidth line > report in dmesg mean GigaBITS, not gigabytes? Then it should give just 4 > gigabytes per second, this card only can do 8x and motherboard only PCIE 2.0 > And this line decodes on GPU, pulls it to host, scales to FHD size, uploads to GPU for opencl tonemapping, then downloads back to host for x264 encoding: ./ffmpeg -hwaccel vaapi -init_hw_device opencl=ocl -filter_hw_device ocl -i ~/K38_sdcard1/Documents/iPhone11_4K-recorder_59.940HDR10.mov -vf "scale=1920:1080,format=p010,hwupload,tonemap_opencl=tonemap=mobius:param=0.01:desat=0:r=tv:p=bt709:t=bt709:m=bt709:format=nv12,hwdownload,format=nv12" -c:a copy -c:s copy -c:v libx264 -f mp4 /dev/shm/fhd-sdr-ocl.mp4 speeds up to nearly 7.0 fps! Without hw decoding it goes slower, at may be 5 fps. But may be on 64bit userspace sw decoding will be faster! > >> >>>
-- Cin mailing list [email protected] https://lists.cinelerra-gg.org/mailman/listinfo/cin

