Hi Martin!
> On 10 Feb 2021, at 22:53, Martin Storsjö wrote:
>
> +.macro idct_16x16 bitdepth
> +function ff_hevc_idct_16x16_\bitdepth\()_neon, export=1
> +//r0 - coeffs
> +mov x15, lr
> +
Binutils doesn't recognize "lr" as alias for x30
>>> It didn’t
Hi Reimar,
On Sat, 16 Jan 2021, Martin Storsjö wrote:
+.macro idct_16x16 bitdepth
+function ff_hevc_idct_16x16_\bitdepth\()_neon, export=1
+//r0 - coeffs
+mov x15, lr
+
Binutils doesn't recognize "lr" as alias for x30
It didn’t have an issue in the Debian unstable VM?
Th
On Sat, 16 Jan 2021, Reimar Döffinger wrote:
On 15 Jan 2021, at 23:55, Martin Storsjö wrote:
On Tue, 12 Jan 2021, reimar.doeffin...@gmx.de wrote:
create mode 100644 libavcodec/aarch64/hevcdsp_idct_neon.S
create mode 100644 libavcodec/aarch64/hevcdsp_init_aarch64.c
This patch fails checkasm
> On 15 Jan 2021, at 23:55, Martin Storsjö wrote:
>
> On Tue, 12 Jan 2021, reimar.doeffin...@gmx.de wrote:
>
>> create mode 100644 libavcodec/aarch64/hevcdsp_idct_neon.S
>> create mode 100644 libavcodec/aarch64/hevcdsp_init_aarch64.c
>
> This patch fails checkasm
Fixed, one mis-translated co
From: Reimar Döffinger
Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth
available on aarch64.
For a UHD HDR (10 bit) sample video these were consuming the most time
and this optimization reduced overall decode time from 19.4s to 16.4s,
approximately 15% speedup.
Test sample was the
On Tue, 12 Jan 2021, reimar.doeffin...@gmx.de wrote:
From: Reimar Döffinger
Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth
available on aarch64.
For a UHD HDR (10 bit) sample video these were consuming the most time
and this optimization reduced overall decode time from 19.4s
On Tue, 12 Jan 2021, Reimar Döffinger wrote:
On 12 Jan 2021, at 13:24, Josh Dekker wrote:
Hi,
AS libavcodec/aarch64/hevcdsp_idct_neon.o
libavcodec/aarch64/hevcdsp_idct_neon.S: Assembler messages:
libavcodec/aarch64/hevcdsp_idct_neon.S:418: Error: operand mismatch -- `mov
v29.4S,v28.4S'
> On 12 Jan 2021, at 13:24, Josh Dekker wrote:
>
> Hi,
>
> On 2021-01-08 21:36, reimar.doeffin...@gmx.de wrote:
>> From: Reimar Döffinger
>> Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth
>> available on aarch64.
>> For a UHD HDR (10 bit) sample video these were consuming the
From: Reimar Döffinger
Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth
available on aarch64.
For a UHD HDR (10 bit) sample video these were consuming the most time
and this optimization reduced overall decode time from 19.4s to 16.4s,
approximately 15% speedup.
Test sample was the
Hi,
On 2021-01-08 21:36, reimar.doeffin...@gmx.de wrote:
From: Reimar Döffinger
Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth
available on aarch64.
For a UHD HDR (10 bit) sample video these were consuming the most time
and this optimization reduced overall decode time from 19
From: Reimar Döffinger
Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth
available on aarch64.
For a UHD HDR (10 bit) sample video these were consuming the most time
and this optimization reduced overall decode time from 19.4s to 16.4s,
approximately 15% speedup.
Test sample was the
11 matches
Mail list logo