On Tue, Feb 17, 2015 at 07:33:04AM +0000, Tomperi Seppo wrote:
> 
> > On 16 Feb 2015, at 19:54, Michael Niedermayer <mich...@niedermayer.cc> 
> > wrote:
> > 
> > On Mon, Feb 16, 2015 at 12:47:36PM +0000, Tomperi Seppo wrote:
> >> More NEON optimizations for testing. fate-hevc passes on Tegra K1, but 
> >> these haven't been tested for NEON clobbering.
> >> 
> >> -Seppo
> >> 
> >> ________________________________________
> >> From: Tomperi Seppo
> >> Sent: Monday, February 16, 2015 1:30 PM
> >> To: Michael Niedermayer
> >> Cc: Michael Niedermayer; FFmpeg development discussions and patches; 
> >> Mickaël Raulet
> >> Subject: RE: [FFmpeg-devel]  DSP function ARM NEON patches for hevc
> >> 
> >> Hi Michael,
> >> 
> >> Here is a totally shot in a dark fix attempt for NEON register clobbering 
> >> for deblocking. Could you test it with qemu and check if it works.
> >> 
> >> 
> >> -Seppo
> >> 
> >> ________________________________________
> >> From: Michael Niedermayer [mich...@niedermayer.cc]
> >> Sent: Monday, February 16, 2015 3:28 AM
> >> To: Tomperi Seppo
> >> Cc: Michael Niedermayer; FFmpeg development discussions and patches; 
> >> Mickaël Raulet
> >> Subject: Re: [FFmpeg-devel]  DSP function ARM NEON patches for hevc
> >> 
> >> Hi
> >> 
> >> On Sun, Feb 15, 2015 at 08:31:32PM +0000, Tomperi Seppo wrote:
> >>> Hi!
> >>> 
> >>> The reason is chroma deblocking which is using q4 without pushing it to 
> >>> stack. :/
> >>> Unfortunately I am in Geneve this week and don't have ARM linux board 
> >>> with me so it is not easy to test.
> >>> 
> >>> Mickael Raulet: maybe guys at INSA could run tests this week if I make a 
> >>> fix? Could you ask?
> >> 
> >> If they cant, then i probably can test it too if its a patch which
> >> applies cleanly to ffmpeg and testing fate-hevc with
> >> --enable-neon-clobber-test under qemu is what is needed
> >> i could test on a arm board too if needed
> >> 
> >> 
> >>> 
> >>> I also have SAO, qpel and epel NEON patches for latest FFmpeg. They pass 
> >>> fate-hevc on Jetson TK1, but should be iOS and clobber checked.
> >>> 
> >>> -Seppo
> >>> 
> >>> 
> >>> ________________________________________
> >>> From: Michael Niedermayer [michae...@gmx.at]
> >>> Sent: Friday, February 13, 2015 5:38 PM
> >>> To: FFmpeg development discussions and patches
> >>> Cc: Tomperi Seppo; Mickaël Raulet
> >>> Subject: Re: [FFmpeg-devel]  DSP function ARM NEON patches for hevc
> >>> 
> >>> On Thu, Feb 05, 2015 at 02:22:28PM +0100, Mickaël Raulet wrote:
> >>>> Michael,
> >>>> 
> >>>> Please find some commits that can be cherry picked from
> >>>> https://github.com/OpenHEVC/FFmpeg/commits/ffmpeg_patch
> >>>> 
> >>> 
> >>>> Optimized deblocking filter (8bits only)
> >>>> 1b9ee47d2f43b0a029a9468233626102eb1473b8
> >>> 
> >>> this breaks the neon clobber test see:
> >>> fate.ffmpeg.org/report.cgi?time=20150211030204&slot=armv7l-panda-gcc4.6-cortexa8-clobber
> >>> 
> >>> [...]
> >>> --
> >>> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> >>> 
> >>> The worst form of inequality is to try to make unequal things equal.
> >>> -- Aristotle
> >>> 
> >> 
> >> --
> >> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> >> 
> >> Opposition brings concord. Out of discord comes the fairest harmony.
> >> -- Heraclitus
> > 
> >> Makefile            |    3 
> >> hevcdsp_init_neon.c |  159 ++++++++
> >> hevcdsp_qpel_neon.S |  999 
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> 3 files changed, 1160 insertions(+), 1 deletion(-)
> >> 9fb0b3c33edf085845b7a0fba3ca77d1ba55dd6c  
> >> 0001-hevcdsp-ARM-NEON-optimized-qpel-functions.patch
> >> From ce06cb2bea4b051995608b11651b185e7a825a4c Mon Sep 17 00:00:00 2001
> >> From: Seppo Tomperi <seppo.tomp...@vtt.fi>
> >> Date: Wed, 11 Feb 2015 10:20:26 +0000
> >> Subject: [PATCH] hevcdsp: ARM NEON optimized qpel functions
> >> 
> >> ---
> >> libavcodec/arm/Makefile            |   3 +-
> >> libavcodec/arm/hevcdsp_init_neon.c | 159 ++++++
> >> libavcodec/arm/hevcdsp_qpel_neon.S | 999 
> >> +++++++++++++++++++++++++++++++++++++
> >> 3 files changed, 1160 insertions(+), 1 deletion(-)
> >> create mode 100644 libavcodec/arm/hevcdsp_qpel_neon.S
> > 
> > 
> > seems to fail building:
> > 
> >        libavformat/utils.o
> > CC      libavcodec/arm/hevcdsp_init_neon.o
> > AS      libavcodec/arm/hevcdsp_qpel_neon.o
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S: Assembler messages:
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: expected } -- 
> > `vld1.32 {d0[0]d0[1]d1[0]d1[1]},[r2],r3'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad 
> > precision register expected -- `vld1.32 {},[r2],r3'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad 
> > precision register expected -- `vld1.32 {},[r2],r3'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad 
> > precision register expected -- `vld1.32 {},[r2],r3'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: expected } -- 
> > `vst1.32 {d0[0]d0[1]d1[0]d1[1]},[r0],r1'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad 
> > precision register expected -- `vst1.32 {},[r0],r1'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad 
> > precision register expected -- `vst1.32 {},[r0],r1'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad 
> > precision register expected -- `vst1.32 {},[r0],r1'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: expected } -- 
> > `vld1.32 {d1[0]d2},[r2]'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: Neon double or quad 
> > precision register expected -- `vld1.32 {},[r2]'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: expected } -- 
> > `vst1.32 {d1[0]d2},[r0]'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: Neon double or quad 
> > precision register expected -- `vst1.32 {},[r0]'
> > make: *** [libavcodec/arm/hevcdsp_qpel_neon.o] Error 1
> > make: *** Waiting for unfinished jobs....
> > 
> > 
> 
> These macros compiled for me with Jetson TK1 toolchain and with latest GAS 
> preprocessor, so I thought they are finally ok.
> But it looks like passing register lists to macros is not handled well by all 
> preprocessors.

plain "arm-linux-gnueabi-gcc-4.5 (Ubuntu/Linaro 4.5.3-12ubuntu2) 4.5.3"
here, with no preprocessor


> 
> These are quite simple functions copying varying width blocks of pixels using 
> NEON. I could either write out the macros (lots of almost identical 
> functions) or leave the optimisation out totally for now. Or do you have any 
> other ideas?

the following seems to fix it, but i sure do not know why these 2
lines failed while the others do not seem to fail
adding , to all works as well

diff --git a/libavcodec/arm/hevcdsp_qpel_neon.S 
b/libavcodec/arm/hevcdsp_qpel_neon.S
index 14116a6..7b0df2e 100644
--- a/libavcodec/arm/hevcdsp_qpel_neon.S
+++ b/libavcodec/arm/hevcdsp_qpel_neon.S
@@ -989,9 +989,9 @@ function ff_hevc_put_qpel_uw_pixels_w\width\()_neon_8, 
export=1
 endfunc
 .endm

-put_qpel_uw_pixels    4 d0[0] d0[1] d1[0] d1[1]
+put_qpel_uw_pixels    4 d0[0], d0[1], d1[0], d1[1]
 put_qpel_uw_pixels    8 d0 d1 d2 d3
-put_qpel_uw_pixels_m 12 d0 d1[0] d2 d3[0]
+put_qpel_uw_pixels_m 12 d0, d1[0], d2, d3[0]
 put_qpel_uw_pixels   16 q0 q1 q2 q3
 put_qpel_uw_pixels   24 d0-d2 d3-d5 d16-d18 d19-d21
 put_qpel_uw_pixels   32 q0-q1 q2-q3 q8-q9 q10-q11

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Awnsering whenever a program halts or runs forever is
On a turing machine, in general impossible (turings halting problem).
On any real computer, always possible as a real computer has a finite number
of states N, and will either halt in less than N cycles or never halt.

Attachment: signature.asc
Description: Digital signature

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to