[FFmpeg-devel] [PATCH] avcodec/libtwolame: fix mono default bitrate
As of libtwolame 0.4.0, 384 kbps is not accepted as a valid bitrate for encoding mono audio and the maximum bitrate is now halved to 192 kbps to comply with the MP2 standard. Example error: twolame_init_params(): 384kbps is an invalid bitrate for mono encoding. Adjust the default bitrate calculation to take this into account. Signed-off-by: James Cowgill --- libavcodec/libtwolame.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/libavcodec/libtwolame.c b/libavcodec/libtwolame.c index 030f88868f..5ceb3d9f3f 100644 --- a/libavcodec/libtwolame.c +++ b/libavcodec/libtwolame.c @@ -78,8 +78,12 @@ static av_cold int twolame_encode_init(AVCodecContext *avctx) twolame_set_in_samplerate(s->glopts, avctx->sample_rate); twolame_set_out_samplerate(s->glopts, avctx->sample_rate); -if (!avctx->bit_rate) -avctx->bit_rate = avctx->sample_rate < 28000 ? 16 : 384000; +if (!avctx->bit_rate) { +if ((s->mode == TWOLAME_AUTO_MODE && avctx->channels == 1) || s->mode == TWOLAME_MONO) +avctx->bit_rate = avctx->sample_rate < 28000 ? 8 : 192000; +else +avctx->bit_rate = avctx->sample_rate < 28000 ? 16 : 384000; +} if (avctx->flags & AV_CODEC_FLAG_QSCALE || !avctx->bit_rate) { twolame_set_VBR(s->glopts, TRUE); -- 2.24.0.rc1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v2] avcodec/arm/sbcenc: avoid callee preserved vfp registers
When compiling FFmpeg with GCC-9, some very random segfaults were observed in code which had previously called down into the SBC encoder NEON assembly routines. This was caused by these functions clobbering some of the vfp callee saved registers (d8 - d15 aka q4 - q7). GCC was using these registers to save local variables, but after these functions returned, they would contain garbage. Fix by reallocating the registers in the two affected functions in the following way: ff_sbc_analyze_4_neon: q2-q5 => q8-q11, then q1-q4 => q8-q11 ff_sbc_analyze_8_neon: q2-q9 => q8-q15 The reason for using these replacements is to keep closely related sets of registers consecutively numbered which hopefully makes the code more easy to follow. Since this commit only reallocates registers, it should have no performance impact. Signed-off-by: James Cowgill --- On 29/07/2019 19:59, Reimar Döffinger wrote: > Seems sensible to me, though extra points if you or someone has numbers on > performance impact. > To know whether it would be worthwhile to check if it can be optimized... Sorry for the long delay - been on various holidays. I did a few tests on my original patch and overall it was about 2% slower than before. In any case I think this new patch is a better solution (although the diff is a lot larger). We don't actually need that many registers in either of these functions, so instead of pushing the clobbered callee saved registers, we can reallocate all the registers to avoid them in the first place. This way there is no performance impact. I couldn't find any tests for this encoder, but I have tested a few audio samples with it and verified the output is identical to what t was before (and with what I get on x86). libavcodec/arm/sbcdsp_neon.S | 220 +-- 1 file changed, 110 insertions(+), 110 deletions(-) diff --git a/libavcodec/arm/sbcdsp_neon.S b/libavcodec/arm/sbcdsp_neon.S index d83d21d202..914abfb6cc 100644 --- a/libavcodec/arm/sbcdsp_neon.S +++ b/libavcodec/arm/sbcdsp_neon.S @@ -38,49 +38,49 @@ function ff_sbc_analyze_4_neon, export=1 /* TODO: merge even and odd cases (or even merge all four calls to this * function) in order to have only aligned reads from 'in' array * and reduce number of load instructions */ -vld1.16 {d4, d5}, [r0, :64]! -vld1.16 {d8, d9}, [r2, :128]! +vld1.16 {d16, d17}, [r0, :64]! +vld1.16 {d20, d21}, [r2, :128]! -vmull.s16 q0, d4, d8 -vld1.16 {d6, d7}, [r0, :64]! -vmull.s16 q1, d5, d9 -vld1.16 {d10, d11}, [r2, :128]! +vmull.s16 q0, d16, d20 +vld1.16 {d18, d19}, [r0, :64]! +vmull.s16 q1, d17, d21 +vld1.16 {d22, d23}, [r2, :128]! -vmlal.s16 q0, d6, d10 -vld1.16 {d4, d5}, [r0, :64]! -vmlal.s16 q1, d7, d11 -vld1.16 {d8, d9}, [r2, :128]! +vmlal.s16 q0, d18, d22 +vld1.16 {d16, d17}, [r0, :64]! +vmlal.s16 q1, d19, d23 +vld1.16 {d20, d21}, [r2, :128]! -vmlal.s16 q0, d4, d8 -vld1.16 {d6, d7}, [r0, :64]! -vmlal.s16 q1, d5, d9 -vld1.16 {d10, d11}, [r2, :128]! +vmlal.s16 q0, d16, d20 +vld1.16 {d18, d19}, [r0, :64]! +vmlal.s16 q1, d17, d21 +vld1.16 {d22, d23}, [r2, :128]! -vmlal.s16 q0, d6, d10 -vld1.16 {d4, d5}, [r0, :64]! -vmlal.s16 q1, d7, d11 -vld1.16 {d8, d9}, [r2, :128]! +vmlal.s16 q0, d18, d22 +vld1.16 {d16, d17}, [r0, :64]! +vmlal.s16 q1, d19, d23 +vld1.16 {d20, d21}, [r2, :128]! -vmlal.s16 q0, d4, d8 -vmlal.s16 q1, d5, d9 +vmlal.s16 q0, d16, d20 +vmlal.s16 q1, d17, d21 vpadd.s32 d0, d0, d1 vpadd.s32 d1, d2, d3 vrshrn.s32 d0, q0, SBC_PROTO_FIXED_SCALE -vld1.16 {d2, d3, d4, d5}, [r2, :128]! +vld1.16 {d16, d17, d18, d19}, [r2, :128]! vdup.i32d1, d0[1] /* TODO: can be eliminated */ vdup.i32d0, d0[0] /* TODO: can be eliminated */ -vmull.s16 q3, d2, d0 -vmull.s16 q4, d3, d0 -vmlal.s16 q3, d4, d1 -vmlal.s16 q4, d5, d1 +vmull.s16 q10, d16, d0 +vmull.s16 q11, d17, d0 +vmlal.s16 q10, d18, d1 +vmlal.s16 q11, d19, d1 -vpadd.s32 d0, d6, d7 /* TODO: can be eliminated */ -vpadd.s32 d1, d8, d9 /* TODO: can be eliminated */ +vpadd.s32 d0, d20, d21 /* TODO: can be eliminated */ +vpadd.s32 d1, d22, d23 /* TODO: can be eliminated */
[FFmpeg-devel] [PATCH] avcodec/arm/sbcenc: save callee preserved vfp registers
When compiling FFmpeg with GCC-9, some very random segfaults were observed in code which had previously called down into the SBC encoder NEON assembly routines. This was caused by these functions clobbering some of the vfp callee saved registers (d8 - d15 aka q4 - q7). GCC was using these registers to save local variables, but after these functions returned, they would contain garbage. Fix by saving the relevant registers on the stack in the affected functions. Signed-off-by: James Cowgill --- libavcodec/arm/sbcdsp_neon.S | 6 ++ 1 file changed, 6 insertions(+) diff --git a/libavcodec/arm/sbcdsp_neon.S b/libavcodec/arm/sbcdsp_neon.S index d83d21d202..aa03800096 100644 --- a/libavcodec/arm/sbcdsp_neon.S +++ b/libavcodec/arm/sbcdsp_neon.S @@ -38,6 +38,8 @@ function ff_sbc_analyze_4_neon, export=1 /* TODO: merge even and odd cases (or even merge all four calls to this * function) in order to have only aligned reads from 'in' array * and reduce number of load instructions */ +vpush {d8-d11} + vld1.16 {d4, d5}, [r0, :64]! vld1.16 {d8, d9}, [r2, :128]! @@ -84,6 +86,7 @@ function ff_sbc_analyze_4_neon, export=1 vst1.32 {d0, d1}, [r1, :128] +vpop{d8-d11} bx lr endfunc @@ -91,6 +94,8 @@ function ff_sbc_analyze_8_neon, export=1 /* TODO: merge even and odd cases (or even merge all four calls to this * function) in order to have only aligned reads from 'in' array * and reduce number of load instructions */ +vpush {d8-d15} + vld1.16 {d4, d5}, [r0, :64]! vld1.16 {d8, d9}, [r2, :128]! @@ -188,6 +193,7 @@ function ff_sbc_analyze_8_neon, export=1 vst1.32 {d0, d1, d2, d3}, [r1, :128] +vpop{d8-d15} bx lr endfunc -- 2.22.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] fate/hap : add test for hap encoding
On 23/04/18 10:11, Carl Eugen Hoyos wrote: > 2018-03-14 7:31 GMT+01:00, Martin Vignali : > >> In that case we can let the test using "none" >> compression (bypass the snappy part) > > These tests are also broken, please fix or > remove them: > https://buildd.debian.org/status/fetch.php?pkg=ffmpeg=i386=7%3A4.0-1=152218=0 > ("Error 1") I've had a brief look at this error (and a similar error on s390x) and it looks like a float rounding issue in some of the functions in libavc/texturedspenc.c. The output from the hap encoder is only different by a few bits. i386 fails because it promotes floats to long double when evaluating (at least on Debian which has SSE disabled), and s390x fails because it promotes floats to doubles. I think they're the only architectures which promote floats (although some have not built yet). I'll probably just ignore these tests for now since I'm not sure what the best solution is. James ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avformat/libssh: check the user provided a password before trying to use it
Hi, On 11/06/17 18:47, jamrial at gmail.com (James Almer) wrote: > Fixes ticket #6413 > > Signed-off-by: James Almer > --- > The public key authentication also tries to use the password variable. I > don't know if NULL is valid in that case or not. > Perhaps for that one it would be better to replace the current usage of > legacy API instead. > > libavformat/libssh.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Please can this patch be applied to the stable branches. Someone using Debian stable (3.2.9) reported it: https://bugs.debian.org/886912 Commit 8ddb6820bd52df6ed616abc3d8be200b126aa8c1 applied to 3.4. Thanks, James > diff --git a/libavformat/libssh.c b/libavformat/libssh.c > index 49e92e7516..9e3d4da45e 100644 > --- a/libavformat/libssh.c > +++ b/libavformat/libssh.c > @@ -103,7 +103,7 @@ static av_cold int libssh_authentication(LIBSSHContext > *libssh, const char *user > } > } > > -if (!authorized && (auth_methods & SSH_AUTH_METHOD_PASSWORD)) { > +if (!authorized && password && (auth_methods & > SSH_AUTH_METHOD_PASSWORD)) { > if (ssh_userauth_password(libssh->session, NULL, password) == > SSH_AUTH_SUCCESS) { > av_log(libssh, AV_LOG_DEBUG, "Authentication successful with > password.\n"); > authorized = 1; > signature.asc Description: OpenPGP digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] avformat/dashenc: fix min_seg_duration option size
In the DASHContext structure, min_seg_duration is declared as an int, but the AVOption list claimed it was an INT64. Change the option list to use the correct size, which should fix some initialization errors seen on big-endian platforms. Signed-off-by: James Cowgill <jcowg...@debian.org> --- libavformat/dashenc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavformat/dashenc.c b/libavformat/dashenc.c index d5554d1df0..ddad3351fd 100644 --- a/libavformat/dashenc.c +++ b/libavformat/dashenc.c @@ -1181,7 +1181,7 @@ static const AVOption options[] = { { "adaptation_sets", "Adaptation sets. Syntax: id=0,streams=0,1,2 id=1,streams=3,4 and so on", OFFSET(adaptation_sets), AV_OPT_TYPE_STRING, { 0 }, 0, 0, AV_OPT_FLAG_ENCODING_PARAM }, { "window_size", "number of segments kept in the manifest", OFFSET(window_size), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INT_MAX, E }, { "extra_window_size", "number of segments kept outside of the manifest before removing from disk", OFFSET(extra_window_size), AV_OPT_TYPE_INT, { .i64 = 5 }, 0, INT_MAX, E }, -{ "min_seg_duration", "minimum segment duration (in microseconds)", OFFSET(min_seg_duration), AV_OPT_TYPE_INT64, { .i64 = 500 }, 0, INT_MAX, E }, +{ "min_seg_duration", "minimum segment duration (in microseconds)", OFFSET(min_seg_duration), AV_OPT_TYPE_INT, { .i64 = 500 }, 0, INT_MAX, E }, { "remove_at_exit", "remove all segments when finished", OFFSET(remove_at_exit), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, E }, { "use_template", "Use SegmentTemplate instead of SegmentList", OFFSET(use_template), AV_OPT_TYPE_BOOL, { .i64 = 1 }, 0, 1, E }, { "use_timeline", "Use SegmentTimeline in SegmentTemplate", OFFSET(use_timeline), AV_OPT_TYPE_BOOL, { .i64 = 1 }, 0, 1, E }, -- 2.15.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc: reset codec on receiving packet after EOF in compat_decode
Hi, On 09/11/17 14:02, Hendrik Leppkes wrote: > On Thu, Nov 9, 2017 at 1:21 PM, James Cowgill <jcowg...@debian.org> wrote: >> In commit 061a0c14bb57 ("decode: restructure the core decoding code"), the >> deprecated avcodec_decode_* APIs were reworked so that they called into the >> new avcodec_send_packet / avcodec_receive_frame API. This had the side effect >> of prohibiting sending new packets containing data after a drain >> packet, but in previous versions of FFmpeg this "worked" and some >> applications relied on it. >> >> To restore some compatibility, reset the codec if we receive a new non-drain >> packet using the old API after draining has completed. While this does >> not give the same behaviour as the old API did, in the majority of cases >> it works and it does not require changes to any other part of the decoding >> code. >> >> Fixes ticket #6775 >> Signed-off-by: James Cowgill <jcowg...@debian.org> >> --- >> libavcodec/decode.c | 5 + >> 1 file changed, 5 insertions(+) >> >> diff --git a/libavcodec/decode.c b/libavcodec/decode.c >> index 86fe5aef52..2f1932fa85 100644 >> --- a/libavcodec/decode.c >> +++ b/libavcodec/decode.c >> @@ -726,6 +726,11 @@ static int compat_decode(AVCodecContext *avctx, AVFrame >> *frame, >> >> av_assert0(avci->compat_decode_consumed == 0); >> >> +if (avci->draining_done && pkt && pkt->size != 0) { >> +av_log(avctx, AV_LOG_WARNING, "Got unexpected packet after EOF\n"); >> +avcodec_flush_buffers(avctx); >> +} >> + > > I don't think this is a good idea. Draining and not flushing > afterwards is a bug in the calling code, and even before recent > changes it would result in inconsistent behavior and even crashes > (with select decoders). I am fully aware that this will only trigger if the calling code is buggy. I am trying to avoid silent breakage of those applications doing this when upgrading to ffmpeg 3.4. I was looking at the documentation of avcodec_decode_* recently because of this and I had some trouble deciding if using the API this way was incorrect. I expect the downstreams affected thought that what they were doing was fine and then got angry when ffmpeg suddenly "broke" their code. This patch at least allows some sort of "transitional period" until downstreams update. From the perspective of Debian, I could either apply this patch to ffmpeg, or I would have to go through over 100 reverse dependencies to see if they abuse the API and then fix them. I currently know of two (gst-libav1.0 and kodi), but there could be more - especially within less used packages. Thanks, James ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] lavc: reset codec on receiving packet after EOF in compat_decode
In commit 061a0c14bb57 ("decode: restructure the core decoding code"), the deprecated avcodec_decode_* APIs were reworked so that they called into the new avcodec_send_packet / avcodec_receive_frame API. This had the side effect of prohibiting sending new packets containing data after a drain packet, but in previous versions of FFmpeg this "worked" and some applications relied on it. To restore some compatibility, reset the codec if we receive a new non-drain packet using the old API after draining has completed. While this does not give the same behaviour as the old API did, in the majority of cases it works and it does not require changes to any other part of the decoding code. Fixes ticket #6775 Signed-off-by: James Cowgill <jcowg...@debian.org> --- libavcodec/decode.c | 5 + 1 file changed, 5 insertions(+) diff --git a/libavcodec/decode.c b/libavcodec/decode.c index 86fe5aef52..2f1932fa85 100644 --- a/libavcodec/decode.c +++ b/libavcodec/decode.c @@ -726,6 +726,11 @@ static int compat_decode(AVCodecContext *avctx, AVFrame *frame, av_assert0(avci->compat_decode_consumed == 0); +if (avci->draining_done && pkt && pkt->size != 0) { +av_log(avctx, AV_LOG_WARNING, "Got unexpected packet after EOF\n"); +avcodec_flush_buffers(avctx); +} + *got_frame = 0; avci->compat_decode = 1; -- 2.15.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH v2] avcodec/arm: Fix SIGBUS on ARM when compiled with binutils 2.29
In binutils 2.29, the behavior of the ADR instruction changed so that 1 is added to the address of a Thumb function (previously nothing was added). This allows the loaded address to be passed to a BLX instruction and the correct mode change will occur. So that the behavior matches in binutils 2.29 and pre-2.29, use .eqv to pre-calculate the function address without the automatic +1 fixup. Then use these new symbols as the function addresses to be loaded. Fixes ticket 6571. Related binutils bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21458 Signed-off-by: James Cowgill <jcowg...@debian.org> --- v2: Forgot to include the "avcodec/arm" commit message prefix. libavcodec/arm/h264idct_neon.S | 28 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/libavcodec/arm/h264idct_neon.S b/libavcodec/arm/h264idct_neon.S index 4f68bdb9f5..04b1ea583b 100644 --- a/libavcodec/arm/h264idct_neon.S +++ b/libavcodec/arm/h264idct_neon.S @@ -20,6 +20,18 @@ #include "libavutil/arm/asm.S" +# In binutils 2.29, the behavior of the ADR instruction changed so that 1 is +# added to the address of a Thumb function (previously nothing was added). +# +# These .eqv are used to pre-calculate the correct address with +CONFIG_THUMB so +# that ADR will work with both old and new versions binutils. +# +# See: https://sourceware.org/bugzilla/show_bug.cgi?id=21458 +.eqv eqv_ff_h264_idct_add_neon, X(ff_h264_idct_add_neon) + CONFIG_THUMB +.eqv eqv_ff_h264_idct_dc_add_neon, X(ff_h264_idct_dc_add_neon) + CONFIG_THUMB +.eqv eqv_ff_h264_idct8_add_neon,X(ff_h264_idct8_add_neon) + CONFIG_THUMB +.eqv eqv_ff_h264_idct8_dc_add_neon, X(ff_h264_idct8_dc_add_neon) + CONFIG_THUMB + function ff_h264_idct_add_neon, export=1 vld1.64 {d0-d3}, [r1,:128] vmov.i16q15, #0 @@ -113,8 +125,8 @@ function ff_h264_idct_add16_neon, export=1 movne lr, #0 cmp lr, #0 ite ne -adrne lr, X(ff_h264_idct_dc_add_neon) + CONFIG_THUMB -adreq lr, X(ff_h264_idct_add_neon)+ CONFIG_THUMB +adrne lr, eqv_ff_h264_idct_dc_add_neon +adreq lr, eqv_ff_h264_idct_add_neon blx lr 2: subsip, ip, #1 add r1, r1, #32 @@ -138,8 +150,8 @@ function ff_h264_idct_add16intra_neon, export=1 cmp r8, #0 ldrsh r8, [r1] iteet ne -adrne lr, X(ff_h264_idct_add_neon)+ CONFIG_THUMB -adreq lr, X(ff_h264_idct_dc_add_neon) + CONFIG_THUMB +adrne lr, eqv_ff_h264_idct_add_neon +adreq lr, eqv_ff_h264_idct_dc_add_neon cmpeq r8, #0 blxne lr subsip, ip, #1 @@ -166,8 +178,8 @@ function ff_h264_idct_add8_neon, export=1 cmp r8, #0 ldrsh r8, [r1] iteet ne -adrne lr, X(ff_h264_idct_add_neon)+ CONFIG_THUMB -adreq lr, X(ff_h264_idct_dc_add_neon) + CONFIG_THUMB +adrne lr, eqv_ff_h264_idct_add_neon +adreq lr, eqv_ff_h264_idct_dc_add_neon cmpeq r8, #0 blxne lr add r12, r12, #1 @@ -388,8 +400,8 @@ function ff_h264_idct8_add4_neon, export=1 movne lr, #0 cmp lr, #0 ite ne -adrne lr, X(ff_h264_idct8_dc_add_neon) + CONFIG_THUMB -adreq lr, X(ff_h264_idct8_add_neon)+ CONFIG_THUMB +adrne lr, eqv_ff_h264_idct8_dc_add_neon +adreq lr, eqv_ff_h264_idct8_add_neon blx lr 2: subsr12, r12, #4 add r1, r1, #128 -- 2.14.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] Fix SIGBUS on ARM when compiled with binutils 2.29
In binutils 2.29, the behavior of the ADR instruction changed so that 1 is added to the address of a Thumb function (previously nothing was added). This allows the loaded address to be passed to a BLX instruction and the correct mode change will occur. So that the behavior matches in binutils 2.29 and pre-2.29, use .eqv to pre-calculate the function address without the automatic +1 fixup. Then use these new symbols as the function addresses to be loaded. See: https://sourceware.org/bugzilla/show_bug.cgi?id=21458 Fixes ticket 6571. Signed-off-by: James Cowgill <jcowg...@debian.org> --- libavcodec/arm/h264idct_neon.S | 28 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/libavcodec/arm/h264idct_neon.S b/libavcodec/arm/h264idct_neon.S index 4f68bdb9f5..04b1ea583b 100644 --- a/libavcodec/arm/h264idct_neon.S +++ b/libavcodec/arm/h264idct_neon.S @@ -20,6 +20,18 @@ #include "libavutil/arm/asm.S" +# In binutils 2.29, the behavior of the ADR instruction changed so that 1 is +# added to the address of a Thumb function (previously nothing was added). +# +# These .eqv are used to pre-calculate the correct address with +CONFIG_THUMB so +# that ADR will work with both old and new versions binutils. +# +# See: https://sourceware.org/bugzilla/show_bug.cgi?id=21458 +.eqv eqv_ff_h264_idct_add_neon, X(ff_h264_idct_add_neon) + CONFIG_THUMB +.eqv eqv_ff_h264_idct_dc_add_neon, X(ff_h264_idct_dc_add_neon) + CONFIG_THUMB +.eqv eqv_ff_h264_idct8_add_neon,X(ff_h264_idct8_add_neon) + CONFIG_THUMB +.eqv eqv_ff_h264_idct8_dc_add_neon, X(ff_h264_idct8_dc_add_neon) + CONFIG_THUMB + function ff_h264_idct_add_neon, export=1 vld1.64 {d0-d3}, [r1,:128] vmov.i16q15, #0 @@ -113,8 +125,8 @@ function ff_h264_idct_add16_neon, export=1 movne lr, #0 cmp lr, #0 ite ne -adrne lr, X(ff_h264_idct_dc_add_neon) + CONFIG_THUMB -adreq lr, X(ff_h264_idct_add_neon)+ CONFIG_THUMB +adrne lr, eqv_ff_h264_idct_dc_add_neon +adreq lr, eqv_ff_h264_idct_add_neon blx lr 2: subsip, ip, #1 add r1, r1, #32 @@ -138,8 +150,8 @@ function ff_h264_idct_add16intra_neon, export=1 cmp r8, #0 ldrsh r8, [r1] iteet ne -adrne lr, X(ff_h264_idct_add_neon)+ CONFIG_THUMB -adreq lr, X(ff_h264_idct_dc_add_neon) + CONFIG_THUMB +adrne lr, eqv_ff_h264_idct_add_neon +adreq lr, eqv_ff_h264_idct_dc_add_neon cmpeq r8, #0 blxne lr subsip, ip, #1 @@ -166,8 +178,8 @@ function ff_h264_idct_add8_neon, export=1 cmp r8, #0 ldrsh r8, [r1] iteet ne -adrne lr, X(ff_h264_idct_add_neon)+ CONFIG_THUMB -adreq lr, X(ff_h264_idct_dc_add_neon) + CONFIG_THUMB +adrne lr, eqv_ff_h264_idct_add_neon +adreq lr, eqv_ff_h264_idct_dc_add_neon cmpeq r8, #0 blxne lr add r12, r12, #1 @@ -388,8 +400,8 @@ function ff_h264_idct8_add4_neon, export=1 movne lr, #0 cmp lr, #0 ite ne -adrne lr, X(ff_h264_idct8_dc_add_neon) + CONFIG_THUMB -adreq lr, X(ff_h264_idct8_add_neon)+ CONFIG_THUMB +adrne lr, eqv_ff_h264_idct8_dc_add_neon +adreq lr, eqv_ff_h264_idct8_add_neon blx lr 2: subsr12, r12, #4 add r1, r1, #128 -- 2.14.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH v2] swscale: fix gbrap16 alpha channel issues
Fixes filter-pixfmts-scale test failing on big-endian systems due to alpSrc not being cast to (const int32_t**). Also fixes distortions in the output alpha channel values by copying the alpha channel code from the rgba64 case found elsewhere in output.c. Fixes ticket 6555. Signed-off-by: James Cowgill <james.cowg...@imgtec.com> --- v2 Move declaration of A inside the loop and don't bother initializing it since the initial value would never be read. libswscale/output.c | 16 tests/ref/fate/filter-pixfmts-scale | 4 ++-- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/libswscale/output.c b/libswscale/output.c index 9774e9f327..f30bce8dd3 100644 --- a/libswscale/output.c +++ b/libswscale/output.c @@ -2026,24 +2026,24 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t *lumFilter, const int16_t **lumSrcx, int lumFilterSize, const int16_t *chrFilter, const int16_t **chrUSrcx, const int16_t **chrVSrcx, int chrFilterSize, -const int16_t **alpSrc, uint8_t **dest, +const int16_t **alpSrcx, uint8_t **dest, int dstW, int y) { const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(c->dstFormat); int i; -int hasAlpha = (desc->flags & AV_PIX_FMT_FLAG_ALPHA) && alpSrc; +int hasAlpha = (desc->flags & AV_PIX_FMT_FLAG_ALPHA) && alpSrcx; uint16_t **dest16 = (uint16_t**)dest; const int32_t **lumSrc = (const int32_t**)lumSrcx; const int32_t **chrUSrc = (const int32_t**)chrUSrcx; const int32_t **chrVSrc = (const int32_t**)chrVSrcx; -int A = 0; // init to silence warning +const int32_t **alpSrc = (const int32_t**)alpSrcx; for (i = 0; i < dstW; i++) { int j; int Y = -0x4000; int U = -(128 << 23); int V = -(128 << 23); -int R, G, B; +int R, G, B, A; for (j = 0; j < lumFilterSize; j++) Y += lumSrc[j][i] * (unsigned)lumFilter[j]; @@ -2059,13 +2059,13 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t *lumFilter, V >>= 14; if (hasAlpha) { -A = 1 << 18; +A = -0x4000; for (j = 0; j < lumFilterSize; j++) A += alpSrc[j][i] * lumFilter[j]; -if (A & 0xF800) -A = av_clip_uintp2(A, 27); +A >>= 1; +A += 0x20002000; } Y -= c->yuv2rgb_y_offset; @@ -2083,7 +2083,7 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t *lumFilter, dest16[1][i] = B >> 14; dest16[2][i] = R >> 14; if (hasAlpha) -dest16[3][i] = A >> 11; +dest16[3][i] = av_clip_uintp2(A, 30) >> 14; } if ((!isBE(c->dstFormat)) != (!HAVE_BIGENDIAN)) { for (i = 0; i < dstW; i++) { diff --git a/tests/ref/fate/filter-pixfmts-scale b/tests/ref/fate/filter-pixfmts-scale index 9b601b71da..dcc34bd4d1 100644 --- a/tests/ref/fate/filter-pixfmts-scale +++ b/tests/ref/fate/filter-pixfmts-scale @@ -23,8 +23,8 @@ gbrap10be 6d89abb9248006c3e9017545e9474654 gbrap10le cf974e23f485a10740f5de74a5c8c3df gbrap12be 1d9b57766ba9c2192403f43967cb9af0 gbrap12le bb1ba1c157717db3dd612a76d38a018e -gbrap16be 81542b96575d1fe3b239d23899f5ece3 -gbrap16le 6feb8b9da131917abe867e0eaaf07b90 +gbrap16be c72b935a6e57a8e1c37bff08c2db55b1 +gbrap16le 13eb0e62b1ac9c1c86c81521eaefab5f gbrpdc3387f925f972c61aae7eb23cdc19f0 gbrp10be0277d4c3a8498d75e2783fb81379e481 gbrp10lef3d70f8ab845c3c9b8f7452e4a6e285a -- 2.13.3 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] swscale: fix gbrap16 alpha channel issues
Hi, On 02/08/17 23:21, Michael Niedermayer wrote: > On Wed, Aug 02, 2017 at 03:32:04PM +0100, James Cowgill wrote: >> Hi, >> >> On 02/08/17 14:18, Michael Niedermayer wrote: >>> On Tue, Aug 01, 2017 at 02:46:22PM +0100, James Cowgill wrote: >>>> Fixes filter-pixfmts-scale test failing on big-endian systems due to >>>> alpSrc not being cast to (const int32_t**). >>>> >>>> Also fixes distortions in the output alpha channel values by copying the >>>> alpha channel code from the rgba64 case found elsewhere in output.c. >>>> >>>> Fixes ticket 6555. >>>> >>>> Signed-off-by: James Cowgill <james.cowg...@imgtec.com> >>>> --- >>>> libswscale/output.c | 15 --- >>>> tests/ref/fate/filter-pixfmts-scale | 4 ++-- >>>> 2 files changed, 10 insertions(+), 9 deletions(-) >>>> >>>> diff --git a/libswscale/output.c b/libswscale/output.c >>>> index 9774e9f327..8e5ec0a256 100644 >>>> --- a/libswscale/output.c >>>> +++ b/libswscale/output.c >>>> @@ -2026,17 +2026,18 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t >>>> *lumFilter, >>>> const int16_t **lumSrcx, int lumFilterSize, >>>> const int16_t *chrFilter, const int16_t **chrUSrcx, >>>> const int16_t **chrVSrcx, int chrFilterSize, >>>> -const int16_t **alpSrc, uint8_t **dest, >>>> +const int16_t **alpSrcx, uint8_t **dest, >>>> int dstW, int y) >>>> { >>>> const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(c->dstFormat); >>>> int i; >>>> -int hasAlpha = (desc->flags & AV_PIX_FMT_FLAG_ALPHA) && alpSrc; >>>> +int hasAlpha = (desc->flags & AV_PIX_FMT_FLAG_ALPHA) && alpSrcx; >>>> uint16_t **dest16 = (uint16_t**)dest; >>>> const int32_t **lumSrc = (const int32_t**)lumSrcx; >>>> const int32_t **chrUSrc = (const int32_t**)chrUSrcx; >>>> const int32_t **chrVSrc = (const int32_t**)chrVSrcx; >>>> -int A = 0; // init to silence warning >>>> +const int32_t **alpSrc = (const int32_t**)alpSrcx; >>> >>>> +int A = 0x << 14; >>> >>> unused value >> >> The initial value of A is unused in the old code, but not in the new code. > > IIRC all uses are under hasAlpha and it is writen to in that case first Sorry, you're right. I think I was looking at the code from yuv2rgba64. I'll send a v2. Thanks, James ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] swscale: fix gbrap16 alpha channel issues
Hi, On 02/08/17 14:18, Michael Niedermayer wrote: > On Tue, Aug 01, 2017 at 02:46:22PM +0100, James Cowgill wrote: >> Fixes filter-pixfmts-scale test failing on big-endian systems due to >> alpSrc not being cast to (const int32_t**). >> >> Also fixes distortions in the output alpha channel values by copying the >> alpha channel code from the rgba64 case found elsewhere in output.c. >> >> Fixes ticket 6555. >> >> Signed-off-by: James Cowgill <james.cowg...@imgtec.com> >> --- >> libswscale/output.c | 15 --- >> tests/ref/fate/filter-pixfmts-scale | 4 ++-- >> 2 files changed, 10 insertions(+), 9 deletions(-) >> >> diff --git a/libswscale/output.c b/libswscale/output.c >> index 9774e9f327..8e5ec0a256 100644 >> --- a/libswscale/output.c >> +++ b/libswscale/output.c >> @@ -2026,17 +2026,18 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t >> *lumFilter, >> const int16_t **lumSrcx, int lumFilterSize, >> const int16_t *chrFilter, const int16_t **chrUSrcx, >> const int16_t **chrVSrcx, int chrFilterSize, >> -const int16_t **alpSrc, uint8_t **dest, >> +const int16_t **alpSrcx, uint8_t **dest, >> int dstW, int y) >> { >> const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(c->dstFormat); >> int i; >> -int hasAlpha = (desc->flags & AV_PIX_FMT_FLAG_ALPHA) && alpSrc; >> +int hasAlpha = (desc->flags & AV_PIX_FMT_FLAG_ALPHA) && alpSrcx; >> uint16_t **dest16 = (uint16_t**)dest; >> const int32_t **lumSrc = (const int32_t**)lumSrcx; >> const int32_t **chrUSrc = (const int32_t**)chrUSrcx; >> const int32_t **chrVSrc = (const int32_t**)chrVSrcx; >> -int A = 0; // init to silence warning >> +const int32_t **alpSrc = (const int32_t**)alpSrcx; > >> +int A = 0x << 14; > > unused value The initial value of A is unused in the old code, but not in the new code. >> >> for (i = 0; i < dstW; i++) { >> int j; >> @@ -2059,13 +2060,13 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t >> *lumFilter, >> V >>= 14; >> >> if (hasAlpha) { >> -A = 1 << 18; >> +A = -0x4000; > > where does this value come from ? > it looks copy and pasted from luma, but alpha does not have a black > level offset as its not luminance I confess I only know the basics of how these functions work. On the basis that yuv2gbrp_full_X_c looks like it copies yuv2rgb_X_c_template, and I would have thought the rgb and gbr cases should be similar, I copied a number of things from yuv2rgba64_full_X_c_template into this function. That value and all of the modifications inside the for loop come from there. >> >> for (j = 0; j < lumFilterSize; j++) >> A += alpSrc[j][i] * lumFilter[j]; >> >> -if (A & 0xF800) >> -A = av_clip_uintp2(A, 27); >> +A >>= 1; >> +A += 0x20002000; >> } >> >> Y -= c->yuv2rgb_y_offset; >> @@ -2083,7 +2084,7 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t >> *lumFilter, >> dest16[1][i] = B >> 14; >> dest16[2][i] = R >> 14; >> if (hasAlpha) >> -dest16[3][i] = A >> 11; >> +dest16[3][i] = av_clip_uintp2(A, 30) >> 14; > > why do you move the cliping code here, this seems unneeded > outside the removed if() This is where the clipping code in yuv2rgba64_full_X_c_template is, and in that function, the value of A is not clipped - only the value stored in dest. Thanks, James ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] swscale: fix gbrap16 alpha channel issues
Fixes filter-pixfmts-scale test failing on big-endian systems due to alpSrc not being cast to (const int32_t**). Also fixes distortions in the output alpha channel values by copying the alpha channel code from the rgba64 case found elsewhere in output.c. Fixes ticket 6555. Signed-off-by: James Cowgill <james.cowg...@imgtec.com> --- libswscale/output.c | 15 --- tests/ref/fate/filter-pixfmts-scale | 4 ++-- 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/libswscale/output.c b/libswscale/output.c index 9774e9f327..8e5ec0a256 100644 --- a/libswscale/output.c +++ b/libswscale/output.c @@ -2026,17 +2026,18 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t *lumFilter, const int16_t **lumSrcx, int lumFilterSize, const int16_t *chrFilter, const int16_t **chrUSrcx, const int16_t **chrVSrcx, int chrFilterSize, -const int16_t **alpSrc, uint8_t **dest, +const int16_t **alpSrcx, uint8_t **dest, int dstW, int y) { const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(c->dstFormat); int i; -int hasAlpha = (desc->flags & AV_PIX_FMT_FLAG_ALPHA) && alpSrc; +int hasAlpha = (desc->flags & AV_PIX_FMT_FLAG_ALPHA) && alpSrcx; uint16_t **dest16 = (uint16_t**)dest; const int32_t **lumSrc = (const int32_t**)lumSrcx; const int32_t **chrUSrc = (const int32_t**)chrUSrcx; const int32_t **chrVSrc = (const int32_t**)chrVSrcx; -int A = 0; // init to silence warning +const int32_t **alpSrc = (const int32_t**)alpSrcx; +int A = 0x << 14; for (i = 0; i < dstW; i++) { int j; @@ -2059,13 +2060,13 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t *lumFilter, V >>= 14; if (hasAlpha) { -A = 1 << 18; +A = -0x4000; for (j = 0; j < lumFilterSize; j++) A += alpSrc[j][i] * lumFilter[j]; -if (A & 0xF800) -A = av_clip_uintp2(A, 27); +A >>= 1; +A += 0x20002000; } Y -= c->yuv2rgb_y_offset; @@ -2083,7 +2084,7 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t *lumFilter, dest16[1][i] = B >> 14; dest16[2][i] = R >> 14; if (hasAlpha) -dest16[3][i] = A >> 11; +dest16[3][i] = av_clip_uintp2(A, 30) >> 14; } if ((!isBE(c->dstFormat)) != (!HAVE_BIGENDIAN)) { for (i = 0; i < dstW; i++) { diff --git a/tests/ref/fate/filter-pixfmts-scale b/tests/ref/fate/filter-pixfmts-scale index 9b601b71da..dcc34bd4d1 100644 --- a/tests/ref/fate/filter-pixfmts-scale +++ b/tests/ref/fate/filter-pixfmts-scale @@ -23,8 +23,8 @@ gbrap10be 6d89abb9248006c3e9017545e9474654 gbrap10le cf974e23f485a10740f5de74a5c8c3df gbrap12be 1d9b57766ba9c2192403f43967cb9af0 gbrap12le bb1ba1c157717db3dd612a76d38a018e -gbrap16be 81542b96575d1fe3b239d23899f5ece3 -gbrap16le 6feb8b9da131917abe867e0eaaf07b90 +gbrap16be c72b935a6e57a8e1c37bff08c2db55b1 +gbrap16le 13eb0e62b1ac9c1c86c81521eaefab5f gbrpdc3387f925f972c61aae7eb23cdc19f0 gbrp10be0277d4c3a8498d75e2783fb81379e481 gbrp10lef3d70f8ab845c3c9b8f7452e4a6e285a -- 2.13.3 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] mips/float_dsp: fix vector_fmul_window_mips on mips64
Commit dfa920807494 (mips/float_dsp: fix a bug in vector_fmul_window_mips) fixed vector_fmul_window_mips by unrolling the loop only 4 times, but also removed the outer C loop and replaced it with assembly branches and pointer arithmetic. When submitting my 64-bit porting patch I missed this new assembly which also needed porting. This patch fixes a bus error in the fate-float-dsp test when run on 64-bit mips. Signed-off-by: James Cowgill james...@cowgill.org.uk Cc: Nedeljko Babic nedeljko.ba...@imgtec.com --- libavutil/mips/float_dsp_mips.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libavutil/mips/float_dsp_mips.c b/libavutil/mips/float_dsp_mips.c index a455687..b3a812c 100644 --- a/libavutil/mips/float_dsp_mips.c +++ b/libavutil/mips/float_dsp_mips.c @@ -188,10 +188,10 @@ static void vector_fmul_window_mips(float *dst, const float *src0, lwc1%[wj3], -12(%[win_j])\n\t lwc1%[s0], 8(%[src0_i])\n\t lwc1%[s01],12(%[src0_i]) \n\t -addiu %[src1_j],-16 \n\t -addiu %[win_i], 16 \n\t -addiu %[win_j], -16 \n\t -addiu %[src0_i], 16 \n\t +PTR_ADDIU %[src1_j],-16\n\t +PTR_ADDIU %[win_i],16 \n\t +PTR_ADDIU %[win_j],-16 \n\t +PTR_ADDIU %[src0_i],16 \n\t swc1%[temp], 0(%[dst_i]) \n\t /* dst[i] = s0*wj - s1*wi; */ swc1%[temp1], 0(%[dst_j]) \n\t /* dst[j] = s0*wi + s1*wj; */ swc1%[temp2], 4(%[dst_i]) \n\t /* dst[i+1] = s01*wj1 - s11*wi1; */ @@ -208,8 +208,8 @@ static void vector_fmul_window_mips(float *dst, const float *src0, swc1%[temp1], -8(%[dst_j]) \n\t /* dst[j-2] = s0*wi2 + s1*wj2; */ swc1%[temp2], 12(%[dst_i])\n\t /* dst[i+2] = s01*wj3 - s11*wi3; */ swc1%[temp3], -12(%[dst_j])\n\t /* dst[j-3] = s01*wi3 + s11*wj3; */ -addiu %[dst_i], 16 \n\t -addiu %[dst_j], -16 \n\t +PTR_ADDIU %[dst_i],16 \n\t +PTR_ADDIU %[dst_j],-16 \n\t bne %[win_i], %[lp_end], 1b\n\t : [temp]=f(temp), [temp1]=f(temp1), [temp2]=f(temp2), [temp3]=f(temp3), [src0_i]+r(src0_i), [win_i]+r(win_i), -- 2.1.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH v3] mips/asmdefs: use _ABI64 as defined by gcc
Unfortunately android api 21 (lollipop) doesn't have the sgidefs.h header, the easiest way around this is to just use the preprocessor definitions from gcc / clang. Signed-off-by: James Cowgill james...@cowgill.org.uk --- Hi, Sorry I forgot about this a little. I think that doing it this way is better than messing around with different headers which may not exist. I know it works on GCC and Clang. Thanks, James libavutil/mips/asmdefs.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/libavutil/mips/asmdefs.h b/libavutil/mips/asmdefs.h index a3a5ee3..fdf82a0 100644 --- a/libavutil/mips/asmdefs.h +++ b/libavutil/mips/asmdefs.h @@ -27,9 +27,7 @@ #ifndef AVUTIL_MIPS_ASMDEFS_H #define AVUTIL_MIPS_ASMDEFS_H -#include sgidefs.h - -#if _MIPS_SIM == _ABI64 +#if defined(_ABI64) _MIPS_SIM == _ABI64 # define PTRSIZE 8 # define PTRLOG 3 # define PTR_ADDU daddu -- 2.1.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] mips/asmdefs: use asm/sgidefs.h header on linux
On Sat, 2015-03-07 at 18:06 +0100, wm4 wrote: On Sat, 7 Mar 2015 10:13:23 + James Cowgill james...@cowgill.org.uk wrote: Unfortunately android api 21 (lollipop) doesn't have the sgidefs.h header, but the linux kernel does in asm/sgidefs.h. So use that header if we can. Change _ABI64 to _MIPS_SIM_ABI64 which is defined in both headers. What does this header contain? Requiring kernel headers for anything but Linux specific syscalls or for building kernel modules is incredibly broken. Yes the correct header on mips is just 'sgidefs.h' and while glibc has provided it for years, android bionic only added it for lollipop. This is the kernel header: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/mips/include/uapi/asm/sgidefs.h The one provided by glibc has a little more stuff but we don't need it. _MIPS_SIM is defined by GCC (and some older mips compilers) to be equal to one of the _MIPS_SIM_* constants depending on which ABI is selected. GCC and Clang also define _ABI* themselves (as well as being defined in the glibc version of the header) for the current ABI, so I suppose using this without including anything might work if we don't care about other compilers: #if defined(_ABI64) _MIPS_SIM == _ABI64 And __linux__ is of course completely out of the question. Just because it's Linux, the libc doesn't necessarily provide kernel headers. Ok James ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] mips/asmdefs: change include guard to read AVUTIL_ instead of AVCODEC_
Signed-off-by: James Cowgill james...@cowgill.org.uk --- libavutil/mips/asmdefs.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavutil/mips/asmdefs.h b/libavutil/mips/asmdefs.h index 3660e98..04c036e 100644 --- a/libavutil/mips/asmdefs.h +++ b/libavutil/mips/asmdefs.h @@ -24,8 +24,8 @@ * assembly (rather than from within .s files). */ -#ifndef AVCODEC_MIPS_ASMDEFS_H -#define AVCODEC_MIPS_ASMDEFS_H +#ifndef AVUTIL_MIPS_ASMDEFS_H +#define AVUTIL_MIPS_ASMDEFS_H #ifdef __linux__ #include asm/sgidefs.h -- 2.1.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH v2 3/4] mips: port optimizations to mips n64
On Sat, 2015-03-07 at 10:15 +0100, Michael Niedermayer wrote: On Sat, Mar 07, 2015 at 02:47:51AM -0300, James Almer wrote: On 05/03/15 2:40 PM, James Cowgill wrote: diff --git a/libavutil/mips/asmdefs.h b/libavutil/mips/asmdefs.h new file mode 100644 index 000..4d2922c --- /dev/null +++ b/libavutil/mips/asmdefs.h @@ -0,0 +1,48 @@ +/* + * Copyright (c) 2015 Imagination Technologies Ltd + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/** + * @file + * MIPS assembly defines from sys/asm.h but rewritten for use with C inline + * assembly (rather than from within .s files). + */ + +#ifndef AVCODEC_MIPS_ASMDEFS_H +#define AVCODEC_MIPS_ASMDEFS_H + +#include sgidefs.h + +#if _MIPS_SIM == _ABI64 This broke compilation with Android NDK r8 (Which apparently doesn't support mips 64 bits). http://fate.ffmpeg.org/report.cgi?time=20150307052927slot=mipsel-android-gcc-4.4 CC libavutil/mips/float_dsp_mips.o mipsel-linux-android-gcc-4.4.3: unrecognized option '-pthreads' In file included from /home/fate/fate/src/libavcodec/mips/aacdec_mips.h:61, from /home/fate/fate/src/libavcodec/aacdec.c:113: /home/fate/fate/src/libavutil/mips/asmdefs.h:30:21: error: sgidefs.h: No such file or directory /home/fate/fate/src/libavutil/mips/asmdefs.h:32:18: warning: _ABI64 is not defined make: *** [libavcodec/aacdec.o] Error 1 Lovely. It looks like sgidefs.h was only added in lollipop (even though it's been everywhere else for years): https://github.com/android/platform_bionic/commit/1c2cf23a0c54619e7a362e1b82b0fb37ec9dd11a But there's still asm/sgidefs.h defined in the linux kernel headers we can use instead (give me a moment). James ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] mips/asmdefs: use asm/sgidefs.h header on linux
Unfortunately android api 21 (lollipop) doesn't have the sgidefs.h header, but the linux kernel does in asm/sgidefs.h. So use that header if we can. Change _ABI64 to _MIPS_SIM_ABI64 which is defined in both headers. Signed-off-by: James Cowgill james...@cowgill.org.uk --- libavutil/mips/asmdefs.h | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/libavutil/mips/asmdefs.h b/libavutil/mips/asmdefs.h index 4d2922c..3660e98 100644 --- a/libavutil/mips/asmdefs.h +++ b/libavutil/mips/asmdefs.h @@ -27,9 +27,13 @@ #ifndef AVCODEC_MIPS_ASMDEFS_H #define AVCODEC_MIPS_ASMDEFS_H +#ifdef __linux__ +#include asm/sgidefs.h +#else #include sgidefs.h +#endif -#if _MIPS_SIM == _ABI64 +#if _MIPS_SIM == _MIPS_SIM_ABI64 # define PTRSIZE 8 # define PTRLOG 3 # define PTR_ADDU daddu -- 2.1.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH v2] mips/asmdefs: use asm/sgidefs.h header on linux
Unfortunately android api 21 (lollipop) doesn't have the sgidefs.h header, but the linux kernel does have an almost equivalent asm/sgidefs.h which will do so use that header if we can. Change _ABI64 to _MIPS_SIM_ABI64 which is defined in both headers. Signed-off-by: James Cowgill james...@cowgill.org.uk --- configure| 4 libavutil/mips/asmdefs.h | 6 +- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/configure b/configure index 1ea2032..a5ff67c 100755 --- a/configure +++ b/configure @@ -1642,6 +1642,7 @@ HEADERS_LIST= alsa_asoundlib_h altivec_h arpa_inet_h +asm_sgidefs_h asm_types_h cdio_paranoia_h cdio_paranoia_paranoia_h @@ -4570,6 +4571,9 @@ EOF elif enabled mips; then +check_header asm/sgidefs.h || check_header sgidefs.h || \ +die either asm/sgidefs.h or sgidefs.h is required on mips + check_inline_asm loongson 'dmult.g $1, $2, $3' # Enable minimum ISA based on selected options diff --git a/libavutil/mips/asmdefs.h b/libavutil/mips/asmdefs.h index a3a5ee3..0e911cb 100644 --- a/libavutil/mips/asmdefs.h +++ b/libavutil/mips/asmdefs.h @@ -27,9 +27,13 @@ #ifndef AVUTIL_MIPS_ASMDEFS_H #define AVUTIL_MIPS_ASMDEFS_H +#if HAVE_ASM_SGIDEFS_H +#include asm/sgidefs.h +#else #include sgidefs.h +#endif -#if _MIPS_SIM == _ABI64 +#if _MIPS_SIM == _MIPS_SIM_ABI64 # define PTRSIZE 8 # define PTRLOG 3 # define PTR_ADDU daddu -- 2.1.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH v2] mips/asmdefs: use asm/sgidefs.h header on linux
On Sat, 2015-03-07 at 13:32 +0100, Michael Niedermayer wrote: On Sat, Mar 07, 2015 at 10:56:45AM +, James Cowgill wrote: Unfortunately android api 21 (lollipop) doesn't have the sgidefs.h header, but the linux kernel does have an almost equivalent asm/sgidefs.h which will do so use that header if we can. Change _ABI64 to _MIPS_SIM_ABI64 which is defined in both headers. Signed-off-by: James Cowgill james...@cowgill.org.uk tryng to build for androidmips: In file included from ffmpeg/libavcodec/mips/mpegaudiodsp_mips_float.c:58: ffmpeg/libavutil/mips/asmdefs.h:30:5: warning: HAVE_ASM_SGIDEFS_H is not defined ffmpeg/libavutil/mips/asmdefs.h:33:21: error: sgidefs.h: No such file or directory ffmpeg/libavutil/mips/asmdefs.h:36:18: warning: _MIPS_SIM_ABI64 is not defined i guess you are missing a include config.h in this Woops sorry - I don't have access to any mips machines at the weekend so it wasn't tested a huge amount (time for qemu...) James ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH v2 3/4] mips: port optimizations to mips n64
This mainly consists of replacing all the pointer arithmatic 'addiu' instructions with PTR_ADDIU which will handle the differences in pointer sizes when compiled on 64 bit mips systems. The header asmdefs.h contains the PTR_ macros which expend to the correct mips instructions to manipulate registers containing pointers. Signed-off-by: James Cowgill james...@cowgill.org.uk --- libavcodec/mips/aacdec_mips.c | 21 +-- libavcodec/mips/aacdec_mips.h | 9 ++--- libavcodec/mips/aacpsdsp_mips.c | 43 +++--- libavcodec/mips/aacpsy_mips.h | 6 ++-- libavcodec/mips/aacsbr_mips.c | 53 +-- libavcodec/mips/aacsbr_mips.h | 17 - libavcodec/mips/ac3dsp_mips.c | 59 --- libavcodec/mips/acelp_filters_mips.c | 13 +++ libavcodec/mips/acelp_vectors_mips.c | 7 ++-- libavcodec/mips/celp_filters_mips.c | 13 +++ libavcodec/mips/celp_math_mips.c | 5 +-- libavcodec/mips/compute_antialias_float.h | 4 ++- libavcodec/mips/fft_mips.c| 13 +++ libavcodec/mips/fmtconvert_mips.c | 6 ++-- libavcodec/mips/lsp_mips.h| 6 ++-- libavcodec/mips/mpegaudiodsp_mips_fixed.c | 11 +++--- libavcodec/mips/mpegaudiodsp_mips_float.c | 25 ++--- libavcodec/mips/sbrdsp_mips.c | 45 +++ libavutil/mips/asmdefs.h | 48 + libavutil/mips/float_dsp_mips.c | 21 +-- 20 files changed, 247 insertions(+), 178 deletions(-) create mode 100644 libavutil/mips/asmdefs.h diff --git a/libavcodec/mips/aacdec_mips.c b/libavcodec/mips/aacdec_mips.c index 93947be..253cdeb 100644 --- a/libavcodec/mips/aacdec_mips.c +++ b/libavcodec/mips/aacdec_mips.c @@ -56,6 +56,7 @@ #include aacdec_mips.h #include libavcodec/aactab.h #include libavcodec/sinewin.h +#include libavutil/mips/asmdefs.h #if HAVE_INLINE_ASM static av_always_inline void float_copy(float *dst, const float *src, int count) @@ -80,7 +81,7 @@ static av_always_inline void float_copy(float *dst, const float *src, int count) lw %[temp5],20(%[src]) \n\t lw %[temp6],24(%[src]) \n\t lw %[temp7],28(%[src]) \n\t -addiu %[src], %[src], 32\n\t +PTR_ADDIU %[src],%[src], 32\n\t sw %[temp0],0(%[dst]) \n\t sw %[temp1],4(%[dst]) \n\t sw %[temp2],8(%[dst]) \n\t @@ -90,7 +91,7 @@ static av_always_inline void float_copy(float *dst, const float *src, int count) sw %[temp6],24(%[dst]) \n\t sw %[temp7],28(%[dst]) \n\t bne %[src], %[loop_end], 1b\n\t -addiu %[dst], %[dst], 32\n\t +PTR_ADDIU %[dst],%[dst], 32\n\t .set pop\n\t : [temp0]=r(temp[0]), [temp1]=r(temp[1]), @@ -250,7 +251,7 @@ static void apply_ltp_mips(AACContext *ac, SingleChannelElement *sce) sw $0, 4(%[p_predTime])\n\t sw $0, 8(%[p_predTime])\n\t sw $0, 12(%[p_predTime]) \n\t -addiu %[p_predTime], %[p_predTime], 16 \n\t +PTR_ADDIU %[p_predTime], %[p_predTime], 16 \n\t : [p_predTime]+r(p_predTime) : @@ -261,7 +262,7 @@ static void apply_ltp_mips(AACContext *ac, SingleChannelElement *sce) __asm__ volatile ( sw $0, 0(%[p_predTime])\n\t -addiu %[p_predTime], %[p_predTime], 4\n\t +PTR_ADDIU %[p_predTime], %[p_predTime], 4\n\t : [p_predTime]+r(p_predTime) : @@ -315,9 +316,9 @@ static av_always_inline void fmul_and_reverse(float *dst, const float *src0, con swc1%[temp9],4(%[ptr1])\n\t swc1%[temp10], 8(%[ptr1])\n\t swc1%[temp11], 12(%[ptr1]) \n\t -addiu %[ptr1], %[ptr1], 16 \n\t -addiu %[ptr2], %[ptr2], -16 \n\t -addiu %[ptr3], %[ptr3], -16 \n\t +PTR_ADDIU %[ptr1], %[ptr1], 16 \n\t +PTR_ADDIU %[ptr2], %[ptr2], -16 \n\t +PTR_ADDIU %[ptr3], %[ptr3], -16 \n\t : [temp0]=f(temp[0]), [temp1]=f(temp[1]), [temp2]=f(temp[2]), [temp3]=f(temp[3]), @@ -358,7 +359,7 @@ static void update_ltp_mips(AACContext *ac, SingleChannelElement *sce) sw $0, 20(%[p_saved_ltp]) \n\t
[FFmpeg-devel] [PATCH v2 1/4] mips/aacdec: remove uses of mips32r2 specific ext instructions
Removing these removes the dependency of this code on mips32r2 which would allow it to be used on processors which have FPU instructions, but not r2 instructions (like the mips64el debian port for instance). Signed-off-by: James Cowgill james...@cowgill.org.uk --- libavcodec/mips/aacdec_mips.h | 49 ++- 1 file changed, 25 insertions(+), 24 deletions(-) diff --git a/libavcodec/mips/aacdec_mips.h b/libavcodec/mips/aacdec_mips.h index 9ba3079..c9efdbb 100644 --- a/libavcodec/mips/aacdec_mips.h +++ b/libavcodec/mips/aacdec_mips.h @@ -68,10 +68,10 @@ static inline float *VMUL2_mips(float *dst, const float *v, unsigned idx, float *ret; __asm__ volatile( -andi%[temp3], %[idx], 15 \n\t -ext %[temp4], %[idx], 4, 4\n\t +andi%[temp3], %[idx], 0x0F \n\t +andi%[temp4], %[idx], 0xF0 \n\t sll %[temp3], %[temp3], 2\n\t -sll %[temp4], %[temp4], 2\n\t +srl %[temp4], %[temp4], 2\n\t lwc1%[temp2], 0(%[scale])\n\t lwxc1 %[temp0], %[temp3](%[v]) \n\t lwxc1 %[temp1], %[temp4](%[v]) \n\t @@ -99,14 +99,13 @@ static inline float *VMUL4_mips(float *dst, const float *v, unsigned idx, float *ret; __asm__ volatile( -andi%[temp0], %[idx], 3 \n\t -ext %[temp1], %[idx], 2, 2 \n\t -ext %[temp2], %[idx], 4, 2 \n\t -ext %[temp3], %[idx], 6, 2 \n\t +andi%[temp0], %[idx], 0x03\n\t +andi%[temp1], %[idx], 0x0C\n\t +andi%[temp2], %[idx], 0x30\n\t +andi%[temp3], %[idx], 0xC0\n\t sll %[temp0], %[temp0], 2 \n\t -sll %[temp1], %[temp1], 2 \n\t -sll %[temp2], %[temp2], 2 \n\t -sll %[temp3], %[temp3], 2 \n\t +srl %[temp2], %[temp2], 2 \n\t +srl %[temp3], %[temp3], 4 \n\t lwc1%[temp4], 0(%[scale]) \n\t lwxc1 %[temp5], %[temp0](%[v])\n\t lwxc1 %[temp6], %[temp1](%[v])\n\t @@ -142,14 +141,14 @@ static inline float *VMUL2S_mips(float *dst, const float *v, unsigned idx, float *ret; __asm__ volatile( -andi%[temp0], %[idx], 15 \n\t -ext %[temp1], %[idx], 4, 4 \n\t +andi%[temp0], %[idx], 0x0F \n\t +andi%[temp1], %[idx], 0xF0 \n\t lw %[temp4], 0(%[scale]) \n\t srl %[temp2], %[sign], 1 \n\t sll %[temp3], %[sign], 31 \n\t sll %[temp2], %[temp2], 31 \n\t sll %[temp0], %[temp0], 2 \n\t -sll %[temp1], %[temp1], 2 \n\t +srl %[temp1], %[temp1], 2 \n\t lwxc1 %[temp8], %[temp0](%[v]) \n\t lwxc1 %[temp9], %[temp1](%[v]) \n\t xor %[temp5], %[temp4], %[temp2] \n\t @@ -185,22 +184,24 @@ static inline float *VMUL4S_mips(float *dst, const float *v, unsigned idx, __asm__ volatile( lw %[temp0], 0(%[scale]) \n\t -and %[temp1], %[idx], 3 \n\t -ext %[temp2], %[idx], 2, 2 \n\t -ext %[temp3], %[idx], 4, 2 \n\t -ext %[temp4], %[idx], 6, 2 \n\t -sll %[temp1], %[temp1], 2 \n\t -sll %[temp2], %[temp2], 2 \n\t -sll %[temp3], %[temp3], 2 \n\t -sll %[temp4], %[temp4], 2 \n\t +andi%[temp1], %[idx], 0x03 \n\t +andi%[temp2], %[idx], 0x0C \n\t +andi%[temp3], %[idx], 0x30 \n\t +andi%[temp4], %[idx], 0xC0 \n\t +sll %[temp1], %[temp1], 2\n\t +srl %[temp3], %[temp3], 2\n\t +srl %[temp4], %[temp4], 4\n\t lwxc1 %[temp10], %[temp1](%[v])\n\t lwxc1 %[temp11], %[temp2](%[v])\n\t lwxc1 %[temp12], %[temp3](%[v])\n\t lwxc1 %[temp13], %[temp4](%[v])\n\t and %[temp1], %[sign], %[mask] \n\t -ext %[temp2], %[idx], 12, 1 \n\t -ext %[temp3], %[idx], 13, 1 \n\t -ext %[temp4], %[idx], 14, 1 \n\t +srl %[temp2
[FFmpeg-devel] [PATCH v2 2/4] configure, mips: remove MIPS32R2, merging it with MIPSFPU
There are no independant uses of mips32r2 instructions except for the FPU parts. Due to the heavy use of mips32r2 specifc fpu extensions, I am guessing the original author intended MIPSFPU to imply MIPS32R2 anyway. Since these fpu instructions are available on mips64 (non-r2), enable them there as well. Also remove the last occurence of HAVE_MIPS32R2 (which is coupled to HAVE_MIPSFPU anyway). mips32r2 is left in the list of options form compatability so that using --disable-mips32r2 doesn't break anything. Signed-off-by: James Cowgill james...@cowgill.org.uk --- Makefile | 2 +- arch.mak | 1 - configure | 18 +- libavcodec/mips/ac3dsp_mips.c | 4 ++-- 4 files changed, 16 insertions(+), 9 deletions(-) diff --git a/Makefile b/Makefile index 845a274..ca2ce59 100644 --- a/Makefile +++ b/Makefile @@ -80,7 +80,7 @@ SUBDIR_VARS := CLEANFILES EXAMPLES FFLIBS HOSTPROGS TESTPROGS TOOLS \ HEADERS ARCH_HEADERS BUILT_HEADERS SKIPHEADERS\ ARMV5TE-OBJS ARMV6-OBJS ARMV8-OBJS VFP-OBJS NEON-OBJS \ ALTIVEC-OBJS MMX-OBJS YASM-OBJS \ - MIPSFPU-OBJS MIPSDSPR2-OBJS MIPSDSPR1-OBJS MIPS32R2-OBJS \ + MIPSFPU-OBJS MIPSDSPR2-OBJS MIPSDSPR1-OBJS\ OBJS SLIBOBJS HOSTOBJS TESTOBJS define RESET diff --git a/arch.mak b/arch.mak index 0e866d8..48bc2d3 100644 --- a/arch.mak +++ b/arch.mak @@ -5,7 +5,6 @@ OBJS-$(HAVE_VFP) += $(VFP-OBJS) $(VFP-OBJS-yes) OBJS-$(HAVE_NEON)+= $(NEON-OBJS)$(NEON-OBJS-yes) OBJS-$(HAVE_MIPSFPU) += $(MIPSFPU-OBJS)$(MIPSFPU-OBJS-yes) -OBJS-$(HAVE_MIPS32R2) += $(MIPS32R2-OBJS) $(MIPS32R2-OBJS-yes) OBJS-$(HAVE_MIPSDSPR1) += $(MIPSDSPR1-OBJS) $(MIPSDSPR1-OBJS-yes) OBJS-$(HAVE_MIPSDSPR2) += $(MIPSDSPR2-OBJS) $(MIPSDSPR2-OBJS-yes) diff --git a/configure b/configure index d641d9f..ce745d2 100755 --- a/configure +++ b/configure @@ -358,7 +358,6 @@ Optimization options (experts only): --disable-neon disable NEON optimizations --disable-inline-asm disable use of inline assembly --disable-yasm disable use of nasm/yasm assembly - --disable-mips32r2 disable MIPS32R2 optimizations --disable-mipsdspr1 disable MIPS DSP ASE R1 optimizations --disable-mipsdspr2 disable MIPS DSP ASE R2 optimizations --disable-mipsfpudisable floating point MIPS optimizations @@ -1999,7 +1998,6 @@ setend_deps=arm map 'eval ${v}_inline_deps=inline_asm' $ARCH_EXT_LIST_ARM mipsfpu_deps=mips -mips32r2_deps=mips mipsdspr1_deps=mips mipsdspr2_deps=mips @@ -4569,8 +4567,19 @@ EOF elif enabled mips; then check_inline_asm loongson 'dmult.g $1, $2, $3' -enabled mips32r2 add_cflags -mips32r2 add_asflags -mips32r2 - check_inline_asm mips32r2 'rotr $t0, $t1, 1' + +# Enable minimum ISA based on selected options +if enabled mips64 (enabled mipsdspr1 || enabled mipsdspr2); then +add_cflags -mips64r2 +add_asflags -mips64r2 +elif enabled mips64 enabled mipsfpu; then +add_cflags -mips64 +add_asflags -mips64 +elif enabled mipsfpu || enabled mipsdspr1 || enabled mipsdspr2; then +add_cflags -mips32r2 +add_asflags -mips32r2 +fi + enabled mipsdspr1 add_cflags -mdsp add_asflags -mdsp check_inline_asm mipsdspr1 'addu.qb $t0, $t1, $t2' enabled mipsdspr2 add_cflags -mdspr2 add_asflags -mdspr2 @@ -5522,7 +5531,6 @@ if enabled arm; then fi if enabled mips; then echo MIPS FPU enabled ${mipsfpu-no} -echo MIPS32R2 enabled ${mips32r2-no} echo MIPS DSP R1 enabled ${mipsdspr1-no} echo MIPS DSP R2 enabled ${mipsdspr2-no} fi diff --git a/libavcodec/mips/ac3dsp_mips.c b/libavcodec/mips/ac3dsp_mips.c index f33c6f1..bd2a611 100644 --- a/libavcodec/mips/ac3dsp_mips.c +++ b/libavcodec/mips/ac3dsp_mips.c @@ -199,7 +199,7 @@ static void ac3_update_bap_counts_mips(uint16_t mant_cnt[16], uint8_t *bap, } #endif -#if HAVE_MIPSFPU HAVE_MIPS32R2 +#if HAVE_MIPSFPU static void float_to_fixed24_mips(int32_t *dst, const float *src, unsigned int len) { const float scale = 1 24; @@ -403,7 +403,7 @@ void ff_ac3dsp_init_mips(AC3DSPContext *c, int bit_exact) { c-bit_alloc_calc_bap = ac3_bit_alloc_calc_bap_mips; c-update_bap_counts = ac3_update_bap_counts_mips; #endif -#if HAVE_MIPSFPU HAVE_MIPS32R2 +#if HAVE_MIPSFPU c-float_to_fixed24 = float_to_fixed24_mips; c-downmix = ac3_downmix_mips; #endif -- 2.1.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH v2 4/4] changelog: add mips 64-bit port
--- Changelog | 1 + 1 file changed, 1 insertion(+) diff --git a/Changelog b/Changelog index 1374cbc..2a5d6b8 100644 --- a/Changelog +++ b/Changelog @@ -36,6 +36,7 @@ version next: - Canopus HQX decoder - RTP depacketization of T.140 text (RFC 4103) - VP9 RTP payload format (draft 0) experimental depacketizer +- Port MIPS opttimizations to 64-bit version 2.5: -- 2.1.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH v2 0/4] mips cleanups and port to mips64
Hi, This is the second version of the mips patches without the ones which were already accepted. Changes: - Keep mips32r2 in the list of configure options so it doesn't break anything (although now it's a no op). - Drop float dsp patch and just do the normal 64-bit porting. This would be good to do properly in generic code at some point. I have a feeling that using restrict in the way I did could result in undefined behavior in certain cases though. - Move asmdefs.h to libavutil/mips to allow for the above change. - Rebase the 64-bit porting and fix the conflicts. - Add entry in the changelog for the mips 64-bit port. Thanks, James James Cowgill (4): mips/aacdec: remove uses of mips32r2 specific ext instructions configure, mips: remove MIPS32R2, merging it with MIPSFPU mips: port optimizations to mips n64 changelog: add mips 64-bit port Changelog | 1 + Makefile | 2 +- arch.mak | 1 - configure | 18 ++--- libavcodec/mips/aacdec_mips.c | 21 ++- libavcodec/mips/aacdec_mips.h | 58 ++-- libavcodec/mips/aacpsdsp_mips.c | 43 ++--- libavcodec/mips/aacpsy_mips.h | 6 ++- libavcodec/mips/aacsbr_mips.c | 53 +- libavcodec/mips/aacsbr_mips.h | 17 + libavcodec/mips/ac3dsp_mips.c | 63 --- libavcodec/mips/acelp_filters_mips.c | 13 --- libavcodec/mips/acelp_vectors_mips.c | 7 ++-- libavcodec/mips/celp_filters_mips.c | 13 --- libavcodec/mips/celp_math_mips.c | 5 ++- libavcodec/mips/compute_antialias_float.h | 4 +- libavcodec/mips/fft_mips.c| 13 --- libavcodec/mips/fmtconvert_mips.c | 6 +-- libavcodec/mips/lsp_mips.h| 6 ++- libavcodec/mips/mpegaudiodsp_mips_fixed.c | 11 +++--- libavcodec/mips/mpegaudiodsp_mips_float.c | 25 ++-- libavcodec/mips/sbrdsp_mips.c | 45 +++--- libavutil/mips/asmdefs.h | 48 +++ libavutil/mips/float_dsp_mips.c | 21 ++- 24 files changed, 289 insertions(+), 211 deletions(-) create mode 100644 libavutil/mips/asmdefs.h -- 2.1.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 09/12] mips: port optimizations to mips n64
On Wed, 2015-03-04 at 11:52 +0100, Michael Niedermayer wrote: On Wed, Mar 04, 2015 at 10:10:15AM +, Nedeljko Babic wrote: LGTM seems this does not apply cleanly on HEAD Applying: mips: port optimizations to mips n64 error: patch failed: libavcodec/mips/acelp_filters_mips.c:82 error: libavcodec/mips/acelp_filters_mips.c: patch does not apply error: patch failed: libavcodec/mips/fmtconvert_mips.c:50 error: libavcodec/mips/fmtconvert_mips.c: patch does not apply Patch failed at 0001 mips: port optimizations to mips n64 Yeah, I'll resend it with my other small changes soon. James ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 02/12] mips/float_dsp: replace assembly with C implementations
On Wed, 2015-03-04 at 11:08 +, Nedeljko Babic wrote: The assembly versions have a few problems - They only work with mips32r2 enabled - They don't work on 64-bits - They're massive and complex So replace them with C implementations which solve these problems and let GCC magically optimize for different platforms. All the functions are manually unrolled 4 times (like the assembly code). With the addition of a few restrict keywords, the functions produce almost identical assembly to the original versions when compiled with gcc -O3. Since this code now uses no fpu assembly, drop the HAVE_MIPSFPU guard as well. All improvements of the C code should be put in generic C code so all architectures can benefit from them. The purpose of this code was to create optimizations for specific architecture. In this way optimizations for mips32r2 architecture are here even without tweaking configure line and even for older compilers. That's ok until you try to run it on an old MIPS processor and the default FFmpeg options cause lots of SIGILLs, but that's another discussion (and maybe nobody cares :/). By putting these optimizations under HAVE_MIPS32R2 problem with building mips64 should be resolved and this can be optimized for mips64 later if needed. I was thinking about just dropping this patch for the time being and porting a few bits to mips64 like in the other files (the code only uses the mips vi parts of mips32r2). The code only kept its performance when I unrolled the loops and used av_restrict. Strictly speaking you're not even supposed to use restrict if the arrays could be exactly equal (which is permitted in the contracts for some of these functions). James ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 07/12] mips/aacdec: remove uses of mips32r2 specific ext instructions
On Tue, 2015-03-03 at 12:42 +, Nedeljko Babic wrote: Removing these removes the dependency of this code on mips32r2 which would allow it to be used on processors which have FPU instructions, but not r2 instructions (like the mips64el debian port for instance). I would be more comfortable if there were two instances of this code: one for mips32r2 and one for mips32 so advantages of using mips32r2 instructions (however small here) are left intact. On the other hand, since this doesn't change much number of instructions used (adding at maximum around 100 instructions overall if I am not mistaking) I am ok with this. Well I can't see how 'ext' can ever be faster than 'and' (it does more work) so most of these should be no slower anyway. For VMUL4S my version has 2 extra instructions in it so it could be a bit slower. Does this #if seem ok? --- a/libavcodec/mips/aacdec_mips.h +++ b/libavcodec/mips/aacdec_mips.h @@ -198,9 +198,18 @@ static inline float *VMUL4S_mips(float *dst, const float *v, unsigned idx, lwxc1 %[temp12], %[temp3](%[v])\n\t lwxc1 %[temp13], %[temp4](%[v])\n\t and %[temp1], %[sign], %[mask] \n\t +#if defined(__mips_isa_rev) __mips_isa_rev = 2 ext %[temp2], %[idx], 12, 1 \n\t ext %[temp3], %[idx], 13, 1 \n\t ext %[temp4], %[idx], 14, 1 \n\t +#else +srl %[temp2], %[idx], 12 \n\t +srl %[temp3], %[idx], 13 \n\t +srl %[temp4], %[idx], 14 \n\t +andi%[temp2], %[temp2], 1 \n\t +andi%[temp3], %[temp3], 1 \n\t +andi%[temp4], %[temp4], 1 \n\t +#endif sllv%[sign],%[sign], %[temp2]\n\t xor %[temp1], %[temp0], %[temp1]\n\t and %[temp2], %[sign], %[mask] \n\t Thanks, James ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 07/12] mips/aacdec: remove uses of mips32r2 specific ext instructions
Removing these removes the dependency of this code on mips32r2 which would allow it to be used on processors which have FPU instructions, but not r2 instructions (like the mips64el debian port for instance). Signed-off-by: James Cowgill james...@cowgill.org.uk --- libavcodec/mips/aacdec_mips.h | 49 ++- 1 file changed, 25 insertions(+), 24 deletions(-) diff --git a/libavcodec/mips/aacdec_mips.h b/libavcodec/mips/aacdec_mips.h index 9ba3079..c9efdbb 100644 --- a/libavcodec/mips/aacdec_mips.h +++ b/libavcodec/mips/aacdec_mips.h @@ -68,10 +68,10 @@ static inline float *VMUL2_mips(float *dst, const float *v, unsigned idx, float *ret; __asm__ volatile( -andi%[temp3], %[idx], 15 \n\t -ext %[temp4], %[idx], 4, 4\n\t +andi%[temp3], %[idx], 0x0F \n\t +andi%[temp4], %[idx], 0xF0 \n\t sll %[temp3], %[temp3], 2\n\t -sll %[temp4], %[temp4], 2\n\t +srl %[temp4], %[temp4], 2\n\t lwc1%[temp2], 0(%[scale])\n\t lwxc1 %[temp0], %[temp3](%[v]) \n\t lwxc1 %[temp1], %[temp4](%[v]) \n\t @@ -99,14 +99,13 @@ static inline float *VMUL4_mips(float *dst, const float *v, unsigned idx, float *ret; __asm__ volatile( -andi%[temp0], %[idx], 3 \n\t -ext %[temp1], %[idx], 2, 2 \n\t -ext %[temp2], %[idx], 4, 2 \n\t -ext %[temp3], %[idx], 6, 2 \n\t +andi%[temp0], %[idx], 0x03\n\t +andi%[temp1], %[idx], 0x0C\n\t +andi%[temp2], %[idx], 0x30\n\t +andi%[temp3], %[idx], 0xC0\n\t sll %[temp0], %[temp0], 2 \n\t -sll %[temp1], %[temp1], 2 \n\t -sll %[temp2], %[temp2], 2 \n\t -sll %[temp3], %[temp3], 2 \n\t +srl %[temp2], %[temp2], 2 \n\t +srl %[temp3], %[temp3], 4 \n\t lwc1%[temp4], 0(%[scale]) \n\t lwxc1 %[temp5], %[temp0](%[v])\n\t lwxc1 %[temp6], %[temp1](%[v])\n\t @@ -142,14 +141,14 @@ static inline float *VMUL2S_mips(float *dst, const float *v, unsigned idx, float *ret; __asm__ volatile( -andi%[temp0], %[idx], 15 \n\t -ext %[temp1], %[idx], 4, 4 \n\t +andi%[temp0], %[idx], 0x0F \n\t +andi%[temp1], %[idx], 0xF0 \n\t lw %[temp4], 0(%[scale]) \n\t srl %[temp2], %[sign], 1 \n\t sll %[temp3], %[sign], 31 \n\t sll %[temp2], %[temp2], 31 \n\t sll %[temp0], %[temp0], 2 \n\t -sll %[temp1], %[temp1], 2 \n\t +srl %[temp1], %[temp1], 2 \n\t lwxc1 %[temp8], %[temp0](%[v]) \n\t lwxc1 %[temp9], %[temp1](%[v]) \n\t xor %[temp5], %[temp4], %[temp2] \n\t @@ -185,22 +184,24 @@ static inline float *VMUL4S_mips(float *dst, const float *v, unsigned idx, __asm__ volatile( lw %[temp0], 0(%[scale]) \n\t -and %[temp1], %[idx], 3 \n\t -ext %[temp2], %[idx], 2, 2 \n\t -ext %[temp3], %[idx], 4, 2 \n\t -ext %[temp4], %[idx], 6, 2 \n\t -sll %[temp1], %[temp1], 2 \n\t -sll %[temp2], %[temp2], 2 \n\t -sll %[temp3], %[temp3], 2 \n\t -sll %[temp4], %[temp4], 2 \n\t +andi%[temp1], %[idx], 0x03 \n\t +andi%[temp2], %[idx], 0x0C \n\t +andi%[temp3], %[idx], 0x30 \n\t +andi%[temp4], %[idx], 0xC0 \n\t +sll %[temp1], %[temp1], 2\n\t +srl %[temp3], %[temp3], 2\n\t +srl %[temp4], %[temp4], 4\n\t lwxc1 %[temp10], %[temp1](%[v])\n\t lwxc1 %[temp11], %[temp2](%[v])\n\t lwxc1 %[temp12], %[temp3](%[v])\n\t lwxc1 %[temp13], %[temp4](%[v])\n\t and %[temp1], %[sign], %[mask] \n\t -ext %[temp2], %[idx], 12, 1 \n\t -ext %[temp3], %[idx], 13, 1 \n\t -ext %[temp4], %[idx], 14, 1 \n\t +srl %[temp2
[FFmpeg-devel] [PATCH 03/12] mips/aacpsdsp: fix definition of ps_decorrelate_mips
Q_fract should have be declared as 'const float*'. Also fix the constness of some local variables affected by this. Signed-off-by: James Cowgill james...@cowgill.org.uk --- libavcodec/mips/aacpsdsp_mips.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/libavcodec/mips/aacpsdsp_mips.c b/libavcodec/mips/aacpsdsp_mips.c index 4730a7f..06d99d8 100644 --- a/libavcodec/mips/aacpsdsp_mips.c +++ b/libavcodec/mips/aacpsdsp_mips.c @@ -277,7 +277,7 @@ static void ps_mul_pair_single_mips(float (*dst)[2], float (*src0)[2], float *sr static void ps_decorrelate_mips(float (*out)[2], float (*delay)[2], float (*ap_delay)[PS_QMF_TIME_SLOTS + PS_MAX_AP_DELAY][2], - const float phi_fract[2], float (*Q_fract)[2], + const float phi_fract[2], const float (*Q_fract)[2], const float *transient_gain, float g_decay_slope, int len) @@ -285,8 +285,8 @@ static void ps_decorrelate_mips(float (*out)[2], float (*delay)[2], float *p_delay = delay[0][0]; float *p_out = out[0][0]; float *p_ap_delay = ap_delay[0][0][0]; -float *p_t_gain = (float*)transient_gain; -float *p_Q_fract = Q_fract[0][0]; +const float *p_t_gain = transient_gain; +const float *p_Q_fract = Q_fract[0][0]; float ag0, ag1, ag2; float phi_fract0 = phi_fract[0]; float phi_fract1 = phi_fract[1]; -- 2.1.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 08/12] configure, mips: remove MIPS32R2, merging it with MIPSFPU
There are no independant uses of mips32r2 instructions except for the FPU parts. Due to the heavy use of mips32r2 specifc fpu extensions, I am guessing the original author intended MIPSFPU to imply MIPS32R2 anyway. Since these fpu instructions are available on mips64 (non-r2), enable them there as well. Also remove the last occurence of HAVE_MIPS32R2 (which is coupled to HAVE_MIPSFPU anyway). Signed-off-by: James Cowgill james...@cowgill.org.uk --- Makefile | 2 +- arch.mak | 1 - configure | 19 +-- libavcodec/mips/ac3dsp_mips.c | 4 ++-- 4 files changed, 16 insertions(+), 10 deletions(-) diff --git a/Makefile b/Makefile index 845a274..ca2ce59 100644 --- a/Makefile +++ b/Makefile @@ -80,7 +80,7 @@ SUBDIR_VARS := CLEANFILES EXAMPLES FFLIBS HOSTPROGS TESTPROGS TOOLS \ HEADERS ARCH_HEADERS BUILT_HEADERS SKIPHEADERS\ ARMV5TE-OBJS ARMV6-OBJS ARMV8-OBJS VFP-OBJS NEON-OBJS \ ALTIVEC-OBJS MMX-OBJS YASM-OBJS \ - MIPSFPU-OBJS MIPSDSPR2-OBJS MIPSDSPR1-OBJS MIPS32R2-OBJS \ + MIPSFPU-OBJS MIPSDSPR2-OBJS MIPSDSPR1-OBJS\ OBJS SLIBOBJS HOSTOBJS TESTOBJS define RESET diff --git a/arch.mak b/arch.mak index 0e866d8..48bc2d3 100644 --- a/arch.mak +++ b/arch.mak @@ -5,7 +5,6 @@ OBJS-$(HAVE_VFP) += $(VFP-OBJS) $(VFP-OBJS-yes) OBJS-$(HAVE_NEON)+= $(NEON-OBJS)$(NEON-OBJS-yes) OBJS-$(HAVE_MIPSFPU) += $(MIPSFPU-OBJS)$(MIPSFPU-OBJS-yes) -OBJS-$(HAVE_MIPS32R2) += $(MIPS32R2-OBJS) $(MIPS32R2-OBJS-yes) OBJS-$(HAVE_MIPSDSPR1) += $(MIPSDSPR1-OBJS) $(MIPSDSPR1-OBJS-yes) OBJS-$(HAVE_MIPSDSPR2) += $(MIPSDSPR2-OBJS) $(MIPSDSPR2-OBJS-yes) diff --git a/configure b/configure index d037da1..6764830 100755 --- a/configure +++ b/configure @@ -358,7 +358,6 @@ Optimization options (experts only): --disable-neon disable NEON optimizations --disable-inline-asm disable use of inline assembly --disable-yasm disable use of nasm/yasm assembly - --disable-mips32r2 disable MIPS32R2 optimizations --disable-mipsdspr1 disable MIPS DSP ASE R1 optimizations --disable-mipsdspr2 disable MIPS DSP ASE R2 optimizations --disable-mipsfpudisable floating point MIPS optimizations @@ -1560,7 +1559,6 @@ ARCH_EXT_LIST_ARM= ARCH_EXT_LIST_MIPS= mipsfpu -mips32r2 mipsdspr1 mipsdspr2 @@ -1996,7 +1994,6 @@ setend_deps=arm map 'eval ${v}_inline_deps=inline_asm' $ARCH_EXT_LIST_ARM mipsfpu_deps=mips -mips32r2_deps=mips mipsdspr1_deps=mips mipsdspr2_deps=mips @@ -4565,8 +4562,19 @@ EOF elif enabled mips; then check_inline_asm loongson 'dmult.g $1, $2, $3' -enabled mips32r2 add_cflags -mips32r2 add_asflags -mips32r2 - check_inline_asm mips32r2 'rotr $t0, $t1, 1' + +# Enable minimum ISA based on selected options +if enabled mips64 (enabled mipsdspr1 || enabled mipsdspr2); then +add_cflags -mips64r2 +add_asflags -mips64r2 +elif enabled mips64 enabled mipsfpu; then +add_cflags -mips64 +add_asflags -mips64 +elif enabled mipsfpu || enabled mipsdspr1 || enabled mipsdspr2; then +add_cflags -mips32r2 +add_asflags -mips32r2 +fi + enabled mipsdspr1 add_cflags -mdsp add_asflags -mdsp check_inline_asm mipsdspr1 'addu.qb $t0, $t1, $t2' enabled mipsdspr2 add_cflags -mdspr2 add_asflags -mdspr2 @@ -5512,7 +5520,6 @@ if enabled arm; then fi if enabled mips; then echo MIPS FPU enabled ${mipsfpu-no} -echo MIPS32R2 enabled ${mips32r2-no} echo MIPS DSP R1 enabled ${mipsdspr1-no} echo MIPS DSP R2 enabled ${mipsdspr2-no} fi diff --git a/libavcodec/mips/ac3dsp_mips.c b/libavcodec/mips/ac3dsp_mips.c index f33c6f1..bd2a611 100644 --- a/libavcodec/mips/ac3dsp_mips.c +++ b/libavcodec/mips/ac3dsp_mips.c @@ -199,7 +199,7 @@ static void ac3_update_bap_counts_mips(uint16_t mant_cnt[16], uint8_t *bap, } #endif -#if HAVE_MIPSFPU HAVE_MIPS32R2 +#if HAVE_MIPSFPU static void float_to_fixed24_mips(int32_t *dst, const float *src, unsigned int len) { const float scale = 1 24; @@ -403,7 +403,7 @@ void ff_ac3dsp_init_mips(AC3DSPContext *c, int bit_exact) { c-bit_alloc_calc_bap = ac3_bit_alloc_calc_bap_mips; c-update_bap_counts = ac3_update_bap_counts_mips; #endif -#if HAVE_MIPSFPU HAVE_MIPS32R2 +#if HAVE_MIPSFPU c-float_to_fixed24 = float_to_fixed24_mips; c-downmix = ac3_downmix_mips; #endif -- 2.1.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 04/12] mips/fft: remove some useless assembly
Remove some assembly that the compiler can easily handle optimally on its own. GCC produces almost identical assembly. Signed-off-by: James Cowgill james...@cowgill.org.uk --- libavcodec/mips/fft_mips.c | 26 ++ 1 file changed, 2 insertions(+), 24 deletions(-) diff --git a/libavcodec/mips/fft_mips.c b/libavcodec/mips/fft_mips.c index 691f2db..e12c33e 100644 --- a/libavcodec/mips/fft_mips.c +++ b/libavcodec/mips/fft_mips.c @@ -65,26 +65,12 @@ static void ff_fft_calc_mips(FFTContext *s, FFTComplex *z) float w_re, w_im; float *w_re_ptr, *w_im_ptr; const int fft_size = (1 s-nbits); -int s_n = s-nbits; -int tem1, tem2; float pom, pom1, pom2, pom3; float temp, temp1, temp3, temp4; FFTComplex * tmpz_n2, * tmpz_n34, * tmpz_n4; FFTComplex * tmpz_n2_i, * tmpz_n34_i, * tmpz_n4_i, * tmpz_i; -/** -*num_transforms = (0x2aab (16 - s-nbits)) | 1; -*/ -__asm__ volatile ( -li %[tem1], 16 \n\t -sub %[s_n], %[tem1], %[s_n] \n\t -li %[tem2], 10923 \n\t -srav %[tem2], %[tem2], %[s_n] \n\t -ori %[num_t],%[tem2], 1 \n\t -: [num_t]=r(num_transforms), [s_n]+r(s_n), - [tem1]=r(tem1), [tem2]=r(tem2) -); - +num_transforms = (0x2aab (16 - s-nbits)) | 1; for (n=0; nnum_transforms; n++) { offset = ff_fft_offsets_lut[n] 2; @@ -214,15 +200,7 @@ static void ff_fft_calc_mips(FFTContext *s, FFTComplex *z) n4 = 4; for (nbits=4; nbits=s-nbits; nbits++) { -/* -* num_transforms = (num_transforms 1) | 1; -*/ -__asm__ volatile ( -sra %[num_t], %[num_t], 1 \n\t -ori %[num_t], %[num_t], 1 \n\t - -: [num_t] +r (num_transforms) -); +num_transforms = (num_transforms 1) | 1; n2 = 2 * n4; n34 = 3 * n4; -- 2.1.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 05/12] mips/sbrdsp: remove sbr_neg_odd_64_mips
The optimized C version of this code actually runs faster than this version, so remove it. Signed-off-by: James Cowgill james...@cowgill.org.uk --- libavcodec/mips/sbrdsp_mips.c | 34 -- 1 file changed, 34 deletions(-) diff --git a/libavcodec/mips/sbrdsp_mips.c b/libavcodec/mips/sbrdsp_mips.c index d4460ba..c76e709 100644 --- a/libavcodec/mips/sbrdsp_mips.c +++ b/libavcodec/mips/sbrdsp_mips.c @@ -58,39 +58,6 @@ #include libavcodec/sbrdsp.h #if HAVE_INLINE_ASM -static void sbr_neg_odd_64_mips(float *x) -{ -int Temp1, Temp2, Temp3, Temp4, Temp5; -float *x1= x[1]; -float *x_end = x1 + 64; - -/* loop unrolled 4 times */ -__asm__ volatile ( -lui%[Temp5], 0x8000 \n\t -1: \n\t -lw %[Temp1], 0(%[x1])\n\t -lw %[Temp2], 8(%[x1])\n\t -lw %[Temp3], 16(%[x1]) \n\t -lw %[Temp4], 24(%[x1]) \n\t -xor%[Temp1], %[Temp1], %[Temp5]\n\t -xor%[Temp2], %[Temp2], %[Temp5]\n\t -xor%[Temp3], %[Temp3], %[Temp5]\n\t -xor%[Temp4], %[Temp4], %[Temp5]\n\t -sw %[Temp1], 0(%[x1])\n\t -sw %[Temp2], 8(%[x1])\n\t -sw %[Temp3], 16(%[x1]) \n\t -sw %[Temp4], 24(%[x1]) \n\t -addiu %[x1], %[x1], 32 \n\t -bne%[x1], %[x_end], 1b \n\t - -: [Temp1]=r(Temp1), [Temp2]=r(Temp2), - [Temp3]=r(Temp3), [Temp4]=r(Temp4), - [Temp5]=r(Temp5), [x1]+r(x1) -: [x_end]r(x_end) -: memory -); -} - static void sbr_qmf_pre_shuffle_mips(float *z) { int Temp1, Temp2, Temp3, Temp4, Temp5, Temp6; @@ -920,7 +887,6 @@ static void sbr_hf_apply_noise_3_mips(float (*Y)[2], const float *s_m, void ff_sbrdsp_init_mips(SBRDSPContext *s) { #if HAVE_INLINE_ASM -s-neg_odd_64 = sbr_neg_odd_64_mips; s-qmf_pre_shuffle = sbr_qmf_pre_shuffle_mips; s-qmf_post_shuffle = sbr_qmf_post_shuffle_mips; #if HAVE_MIPSFPU -- 2.1.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 02/12] mips/float_dsp: replace assembly with C implementations
The assembly versions have a few problems - They only work with mips32r2 enabled - They don't work on 64-bits - They're massive and complex So replace them with C implementations which solve these problems and let GCC magically optimize for different platforms. All the functions are manually unrolled 4 times (like the assembly code). With the addition of a few restrict keywords, the functions produce almost identical assembly to the original versions when compiled with gcc -O3. Since this code now uses no fpu assembly, drop the HAVE_MIPSFPU guard as well. Signed-off-by: James Cowgill james...@cowgill.org.uk --- libavutil/mips/float_dsp_mips.c | 354 1 file changed, 72 insertions(+), 282 deletions(-) diff --git a/libavutil/mips/float_dsp_mips.c b/libavutil/mips/float_dsp_mips.c index 06d52dc..31425de 100644 --- a/libavutil/mips/float_dsp_mips.c +++ b/libavutil/mips/float_dsp_mips.c @@ -52,332 +52,122 @@ */ #include config.h +#include libavutil/avassert.h #include libavutil/float_dsp.h -#if HAVE_INLINE_ASM HAVE_MIPSFPU -static void vector_fmul_mips(float *dst, const float *src0, const float *src1, - int len) +// The functions here are basically the same as the C implementations but +// unrolled 4 times to take advantage of pointer alignment + mips fpu registers + +static void vector_fmul_mips( +float *av_restrict dst, const float *av_restrict src0, +const float *av_restrict src1, int len) { int i; -if (len 3) { -for (i = 0; i len; i++) -dst[i] = src0[i] * src1[i]; -} else { -float *d = (float *)dst; -float *d_end = d + len; -float *s0= (float *)src0; -float *s1= (float *)src1; - -float src0_0, src0_1, src0_2, src0_3; -float src1_0, src1_1, src1_2, src1_3; - -__asm__ volatile ( -1: \n\t -lwc1 %[src0_0], 0(%[s0])\n\t -lwc1 %[src1_0], 0(%[s1])\n\t -lwc1 %[src0_1], 4(%[s0])\n\t -lwc1 %[src1_1], 4(%[s1])\n\t -lwc1 %[src0_2], 8(%[s0])\n\t -lwc1 %[src1_2], 8(%[s1])\n\t -lwc1 %[src0_3], 12(%[s0]) \n\t -lwc1 %[src1_3], 12(%[s1]) \n\t -mul.s %[src0_0], %[src0_0], %[src1_0] \n\t -mul.s %[src0_1], %[src0_1], %[src1_1] \n\t -mul.s %[src0_2], %[src0_2], %[src1_2] \n\t -mul.s %[src0_3], %[src0_3], %[src1_3] \n\t -swc1 %[src0_0], 0(%[d]) \n\t -swc1 %[src0_1], 4(%[d]) \n\t -swc1 %[src0_2], 8(%[d]) \n\t -swc1 %[src0_3], 12(%[d])\n\t -addiu %[s0], %[s0], 16 \n\t -addiu %[s1], %[s1], 16 \n\t -addiu %[d], %[d], 16 \n\t -bne%[d], %[d_end], 1b \n\t +// input length must be a multiple of 4 +av_assert2(len % 4 == 0); -: [src0_0]=f(src0_0), [src0_1]=f(src0_1), - [src0_2]=f(src0_2), [src0_3]=f(src0_3), - [src1_0]=f(src1_0), [src1_1]=f(src1_1), - [src1_2]=f(src1_2), [src1_3]=f(src1_3), - [d]+r(d), [s0]+r(s0), [s1]+r(s1) -: [d_end]r(d_end) -: memory -); +for (i = 0; i len; i += 4) { +dst[i] = src0[i] * src1[i]; +dst[i + 1] = src0[i + 1] * src1[i + 1]; +dst[i + 2] = src0[i + 2] * src1[i + 2]; +dst[i + 3] = src0[i + 3] * src1[i + 3]; } } -static void vector_fmul_scalar_mips(float *dst, const float *src, float mul, - int len) +static void vector_fmul_scalar_mips( +float *av_restrict dst, const float *av_restrict src, float mul, int len) { -float temp0, temp1, temp2, temp3; -float *local_src = (float*)src; -float *end = local_src + len; +int i; -/* loop unrolled 4 times */ -__asm__ volatile( -.setpush \n\t -.setnoreorder\n\t -1: \n\t -lwc1%[temp0], 0(%[src])\n\t -lwc1%[temp1], 4(%[src])\n\t -lwc1%[temp2], 8(%[src])\n\t -lwc1%[temp3], 12(%[src]) \n\t -addiu %[dst], %[dst], 16 \n\t -mul.s %[temp0], %[temp0], %[mul] \n\t -mul.s %[temp1], %[temp1], %[mul] \n\t -mul.s %[temp2], %[temp2], %[mul] \n\t -mul.s %[temp3], %[temp3], %[mul] \n\t -addiu %[src], %[src], 16 \n\t -swc1%[temp0
[FFmpeg-devel] [PATCH 10/12] mips: use float* to hold pointer instead of int
This is obviously needed for 64-bit support. Signed-off-by: James Cowgill james...@cowgill.org.uk --- libavcodec/mips/aacdec_mips.c | 2 +- libavcodec/mips/aacpsdsp_mips.c | 12 ++-- libavcodec/mips/sbrdsp_mips.c | 10 +- 3 files changed, 12 insertions(+), 12 deletions(-) diff --git a/libavcodec/mips/aacdec_mips.c b/libavcodec/mips/aacdec_mips.c index 5e0a83d..b6eec53 100644 --- a/libavcodec/mips/aacdec_mips.c +++ b/libavcodec/mips/aacdec_mips.c @@ -344,7 +344,7 @@ static void update_ltp_mips(AACContext *ac, SingleChannelElement *sce) if (ics-window_sequence[0] == EIGHT_SHORT_SEQUENCE) { float *p_saved_ltp = saved_ltp + 576; -int loop_end1 = (int)(p_saved_ltp + 448); +float *loop_end1 = p_saved_ltp + 448; float_copy(saved_ltp, saved, 512); diff --git a/libavcodec/mips/aacpsdsp_mips.c b/libavcodec/mips/aacpsdsp_mips.c index 1175918..b03cc3f 100644 --- a/libavcodec/mips/aacpsdsp_mips.c +++ b/libavcodec/mips/aacpsdsp_mips.c @@ -293,7 +293,7 @@ static void ps_decorrelate_mips(float (*out)[2], float (*delay)[2], float phi_fract1 = phi_fract[1]; float temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7, temp8, temp9; -len = (int)((int*)p_delay + (len 1)); +float *p_delay_end = (p_delay + (len 1)); /* merged 2 loops */ __asm__ volatile( @@ -369,7 +369,7 @@ static void ps_decorrelate_mips(float (*out)[2], float (*delay)[2], swc1%[temp3], 628(%[p_ap_delay]) \n\t swc1%[temp5], -8(%[p_out]) \n\t swc1%[temp6], -4(%[p_out]) \n\t -bne %[p_delay],%[len],1b\n\t +bne %[p_delay],%[p_delay_end],1b\n\t swc1 %[temp6], -4(%[p_out]) \n\t .setpop \n\t @@ -380,7 +380,7 @@ static void ps_decorrelate_mips(float (*out)[2], float (*delay)[2], [p_Q_fract]+r(p_Q_fract), [p_t_gain]+r(p_t_gain), [p_out]+r(p_out), [ag0]=f(ag0), [ag1]=f(ag1), [ag2]=f(ag2) : [phi_fract0]f(phi_fract0), [phi_fract1]f(phi_fract1), - [len]r(len), [g_decay_slope]f(g_decay_slope) + [p_delay_end]r(p_delay_end), [g_decay_slope]f(g_decay_slope) : memory ); } @@ -400,7 +400,7 @@ static void ps_stereo_interpolate_mips(float (*l)[2], float (*r)[2], float temp0, temp1, temp2, temp3; float l_re, l_im, r_re, r_im; -len = (int)((int*)l + (len 1)); +float *l_end = ((float *)l + (len 1)); __asm__ volatile( .setpush \n\t @@ -427,7 +427,7 @@ static void ps_stereo_interpolate_mips(float (*l)[2], float (*r)[2], swc1%[temp0], -8(%[l]) \n\t swc1%[temp2], -8(%[r]) \n\t swc1%[temp1], -4(%[l]) \n\t -bne %[l], %[len],1b \n\t +bne %[l], %[l_end], 1b \n\t swc1 %[temp3], -4(%[r]) \n\t .setpop \n\t @@ -438,7 +438,7 @@ static void ps_stereo_interpolate_mips(float (*l)[2], float (*r)[2], [l_re]=f(l_re), [l_im]=f(l_im), [r_re]=f(r_re), [r_im]=f(r_im) : [hs0]f(hs0), [hs1]f(hs1), [hs2]f(hs2), - [hs3]f(hs3), [len]r(len) + [hs3]f(hs3), [l_end]r(l_end) : memory ); } diff --git a/libavcodec/mips/sbrdsp_mips.c b/libavcodec/mips/sbrdsp_mips.c index 5c21749..9f2d827 100644 --- a/libavcodec/mips/sbrdsp_mips.c +++ b/libavcodec/mips/sbrdsp_mips.c @@ -665,14 +665,14 @@ static void sbr_hf_gen_mips(float (*X_high)[2], const float (*X_low)[2], static void sbr_hf_g_filt_mips(float (*Y)[2], const float (*X_high)[40][2], const float *g_filt, int m_max, intptr_t ixh) { -float *p_y, *p_x, *p_g; +const float *p_x, *p_g, *loop_end; +float *p_y; float temp0, temp1, temp2; -int loop_end; -p_g = (float*)g_filt[0]; +p_g = g_filt[0]; p_y = Y[0][0]; -p_x = (float*)X_high[0][ixh][0]; -loop_end = (int)((int*)p_g + m_max); +p_x = X_high[0][ixh][0]; +loop_end = p_g + m_max; __asm__ volatile( .setpush\n\t -- 2.1.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 12/12] mips/aaccoder: use variables instead of using register names directly
On mips64, the registers t[4-7] do not exist. Instead of using a lot of #ifdef or defines to handle differing register names, use variables and let GCC allocate the registers automatically (like in the other mips assembly files). In get_band_cost_ESC_mips, t4 and t5 were renamed to t6 and t7 to avoid a variable name conflict. Signed-off-by: James Cowgill james...@cowgill.org.uk --- libavcodec/mips/aaccoder_mips.c | 929 +--- 1 file changed, 477 insertions(+), 452 deletions(-) diff --git a/libavcodec/mips/aaccoder_mips.c b/libavcodec/mips/aaccoder_mips.c index 8595913..ea0bf31 100644 --- a/libavcodec/mips/aaccoder_mips.c +++ b/libavcodec/mips/aaccoder_mips.c @@ -221,6 +221,7 @@ static void quantize_and_encode_band_cost_SQUAD_mips(struct AACEncContext *s, for (i = 0; i size; i += 4) { int curidx; int *in_int = (int *)in[i]; +int t0, t1, t2, t3, t4, t5, t6, t7; qc1 = scaled[i ] * Q34 + 0.4054f; qc2 = scaled[i+1] * Q34 + 0.4054f; @@ -235,31 +236,31 @@ static void quantize_and_encode_band_cost_SQUAD_mips(struct AACEncContext *s, slt%[qc2], $zero, %[qc2] \n\t slt%[qc3], $zero, %[qc3] \n\t slt%[qc4], $zero, %[qc4] \n\t -lw $t0,0(%[in_int])\n\t -lw $t1,4(%[in_int])\n\t -lw $t2,8(%[in_int])\n\t -lw $t3,12(%[in_int]) \n\t -srl$t0,$t0,31 \n\t -srl$t1,$t1,31 \n\t -srl$t2,$t2,31 \n\t -srl$t3,$t3,31 \n\t -subu $t4,$zero, %[qc1] \n\t -subu $t5,$zero, %[qc2] \n\t -subu $t6,$zero, %[qc3] \n\t -subu $t7,$zero, %[qc4] \n\t -movn %[qc1], $t4,$t0 \n\t -movn %[qc2], $t5,$t1 \n\t -movn %[qc3], $t6,$t2 \n\t -movn %[qc4], $t7,$t3 \n\t +lw %[t0], 0(%[in_int])\n\t +lw %[t1], 4(%[in_int])\n\t +lw %[t2], 8(%[in_int])\n\t +lw %[t3], 12(%[in_int]) \n\t +srl%[t0], %[t0], 31 \n\t +srl%[t1], %[t1], 31 \n\t +srl%[t2], %[t2], 31 \n\t +srl%[t3], %[t3], 31 \n\t +subu %[t4], $zero, %[qc1] \n\t +subu %[t5], $zero, %[qc2] \n\t +subu %[t6], $zero, %[qc3] \n\t +subu %[t7], $zero, %[qc4] \n\t +movn %[qc1], %[t4], %[t0] \n\t +movn %[qc2], %[t5], %[t1] \n\t +movn %[qc3], %[t6], %[t2] \n\t +movn %[qc4], %[t7], %[t3] \n\t .set pop \n\t : [qc1]+r(qc1), [qc2]+r(qc2), - [qc3]+r(qc3), [qc4]+r(qc4) + [qc3]+r(qc3), [qc4]+r(qc4), + [t0]=r(t0), [t1]=r(t1), [t2]=r(t2), [t3]=r(t3), + [t4]=r(t4), [t5]=r(t5), [t6]=r(t6), [t7]=r(t7) : [in_int]r(in_int) -: t0, t1, t2, t3, - t4, t5, t6, t7, - memory +: memory ); curidx = qc1; @@ -295,6 +296,7 @@ static void quantize_and_encode_band_cost_UQUAD_mips(struct AACEncContext *s, int *in_int = (int *)in[i]; uint8_t v_bits; unsigned int v_codes; +int t0, t1, t2, t3, t4; qc1 = scaled[i ] * Q34 + 0.4054f; qc2 = scaled[i+1] * Q34 + 0.4054f; @@ -305,50 +307,51 @@ static void quantize_and_encode_band_cost_UQUAD_mips(struct AACEncContext *s, .set push \n\t .set noreorder \n\t -ori$t4,$zero, 2 \n\t +ori%[t4], $zero, 2 \n\t ori%[sign],$zero, 0 \n\t -slt$t0,$t4,%[qc1] \n\t -slt$t1,$t4,%[qc2] \n\t -slt$t2,$t4,%[qc3] \n\t -slt$t3,$t4,%[qc4] \n\t -movn %[qc1], $t4,$t0 \n\t -movn %[qc2], $t4,$t1 \n\t -movn %[qc3], $t4,$t2 \n\t -movn %[qc4], $t4,$t3 \n\t -lw $t0,0(%[in_int])\n\t -lw $t1,4(%[in_int])\n\t -lw $t2,8(%[in_int])\n\t -lw $t3,12(%[in_int]) \n\t -slt$t0,$t0,$zero \n\t -movn %[sign],$t0,%[qc1] \n\t -slt$t1,$t1,$zero \n\t -slt$t2,$t2,$zero \n\t -slt$t3,$t3,$zero
[FFmpeg-devel] [PATCH 00/12] mips cleanups and port to mips64
Hi, This patchset aims to cleanup the MIPS optimizations a bit and add support for 64-bit processors. I haven't attempted specifically to optimize any of this for 64-bit systems, except for the removal of some assembly blocks which GCC can optimize just as well itself. Also I havn't gone through and cleaned up everything, just the bits that make it easier to port to 64-bits or some things that were really bugging me :) I've run fate on both 32 and 64-bit mips machines and it passes all the tests on both. I don't have a machine with DSP instructions but I managed (with some effort) to run fate using qemu and it passed all the tests there as well. One thing I was sligly uneasy about in the change I made to the configure script was forcing specific ISA levels unless you pass --disable-xxx to configure. This has a habit of causing the final binaries not to run at all (eg I have to disable DSP otherwise I get a lot of SIGILL). Since this was what the code was doing before, I just left it instead of messing up all the MIPS configure options (more than I have done). Thanks, James James Cowgill (12): mips/mathops: remove 64-bit code mips/float_dsp: replace assembly with C implementations mips/aacpsdsp: fix definition of ps_decorrelate_mips mips/fft: remove some useless assembly mips/sbrdsp: remove sbr_neg_odd_64_mips mips/aacdec: refactor out duplicated assembly code mips/aacdec: remove uses of mips32r2 specific ext instructions configure, mips: remove MIPS32R2, merging it with MIPSFPU mips: port optimizations to mips n64 mips: use float* to hold pointer instead of int mips/acelp_filters: fix incorrect register constraint mips/aaccoder: use variables instead of using register names directly Makefile | 2 +- arch.mak | 1 - configure | 19 +- libavcodec/mips/aaccoder_mips.c | 929 +++--- libavcodec/mips/aacdec_mips.c | 623 libavcodec/mips/aacdec_mips.h | 58 +- libavcodec/mips/aacpsdsp_mips.c | 61 +- libavcodec/mips/aacpsy_mips.h | 6 +- libavcodec/mips/aacsbr_mips.c | 53 +- libavcodec/mips/aacsbr_mips.h | 17 +- libavcodec/mips/ac3dsp_mips.c | 63 +- libavcodec/mips/acelp_filters_mips.c | 15 +- libavcodec/mips/acelp_vectors_mips.c | 7 +- libavcodec/mips/asmdefs.h | 48 ++ libavcodec/mips/celp_filters_mips.c | 13 +- libavcodec/mips/celp_math_mips.c | 5 +- libavcodec/mips/compute_antialias_float.h | 4 +- libavcodec/mips/fft_mips.c| 39 +- libavcodec/mips/fmtconvert_mips.c | 33 +- libavcodec/mips/lsp_mips.h| 6 +- libavcodec/mips/mathops.h | 26 - libavcodec/mips/mpegaudiodsp_mips_fixed.c | 11 +- libavcodec/mips/mpegaudiodsp_mips_float.c | 25 +- libavcodec/mips/sbrdsp_mips.c | 89 +-- libavutil/mips/float_dsp_mips.c | 354 +++- 25 files changed, 963 insertions(+), 1544 deletions(-) create mode 100644 libavcodec/mips/asmdefs.h -- 2.1.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 01/12] mips/mathops: remove 64-bit code
GCC is perfectly happy generating optimized multiplication code on its own for 64-bit arches. GCC refuses to optimize the loongson code when in 32-bit mode, so I've left that. Signed-off-by: James Cowgill james...@cowgill.org.uk --- libavcodec/mips/mathops.h | 26 -- 1 file changed, 26 deletions(-) diff --git a/libavcodec/mips/mathops.h b/libavcodec/mips/mathops.h index 368290a..5673fc0 100644 --- a/libavcodec/mips/mathops.h +++ b/libavcodec/mips/mathops.h @@ -49,32 +49,6 @@ static inline av_const int64_t MLS64(int64_t d, int a, int b) } #define MLS64(d, a, b) ((d) = MLS64(d, a, b)) -#elif ARCH_MIPS64 - -static inline av_const int64_t MAC64(int64_t d, int a, int b) -{ -int64_t m; -__asm__ (dmult %2, %3 \n\t - mflo %1 \n\t - daddu %0, %0, %1 \n\t - : +r(d), =r(m) : r(a), r(b) - : hi, lo); -return d; -} -#define MAC64(d, a, b) ((d) = MAC64(d, a, b)) - -static inline av_const int64_t MLS64(int64_t d, int a, int b) -{ -int64_t m; -__asm__ (dmult %2, %3 \n\t - mflo %1 \n\t - dsubu %0, %0, %1 \n\t - : +r(d), =r(m) : r(a), r(b) - : hi, lo); -return d; -} -#define MLS64(d, a, b) ((d) = MLS64(d, a, b)) - #endif #endif /* HAVE_INLINE_ASM */ -- 2.1.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 09/12] mips: port optimizations to mips n64
This mainly consists of replacing all the pointer arithmatic 'addiu' instructions with PTR_ADDIU which will handle the differences in pointer sizes when compiled on 64 bit mips systems. The header asmdefs.h contains the PTR_ macros which expend to the correct mips instructions to manipulate registers containing pointers. Signed-off-by: James Cowgill james...@cowgill.org.uk --- libavcodec/mips/aacdec_mips.c | 21 +-- libavcodec/mips/aacdec_mips.h | 9 ++--- libavcodec/mips/aacpsdsp_mips.c | 43 +++--- libavcodec/mips/aacpsy_mips.h | 6 ++-- libavcodec/mips/aacsbr_mips.c | 53 +-- libavcodec/mips/aacsbr_mips.h | 17 - libavcodec/mips/ac3dsp_mips.c | 59 --- libavcodec/mips/acelp_filters_mips.c | 13 +++ libavcodec/mips/acelp_vectors_mips.c | 7 ++-- libavcodec/mips/asmdefs.h | 48 + libavcodec/mips/celp_filters_mips.c | 13 +++ libavcodec/mips/celp_math_mips.c | 5 +-- libavcodec/mips/compute_antialias_float.h | 4 ++- libavcodec/mips/fft_mips.c| 13 +++ libavcodec/mips/fmtconvert_mips.c | 33 - libavcodec/mips/lsp_mips.h| 6 ++-- libavcodec/mips/mpegaudiodsp_mips_fixed.c | 11 +++--- libavcodec/mips/mpegaudiodsp_mips_float.c | 25 ++--- libavcodec/mips/sbrdsp_mips.c | 45 +++ 19 files changed, 250 insertions(+), 181 deletions(-) create mode 100644 libavcodec/mips/asmdefs.h diff --git a/libavcodec/mips/aacdec_mips.c b/libavcodec/mips/aacdec_mips.c index 909e22b..5e0a83d 100644 --- a/libavcodec/mips/aacdec_mips.c +++ b/libavcodec/mips/aacdec_mips.c @@ -56,6 +56,7 @@ #include aacdec_mips.h #include libavcodec/aactab.h #include libavcodec/sinewin.h +#include libavcodec/mips/asmdefs.h #if HAVE_INLINE_ASM static av_always_inline void float_copy(float *dst, const float *src, int count) @@ -80,7 +81,7 @@ static av_always_inline void float_copy(float *dst, const float *src, int count) lw %[temp5],20(%[src]) \n\t lw %[temp6],24(%[src]) \n\t lw %[temp7],28(%[src]) \n\t -addiu %[src], %[src], 32\n\t +PTR_ADDIU %[src],%[src], 32\n\t sw %[temp0],0(%[dst]) \n\t sw %[temp1],4(%[dst]) \n\t sw %[temp2],8(%[dst]) \n\t @@ -90,7 +91,7 @@ static av_always_inline void float_copy(float *dst, const float *src, int count) sw %[temp6],24(%[dst]) \n\t sw %[temp7],28(%[dst]) \n\t bne %[src], %[loop_end], 1b\n\t -addiu %[dst], %[dst], 32\n\t +PTR_ADDIU %[dst],%[dst], 32\n\t .set pop\n\t : [temp0]=r(temp[0]), [temp1]=r(temp[1]), @@ -250,7 +251,7 @@ static void apply_ltp_mips(AACContext *ac, SingleChannelElement *sce) sw $0, 4(%[p_predTime])\n\t sw $0, 8(%[p_predTime])\n\t sw $0, 12(%[p_predTime]) \n\t -addiu %[p_predTime], %[p_predTime], 16 \n\t +PTR_ADDIU %[p_predTime], %[p_predTime], 16 \n\t : [p_predTime]+r(p_predTime) : @@ -261,7 +262,7 @@ static void apply_ltp_mips(AACContext *ac, SingleChannelElement *sce) __asm__ volatile ( sw $0, 0(%[p_predTime])\n\t -addiu %[p_predTime], %[p_predTime], 4\n\t +PTR_ADDIU %[p_predTime], %[p_predTime], 4\n\t : [p_predTime]+r(p_predTime) : @@ -315,9 +316,9 @@ static av_always_inline void fmul_and_reverse(float *dst, const float *src0, con swc1%[temp9],4(%[ptr1])\n\t swc1%[temp10], 8(%[ptr1])\n\t swc1%[temp11], 12(%[ptr1]) \n\t -addiu %[ptr1], %[ptr1], 16 \n\t -addiu %[ptr2], %[ptr2], -16 \n\t -addiu %[ptr3], %[ptr3], -16 \n\t +PTR_ADDIU %[ptr1], %[ptr1], 16 \n\t +PTR_ADDIU %[ptr2], %[ptr2], -16 \n\t +PTR_ADDIU %[ptr3], %[ptr3], -16 \n\t : [temp0]=f(temp[0]), [temp1]=f(temp[1]), [temp2]=f(temp[2]), [temp3]=f(temp[3]), @@ -358,7 +359,7 @@ static void update_ltp_mips(AACContext *ac, SingleChannelElement *sce) sw $0, 20(%[p_saved_ltp]) \n\t sw $0, 24
[FFmpeg-devel] [PATCH 06/12] mips/aacdec: refactor out duplicated assembly code
The float_copy and fmul_and_reverse functions are refactored out from the multiple copies in this file. Signed-off-by: James Cowgill james...@cowgill.org.uk --- libavcodec/mips/aacdec_mips.c | 612 -- 1 file changed, 111 insertions(+), 501 deletions(-) diff --git a/libavcodec/mips/aacdec_mips.c b/libavcodec/mips/aacdec_mips.c index 5db10f9..909e22b 100644 --- a/libavcodec/mips/aacdec_mips.c +++ b/libavcodec/mips/aacdec_mips.c @@ -58,6 +58,51 @@ #include libavcodec/sinewin.h #if HAVE_INLINE_ASM +static av_always_inline void float_copy(float *dst, const float *src, int count) +{ +// Copy 'count' floats from src to dst +const float *loop_end = src + count; +int temp[8]; + +// count must be a multiple of 8 +av_assert2(count % 8 == 0); + +// loop unrolled 8 times +__asm__ volatile ( +.set push \n\t +.set noreorder \n\t +1: \n\t +lw %[temp0],0(%[src]) \n\t +lw %[temp1],4(%[src]) \n\t +lw %[temp2],8(%[src]) \n\t +lw %[temp3],12(%[src]) \n\t +lw %[temp4],16(%[src]) \n\t +lw %[temp5],20(%[src]) \n\t +lw %[temp6],24(%[src]) \n\t +lw %[temp7],28(%[src]) \n\t +addiu %[src], %[src], 32\n\t +sw %[temp0],0(%[dst]) \n\t +sw %[temp1],4(%[dst]) \n\t +sw %[temp2],8(%[dst]) \n\t +sw %[temp3],12(%[dst]) \n\t +sw %[temp4],16(%[dst]) \n\t +sw %[temp5],20(%[dst]) \n\t +sw %[temp6],24(%[dst]) \n\t +sw %[temp7],28(%[dst]) \n\t +bne %[src], %[loop_end], 1b\n\t +addiu %[dst], %[dst], 32\n\t +.set pop\n\t + +: [temp0]=r(temp[0]), [temp1]=r(temp[1]), + [temp2]=r(temp[2]), [temp3]=r(temp[3]), + [temp4]=r(temp[4]), [temp5]=r(temp[5]), + [temp6]=r(temp[6]), [temp7]=r(temp[7]), + [src]+r(src), [dst]+r(dst) +: [loop_end]r(loop_end) +: memory +); +} + static av_always_inline int lcg_random(unsigned previous_val) { union { unsigned u; int s; } v = { previous_val * 1664525u + 1013904223 }; @@ -92,49 +137,7 @@ static void imdct_and_windowing_mips(AACContext *ac, SingleChannelElement *sce) (ics-window_sequence[0] == ONLY_LONG_SEQUENCE || ics-window_sequence[0] == LONG_START_SEQUENCE)) { ac-fdsp-vector_fmul_window(out, saved, buf, lwindow_prev, 512); } else { -{ -float *buf1 = saved; -float *buf2 = out; -int temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7; -int loop_end; - -/* loop unrolled 8 times */ -__asm__ volatile ( -.set push \n\t -.set noreorder \n\t -addiu %[loop_end], %[src], 1792 \n\t -1: \n\t -lw %[temp0],0(%[src]) \n\t -lw %[temp1],4(%[src]) \n\t -lw %[temp2],8(%[src]) \n\t -lw %[temp3],12(%[src]) \n\t -lw %[temp4],16(%[src]) \n\t -lw %[temp5],20(%[src]) \n\t -lw %[temp6],24(%[src]) \n\t -lw %[temp7],28(%[src]) \n\t -addiu %[src], %[src], 32\n\t -sw %[temp0],0(%[dst]) \n\t -sw %[temp1],4(%[dst]) \n\t -sw %[temp2],8(%[dst]) \n\t -sw %[temp3],12(%[dst]) \n\t -sw %[temp4],16(%[dst]) \n\t -sw %[temp5],20(%[dst]) \n\t -sw %[temp6],24(%[dst]) \n\t -sw %[temp7],28(%[dst]) \n\t -bne %[src], %[loop_end], 1b\n\t - addiu %[dst], %[dst], 32\n\t -.set pop\n\t - -: [temp0]=r(temp0), [temp1]=r(temp1), - [temp2]=r(temp2), [temp3]=r(temp3), - [temp4]=r(temp4), [temp5]=r(temp5), - [temp6]=r(temp6), [temp7]=r(temp7), - [loop_end]=r(loop_end), [src]+r(buf1), - [dst]+r(buf2) -: -: memory
[FFmpeg-devel] [PATCH 11/12] mips/acelp_filters: fix incorrect register constraint
Change register constraint on the v variable from = to +. This was causing GCC to think that the v variable was never read and therefore not initialize it. This fixes about 20 fate failures on mips64el. Signed-off-by: James Cowgill james...@cowgill.org.uk --- libavcodec/mips/acelp_filters_mips.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/mips/acelp_filters_mips.c b/libavcodec/mips/acelp_filters_mips.c index 98ddc54..c77d37b 100644 --- a/libavcodec/mips/acelp_filters_mips.c +++ b/libavcodec/mips/acelp_filters_mips.c @@ -90,7 +90,7 @@ static void ff_acelp_interpolatef_mips(float *out, const float *in, PTR_ADDU %[p_filter_coeffs_m],%[p_filter_coeffs_m], %[prec] \n\t madd.s %[v],%[v],%[in_val_m], %[fc_val_m] \n\t -: [v] =f (v),[p_in_p] +r (p_in_p), [p_in_m] +r (p_in_m), +: [v] +f (v),[p_in_p] +r (p_in_p), [p_in_m] +r (p_in_m), [p_filter_coeffs_p] +r (p_filter_coeffs_p), [in_val_p] =f (in_val_p), [in_val_m] =f (in_val_m), [fc_val_p] =f (fc_val_p), [fc_val_m] =f (fc_val_m), -- 2.1.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 02/12] mips/float_dsp: replace assembly with C implementations
On Thu, 2015-02-26 at 13:51 +, Derek Buitenhuis wrote: On 2/26/2015 1:42 PM, James Cowgill wrote: The assembly versions have a few problems - They only work with mips32r2 enabled - They don't work on 64-bits - They're massive and complex So replace them with C implementations which solve these problems and let GCC magically optimize for different platforms. All the functions are manually unrolled 4 times (like the assembly code). With the addition of a few restrict keywords, the functions produce almost identical assembly to the original versions when compiled with gcc -O3. Why have C implementations in the *MIPS* DSP code? That's silly. Hmm maybe a little. I was just worried that if I moved all the loop unrolling stuff into generic code it might go slower on other arches I haven't tested. James ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel