from:"James Cowgill"

[FFmpeg-devel] [PATCH] avcodec/libtwolame: fix mono default bitrate

2019-11-01 Thread James Cowgill

As of libtwolame 0.4.0, 384 kbps is not accepted as a valid bitrate
for encoding mono audio and the maximum bitrate is now halved to 192
kbps to comply with the MP2 standard. Example error:

twolame_init_params(): 384kbps is an invalid bitrate for mono encoding.

Adjust the default bitrate calculation to take this into account.

Signed-off-by: James Cowgill 
---
 libavcodec/libtwolame.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/libavcodec/libtwolame.c b/libavcodec/libtwolame.c
index 030f88868f..5ceb3d9f3f 100644
--- a/libavcodec/libtwolame.c
+++ b/libavcodec/libtwolame.c
@@ -78,8 +78,12 @@ static av_cold int twolame_encode_init(AVCodecContext *avctx)
 twolame_set_in_samplerate(s->glopts, avctx->sample_rate);
 twolame_set_out_samplerate(s->glopts, avctx->sample_rate);
 
-if (!avctx->bit_rate)
-avctx->bit_rate = avctx->sample_rate < 28000 ? 16 : 384000;
+if (!avctx->bit_rate) {
+if ((s->mode == TWOLAME_AUTO_MODE && avctx->channels == 1) || s->mode 
== TWOLAME_MONO)
+avctx->bit_rate = avctx->sample_rate < 28000 ? 8 : 192000;
+else
+avctx->bit_rate = avctx->sample_rate < 28000 ? 16 : 384000;
+}
 
 if (avctx->flags & AV_CODEC_FLAG_QSCALE || !avctx->bit_rate) {
 twolame_set_VBR(s->glopts, TRUE);
-- 
2.24.0.rc1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v2] avcodec/arm/sbcenc: avoid callee preserved vfp registers

2019-08-25 Thread James Cowgill

When compiling FFmpeg with GCC-9, some very random segfaults were
observed in code which had previously called down into the SBC encoder
NEON assembly routines. This was caused by these functions clobbering
some of the vfp callee saved registers (d8 - d15 aka q4 - q7). GCC was
using these registers to save local variables, but after these
functions returned, they would contain garbage.

Fix by reallocating the registers in the two affected functions in
the following way:
 ff_sbc_analyze_4_neon: q2-q5 => q8-q11, then q1-q4 => q8-q11
 ff_sbc_analyze_8_neon: q2-q9 => q8-q15

The reason for using these replacements is to keep closely related
sets of registers consecutively numbered which hopefully makes the
code more easy to follow. Since this commit only reallocates
registers, it should have no performance impact.

Signed-off-by: James Cowgill 
---

On 29/07/2019 19:59, Reimar Döffinger wrote:
> Seems sensible to me, though extra points if you or someone has numbers on 
> performance impact.
> To know whether it would be worthwhile to check if it can be optimized...

Sorry for the long delay - been on various holidays.

I did a few tests on my original patch and overall it was about 2%
slower than before. In any case I think this new patch is a better
solution (although the diff is a lot larger). We don't actually need
that many registers in either of these functions, so instead of
pushing the clobbered callee saved registers, we can reallocate all
the registers to avoid them in the first place. This way there is no
performance impact.

I couldn't find any tests for this encoder, but I have tested a few
audio samples with it and verified the output is identical to what t
was before (and with what I get on x86).

 libavcodec/arm/sbcdsp_neon.S | 220 +--
 1 file changed, 110 insertions(+), 110 deletions(-)

diff --git a/libavcodec/arm/sbcdsp_neon.S b/libavcodec/arm/sbcdsp_neon.S
index d83d21d202..914abfb6cc 100644
--- a/libavcodec/arm/sbcdsp_neon.S
+++ b/libavcodec/arm/sbcdsp_neon.S
@@ -38,49 +38,49 @@ function ff_sbc_analyze_4_neon, export=1
 /* TODO: merge even and odd cases (or even merge all four calls to this
  * function) in order to have only aligned reads from 'in' array
  * and reduce number of load instructions */
-vld1.16 {d4, d5}, [r0, :64]!
-vld1.16 {d8, d9}, [r2, :128]!
+vld1.16 {d16, d17}, [r0, :64]!
+vld1.16 {d20, d21}, [r2, :128]!

-vmull.s16   q0, d4, d8
-vld1.16 {d6,  d7}, [r0, :64]!
-vmull.s16   q1, d5, d9
-vld1.16 {d10, d11}, [r2, :128]!
+vmull.s16   q0, d16, d20
+vld1.16 {d18, d19}, [r0, :64]!
+vmull.s16   q1, d17, d21
+vld1.16 {d22, d23}, [r2, :128]!

-vmlal.s16   q0, d6, d10
-vld1.16 {d4, d5}, [r0, :64]!
-vmlal.s16   q1, d7, d11
-vld1.16 {d8, d9}, [r2, :128]!
+vmlal.s16   q0, d18, d22
+vld1.16 {d16, d17}, [r0, :64]!
+vmlal.s16   q1, d19, d23
+vld1.16 {d20, d21}, [r2, :128]!

-vmlal.s16   q0, d4, d8
-vld1.16 {d6,  d7}, [r0, :64]!
-vmlal.s16   q1, d5, d9
-vld1.16 {d10, d11}, [r2, :128]!
+vmlal.s16   q0, d16, d20
+vld1.16 {d18, d19}, [r0, :64]!
+vmlal.s16   q1, d17, d21
+vld1.16 {d22, d23}, [r2, :128]!

-vmlal.s16   q0, d6, d10
-vld1.16 {d4, d5}, [r0, :64]!
-vmlal.s16   q1, d7, d11
-vld1.16 {d8, d9}, [r2, :128]!
+vmlal.s16   q0, d18, d22
+vld1.16 {d16, d17}, [r0, :64]!
+vmlal.s16   q1, d19, d23
+vld1.16 {d20, d21}, [r2, :128]!

-vmlal.s16   q0, d4, d8
-vmlal.s16   q1, d5, d9
+vmlal.s16   q0, d16, d20
+vmlal.s16   q1, d17, d21

 vpadd.s32   d0, d0, d1
 vpadd.s32   d1, d2, d3

 vrshrn.s32  d0, q0, SBC_PROTO_FIXED_SCALE

-vld1.16 {d2, d3, d4, d5}, [r2, :128]!
+vld1.16 {d16, d17, d18, d19}, [r2, :128]!

 vdup.i32d1, d0[1]  /* TODO: can be eliminated */
 vdup.i32d0, d0[0]  /* TODO: can be eliminated */

-vmull.s16   q3, d2, d0
-vmull.s16   q4, d3, d0
-vmlal.s16   q3, d4, d1
-vmlal.s16   q4, d5, d1
+vmull.s16   q10, d16, d0
+vmull.s16   q11, d17, d0
+vmlal.s16   q10, d18, d1
+vmlal.s16   q11, d19, d1

-vpadd.s32   d0, d6, d7 /* TODO: can be eliminated */
-vpadd.s32   d1, d8, d9 /* TODO: can be eliminated */
+vpadd.s32   d0, d20, d21 /* TODO: can be eliminated */
+vpadd.s32   d1, d22, d23 /* TODO: can be eliminated */

[FFmpeg-devel] [PATCH] avcodec/arm/sbcenc: save callee preserved vfp registers

2019-07-28 Thread James Cowgill

When compiling FFmpeg with GCC-9, some very random segfaults were
observed in code which had previously called down into the SBC encoder
NEON assembly routines. This was caused by these functions clobbering
some of the vfp callee saved registers (d8 - d15 aka q4 - q7). GCC was
using these registers to save local variables, but after these
functions returned, they would contain garbage.

Fix by saving the relevant registers on the stack in the affected
functions.

Signed-off-by: James Cowgill 
---
 libavcodec/arm/sbcdsp_neon.S | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libavcodec/arm/sbcdsp_neon.S b/libavcodec/arm/sbcdsp_neon.S
index d83d21d202..aa03800096 100644
--- a/libavcodec/arm/sbcdsp_neon.S
+++ b/libavcodec/arm/sbcdsp_neon.S
@@ -38,6 +38,8 @@ function ff_sbc_analyze_4_neon, export=1
 /* TODO: merge even and odd cases (or even merge all four calls to this
  * function) in order to have only aligned reads from 'in' array
  * and reduce number of load instructions */
+vpush   {d8-d11}
+
 vld1.16 {d4, d5}, [r0, :64]!
 vld1.16 {d8, d9}, [r2, :128]!
 
@@ -84,6 +86,7 @@ function ff_sbc_analyze_4_neon, export=1
 
 vst1.32 {d0, d1}, [r1, :128]
 
+vpop{d8-d11}
 bx  lr
 endfunc
 
@@ -91,6 +94,8 @@ function ff_sbc_analyze_8_neon, export=1
 /* TODO: merge even and odd cases (or even merge all four calls to this
  * function) in order to have only aligned reads from 'in' array
  * and reduce number of load instructions */
+vpush   {d8-d15}
+
 vld1.16 {d4, d5}, [r0, :64]!
 vld1.16 {d8, d9}, [r2, :128]!
 
@@ -188,6 +193,7 @@ function ff_sbc_analyze_8_neon, export=1
 
 vst1.32 {d0, d1, d2, d3}, [r1, :128]
 
+vpop{d8-d15}
 bx  lr
 endfunc
 
-- 
2.22.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] fate/hap : add test for hap encoding

2018-04-24 Thread James Cowgill

On 23/04/18 10:11, Carl Eugen Hoyos wrote:
> 2018-03-14 7:31 GMT+01:00, Martin Vignali :
> 
>> In that case we can let the test using "none"
>> compression (bypass the snappy part)
> 
> These tests are also broken, please fix or
> remove them:
> https://buildd.debian.org/status/fetch.php?pkg=ffmpeg=i386=7%3A4.0-1=152218=0
> ("Error 1")

I've had a brief look at this error (and a similar error on s390x) and
it looks like a float rounding issue in some of the functions in
libavc/texturedspenc.c. The output from the hap encoder is only
different by a few bits.

i386 fails because it promotes floats to long double when evaluating (at
least on Debian which has SSE disabled), and s390x fails because it
promotes floats to doubles. I think they're the only architectures which
promote floats (although some have not built yet).

I'll probably just ignore these tests for now since I'm not sure what
the best solution is.

James
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] avformat/libssh: check the user provided a password before trying to use it

2018-01-11 Thread James Cowgill

Hi,

On 11/06/17 18:47, jamrial at gmail.com (James Almer) wrote:
> Fixes ticket #6413
> 
> Signed-off-by: James Almer 
> ---
> The public key authentication also tries to use the password variable. I
> don't know if NULL is valid in that case or not.
> Perhaps for that one it would be better to replace the current usage of
> legacy API instead.
> 
>  libavformat/libssh.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Please can this patch be applied to the stable branches. Someone using
Debian stable (3.2.9) reported it:
https://bugs.debian.org/886912

Commit 8ddb6820bd52df6ed616abc3d8be200b126aa8c1 applied to 3.4.

Thanks,
James

> diff --git a/libavformat/libssh.c b/libavformat/libssh.c
> index 49e92e7516..9e3d4da45e 100644
> --- a/libavformat/libssh.c
> +++ b/libavformat/libssh.c
> @@ -103,7 +103,7 @@ static av_cold int libssh_authentication(LIBSSHContext 
> *libssh, const char *user
>  }
>  }
>  
> -if (!authorized && (auth_methods & SSH_AUTH_METHOD_PASSWORD)) {
> +if (!authorized && password && (auth_methods & 
> SSH_AUTH_METHOD_PASSWORD)) {
>  if (ssh_userauth_password(libssh->session, NULL, password) == 
> SSH_AUTH_SUCCESS) {
>  av_log(libssh, AV_LOG_DEBUG, "Authentication successful with 
> password.\n");
>  authorized = 1;
> 




signature.asc
Description: OpenPGP digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] avformat/dashenc: fix min_seg_duration option size

2017-11-18 Thread James Cowgill

In the DASHContext structure, min_seg_duration is declared as an int,
but the AVOption list claimed it was an INT64. Change the option list
to use the correct size, which should fix some initialization errors
seen on big-endian platforms.

Signed-off-by: James Cowgill <jcowg...@debian.org>
---
 libavformat/dashenc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavformat/dashenc.c b/libavformat/dashenc.c
index d5554d1df0..ddad3351fd 100644
--- a/libavformat/dashenc.c
+++ b/libavformat/dashenc.c
@@ -1181,7 +1181,7 @@ static const AVOption options[] = {
 { "adaptation_sets", "Adaptation sets. Syntax: id=0,streams=0,1,2 
id=1,streams=3,4 and so on", OFFSET(adaptation_sets), AV_OPT_TYPE_STRING, { 0 
}, 0, 0, AV_OPT_FLAG_ENCODING_PARAM },
 { "window_size", "number of segments kept in the manifest", 
OFFSET(window_size), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, INT_MAX, E },
 { "extra_window_size", "number of segments kept outside of the manifest 
before removing from disk", OFFSET(extra_window_size), AV_OPT_TYPE_INT, { .i64 
= 5 }, 0, INT_MAX, E },
-{ "min_seg_duration", "minimum segment duration (in microseconds)", 
OFFSET(min_seg_duration), AV_OPT_TYPE_INT64, { .i64 = 500 }, 0, INT_MAX, E 
},
+{ "min_seg_duration", "minimum segment duration (in microseconds)", 
OFFSET(min_seg_duration), AV_OPT_TYPE_INT, { .i64 = 500 }, 0, INT_MAX, E },
 { "remove_at_exit", "remove all segments when finished", 
OFFSET(remove_at_exit), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, E },
 { "use_template", "Use SegmentTemplate instead of SegmentList", 
OFFSET(use_template), AV_OPT_TYPE_BOOL, { .i64 = 1 }, 0, 1, E },
 { "use_timeline", "Use SegmentTimeline in SegmentTemplate", 
OFFSET(use_timeline), AV_OPT_TYPE_BOOL, { .i64 = 1 }, 0, 1, E },
-- 
2.15.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] lavc: reset codec on receiving packet after EOF in compat_decode

2017-11-09 Thread James Cowgill

Hi,

On 09/11/17 14:02, Hendrik Leppkes wrote:
> On Thu, Nov 9, 2017 at 1:21 PM, James Cowgill <jcowg...@debian.org> wrote:
>> In commit 061a0c14bb57 ("decode: restructure the core decoding code"), the
>> deprecated avcodec_decode_* APIs were reworked so that they called into the
>> new avcodec_send_packet / avcodec_receive_frame API. This had the side effect
>> of prohibiting sending new packets containing data after a drain
>> packet, but in previous versions of FFmpeg this "worked" and some
>> applications relied on it.
>>
>> To restore some compatibility, reset the codec if we receive a new non-drain
>> packet using the old API after draining has completed. While this does
>> not give the same behaviour as the old API did, in the majority of cases
>> it works and it does not require changes to any other part of the decoding
>> code.
>>
>> Fixes ticket #6775
>> Signed-off-by: James Cowgill <jcowg...@debian.org>
>> ---
>>  libavcodec/decode.c | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/libavcodec/decode.c b/libavcodec/decode.c
>> index 86fe5aef52..2f1932fa85 100644
>> --- a/libavcodec/decode.c
>> +++ b/libavcodec/decode.c
>> @@ -726,6 +726,11 @@ static int compat_decode(AVCodecContext *avctx, AVFrame 
>> *frame,
>>
>>  av_assert0(avci->compat_decode_consumed == 0);
>>
>> +if (avci->draining_done && pkt && pkt->size != 0) {
>> +av_log(avctx, AV_LOG_WARNING, "Got unexpected packet after EOF\n");
>> +avcodec_flush_buffers(avctx);
>> +}
>> +
> 
> I don't think this is a good idea. Draining and not flushing
> afterwards is a bug in the calling code, and even before recent
> changes it would result in inconsistent behavior and even crashes
> (with select decoders).

I am fully aware that this will only trigger if the calling code is
buggy. I am trying to avoid silent breakage of those applications doing
this when upgrading to ffmpeg 3.4.

I was looking at the documentation of avcodec_decode_* recently because
of this and I had some trouble deciding if using the API this way was
incorrect. I expect the downstreams affected thought that what they were
doing was fine and then got angry when ffmpeg suddenly "broke" their
code. This patch at least allows some sort of "transitional period"
until downstreams update.

From the perspective of Debian, I could either apply this patch to
ffmpeg, or I would have to go through over 100 reverse dependencies to
see if they abuse the API and then fix them. I currently know of two
(gst-libav1.0 and kodi), but there could be more - especially within
less used packages.

Thanks,
James
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] lavc: reset codec on receiving packet after EOF in compat_decode

2017-11-09 Thread James Cowgill

In commit 061a0c14bb57 ("decode: restructure the core decoding code"), the
deprecated avcodec_decode_* APIs were reworked so that they called into the
new avcodec_send_packet / avcodec_receive_frame API. This had the side effect
of prohibiting sending new packets containing data after a drain
packet, but in previous versions of FFmpeg this "worked" and some
applications relied on it.

To restore some compatibility, reset the codec if we receive a new non-drain
packet using the old API after draining has completed. While this does
not give the same behaviour as the old API did, in the majority of cases
it works and it does not require changes to any other part of the decoding
code.

Fixes ticket #6775
Signed-off-by: James Cowgill <jcowg...@debian.org>
---
 libavcodec/decode.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libavcodec/decode.c b/libavcodec/decode.c
index 86fe5aef52..2f1932fa85 100644
--- a/libavcodec/decode.c
+++ b/libavcodec/decode.c
@@ -726,6 +726,11 @@ static int compat_decode(AVCodecContext *avctx, AVFrame 
*frame,
 
 av_assert0(avci->compat_decode_consumed == 0);
 
+if (avci->draining_done && pkt && pkt->size != 0) {
+av_log(avctx, AV_LOG_WARNING, "Got unexpected packet after EOF\n");
+avcodec_flush_buffers(avctx);
+}
+
 *got_frame = 0;
 avci->compat_decode = 1;
 
-- 
2.15.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH v2] avcodec/arm: Fix SIGBUS on ARM when compiled with binutils 2.29

2017-08-29 Thread James Cowgill

In binutils 2.29, the behavior of the ADR instruction changed so that 1 is
added to the address of a Thumb function (previously nothing was added). This
allows the loaded address to be passed to a BLX instruction and the correct
mode change will occur.

So that the behavior matches in binutils 2.29 and pre-2.29, use .eqv to
pre-calculate the function address without the automatic +1 fixup. Then use
these new symbols as the function addresses to be loaded.

Fixes ticket 6571.
Related binutils bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21458

Signed-off-by: James Cowgill <jcowg...@debian.org>
---
v2:
 Forgot to include the "avcodec/arm" commit message prefix.

 libavcodec/arm/h264idct_neon.S | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/libavcodec/arm/h264idct_neon.S b/libavcodec/arm/h264idct_neon.S
index 4f68bdb9f5..04b1ea583b 100644
--- a/libavcodec/arm/h264idct_neon.S
+++ b/libavcodec/arm/h264idct_neon.S
@@ -20,6 +20,18 @@
 
 #include "libavutil/arm/asm.S"
 
+# In binutils 2.29, the behavior of the ADR instruction changed so that 1 is
+# added to the address of a Thumb function (previously nothing was added).
+#
+# These .eqv are used to pre-calculate the correct address with +CONFIG_THUMB 
so
+# that ADR will work with both old and new versions binutils.
+#
+# See: https://sourceware.org/bugzilla/show_bug.cgi?id=21458
+.eqv eqv_ff_h264_idct_add_neon, X(ff_h264_idct_add_neon) + CONFIG_THUMB
+.eqv eqv_ff_h264_idct_dc_add_neon,  X(ff_h264_idct_dc_add_neon) + CONFIG_THUMB
+.eqv eqv_ff_h264_idct8_add_neon,X(ff_h264_idct8_add_neon) + CONFIG_THUMB
+.eqv eqv_ff_h264_idct8_dc_add_neon, X(ff_h264_idct8_dc_add_neon) + CONFIG_THUMB
+
 function ff_h264_idct_add_neon, export=1
 vld1.64 {d0-d3},  [r1,:128]
 vmov.i16q15, #0
@@ -113,8 +125,8 @@ function ff_h264_idct_add16_neon, export=1
 movne   lr,  #0
 cmp lr,  #0
 ite ne
-adrne   lr,  X(ff_h264_idct_dc_add_neon) + CONFIG_THUMB
-adreq   lr,  X(ff_h264_idct_add_neon)+ CONFIG_THUMB
+adrne   lr,  eqv_ff_h264_idct_dc_add_neon
+adreq   lr,  eqv_ff_h264_idct_add_neon
 blx lr
 2:  subsip,  ip,  #1
 add r1,  r1,  #32
@@ -138,8 +150,8 @@ function ff_h264_idct_add16intra_neon, export=1
 cmp r8,  #0
 ldrsh   r8,  [r1]
 iteet   ne
-adrne   lr,  X(ff_h264_idct_add_neon)+ CONFIG_THUMB
-adreq   lr,  X(ff_h264_idct_dc_add_neon) + CONFIG_THUMB
+adrne   lr,  eqv_ff_h264_idct_add_neon
+adreq   lr,  eqv_ff_h264_idct_dc_add_neon
 cmpeq   r8,  #0
 blxne   lr
 subsip,  ip,  #1
@@ -166,8 +178,8 @@ function ff_h264_idct_add8_neon, export=1
 cmp r8,  #0
 ldrsh   r8,  [r1]
 iteet   ne
-adrne   lr,  X(ff_h264_idct_add_neon)+ CONFIG_THUMB
-adreq   lr,  X(ff_h264_idct_dc_add_neon) + CONFIG_THUMB
+adrne   lr,  eqv_ff_h264_idct_add_neon
+adreq   lr,  eqv_ff_h264_idct_dc_add_neon
 cmpeq   r8,  #0
 blxne   lr
 add r12, r12, #1
@@ -388,8 +400,8 @@ function ff_h264_idct8_add4_neon, export=1
 movne   lr,  #0
 cmp lr,  #0
 ite ne
-adrne   lr,  X(ff_h264_idct8_dc_add_neon) + CONFIG_THUMB
-adreq   lr,  X(ff_h264_idct8_add_neon)+ CONFIG_THUMB
+adrne   lr,  eqv_ff_h264_idct8_dc_add_neon
+adreq   lr,  eqv_ff_h264_idct8_add_neon
 blx lr
 2:  subsr12, r12, #4
 add r1,  r1,  #128
-- 
2.14.1


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] Fix SIGBUS on ARM when compiled with binutils 2.29

2017-08-28 Thread James Cowgill

In binutils 2.29, the behavior of the ADR instruction changed so that 1 is
added to the address of a Thumb function (previously nothing was added). This
allows the loaded address to be passed to a BLX instruction and the correct
mode change will occur.

So that the behavior matches in binutils 2.29 and pre-2.29, use .eqv to
pre-calculate the function address without the automatic +1 fixup. Then use
these new symbols as the function addresses to be loaded.

See: https://sourceware.org/bugzilla/show_bug.cgi?id=21458

Fixes ticket 6571.

Signed-off-by: James Cowgill <jcowg...@debian.org>
---
 libavcodec/arm/h264idct_neon.S | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/libavcodec/arm/h264idct_neon.S b/libavcodec/arm/h264idct_neon.S
index 4f68bdb9f5..04b1ea583b 100644
--- a/libavcodec/arm/h264idct_neon.S
+++ b/libavcodec/arm/h264idct_neon.S
@@ -20,6 +20,18 @@
 
 #include "libavutil/arm/asm.S"
 
+# In binutils 2.29, the behavior of the ADR instruction changed so that 1 is
+# added to the address of a Thumb function (previously nothing was added).
+#
+# These .eqv are used to pre-calculate the correct address with +CONFIG_THUMB 
so
+# that ADR will work with both old and new versions binutils.
+#
+# See: https://sourceware.org/bugzilla/show_bug.cgi?id=21458
+.eqv eqv_ff_h264_idct_add_neon, X(ff_h264_idct_add_neon) + CONFIG_THUMB
+.eqv eqv_ff_h264_idct_dc_add_neon,  X(ff_h264_idct_dc_add_neon) + CONFIG_THUMB
+.eqv eqv_ff_h264_idct8_add_neon,X(ff_h264_idct8_add_neon) + CONFIG_THUMB
+.eqv eqv_ff_h264_idct8_dc_add_neon, X(ff_h264_idct8_dc_add_neon) + CONFIG_THUMB
+
 function ff_h264_idct_add_neon, export=1
 vld1.64 {d0-d3},  [r1,:128]
 vmov.i16q15, #0
@@ -113,8 +125,8 @@ function ff_h264_idct_add16_neon, export=1
 movne   lr,  #0
 cmp lr,  #0
 ite ne
-adrne   lr,  X(ff_h264_idct_dc_add_neon) + CONFIG_THUMB
-adreq   lr,  X(ff_h264_idct_add_neon)+ CONFIG_THUMB
+adrne   lr,  eqv_ff_h264_idct_dc_add_neon
+adreq   lr,  eqv_ff_h264_idct_add_neon
 blx lr
 2:  subsip,  ip,  #1
 add r1,  r1,  #32
@@ -138,8 +150,8 @@ function ff_h264_idct_add16intra_neon, export=1
 cmp r8,  #0
 ldrsh   r8,  [r1]
 iteet   ne
-adrne   lr,  X(ff_h264_idct_add_neon)+ CONFIG_THUMB
-adreq   lr,  X(ff_h264_idct_dc_add_neon) + CONFIG_THUMB
+adrne   lr,  eqv_ff_h264_idct_add_neon
+adreq   lr,  eqv_ff_h264_idct_dc_add_neon
 cmpeq   r8,  #0
 blxne   lr
 subsip,  ip,  #1
@@ -166,8 +178,8 @@ function ff_h264_idct_add8_neon, export=1
 cmp r8,  #0
 ldrsh   r8,  [r1]
 iteet   ne
-adrne   lr,  X(ff_h264_idct_add_neon)+ CONFIG_THUMB
-adreq   lr,  X(ff_h264_idct_dc_add_neon) + CONFIG_THUMB
+adrne   lr,  eqv_ff_h264_idct_add_neon
+adreq   lr,  eqv_ff_h264_idct_dc_add_neon
 cmpeq   r8,  #0
 blxne   lr
 add r12, r12, #1
@@ -388,8 +400,8 @@ function ff_h264_idct8_add4_neon, export=1
 movne   lr,  #0
 cmp lr,  #0
 ite ne
-adrne   lr,  X(ff_h264_idct8_dc_add_neon) + CONFIG_THUMB
-adreq   lr,  X(ff_h264_idct8_add_neon)+ CONFIG_THUMB
+adrne   lr,  eqv_ff_h264_idct8_dc_add_neon
+adreq   lr,  eqv_ff_h264_idct8_add_neon
 blx lr
 2:  subsr12, r12, #4
 add r1,  r1,  #128
-- 
2.14.1
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH v2] swscale: fix gbrap16 alpha channel issues

2017-08-03 Thread James Cowgill

Fixes filter-pixfmts-scale test failing on big-endian systems due to
alpSrc not being cast to (const int32_t**).

Also fixes distortions in the output alpha channel values by copying the
alpha channel code from the rgba64 case found elsewhere in output.c.

Fixes ticket 6555.

Signed-off-by: James Cowgill <james.cowg...@imgtec.com>
---
 v2
 
 Move declaration of A inside the loop and don't bother initializing it since
 the initial value would never be read.

 libswscale/output.c | 16 
 tests/ref/fate/filter-pixfmts-scale |  4 ++--
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/libswscale/output.c b/libswscale/output.c
index 9774e9f327..f30bce8dd3 100644
--- a/libswscale/output.c
+++ b/libswscale/output.c
@@ -2026,24 +2026,24 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t 
*lumFilter,
 const int16_t **lumSrcx, int lumFilterSize,
 const int16_t *chrFilter, const int16_t **chrUSrcx,
 const int16_t **chrVSrcx, int chrFilterSize,
-const int16_t **alpSrc, uint8_t **dest,
+const int16_t **alpSrcx, uint8_t **dest,
 int dstW, int y)
 {
 const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(c->dstFormat);
 int i;
-int hasAlpha = (desc->flags & AV_PIX_FMT_FLAG_ALPHA) && alpSrc;
+int hasAlpha = (desc->flags & AV_PIX_FMT_FLAG_ALPHA) && alpSrcx;
 uint16_t **dest16 = (uint16_t**)dest;
 const int32_t **lumSrc  = (const int32_t**)lumSrcx;
 const int32_t **chrUSrc = (const int32_t**)chrUSrcx;
 const int32_t **chrVSrc = (const int32_t**)chrVSrcx;
-int A = 0; // init to silence warning
+const int32_t **alpSrc  = (const int32_t**)alpSrcx;
 
 for (i = 0; i < dstW; i++) {
 int j;
 int Y = -0x4000;
 int U = -(128 << 23);
 int V = -(128 << 23);
-int R, G, B;
+int R, G, B, A;
 
 for (j = 0; j < lumFilterSize; j++)
 Y += lumSrc[j][i] * (unsigned)lumFilter[j];
@@ -2059,13 +2059,13 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t 
*lumFilter,
 V >>= 14;
 
 if (hasAlpha) {
-A = 1 << 18;
+A = -0x4000;
 
 for (j = 0; j < lumFilterSize; j++)
 A += alpSrc[j][i] * lumFilter[j];
 
-if (A & 0xF800)
-A =  av_clip_uintp2(A, 27);
+A >>= 1;
+A += 0x20002000;
 }
 
 Y -= c->yuv2rgb_y_offset;
@@ -2083,7 +2083,7 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t 
*lumFilter,
 dest16[1][i] = B >> 14;
 dest16[2][i] = R >> 14;
 if (hasAlpha)
-dest16[3][i] = A >> 11;
+dest16[3][i] = av_clip_uintp2(A, 30) >> 14;
 }
 if ((!isBE(c->dstFormat)) != (!HAVE_BIGENDIAN)) {
 for (i = 0; i < dstW; i++) {
diff --git a/tests/ref/fate/filter-pixfmts-scale 
b/tests/ref/fate/filter-pixfmts-scale
index 9b601b71da..dcc34bd4d1 100644
--- a/tests/ref/fate/filter-pixfmts-scale
+++ b/tests/ref/fate/filter-pixfmts-scale
@@ -23,8 +23,8 @@ gbrap10be   6d89abb9248006c3e9017545e9474654
 gbrap10le   cf974e23f485a10740f5de74a5c8c3df
 gbrap12be   1d9b57766ba9c2192403f43967cb9af0
 gbrap12le   bb1ba1c157717db3dd612a76d38a018e
-gbrap16be   81542b96575d1fe3b239d23899f5ece3
-gbrap16le   6feb8b9da131917abe867e0eaaf07b90
+gbrap16be   c72b935a6e57a8e1c37bff08c2db55b1
+gbrap16le   13eb0e62b1ac9c1c86c81521eaefab5f
 gbrpdc3387f925f972c61aae7eb23cdc19f0
 gbrp10be0277d4c3a8498d75e2783fb81379e481
 gbrp10lef3d70f8ab845c3c9b8f7452e4a6e285a
-- 
2.13.3

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] swscale: fix gbrap16 alpha channel issues

2017-08-03 Thread James Cowgill

Hi,

On 02/08/17 23:21, Michael Niedermayer wrote:
> On Wed, Aug 02, 2017 at 03:32:04PM +0100, James Cowgill wrote:
>> Hi,
>>
>> On 02/08/17 14:18, Michael Niedermayer wrote:
>>> On Tue, Aug 01, 2017 at 02:46:22PM +0100, James Cowgill wrote:
>>>> Fixes filter-pixfmts-scale test failing on big-endian systems due to
>>>> alpSrc not being cast to (const int32_t**).
>>>>
>>>> Also fixes distortions in the output alpha channel values by copying the
>>>> alpha channel code from the rgba64 case found elsewhere in output.c.
>>>>
>>>> Fixes ticket 6555.
>>>>
>>>> Signed-off-by: James Cowgill <james.cowg...@imgtec.com>
>>>> ---
>>>>  libswscale/output.c | 15 ---
>>>>  tests/ref/fate/filter-pixfmts-scale |  4 ++--
>>>>  2 files changed, 10 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/libswscale/output.c b/libswscale/output.c
>>>> index 9774e9f327..8e5ec0a256 100644
>>>> --- a/libswscale/output.c
>>>> +++ b/libswscale/output.c
>>>> @@ -2026,17 +2026,18 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t 
>>>> *lumFilter,
>>>>  const int16_t **lumSrcx, int lumFilterSize,
>>>>  const int16_t *chrFilter, const int16_t **chrUSrcx,
>>>>  const int16_t **chrVSrcx, int chrFilterSize,
>>>> -const int16_t **alpSrc, uint8_t **dest,
>>>> +const int16_t **alpSrcx, uint8_t **dest,
>>>>  int dstW, int y)
>>>>  {
>>>>  const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(c->dstFormat);
>>>>  int i;
>>>> -int hasAlpha = (desc->flags & AV_PIX_FMT_FLAG_ALPHA) && alpSrc;
>>>> +int hasAlpha = (desc->flags & AV_PIX_FMT_FLAG_ALPHA) && alpSrcx;
>>>>  uint16_t **dest16 = (uint16_t**)dest;
>>>>  const int32_t **lumSrc  = (const int32_t**)lumSrcx;
>>>>  const int32_t **chrUSrc = (const int32_t**)chrUSrcx;
>>>>  const int32_t **chrVSrc = (const int32_t**)chrVSrcx;
>>>> -int A = 0; // init to silence warning
>>>> +const int32_t **alpSrc  = (const int32_t**)alpSrcx;
>>>
>>>> +int A = 0x << 14;
>>>
>>> unused value
>>
>> The initial value of A is unused in the old code, but not in the new code.
> 
> IIRC all uses are under hasAlpha and it is writen to in that case first

Sorry, you're right. I think I was looking at the code from yuv2rgba64.
I'll send a v2.

Thanks,
James
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] swscale: fix gbrap16 alpha channel issues

2017-08-02 Thread James Cowgill

Hi,

On 02/08/17 14:18, Michael Niedermayer wrote:
> On Tue, Aug 01, 2017 at 02:46:22PM +0100, James Cowgill wrote:
>> Fixes filter-pixfmts-scale test failing on big-endian systems due to
>> alpSrc not being cast to (const int32_t**).
>>
>> Also fixes distortions in the output alpha channel values by copying the
>> alpha channel code from the rgba64 case found elsewhere in output.c.
>>
>> Fixes ticket 6555.
>>
>> Signed-off-by: James Cowgill <james.cowg...@imgtec.com>
>> ---
>>  libswscale/output.c | 15 ---
>>  tests/ref/fate/filter-pixfmts-scale |  4 ++--
>>  2 files changed, 10 insertions(+), 9 deletions(-)
>>
>> diff --git a/libswscale/output.c b/libswscale/output.c
>> index 9774e9f327..8e5ec0a256 100644
>> --- a/libswscale/output.c
>> +++ b/libswscale/output.c
>> @@ -2026,17 +2026,18 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t 
>> *lumFilter,
>>  const int16_t **lumSrcx, int lumFilterSize,
>>  const int16_t *chrFilter, const int16_t **chrUSrcx,
>>  const int16_t **chrVSrcx, int chrFilterSize,
>> -const int16_t **alpSrc, uint8_t **dest,
>> +const int16_t **alpSrcx, uint8_t **dest,
>>  int dstW, int y)
>>  {
>>  const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(c->dstFormat);
>>  int i;
>> -int hasAlpha = (desc->flags & AV_PIX_FMT_FLAG_ALPHA) && alpSrc;
>> +int hasAlpha = (desc->flags & AV_PIX_FMT_FLAG_ALPHA) && alpSrcx;
>>  uint16_t **dest16 = (uint16_t**)dest;
>>  const int32_t **lumSrc  = (const int32_t**)lumSrcx;
>>  const int32_t **chrUSrc = (const int32_t**)chrUSrcx;
>>  const int32_t **chrVSrc = (const int32_t**)chrVSrcx;
>> -int A = 0; // init to silence warning
>> +const int32_t **alpSrc  = (const int32_t**)alpSrcx;
> 
>> +int A = 0x << 14;
> 
> unused value

The initial value of A is unused in the old code, but not in the new code.

>>  
>>  for (i = 0; i < dstW; i++) {
>>  int j;
>> @@ -2059,13 +2060,13 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t 
>> *lumFilter,
>>  V >>= 14;
>>  
>>  if (hasAlpha) {
>> -A = 1 << 18;
>> +A = -0x4000;
> 
> where does this value come from ?
> it looks copy and pasted from luma, but alpha does not have a black
> level offset as its not luminance

I confess I only know the basics of how these functions work. On the
basis that yuv2gbrp_full_X_c looks like it copies yuv2rgb_X_c_template,
and I would have thought the rgb and gbr cases should be similar, I
copied a number of things from yuv2rgba64_full_X_c_template into this
function. That value and all of the modifications inside the for loop
come from there.

>>  
>>  for (j = 0; j < lumFilterSize; j++)
>>  A += alpSrc[j][i] * lumFilter[j];
>>  
>> -if (A & 0xF800)
>> -A =  av_clip_uintp2(A, 27);
>> +A >>= 1;
>> +A += 0x20002000;
>>  }
>>  
>>  Y -= c->yuv2rgb_y_offset;
>> @@ -2083,7 +2084,7 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t 
>> *lumFilter,
>>  dest16[1][i] = B >> 14;
>>  dest16[2][i] = R >> 14;
>>  if (hasAlpha)
>> -dest16[3][i] = A >> 11;
>> +dest16[3][i] = av_clip_uintp2(A, 30) >> 14;
> 
> why do you move the cliping code here, this seems unneeded
> outside the removed if()

This is where the clipping code in yuv2rgba64_full_X_c_template is, and
in that function, the value of A is not clipped - only the value stored
in dest.

Thanks,
James
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] swscale: fix gbrap16 alpha channel issues

2017-08-01 Thread James Cowgill

Fixes filter-pixfmts-scale test failing on big-endian systems due to
alpSrc not being cast to (const int32_t**).

Also fixes distortions in the output alpha channel values by copying the
alpha channel code from the rgba64 case found elsewhere in output.c.

Fixes ticket 6555.

Signed-off-by: James Cowgill <james.cowg...@imgtec.com>
---
 libswscale/output.c | 15 ---
 tests/ref/fate/filter-pixfmts-scale |  4 ++--
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/libswscale/output.c b/libswscale/output.c
index 9774e9f327..8e5ec0a256 100644
--- a/libswscale/output.c
+++ b/libswscale/output.c
@@ -2026,17 +2026,18 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t 
*lumFilter,
 const int16_t **lumSrcx, int lumFilterSize,
 const int16_t *chrFilter, const int16_t **chrUSrcx,
 const int16_t **chrVSrcx, int chrFilterSize,
-const int16_t **alpSrc, uint8_t **dest,
+const int16_t **alpSrcx, uint8_t **dest,
 int dstW, int y)
 {
 const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(c->dstFormat);
 int i;
-int hasAlpha = (desc->flags & AV_PIX_FMT_FLAG_ALPHA) && alpSrc;
+int hasAlpha = (desc->flags & AV_PIX_FMT_FLAG_ALPHA) && alpSrcx;
 uint16_t **dest16 = (uint16_t**)dest;
 const int32_t **lumSrc  = (const int32_t**)lumSrcx;
 const int32_t **chrUSrc = (const int32_t**)chrUSrcx;
 const int32_t **chrVSrc = (const int32_t**)chrVSrcx;
-int A = 0; // init to silence warning
+const int32_t **alpSrc  = (const int32_t**)alpSrcx;
+int A = 0x << 14;
 
 for (i = 0; i < dstW; i++) {
 int j;
@@ -2059,13 +2060,13 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t 
*lumFilter,
 V >>= 14;
 
 if (hasAlpha) {
-A = 1 << 18;
+A = -0x4000;
 
 for (j = 0; j < lumFilterSize; j++)
 A += alpSrc[j][i] * lumFilter[j];
 
-if (A & 0xF800)
-A =  av_clip_uintp2(A, 27);
+A >>= 1;
+A += 0x20002000;
 }
 
 Y -= c->yuv2rgb_y_offset;
@@ -2083,7 +2084,7 @@ yuv2gbrp16_full_X_c(SwsContext *c, const int16_t 
*lumFilter,
 dest16[1][i] = B >> 14;
 dest16[2][i] = R >> 14;
 if (hasAlpha)
-dest16[3][i] = A >> 11;
+dest16[3][i] = av_clip_uintp2(A, 30) >> 14;
 }
 if ((!isBE(c->dstFormat)) != (!HAVE_BIGENDIAN)) {
 for (i = 0; i < dstW; i++) {
diff --git a/tests/ref/fate/filter-pixfmts-scale 
b/tests/ref/fate/filter-pixfmts-scale
index 9b601b71da..dcc34bd4d1 100644
--- a/tests/ref/fate/filter-pixfmts-scale
+++ b/tests/ref/fate/filter-pixfmts-scale
@@ -23,8 +23,8 @@ gbrap10be   6d89abb9248006c3e9017545e9474654
 gbrap10le   cf974e23f485a10740f5de74a5c8c3df
 gbrap12be   1d9b57766ba9c2192403f43967cb9af0
 gbrap12le   bb1ba1c157717db3dd612a76d38a018e
-gbrap16be   81542b96575d1fe3b239d23899f5ece3
-gbrap16le   6feb8b9da131917abe867e0eaaf07b90
+gbrap16be   c72b935a6e57a8e1c37bff08c2db55b1
+gbrap16le   13eb0e62b1ac9c1c86c81521eaefab5f
 gbrpdc3387f925f972c61aae7eb23cdc19f0
 gbrp10be0277d4c3a8498d75e2783fb81379e481
 gbrp10lef3d70f8ab845c3c9b8f7452e4a6e285a
-- 
2.13.3

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] mips/float_dsp: fix vector_fmul_window_mips on mips64

2015-03-18 Thread James Cowgill

Commit dfa920807494 (mips/float_dsp: fix a bug in vector_fmul_window_mips)
fixed vector_fmul_window_mips by unrolling the loop only 4 times, but also
removed the outer C loop and replaced it with assembly branches and pointer
arithmetic. When submitting my 64-bit porting patch I missed this new
assembly which also needed porting.

This patch fixes a bus error in the fate-float-dsp test when run on 64-bit
mips.

Signed-off-by: James Cowgill james...@cowgill.org.uk
Cc: Nedeljko Babic nedeljko.ba...@imgtec.com
---
 libavutil/mips/float_dsp_mips.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/libavutil/mips/float_dsp_mips.c b/libavutil/mips/float_dsp_mips.c
index a455687..b3a812c 100644
--- a/libavutil/mips/float_dsp_mips.c
+++ b/libavutil/mips/float_dsp_mips.c
@@ -188,10 +188,10 @@ static void vector_fmul_window_mips(float *dst, const 
float *src0,
 lwc1%[wj3],   -12(%[win_j])\n\t
 lwc1%[s0], 8(%[src0_i])\n\t
 lwc1%[s01],12(%[src0_i])   \n\t
-addiu   %[src1_j],-16  \n\t
-addiu   %[win_i],  16  \n\t
-addiu   %[win_j], -16  \n\t
-addiu   %[src0_i], 16  \n\t
+PTR_ADDIU %[src1_j],-16\n\t
+PTR_ADDIU %[win_i],16  \n\t
+PTR_ADDIU %[win_j],-16 \n\t
+PTR_ADDIU %[src0_i],16 \n\t
 swc1%[temp],   0(%[dst_i]) \n\t /* dst[i] = 
s0*wj - s1*wi; */
 swc1%[temp1],  0(%[dst_j]) \n\t /* dst[j] = 
s0*wi + s1*wj; */
 swc1%[temp2],  4(%[dst_i]) \n\t /* dst[i+1] = 
s01*wj1 - s11*wi1; */
@@ -208,8 +208,8 @@ static void vector_fmul_window_mips(float *dst, const float 
*src0,
 swc1%[temp1], -8(%[dst_j]) \n\t /* dst[j-2] = 
s0*wi2 + s1*wj2; */
 swc1%[temp2],  12(%[dst_i])\n\t /* dst[i+2] = 
s01*wj3 - s11*wi3; */
 swc1%[temp3], -12(%[dst_j])\n\t /* dst[j-3] = 
s01*wi3 + s11*wj3; */
-addiu   %[dst_i],  16  \n\t
-addiu   %[dst_j], -16  \n\t
+PTR_ADDIU %[dst_i],16  \n\t
+PTR_ADDIU %[dst_j],-16 \n\t
 bne %[win_i], %[lp_end], 1b\n\t
 : [temp]=f(temp), [temp1]=f(temp1), [temp2]=f(temp2),
   [temp3]=f(temp3), [src0_i]+r(src0_i), [win_i]+r(win_i),
-- 
2.1.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH v3] mips/asmdefs: use _ABI64 as defined by gcc

2015-03-11 Thread James Cowgill

Unfortunately android  api 21 (lollipop) doesn't have the sgidefs.h header,
the easiest way around this is to just use the preprocessor definitions from
gcc / clang.

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
Hi,

Sorry I forgot about this a little.

I think that doing it this way is better than messing around with different
headers which may not exist. I know it works on GCC and Clang.

Thanks,
James

 libavutil/mips/asmdefs.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/libavutil/mips/asmdefs.h b/libavutil/mips/asmdefs.h
index a3a5ee3..fdf82a0 100644
--- a/libavutil/mips/asmdefs.h
+++ b/libavutil/mips/asmdefs.h
@@ -27,9 +27,7 @@
 #ifndef AVUTIL_MIPS_ASMDEFS_H
 #define AVUTIL_MIPS_ASMDEFS_H
 
-#include sgidefs.h
-
-#if _MIPS_SIM == _ABI64
+#if defined(_ABI64)  _MIPS_SIM == _ABI64
 # define PTRSIZE 8 
 # define PTRLOG  3 
 # define PTR_ADDU   daddu 
-- 
2.1.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] mips/asmdefs: use asm/sgidefs.h header on linux

2015-03-07 Thread James Cowgill

On Sat, 2015-03-07 at 18:06 +0100, wm4 wrote:
On Sat, 7 Mar 2015 10:13:23 +
James Cowgill james...@cowgill.org.uk wrote:

Unfortunately android api 21 (lollipop) doesn't have the sgidefs.h header,
but the linux kernel does in asm/sgidefs.h. So use that header if we can.

Change _ABI64 to _MIPS_SIM_ABI64 which is defined in both headers.

What does this header contain? Requiring kernel headers for anything
but Linux specific syscalls or for building kernel modules is incredibly
broken.

Yes the correct header on mips is just 'sgidefs.h' and while glibc has
provided it for years, android bionic only added it for lollipop.

This is the kernel header:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/mips/include/uapi/asm/sgidefs.h

The one provided by glibc has a little more stuff but we don't need it.
_MIPS_SIM is defined by GCC (and some older mips compilers) to be equal
to one of the _MIPS_SIM_* constants depending on which ABI is selected.

GCC and Clang also define _ABI* themselves (as well as being defined in
the glibc version of the header) for the current ABI, so I suppose using
this without including anything might work if we don't care about other
compilers:

#if defined(_ABI64) _MIPS_SIM == _ABI64

And __linux__ is of course completely out of the question. Just because
it's Linux, the libc doesn't necessarily provide kernel headers.

James

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] mips/asmdefs: change include guard to read AVUTIL_ instead of AVCODEC_

2015-03-07 Thread James Cowgill

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 libavutil/mips/asmdefs.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavutil/mips/asmdefs.h b/libavutil/mips/asmdefs.h
index 3660e98..04c036e 100644
--- a/libavutil/mips/asmdefs.h
+++ b/libavutil/mips/asmdefs.h
@@ -24,8 +24,8 @@
  * assembly (rather than from within .s files).
  */
 
-#ifndef AVCODEC_MIPS_ASMDEFS_H
-#define AVCODEC_MIPS_ASMDEFS_H
+#ifndef AVUTIL_MIPS_ASMDEFS_H
+#define AVUTIL_MIPS_ASMDEFS_H
 
 #ifdef __linux__
 #include asm/sgidefs.h
-- 
2.1.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH v2 3/4] mips: port optimizations to mips n64

2015-03-07 Thread James Cowgill

On Sat, 2015-03-07 at 10:15 +0100, Michael Niedermayer wrote:
 On Sat, Mar 07, 2015 at 02:47:51AM -0300, James Almer wrote:
  On 05/03/15 2:40 PM, James Cowgill wrote:
   diff --git a/libavutil/mips/asmdefs.h b/libavutil/mips/asmdefs.h
   new file mode 100644
   index 000..4d2922c
   --- /dev/null
   +++ b/libavutil/mips/asmdefs.h
   @@ -0,0 +1,48 @@
   +/*
   + * Copyright (c) 2015 Imagination Technologies Ltd
   + *
   + * This file is part of FFmpeg.
   + *
   + * FFmpeg is free software; you can redistribute it and/or
   + * modify it under the terms of the GNU Lesser General Public
   + * License as published by the Free Software Foundation; either
   + * version 2.1 of the License, or (at your option) any later version.
   + *
   + * FFmpeg is distributed in the hope that it will be useful,
   + * but WITHOUT ANY WARRANTY; without even the implied warranty of
   + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   + * Lesser General Public License for more details.
   + *
   + * You should have received a copy of the GNU Lesser General Public
   + * License along with FFmpeg; if not, write to the Free Software
   + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 
   02110-1301 USA
   + */
   +
   +/**
   + * @file
   + * MIPS assembly defines from sys/asm.h but rewritten for use with C 
   inline
   + * assembly (rather than from within .s files).
   + */
   +
   +#ifndef AVCODEC_MIPS_ASMDEFS_H
   +#define AVCODEC_MIPS_ASMDEFS_H
   +
   +#include sgidefs.h
   +
   +#if _MIPS_SIM == _ABI64
  
  This broke compilation with Android NDK r8 (Which apparently doesn't 
  support mips 64 bits).
  http://fate.ffmpeg.org/report.cgi?time=20150307052927slot=mipsel-android-gcc-4.4
  
  CC  libavutil/mips/float_dsp_mips.o
  mipsel-linux-android-gcc-4.4.3: unrecognized option '-pthreads'
  In file included from /home/fate/fate/src/libavcodec/mips/aacdec_mips.h:61,
   from /home/fate/fate/src/libavcodec/aacdec.c:113:
  /home/fate/fate/src/libavutil/mips/asmdefs.h:30:21: error: sgidefs.h: No 
  such file or directory
  /home/fate/fate/src/libavutil/mips/asmdefs.h:32:18: warning: _ABI64 is 
  not defined
  make: *** [libavcodec/aacdec.o] Error 1

Lovely. It looks like sgidefs.h was only added in lollipop (even though
it's been everywhere else for years):
https://github.com/android/platform_bionic/commit/1c2cf23a0c54619e7a362e1b82b0fb37ec9dd11a

But there's still asm/sgidefs.h defined in the linux kernel headers we
can use instead (give me a moment).

James

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] mips/asmdefs: use asm/sgidefs.h header on linux

2015-03-07 Thread James Cowgill

Unfortunately android  api 21 (lollipop) doesn't have the sgidefs.h header,
but the linux kernel does in asm/sgidefs.h. So use that header if we can.

Change _ABI64 to _MIPS_SIM_ABI64 which is defined in both headers.

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 libavutil/mips/asmdefs.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/libavutil/mips/asmdefs.h b/libavutil/mips/asmdefs.h
index 4d2922c..3660e98 100644
--- a/libavutil/mips/asmdefs.h
+++ b/libavutil/mips/asmdefs.h
@@ -27,9 +27,13 @@
 #ifndef AVCODEC_MIPS_ASMDEFS_H
 #define AVCODEC_MIPS_ASMDEFS_H
 
+#ifdef __linux__
+#include asm/sgidefs.h
+#else
 #include sgidefs.h
+#endif
 
-#if _MIPS_SIM == _ABI64
+#if _MIPS_SIM == _MIPS_SIM_ABI64
 # define PTRSIZE 8 
 # define PTRLOG  3 
 # define PTR_ADDU   daddu 
-- 
2.1.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH v2] mips/asmdefs: use asm/sgidefs.h header on linux

2015-03-07 Thread James Cowgill

Unfortunately android  api 21 (lollipop) doesn't have the sgidefs.h header,
but the linux kernel does have an almost equivalent asm/sgidefs.h which will
do so use that header if we can.

Change _ABI64 to _MIPS_SIM_ABI64 which is defined in both headers.

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 configure| 4 
 libavutil/mips/asmdefs.h | 6 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 1ea2032..a5ff67c 100755
--- a/configure
+++ b/configure
@@ -1642,6 +1642,7 @@ HEADERS_LIST=
 alsa_asoundlib_h
 altivec_h
 arpa_inet_h
+asm_sgidefs_h
 asm_types_h
 cdio_paranoia_h
 cdio_paranoia_paranoia_h
@@ -4570,6 +4571,9 @@ EOF
 
 elif enabled mips; then
 
+check_header asm/sgidefs.h || check_header sgidefs.h || \
+die either asm/sgidefs.h or sgidefs.h is required on mips
+
 check_inline_asm loongson 'dmult.g $1, $2, $3'
 
 # Enable minimum ISA based on selected options
diff --git a/libavutil/mips/asmdefs.h b/libavutil/mips/asmdefs.h
index a3a5ee3..0e911cb 100644
--- a/libavutil/mips/asmdefs.h
+++ b/libavutil/mips/asmdefs.h
@@ -27,9 +27,13 @@
 #ifndef AVUTIL_MIPS_ASMDEFS_H
 #define AVUTIL_MIPS_ASMDEFS_H
 
+#if HAVE_ASM_SGIDEFS_H
+#include asm/sgidefs.h
+#else
 #include sgidefs.h
+#endif
 
-#if _MIPS_SIM == _ABI64
+#if _MIPS_SIM == _MIPS_SIM_ABI64
 # define PTRSIZE 8 
 # define PTRLOG  3 
 # define PTR_ADDU   daddu 
-- 
2.1.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH v2] mips/asmdefs: use asm/sgidefs.h header on linux

2015-03-07 Thread James Cowgill

On Sat, 2015-03-07 at 13:32 +0100, Michael Niedermayer wrote:
 On Sat, Mar 07, 2015 at 10:56:45AM +, James Cowgill wrote:
  Unfortunately android  api 21 (lollipop) doesn't have the sgidefs.h header,
  but the linux kernel does have an almost equivalent asm/sgidefs.h which will
  do so use that header if we can.
  
  Change _ABI64 to _MIPS_SIM_ABI64 which is defined in both headers.
  
  Signed-off-by: James Cowgill james...@cowgill.org.uk
 
 tryng to build for androidmips:
 In file included from ffmpeg/libavcodec/mips/mpegaudiodsp_mips_float.c:58:
 ffmpeg/libavutil/mips/asmdefs.h:30:5: warning: HAVE_ASM_SGIDEFS_H is not 
 defined
 ffmpeg/libavutil/mips/asmdefs.h:33:21: error: sgidefs.h: No such file or 
 directory
 ffmpeg/libavutil/mips/asmdefs.h:36:18: warning: _MIPS_SIM_ABI64 is not 
 defined
 
 i guess you are missing a include config.h in this

Woops sorry - I don't have access to any mips machines at the weekend so
it wasn't tested a huge amount (time for qemu...)

James

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH v2 3/4] mips: port optimizations to mips n64

2015-03-05 Thread James Cowgill

This mainly consists of replacing all the pointer arithmatic 'addiu'
instructions with PTR_ADDIU which will handle the differences in pointer
sizes when compiled on 64 bit mips systems.

The header asmdefs.h contains the PTR_ macros which expend to the correct mips
instructions to manipulate registers containing pointers.

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 libavcodec/mips/aacdec_mips.c | 21 +--
 libavcodec/mips/aacdec_mips.h |  9 ++---
 libavcodec/mips/aacpsdsp_mips.c   | 43 +++---
 libavcodec/mips/aacpsy_mips.h |  6 ++--
 libavcodec/mips/aacsbr_mips.c | 53 +--
 libavcodec/mips/aacsbr_mips.h | 17 -
 libavcodec/mips/ac3dsp_mips.c | 59 ---
 libavcodec/mips/acelp_filters_mips.c  | 13 +++
 libavcodec/mips/acelp_vectors_mips.c  |  7 ++--
 libavcodec/mips/celp_filters_mips.c   | 13 +++
 libavcodec/mips/celp_math_mips.c  |  5 +--
 libavcodec/mips/compute_antialias_float.h |  4 ++-
 libavcodec/mips/fft_mips.c| 13 +++
 libavcodec/mips/fmtconvert_mips.c |  6 ++--
 libavcodec/mips/lsp_mips.h|  6 ++--
 libavcodec/mips/mpegaudiodsp_mips_fixed.c | 11 +++---
 libavcodec/mips/mpegaudiodsp_mips_float.c | 25 ++---
 libavcodec/mips/sbrdsp_mips.c | 45 +++
 libavutil/mips/asmdefs.h  | 48 +
 libavutil/mips/float_dsp_mips.c   | 21 +--
 20 files changed, 247 insertions(+), 178 deletions(-)
 create mode 100644 libavutil/mips/asmdefs.h

diff --git a/libavcodec/mips/aacdec_mips.c b/libavcodec/mips/aacdec_mips.c
index 93947be..253cdeb 100644
--- a/libavcodec/mips/aacdec_mips.c
+++ b/libavcodec/mips/aacdec_mips.c
@@ -56,6 +56,7 @@
 #include aacdec_mips.h
 #include libavcodec/aactab.h
 #include libavcodec/sinewin.h
+#include libavutil/mips/asmdefs.h
 
 #if HAVE_INLINE_ASM
 static av_always_inline void float_copy(float *dst, const float *src, int 
count)
@@ -80,7 +81,7 @@ static av_always_inline void float_copy(float *dst, const 
float *src, int count)
 lw  %[temp5],20(%[src]) \n\t
 lw  %[temp6],24(%[src]) \n\t
 lw  %[temp7],28(%[src]) \n\t
-addiu   %[src],  %[src],  32\n\t
+PTR_ADDIU %[src],%[src],  32\n\t
 sw  %[temp0],0(%[dst])  \n\t
 sw  %[temp1],4(%[dst])  \n\t
 sw  %[temp2],8(%[dst])  \n\t
@@ -90,7 +91,7 @@ static av_always_inline void float_copy(float *dst, const 
float *src, int count)
 sw  %[temp6],24(%[dst]) \n\t
 sw  %[temp7],28(%[dst]) \n\t
 bne %[src],  %[loop_end], 1b\n\t
-addiu   %[dst],  %[dst],  32\n\t
+PTR_ADDIU %[dst],%[dst],  32\n\t
 .set pop\n\t
 
 : [temp0]=r(temp[0]), [temp1]=r(temp[1]),
@@ -250,7 +251,7 @@ static void apply_ltp_mips(AACContext *ac, 
SingleChannelElement *sce)
 sw  $0,  4(%[p_predTime])\n\t
 sw  $0,  8(%[p_predTime])\n\t
 sw  $0,  12(%[p_predTime])   \n\t
-addiu   %[p_predTime],   %[p_predTime], 16   \n\t
+PTR_ADDIU %[p_predTime], %[p_predTime], 16   \n\t
 
 : [p_predTime]+r(p_predTime)
 :
@@ -261,7 +262,7 @@ static void apply_ltp_mips(AACContext *ac, 
SingleChannelElement *sce)
 
 __asm__ volatile (
 sw  $0,  0(%[p_predTime])\n\t
-addiu   %[p_predTime],   %[p_predTime], 4\n\t
+PTR_ADDIU %[p_predTime], %[p_predTime], 4\n\t
 
 : [p_predTime]+r(p_predTime)
 :
@@ -315,9 +316,9 @@ static av_always_inline void fmul_and_reverse(float *dst, 
const float *src0, con
 swc1%[temp9],4(%[ptr1])\n\t
 swc1%[temp10],   8(%[ptr1])\n\t
 swc1%[temp11],   12(%[ptr1])   \n\t
-addiu   %[ptr1], %[ptr1],  16  \n\t
-addiu   %[ptr2], %[ptr2],  -16 \n\t
-addiu   %[ptr3], %[ptr3],  -16 \n\t
+PTR_ADDIU %[ptr1],   %[ptr1],  16  \n\t
+PTR_ADDIU %[ptr2],   %[ptr2],  -16 \n\t
+PTR_ADDIU %[ptr3],   %[ptr3],  -16 \n\t
 
 : [temp0]=f(temp[0]), [temp1]=f(temp[1]),
   [temp2]=f(temp[2]), [temp3]=f(temp[3]),
@@ -358,7 +359,7 @@ static void update_ltp_mips(AACContext *ac, 
SingleChannelElement *sce)
 sw $0,  20(%[p_saved_ltp])   \n\t

[FFmpeg-devel] [PATCH v2 1/4] mips/aacdec: remove uses of mips32r2 specific ext instructions

2015-03-05 Thread James Cowgill

Removing these removes the dependency of this code on mips32r2 which would
allow it to be used on processors which have FPU instructions, but not r2
instructions (like the mips64el debian port for instance).

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 libavcodec/mips/aacdec_mips.h | 49 ++-
 1 file changed, 25 insertions(+), 24 deletions(-)

diff --git a/libavcodec/mips/aacdec_mips.h b/libavcodec/mips/aacdec_mips.h
index 9ba3079..c9efdbb 100644
--- a/libavcodec/mips/aacdec_mips.h
+++ b/libavcodec/mips/aacdec_mips.h
@@ -68,10 +68,10 @@ static inline float *VMUL2_mips(float *dst, const float *v, 
unsigned idx,
 float *ret;
 
 __asm__ volatile(
-andi%[temp3],  %[idx],   15   \n\t
-ext %[temp4],  %[idx],   4,  4\n\t
+andi%[temp3],  %[idx],   0x0F \n\t
+andi%[temp4],  %[idx],   0xF0 \n\t
 sll %[temp3],  %[temp3], 2\n\t
-sll %[temp4],  %[temp4], 2\n\t
+srl %[temp4],  %[temp4], 2\n\t
 lwc1%[temp2],  0(%[scale])\n\t
 lwxc1   %[temp0],  %[temp3](%[v]) \n\t
 lwxc1   %[temp1],  %[temp4](%[v]) \n\t
@@ -99,14 +99,13 @@ static inline float *VMUL4_mips(float *dst, const float *v, 
unsigned idx,
 float *ret;
 
 __asm__ volatile(
-andi%[temp0],  %[idx],   3   \n\t
-ext %[temp1],  %[idx],   2,  2   \n\t
-ext %[temp2],  %[idx],   4,  2   \n\t
-ext %[temp3],  %[idx],   6,  2   \n\t
+andi%[temp0],  %[idx],   0x03\n\t
+andi%[temp1],  %[idx],   0x0C\n\t
+andi%[temp2],  %[idx],   0x30\n\t
+andi%[temp3],  %[idx],   0xC0\n\t
 sll %[temp0],  %[temp0], 2   \n\t
-sll %[temp1],  %[temp1], 2   \n\t
-sll %[temp2],  %[temp2], 2   \n\t
-sll %[temp3],  %[temp3], 2   \n\t
+srl %[temp2],  %[temp2], 2   \n\t
+srl %[temp3],  %[temp3], 4   \n\t
 lwc1%[temp4],  0(%[scale])   \n\t
 lwxc1   %[temp5],  %[temp0](%[v])\n\t
 lwxc1   %[temp6],  %[temp1](%[v])\n\t
@@ -142,14 +141,14 @@ static inline float *VMUL2S_mips(float *dst, const float 
*v, unsigned idx,
 float *ret;
 
 __asm__ volatile(
-andi%[temp0],  %[idx],   15 \n\t
-ext %[temp1],  %[idx],   4, 4   \n\t
+andi%[temp0],  %[idx],   0x0F   \n\t
+andi%[temp1],  %[idx],   0xF0   \n\t
 lw  %[temp4],  0(%[scale])  \n\t
 srl %[temp2],  %[sign],  1  \n\t
 sll %[temp3],  %[sign],  31 \n\t
 sll %[temp2],  %[temp2], 31 \n\t
 sll %[temp0],  %[temp0], 2  \n\t
-sll %[temp1],  %[temp1], 2  \n\t
+srl %[temp1],  %[temp1], 2  \n\t
 lwxc1   %[temp8],  %[temp0](%[v])   \n\t
 lwxc1   %[temp9],  %[temp1](%[v])   \n\t
 xor %[temp5],  %[temp4], %[temp2]   \n\t
@@ -185,22 +184,24 @@ static inline float *VMUL4S_mips(float *dst, const float 
*v, unsigned idx,
 
 __asm__ volatile(
 lw  %[temp0],   0(%[scale])   \n\t
-and %[temp1],   %[idx],   3   \n\t
-ext %[temp2],   %[idx],   2,  2   \n\t
-ext %[temp3],   %[idx],   4,  2   \n\t
-ext %[temp4],   %[idx],   6,  2   \n\t
-sll %[temp1],   %[temp1], 2   \n\t
-sll %[temp2],   %[temp2], 2   \n\t
-sll %[temp3],   %[temp3], 2   \n\t
-sll %[temp4],   %[temp4], 2   \n\t
+andi%[temp1],  %[idx],   0x03 \n\t
+andi%[temp2],  %[idx],   0x0C \n\t
+andi%[temp3],  %[idx],   0x30 \n\t
+andi%[temp4],  %[idx],   0xC0 \n\t
+sll %[temp1],  %[temp1], 2\n\t
+srl %[temp3],  %[temp3], 2\n\t
+srl %[temp4],  %[temp4], 4\n\t
 lwxc1   %[temp10],  %[temp1](%[v])\n\t
 lwxc1   %[temp11],  %[temp2](%[v])\n\t
 lwxc1   %[temp12],  %[temp3](%[v])\n\t
 lwxc1   %[temp13],  %[temp4](%[v])\n\t
 and %[temp1],   %[sign],  %[mask] \n\t
-ext %[temp2],   %[idx],   12, 1   \n\t
-ext %[temp3],   %[idx],   13, 1   \n\t
-ext %[temp4],   %[idx],   14, 1   \n\t
+srl %[temp2

[FFmpeg-devel] [PATCH v2 2/4] configure, mips: remove MIPS32R2, merging it with MIPSFPU

2015-03-05 Thread James Cowgill

There are no independant uses of mips32r2 instructions except for the
FPU parts. Due to the heavy use of mips32r2 specifc fpu extensions, I
am guessing the original author intended MIPSFPU to imply MIPS32R2 anyway.

Since these fpu instructions are available on mips64 (non-r2), enable them
there as well.

Also remove the last occurence of HAVE_MIPS32R2 (which is coupled to
HAVE_MIPSFPU anyway).

mips32r2 is left in the list of options form compatability so that using
--disable-mips32r2 doesn't break anything.

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 Makefile  |  2 +-
 arch.mak  |  1 -
 configure | 18 +-
 libavcodec/mips/ac3dsp_mips.c |  4 ++--
 4 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/Makefile b/Makefile
index 845a274..ca2ce59 100644
--- a/Makefile
+++ b/Makefile
@@ -80,7 +80,7 @@ SUBDIR_VARS := CLEANFILES EXAMPLES FFLIBS HOSTPROGS TESTPROGS 
TOOLS  \
HEADERS ARCH_HEADERS BUILT_HEADERS SKIPHEADERS\
ARMV5TE-OBJS ARMV6-OBJS ARMV8-OBJS VFP-OBJS NEON-OBJS \
ALTIVEC-OBJS MMX-OBJS YASM-OBJS   \
-   MIPSFPU-OBJS MIPSDSPR2-OBJS MIPSDSPR1-OBJS MIPS32R2-OBJS  \
+   MIPSFPU-OBJS MIPSDSPR2-OBJS MIPSDSPR1-OBJS\
OBJS SLIBOBJS HOSTOBJS TESTOBJS
 
 define RESET
diff --git a/arch.mak b/arch.mak
index 0e866d8..48bc2d3 100644
--- a/arch.mak
+++ b/arch.mak
@@ -5,7 +5,6 @@ OBJS-$(HAVE_VFP) += $(VFP-OBJS) $(VFP-OBJS-yes)
 OBJS-$(HAVE_NEON)+= $(NEON-OBJS)$(NEON-OBJS-yes)
 
 OBJS-$(HAVE_MIPSFPU)   += $(MIPSFPU-OBJS)$(MIPSFPU-OBJS-yes)
-OBJS-$(HAVE_MIPS32R2)  += $(MIPS32R2-OBJS)   $(MIPS32R2-OBJS-yes)
 OBJS-$(HAVE_MIPSDSPR1) += $(MIPSDSPR1-OBJS)  $(MIPSDSPR1-OBJS-yes)
 OBJS-$(HAVE_MIPSDSPR2) += $(MIPSDSPR2-OBJS)  $(MIPSDSPR2-OBJS-yes)
 
diff --git a/configure b/configure
index d641d9f..ce745d2 100755
--- a/configure
+++ b/configure
@@ -358,7 +358,6 @@ Optimization options (experts only):
   --disable-neon   disable NEON optimizations
   --disable-inline-asm disable use of inline assembly
   --disable-yasm   disable use of nasm/yasm assembly
-  --disable-mips32r2   disable MIPS32R2 optimizations
   --disable-mipsdspr1  disable MIPS DSP ASE R1 optimizations
   --disable-mipsdspr2  disable MIPS DSP ASE R2 optimizations
   --disable-mipsfpudisable floating point MIPS optimizations
@@ -1999,7 +1998,6 @@ setend_deps=arm
 map 'eval ${v}_inline_deps=inline_asm' $ARCH_EXT_LIST_ARM
 
 mipsfpu_deps=mips
-mips32r2_deps=mips
 mipsdspr1_deps=mips
 mipsdspr2_deps=mips
 
@@ -4569,8 +4567,19 @@ EOF
 elif enabled mips; then
 
 check_inline_asm loongson 'dmult.g $1, $2, $3'
-enabled mips32r2   add_cflags -mips32r2  add_asflags -mips32r2 
- check_inline_asm mips32r2  'rotr $t0, $t1, 1'
+
+# Enable minimum ISA based on selected options
+if enabled mips64  (enabled mipsdspr1 || enabled mipsdspr2); then
+add_cflags -mips64r2
+add_asflags -mips64r2
+elif enabled mips64  enabled mipsfpu; then
+add_cflags -mips64
+add_asflags -mips64
+elif enabled mipsfpu || enabled mipsdspr1 || enabled mipsdspr2; then
+add_cflags -mips32r2
+add_asflags -mips32r2
+fi
+
 enabled mipsdspr1  add_cflags -mdsp  add_asflags -mdsp 
  check_inline_asm mipsdspr1 'addu.qb $t0, $t1, $t2'
 enabled mipsdspr2  add_cflags -mdspr2  add_asflags -mdspr2 
@@ -5522,7 +5531,6 @@ if enabled arm; then
 fi
 if enabled mips; then
 echo MIPS FPU enabled  ${mipsfpu-no}
-echo MIPS32R2 enabled  ${mips32r2-no}
 echo MIPS DSP R1 enabled   ${mipsdspr1-no}
 echo MIPS DSP R2 enabled   ${mipsdspr2-no}
 fi
diff --git a/libavcodec/mips/ac3dsp_mips.c b/libavcodec/mips/ac3dsp_mips.c
index f33c6f1..bd2a611 100644
--- a/libavcodec/mips/ac3dsp_mips.c
+++ b/libavcodec/mips/ac3dsp_mips.c
@@ -199,7 +199,7 @@ static void ac3_update_bap_counts_mips(uint16_t 
mant_cnt[16], uint8_t *bap,
 }
 #endif
 
-#if HAVE_MIPSFPU  HAVE_MIPS32R2
+#if HAVE_MIPSFPU
 static void float_to_fixed24_mips(int32_t *dst, const float *src, unsigned int 
len)
 {
 const float scale = 1  24;
@@ -403,7 +403,7 @@ void ff_ac3dsp_init_mips(AC3DSPContext *c, int bit_exact) {
 c-bit_alloc_calc_bap = ac3_bit_alloc_calc_bap_mips;
 c-update_bap_counts  = ac3_update_bap_counts_mips;
 #endif
-#if HAVE_MIPSFPU  HAVE_MIPS32R2
+#if HAVE_MIPSFPU
 c-float_to_fixed24 = float_to_fixed24_mips;
 c-downmix  = ac3_downmix_mips;
 #endif
-- 
2.1.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH v2 4/4] changelog: add mips 64-bit port

2015-03-05 Thread James Cowgill

---
 Changelog | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Changelog b/Changelog
index 1374cbc..2a5d6b8 100644
--- a/Changelog
+++ b/Changelog
@@ -36,6 +36,7 @@ version next:
 - Canopus HQX decoder
 - RTP depacketization of T.140 text (RFC 4103)
 - VP9 RTP payload format (draft 0) experimental depacketizer
+- Port MIPS opttimizations to 64-bit
 
 
 version 2.5:
-- 
2.1.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH v2 0/4] mips cleanups and port to mips64

2015-03-05 Thread James Cowgill

Hi,

This is the second version of the mips patches without the ones which
were already accepted.

Changes:
 - Keep mips32r2 in the list of configure options so it doesn't break
   anything (although now it's a no op).
 - Drop float dsp patch and just do the normal 64-bit porting.
   This would be good to do properly in generic code at some point. I have
   a feeling that using restrict in the way I did could result in undefined
   behavior in certain cases though.
 - Move asmdefs.h to libavutil/mips to allow for the above change.
 - Rebase the 64-bit porting and fix the conflicts.
 - Add entry in the changelog for the mips 64-bit port.

Thanks,
James

James Cowgill (4):
  mips/aacdec: remove uses of mips32r2 specific ext instructions
  configure, mips: remove MIPS32R2, merging it with MIPSFPU
  mips: port optimizations to mips n64
  changelog: add mips 64-bit port

 Changelog |  1 +
 Makefile  |  2 +-
 arch.mak  |  1 -
 configure | 18 ++---
 libavcodec/mips/aacdec_mips.c | 21 ++-
 libavcodec/mips/aacdec_mips.h | 58 ++--
 libavcodec/mips/aacpsdsp_mips.c   | 43 ++---
 libavcodec/mips/aacpsy_mips.h |  6 ++-
 libavcodec/mips/aacsbr_mips.c | 53 +-
 libavcodec/mips/aacsbr_mips.h | 17 +
 libavcodec/mips/ac3dsp_mips.c | 63 ---
 libavcodec/mips/acelp_filters_mips.c  | 13 ---
 libavcodec/mips/acelp_vectors_mips.c  |  7 ++--
 libavcodec/mips/celp_filters_mips.c   | 13 ---
 libavcodec/mips/celp_math_mips.c  |  5 ++-
 libavcodec/mips/compute_antialias_float.h |  4 +-
 libavcodec/mips/fft_mips.c| 13 ---
 libavcodec/mips/fmtconvert_mips.c |  6 +--
 libavcodec/mips/lsp_mips.h|  6 ++-
 libavcodec/mips/mpegaudiodsp_mips_fixed.c | 11 +++---
 libavcodec/mips/mpegaudiodsp_mips_float.c | 25 ++--
 libavcodec/mips/sbrdsp_mips.c | 45 +++---
 libavutil/mips/asmdefs.h  | 48 +++
 libavutil/mips/float_dsp_mips.c   | 21 ++-
 24 files changed, 289 insertions(+), 211 deletions(-)
 create mode 100644 libavutil/mips/asmdefs.h

-- 
2.1.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 09/12] mips: port optimizations to mips n64

2015-03-04 Thread James Cowgill

On Wed, 2015-03-04 at 11:52 +0100, Michael Niedermayer wrote:
 On Wed, Mar 04, 2015 at 10:10:15AM +, Nedeljko Babic wrote:
  LGTM
 
 seems this does not apply cleanly on HEAD
 Applying: mips: port optimizations to mips n64
 error: patch failed: libavcodec/mips/acelp_filters_mips.c:82
 error: libavcodec/mips/acelp_filters_mips.c: patch does not apply
 error: patch failed: libavcodec/mips/fmtconvert_mips.c:50
 error: libavcodec/mips/fmtconvert_mips.c: patch does not apply
 Patch failed at 0001 mips: port optimizations to mips n64

Yeah, I'll resend it with my other small changes soon.

James

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 02/12] mips/float_dsp: replace assembly with C implementations

2015-03-04 Thread James Cowgill

On Wed, 2015-03-04 at 11:08 +, Nedeljko Babic wrote:
 The assembly versions have a few problems
 - They only work with mips32r2 enabled
 - They don't work on 64-bits
 - They're massive and complex
 
 So replace them with C implementations which solve these problems and let GCC
 magically optimize for different platforms. All the functions are manually
 unrolled 4 times (like the assembly code). With the addition of a few 
 restrict
 keywords, the functions produce almost identical assembly to the original
 versions when compiled with gcc -O3.
 
 Since this code now uses no fpu assembly, drop the HAVE_MIPSFPU guard as 
 well.
 
 All improvements of the C code should be put in generic C code so all 
 architectures
 can benefit from them.
 
 The purpose of this code was to create optimizations for specific 
 architecture.
 In this way optimizations for mips32r2 architecture are here even without 
 tweaking
 configure line and even for older compilers.

That's ok until you try to run it on an old MIPS processor and the
default FFmpeg options cause lots of SIGILLs, but that's another
discussion (and maybe nobody cares :/).

 By putting these optimizations under HAVE_MIPS32R2 problem with building 
 mips64 should
 be resolved and this can be optimized for mips64 later if needed.

I was thinking about just dropping this patch for the time being and
porting a few bits to mips64 like in the other files (the code only uses
the mips vi parts of mips32r2). The code only kept its performance when
I unrolled the loops and used av_restrict. Strictly speaking you're not
even supposed to use restrict if the arrays could be exactly equal
(which is permitted in the contracts for some of these functions).

James

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 07/12] mips/aacdec: remove uses of mips32r2 specific ext instructions

2015-03-03 Thread James Cowgill

On Tue, 2015-03-03 at 12:42 +, Nedeljko Babic wrote:
 Removing these removes the dependency of this code on mips32r2 which would
 allow it to be used on processors which have FPU instructions, but not r2
 instructions (like the mips64el debian port for instance).
 
 
 I would be more comfortable if there were two instances of this code: one for
 mips32r2 and one for mips32 so advantages of using mips32r2 instructions 
 (however small here) are left intact.
 
 On the other hand, since this doesn't change much number of instructions used
 (adding at maximum around 100 instructions overall if I am not mistaking) I 
 am ok with this.

Well I can't see how 'ext' can ever be faster than 'and' (it does more
work) so most of these should be no slower anyway. For VMUL4S my version
has 2 extra instructions in it so it could be a bit slower. Does this
#if seem ok?

--- a/libavcodec/mips/aacdec_mips.h
+++ b/libavcodec/mips/aacdec_mips.h
@@ -198,9 +198,18 @@ static inline float *VMUL4S_mips(float *dst, const float 
*v, unsigned idx,
 lwxc1   %[temp12],  %[temp3](%[v])\n\t
 lwxc1   %[temp13],  %[temp4](%[v])\n\t
 and %[temp1],   %[sign],  %[mask] \n\t
+#if defined(__mips_isa_rev)  __mips_isa_rev = 2
 ext %[temp2],   %[idx],   12, 1   \n\t
 ext %[temp3],   %[idx],   13, 1   \n\t
 ext %[temp4],   %[idx],   14, 1   \n\t
+#else
+srl %[temp2],   %[idx],   12  \n\t
+srl %[temp3],   %[idx],   13  \n\t
+srl %[temp4],   %[idx],   14  \n\t
+andi%[temp2],   %[temp2], 1   \n\t
+andi%[temp3],   %[temp3], 1   \n\t
+andi%[temp4],   %[temp4], 1   \n\t
+#endif
 sllv%[sign],%[sign],  %[temp2]\n\t
 xor %[temp1],   %[temp0], %[temp1]\n\t
 and %[temp2],   %[sign],  %[mask] \n\t

Thanks,
James

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 07/12] mips/aacdec: remove uses of mips32r2 specific ext instructions

2015-02-26 Thread James Cowgill

Removing these removes the dependency of this code on mips32r2 which would
allow it to be used on processors which have FPU instructions, but not r2
instructions (like the mips64el debian port for instance).

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 libavcodec/mips/aacdec_mips.h | 49 ++-
 1 file changed, 25 insertions(+), 24 deletions(-)

diff --git a/libavcodec/mips/aacdec_mips.h b/libavcodec/mips/aacdec_mips.h
index 9ba3079..c9efdbb 100644
--- a/libavcodec/mips/aacdec_mips.h
+++ b/libavcodec/mips/aacdec_mips.h
@@ -68,10 +68,10 @@ static inline float *VMUL2_mips(float *dst, const float *v, 
unsigned idx,
 float *ret;
 
 __asm__ volatile(
-andi%[temp3],  %[idx],   15   \n\t
-ext %[temp4],  %[idx],   4,  4\n\t
+andi%[temp3],  %[idx],   0x0F \n\t
+andi%[temp4],  %[idx],   0xF0 \n\t
 sll %[temp3],  %[temp3], 2\n\t
-sll %[temp4],  %[temp4], 2\n\t
+srl %[temp4],  %[temp4], 2\n\t
 lwc1%[temp2],  0(%[scale])\n\t
 lwxc1   %[temp0],  %[temp3](%[v]) \n\t
 lwxc1   %[temp1],  %[temp4](%[v]) \n\t
@@ -99,14 +99,13 @@ static inline float *VMUL4_mips(float *dst, const float *v, 
unsigned idx,
 float *ret;
 
 __asm__ volatile(
-andi%[temp0],  %[idx],   3   \n\t
-ext %[temp1],  %[idx],   2,  2   \n\t
-ext %[temp2],  %[idx],   4,  2   \n\t
-ext %[temp3],  %[idx],   6,  2   \n\t
+andi%[temp0],  %[idx],   0x03\n\t
+andi%[temp1],  %[idx],   0x0C\n\t
+andi%[temp2],  %[idx],   0x30\n\t
+andi%[temp3],  %[idx],   0xC0\n\t
 sll %[temp0],  %[temp0], 2   \n\t
-sll %[temp1],  %[temp1], 2   \n\t
-sll %[temp2],  %[temp2], 2   \n\t
-sll %[temp3],  %[temp3], 2   \n\t
+srl %[temp2],  %[temp2], 2   \n\t
+srl %[temp3],  %[temp3], 4   \n\t
 lwc1%[temp4],  0(%[scale])   \n\t
 lwxc1   %[temp5],  %[temp0](%[v])\n\t
 lwxc1   %[temp6],  %[temp1](%[v])\n\t
@@ -142,14 +141,14 @@ static inline float *VMUL2S_mips(float *dst, const float 
*v, unsigned idx,
 float *ret;
 
 __asm__ volatile(
-andi%[temp0],  %[idx],   15 \n\t
-ext %[temp1],  %[idx],   4, 4   \n\t
+andi%[temp0],  %[idx],   0x0F   \n\t
+andi%[temp1],  %[idx],   0xF0   \n\t
 lw  %[temp4],  0(%[scale])  \n\t
 srl %[temp2],  %[sign],  1  \n\t
 sll %[temp3],  %[sign],  31 \n\t
 sll %[temp2],  %[temp2], 31 \n\t
 sll %[temp0],  %[temp0], 2  \n\t
-sll %[temp1],  %[temp1], 2  \n\t
+srl %[temp1],  %[temp1], 2  \n\t
 lwxc1   %[temp8],  %[temp0](%[v])   \n\t
 lwxc1   %[temp9],  %[temp1](%[v])   \n\t
 xor %[temp5],  %[temp4], %[temp2]   \n\t
@@ -185,22 +184,24 @@ static inline float *VMUL4S_mips(float *dst, const float 
*v, unsigned idx,
 
 __asm__ volatile(
 lw  %[temp0],   0(%[scale])   \n\t
-and %[temp1],   %[idx],   3   \n\t
-ext %[temp2],   %[idx],   2,  2   \n\t
-ext %[temp3],   %[idx],   4,  2   \n\t
-ext %[temp4],   %[idx],   6,  2   \n\t
-sll %[temp1],   %[temp1], 2   \n\t
-sll %[temp2],   %[temp2], 2   \n\t
-sll %[temp3],   %[temp3], 2   \n\t
-sll %[temp4],   %[temp4], 2   \n\t
+andi%[temp1],  %[idx],   0x03 \n\t
+andi%[temp2],  %[idx],   0x0C \n\t
+andi%[temp3],  %[idx],   0x30 \n\t
+andi%[temp4],  %[idx],   0xC0 \n\t
+sll %[temp1],  %[temp1], 2\n\t
+srl %[temp3],  %[temp3], 2\n\t
+srl %[temp4],  %[temp4], 4\n\t
 lwxc1   %[temp10],  %[temp1](%[v])\n\t
 lwxc1   %[temp11],  %[temp2](%[v])\n\t
 lwxc1   %[temp12],  %[temp3](%[v])\n\t
 lwxc1   %[temp13],  %[temp4](%[v])\n\t
 and %[temp1],   %[sign],  %[mask] \n\t
-ext %[temp2],   %[idx],   12, 1   \n\t
-ext %[temp3],   %[idx],   13, 1   \n\t
-ext %[temp4],   %[idx],   14, 1   \n\t
+srl %[temp2

[FFmpeg-devel] [PATCH 03/12] mips/aacpsdsp: fix definition of ps_decorrelate_mips

2015-02-26 Thread James Cowgill

Q_fract should have be declared as 'const float*'.
Also fix the constness of some local variables affected by this.

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 libavcodec/mips/aacpsdsp_mips.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libavcodec/mips/aacpsdsp_mips.c b/libavcodec/mips/aacpsdsp_mips.c
index 4730a7f..06d99d8 100644
--- a/libavcodec/mips/aacpsdsp_mips.c
+++ b/libavcodec/mips/aacpsdsp_mips.c
@@ -277,7 +277,7 @@ static void ps_mul_pair_single_mips(float (*dst)[2], float 
(*src0)[2], float *sr
 
 static void ps_decorrelate_mips(float (*out)[2], float (*delay)[2],
  float (*ap_delay)[PS_QMF_TIME_SLOTS + 
PS_MAX_AP_DELAY][2],
- const float phi_fract[2], float (*Q_fract)[2],
+ const float phi_fract[2], const float 
(*Q_fract)[2],
  const float *transient_gain,
  float g_decay_slope,
  int len)
@@ -285,8 +285,8 @@ static void ps_decorrelate_mips(float (*out)[2], float 
(*delay)[2],
 float *p_delay = delay[0][0];
 float *p_out = out[0][0];
 float *p_ap_delay = ap_delay[0][0][0];
-float *p_t_gain = (float*)transient_gain;
-float *p_Q_fract = Q_fract[0][0];
+const float *p_t_gain = transient_gain;
+const float *p_Q_fract = Q_fract[0][0];
 float ag0, ag1, ag2;
 float phi_fract0 = phi_fract[0];
 float phi_fract1 = phi_fract[1];
-- 
2.1.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 08/12] configure, mips: remove MIPS32R2, merging it with MIPSFPU

2015-02-26 Thread James Cowgill

There are no independant uses of mips32r2 instructions except for the
FPU parts. Due to the heavy use of mips32r2 specifc fpu extensions, I
am guessing the original author intended MIPSFPU to imply MIPS32R2 anyway.

Since these fpu instructions are available on mips64 (non-r2), enable them
there as well.

Also remove the last occurence of HAVE_MIPS32R2 (which is coupled to
HAVE_MIPSFPU anyway).

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 Makefile  |  2 +-
 arch.mak  |  1 -
 configure | 19 +--
 libavcodec/mips/ac3dsp_mips.c |  4 ++--
 4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/Makefile b/Makefile
index 845a274..ca2ce59 100644
--- a/Makefile
+++ b/Makefile
@@ -80,7 +80,7 @@ SUBDIR_VARS := CLEANFILES EXAMPLES FFLIBS HOSTPROGS TESTPROGS 
TOOLS  \
HEADERS ARCH_HEADERS BUILT_HEADERS SKIPHEADERS\
ARMV5TE-OBJS ARMV6-OBJS ARMV8-OBJS VFP-OBJS NEON-OBJS \
ALTIVEC-OBJS MMX-OBJS YASM-OBJS   \
-   MIPSFPU-OBJS MIPSDSPR2-OBJS MIPSDSPR1-OBJS MIPS32R2-OBJS  \
+   MIPSFPU-OBJS MIPSDSPR2-OBJS MIPSDSPR1-OBJS\
OBJS SLIBOBJS HOSTOBJS TESTOBJS
 
 define RESET
diff --git a/arch.mak b/arch.mak
index 0e866d8..48bc2d3 100644
--- a/arch.mak
+++ b/arch.mak
@@ -5,7 +5,6 @@ OBJS-$(HAVE_VFP) += $(VFP-OBJS) $(VFP-OBJS-yes)
 OBJS-$(HAVE_NEON)+= $(NEON-OBJS)$(NEON-OBJS-yes)
 
 OBJS-$(HAVE_MIPSFPU)   += $(MIPSFPU-OBJS)$(MIPSFPU-OBJS-yes)
-OBJS-$(HAVE_MIPS32R2)  += $(MIPS32R2-OBJS)   $(MIPS32R2-OBJS-yes)
 OBJS-$(HAVE_MIPSDSPR1) += $(MIPSDSPR1-OBJS)  $(MIPSDSPR1-OBJS-yes)
 OBJS-$(HAVE_MIPSDSPR2) += $(MIPSDSPR2-OBJS)  $(MIPSDSPR2-OBJS-yes)
 
diff --git a/configure b/configure
index d037da1..6764830 100755
--- a/configure
+++ b/configure
@@ -358,7 +358,6 @@ Optimization options (experts only):
   --disable-neon   disable NEON optimizations
   --disable-inline-asm disable use of inline assembly
   --disable-yasm   disable use of nasm/yasm assembly
-  --disable-mips32r2   disable MIPS32R2 optimizations
   --disable-mipsdspr1  disable MIPS DSP ASE R1 optimizations
   --disable-mipsdspr2  disable MIPS DSP ASE R2 optimizations
   --disable-mipsfpudisable floating point MIPS optimizations
@@ -1560,7 +1559,6 @@ ARCH_EXT_LIST_ARM=
 
 ARCH_EXT_LIST_MIPS=
 mipsfpu
-mips32r2
 mipsdspr1
 mipsdspr2
 
@@ -1996,7 +1994,6 @@ setend_deps=arm
 map 'eval ${v}_inline_deps=inline_asm' $ARCH_EXT_LIST_ARM
 
 mipsfpu_deps=mips
-mips32r2_deps=mips
 mipsdspr1_deps=mips
 mipsdspr2_deps=mips
 
@@ -4565,8 +4562,19 @@ EOF
 elif enabled mips; then
 
 check_inline_asm loongson 'dmult.g $1, $2, $3'
-enabled mips32r2   add_cflags -mips32r2  add_asflags -mips32r2 
- check_inline_asm mips32r2  'rotr $t0, $t1, 1'
+
+# Enable minimum ISA based on selected options
+if enabled mips64  (enabled mipsdspr1 || enabled mipsdspr2); then
+add_cflags -mips64r2
+add_asflags -mips64r2
+elif enabled mips64  enabled mipsfpu; then
+add_cflags -mips64
+add_asflags -mips64
+elif enabled mipsfpu || enabled mipsdspr1 || enabled mipsdspr2; then
+add_cflags -mips32r2
+add_asflags -mips32r2
+fi
+
 enabled mipsdspr1  add_cflags -mdsp  add_asflags -mdsp 
  check_inline_asm mipsdspr1 'addu.qb $t0, $t1, $t2'
 enabled mipsdspr2  add_cflags -mdspr2  add_asflags -mdspr2 
@@ -5512,7 +5520,6 @@ if enabled arm; then
 fi
 if enabled mips; then
 echo MIPS FPU enabled  ${mipsfpu-no}
-echo MIPS32R2 enabled  ${mips32r2-no}
 echo MIPS DSP R1 enabled   ${mipsdspr1-no}
 echo MIPS DSP R2 enabled   ${mipsdspr2-no}
 fi
diff --git a/libavcodec/mips/ac3dsp_mips.c b/libavcodec/mips/ac3dsp_mips.c
index f33c6f1..bd2a611 100644
--- a/libavcodec/mips/ac3dsp_mips.c
+++ b/libavcodec/mips/ac3dsp_mips.c
@@ -199,7 +199,7 @@ static void ac3_update_bap_counts_mips(uint16_t 
mant_cnt[16], uint8_t *bap,
 }
 #endif
 
-#if HAVE_MIPSFPU  HAVE_MIPS32R2
+#if HAVE_MIPSFPU
 static void float_to_fixed24_mips(int32_t *dst, const float *src, unsigned int 
len)
 {
 const float scale = 1  24;
@@ -403,7 +403,7 @@ void ff_ac3dsp_init_mips(AC3DSPContext *c, int bit_exact) {
 c-bit_alloc_calc_bap = ac3_bit_alloc_calc_bap_mips;
 c-update_bap_counts  = ac3_update_bap_counts_mips;
 #endif
-#if HAVE_MIPSFPU  HAVE_MIPS32R2
+#if HAVE_MIPSFPU
 c-float_to_fixed24 = float_to_fixed24_mips;
 c-downmix  = ac3_downmix_mips;
 #endif
-- 
2.1.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 04/12] mips/fft: remove some useless assembly

2015-02-26 Thread James Cowgill

Remove some assembly that the compiler can easily handle optimally on its own.
GCC produces almost identical assembly.

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 libavcodec/mips/fft_mips.c | 26 ++
 1 file changed, 2 insertions(+), 24 deletions(-)

diff --git a/libavcodec/mips/fft_mips.c b/libavcodec/mips/fft_mips.c
index 691f2db..e12c33e 100644
--- a/libavcodec/mips/fft_mips.c
+++ b/libavcodec/mips/fft_mips.c
@@ -65,26 +65,12 @@ static void ff_fft_calc_mips(FFTContext *s, FFTComplex *z)
 float w_re, w_im;
 float *w_re_ptr, *w_im_ptr;
 const int fft_size = (1  s-nbits);
-int s_n = s-nbits;
-int tem1, tem2;
 float pom,  pom1,  pom2,  pom3;
 float temp, temp1, temp3, temp4;
 FFTComplex * tmpz_n2, * tmpz_n34, * tmpz_n4;
 FFTComplex * tmpz_n2_i, * tmpz_n34_i, * tmpz_n4_i, * tmpz_i;
 
-/**
-*num_transforms = (0x2aab  (16 - s-nbits)) | 1;
-*/
-__asm__ volatile (
-li   %[tem1], 16  \n\t
-sub  %[s_n],  %[tem1], %[s_n] \n\t
-li   %[tem2], 10923   \n\t
-srav %[tem2], %[tem2], %[s_n] \n\t
-ori  %[num_t],%[tem2], 1  \n\t
-: [num_t]=r(num_transforms), [s_n]+r(s_n),
-  [tem1]=r(tem1), [tem2]=r(tem2)
-);
-
+num_transforms = (0x2aab  (16 - s-nbits)) | 1;
 
 for (n=0; nnum_transforms; n++) {
 offset = ff_fft_offsets_lut[n]  2;
@@ -214,15 +200,7 @@ static void ff_fft_calc_mips(FFTContext *s, FFTComplex *z)
 n4 = 4;
 
 for (nbits=4; nbits=s-nbits; nbits++) {
-/*
-* num_transforms = (num_transforms  1) | 1;
-*/
-__asm__ volatile (
-sra %[num_t], %[num_t], 1   \n\t
-ori %[num_t], %[num_t], 1   \n\t
-
-: [num_t] +r (num_transforms)
-);
+num_transforms = (num_transforms  1) | 1;
 n2  = 2 * n4;
 n34 = 3 * n4;
 
-- 
2.1.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 05/12] mips/sbrdsp: remove sbr_neg_odd_64_mips

2015-02-26 Thread James Cowgill

The optimized C version of this code actually runs faster than this
version, so remove it.

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 libavcodec/mips/sbrdsp_mips.c | 34 --
 1 file changed, 34 deletions(-)

diff --git a/libavcodec/mips/sbrdsp_mips.c b/libavcodec/mips/sbrdsp_mips.c
index d4460ba..c76e709 100644
--- a/libavcodec/mips/sbrdsp_mips.c
+++ b/libavcodec/mips/sbrdsp_mips.c
@@ -58,39 +58,6 @@
 #include libavcodec/sbrdsp.h
 
 #if HAVE_INLINE_ASM
-static void sbr_neg_odd_64_mips(float *x)
-{
-int Temp1, Temp2, Temp3, Temp4, Temp5;
-float *x1= x[1];
-float *x_end = x1 + 64;
-
-/* loop unrolled 4 times */
-__asm__ volatile (
-lui%[Temp5],   0x8000  \n\t
-1: \n\t
-lw %[Temp1],   0(%[x1])\n\t
-lw %[Temp2],   8(%[x1])\n\t
-lw %[Temp3],   16(%[x1])   \n\t
-lw %[Temp4],   24(%[x1])   \n\t
-xor%[Temp1],   %[Temp1],   %[Temp5]\n\t
-xor%[Temp2],   %[Temp2],   %[Temp5]\n\t
-xor%[Temp3],   %[Temp3],   %[Temp5]\n\t
-xor%[Temp4],   %[Temp4],   %[Temp5]\n\t
-sw %[Temp1],   0(%[x1])\n\t
-sw %[Temp2],   8(%[x1])\n\t
-sw %[Temp3],   16(%[x1])   \n\t
-sw %[Temp4],   24(%[x1])   \n\t
-addiu  %[x1],  %[x1],  32  \n\t
-bne%[x1],  %[x_end],   1b  \n\t
-
-: [Temp1]=r(Temp1), [Temp2]=r(Temp2),
-  [Temp3]=r(Temp3), [Temp4]=r(Temp4),
-  [Temp5]=r(Temp5), [x1]+r(x1)
-: [x_end]r(x_end)
-: memory
-);
-}
-
 static void sbr_qmf_pre_shuffle_mips(float *z)
 {
 int Temp1, Temp2, Temp3, Temp4, Temp5, Temp6;
@@ -920,7 +887,6 @@ static void sbr_hf_apply_noise_3_mips(float (*Y)[2], const 
float *s_m,
 void ff_sbrdsp_init_mips(SBRDSPContext *s)
 {
 #if HAVE_INLINE_ASM
-s-neg_odd_64 = sbr_neg_odd_64_mips;
 s-qmf_pre_shuffle = sbr_qmf_pre_shuffle_mips;
 s-qmf_post_shuffle = sbr_qmf_post_shuffle_mips;
 #if HAVE_MIPSFPU
-- 
2.1.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 02/12] mips/float_dsp: replace assembly with C implementations

2015-02-26 Thread James Cowgill

The assembly versions have a few problems
- They only work with mips32r2 enabled
- They don't work on 64-bits
- They're massive and complex

So replace them with C implementations which solve these problems and let GCC
magically optimize for different platforms. All the functions are manually
unrolled 4 times (like the assembly code). With the addition of a few restrict
keywords, the functions produce almost identical assembly to the original
versions when compiled with gcc -O3.

Since this code now uses no fpu assembly, drop the HAVE_MIPSFPU guard as well.

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 libavutil/mips/float_dsp_mips.c | 354 
 1 file changed, 72 insertions(+), 282 deletions(-)

diff --git a/libavutil/mips/float_dsp_mips.c b/libavutil/mips/float_dsp_mips.c
index 06d52dc..31425de 100644
--- a/libavutil/mips/float_dsp_mips.c
+++ b/libavutil/mips/float_dsp_mips.c
@@ -52,332 +52,122 @@
  */
 
 #include config.h
+#include libavutil/avassert.h
 #include libavutil/float_dsp.h
 
-#if HAVE_INLINE_ASM  HAVE_MIPSFPU
-static void vector_fmul_mips(float *dst, const float *src0, const float *src1,
- int len)
+// The functions here are basically the same as the C implementations but
+// unrolled 4 times to take advantage of pointer alignment + mips fpu registers
+
+static void vector_fmul_mips(
+float *av_restrict dst, const float *av_restrict src0,
+const float *av_restrict src1, int len)
 {
 int i;
 
-if (len  3) {
-for (i = 0; i  len; i++)
-dst[i] = src0[i] * src1[i];
-} else {
-float *d = (float *)dst;
-float *d_end = d + len;
-float *s0= (float *)src0;
-float *s1= (float *)src1;
-
-float src0_0, src0_1, src0_2, src0_3;
-float src1_0, src1_1, src1_2, src1_3;
-
-__asm__ volatile (
-1: \n\t
-lwc1   %[src0_0],  0(%[s0])\n\t
-lwc1   %[src1_0],  0(%[s1])\n\t
-lwc1   %[src0_1],  4(%[s0])\n\t
-lwc1   %[src1_1],  4(%[s1])\n\t
-lwc1   %[src0_2],  8(%[s0])\n\t
-lwc1   %[src1_2],  8(%[s1])\n\t
-lwc1   %[src0_3],  12(%[s0])   \n\t
-lwc1   %[src1_3],  12(%[s1])   \n\t
-mul.s  %[src0_0],  %[src0_0],  %[src1_0]   \n\t
-mul.s  %[src0_1],  %[src0_1],  %[src1_1]   \n\t
-mul.s  %[src0_2],  %[src0_2],  %[src1_2]   \n\t
-mul.s  %[src0_3],  %[src0_3],  %[src1_3]   \n\t
-swc1   %[src0_0],  0(%[d]) \n\t
-swc1   %[src0_1],  4(%[d]) \n\t
-swc1   %[src0_2],  8(%[d]) \n\t
-swc1   %[src0_3],  12(%[d])\n\t
-addiu  %[s0],  %[s0],  16  \n\t
-addiu  %[s1],  %[s1],  16  \n\t
-addiu  %[d],   %[d],   16  \n\t
-bne%[d],   %[d_end],   1b  \n\t
+// input length must be a multiple of 4
+av_assert2(len % 4 == 0);
 
-: [src0_0]=f(src0_0), [src0_1]=f(src0_1),
-  [src0_2]=f(src0_2), [src0_3]=f(src0_3),
-  [src1_0]=f(src1_0), [src1_1]=f(src1_1),
-  [src1_2]=f(src1_2), [src1_3]=f(src1_3),
-  [d]+r(d), [s0]+r(s0), [s1]+r(s1)
-: [d_end]r(d_end)
-: memory
-);
+for (i = 0; i  len; i += 4) {
+dst[i] = src0[i] * src1[i];
+dst[i + 1] = src0[i + 1] * src1[i + 1];
+dst[i + 2] = src0[i + 2] * src1[i + 2];
+dst[i + 3] = src0[i + 3] * src1[i + 3];
 }
 }
 
-static void vector_fmul_scalar_mips(float *dst, const float *src, float mul,
- int len)
+static void vector_fmul_scalar_mips(
+float *av_restrict dst, const float *av_restrict src, float mul, int len)
 {
-float temp0, temp1, temp2, temp3;
-float *local_src = (float*)src;
-float *end = local_src + len;
+int i;
 
-/* loop unrolled 4 times */
-__asm__ volatile(
-.setpush \n\t
-.setnoreorder\n\t
-1:   \n\t
-lwc1%[temp0],   0(%[src])\n\t
-lwc1%[temp1],   4(%[src])\n\t
-lwc1%[temp2],   8(%[src])\n\t
-lwc1%[temp3],   12(%[src])   \n\t
-addiu   %[dst], %[dst], 16   \n\t
-mul.s   %[temp0],   %[temp0],   %[mul]   \n\t
-mul.s   %[temp1],   %[temp1],   %[mul]   \n\t
-mul.s   %[temp2],   %[temp2],   %[mul]   \n\t
-mul.s   %[temp3],   %[temp3],   %[mul]   \n\t
-addiu   %[src], %[src], 16   \n\t
-swc1%[temp0

[FFmpeg-devel] [PATCH 10/12] mips: use float* to hold pointer instead of int

2015-02-26 Thread James Cowgill

This is obviously needed for 64-bit support.

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 libavcodec/mips/aacdec_mips.c   |  2 +-
 libavcodec/mips/aacpsdsp_mips.c | 12 ++--
 libavcodec/mips/sbrdsp_mips.c   | 10 +-
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/libavcodec/mips/aacdec_mips.c b/libavcodec/mips/aacdec_mips.c
index 5e0a83d..b6eec53 100644
--- a/libavcodec/mips/aacdec_mips.c
+++ b/libavcodec/mips/aacdec_mips.c
@@ -344,7 +344,7 @@ static void update_ltp_mips(AACContext *ac, 
SingleChannelElement *sce)
 
 if (ics-window_sequence[0] == EIGHT_SHORT_SEQUENCE) {
 float *p_saved_ltp = saved_ltp + 576;
-int loop_end1 = (int)(p_saved_ltp + 448);
+float *loop_end1 = p_saved_ltp + 448;
 
 float_copy(saved_ltp, saved, 512);
 
diff --git a/libavcodec/mips/aacpsdsp_mips.c b/libavcodec/mips/aacpsdsp_mips.c
index 1175918..b03cc3f 100644
--- a/libavcodec/mips/aacpsdsp_mips.c
+++ b/libavcodec/mips/aacpsdsp_mips.c
@@ -293,7 +293,7 @@ static void ps_decorrelate_mips(float (*out)[2], float 
(*delay)[2],
 float phi_fract1 = phi_fract[1];
 float temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7, temp8, temp9;
 
-len = (int)((int*)p_delay + (len  1));
+float *p_delay_end = (p_delay + (len  1));
 
 /* merged 2 loops */
 __asm__ volatile(
@@ -369,7 +369,7 @@ static void ps_decorrelate_mips(float (*out)[2], float 
(*delay)[2],
 swc1%[temp3],  628(%[p_ap_delay])   \n\t
 swc1%[temp5],  -8(%[p_out]) \n\t
 swc1%[temp6],  -4(%[p_out]) \n\t
-bne %[p_delay],%[len],1b\n\t
+bne %[p_delay],%[p_delay_end],1b\n\t
  swc1   %[temp6],  -4(%[p_out]) \n\t
 .setpop \n\t
 
@@ -380,7 +380,7 @@ static void ps_decorrelate_mips(float (*out)[2], float 
(*delay)[2],
   [p_Q_fract]+r(p_Q_fract), [p_t_gain]+r(p_t_gain), 
[p_out]+r(p_out),
   [ag0]=f(ag0), [ag1]=f(ag1), [ag2]=f(ag2)
 : [phi_fract0]f(phi_fract0), [phi_fract1]f(phi_fract1),
-  [len]r(len), [g_decay_slope]f(g_decay_slope)
+  [p_delay_end]r(p_delay_end), [g_decay_slope]f(g_decay_slope)
 : memory
 );
 }
@@ -400,7 +400,7 @@ static void ps_stereo_interpolate_mips(float (*l)[2], float 
(*r)[2],
 float temp0, temp1, temp2, temp3;
 float l_re, l_im, r_re, r_im;
 
-len = (int)((int*)l + (len  1));
+float *l_end = ((float *)l + (len  1));
 
 __asm__ volatile(
 .setpush \n\t
@@ -427,7 +427,7 @@ static void ps_stereo_interpolate_mips(float (*l)[2], float 
(*r)[2],
 swc1%[temp0],  -8(%[l])  \n\t
 swc1%[temp2],  -8(%[r])  \n\t
 swc1%[temp1],  -4(%[l])  \n\t
-bne %[l],  %[len],1b \n\t
+bne %[l],  %[l_end],  1b \n\t
  swc1   %[temp3],  -4(%[r])  \n\t
 .setpop  \n\t
 
@@ -438,7 +438,7 @@ static void ps_stereo_interpolate_mips(float (*l)[2], float 
(*r)[2],
   [l_re]=f(l_re), [l_im]=f(l_im),
   [r_re]=f(r_re), [r_im]=f(r_im)
 : [hs0]f(hs0), [hs1]f(hs1), [hs2]f(hs2),
-  [hs3]f(hs3), [len]r(len)
+  [hs3]f(hs3), [l_end]r(l_end)
 : memory
 );
 }
diff --git a/libavcodec/mips/sbrdsp_mips.c b/libavcodec/mips/sbrdsp_mips.c
index 5c21749..9f2d827 100644
--- a/libavcodec/mips/sbrdsp_mips.c
+++ b/libavcodec/mips/sbrdsp_mips.c
@@ -665,14 +665,14 @@ static void sbr_hf_gen_mips(float (*X_high)[2], const 
float (*X_low)[2],
 static void sbr_hf_g_filt_mips(float (*Y)[2], const float (*X_high)[40][2],
 const float *g_filt, int m_max, intptr_t ixh)
 {
-float *p_y, *p_x, *p_g;
+const float *p_x, *p_g, *loop_end;
+float *p_y;
 float temp0, temp1, temp2;
-int loop_end;
 
-p_g = (float*)g_filt[0];
+p_g = g_filt[0];
 p_y = Y[0][0];
-p_x = (float*)X_high[0][ixh][0];
-loop_end = (int)((int*)p_g + m_max);
+p_x = X_high[0][ixh][0];
+loop_end = p_g + m_max;
 
 __asm__ volatile(
 .setpush\n\t
-- 
2.1.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 12/12] mips/aaccoder: use variables instead of using register names directly

2015-02-26 Thread James Cowgill

On mips64, the registers t[4-7] do not exist. Instead of using a lot of #ifdef
or defines to handle differing register names, use variables and let GCC
allocate the registers automatically (like in the other mips assembly files).

In get_band_cost_ESC_mips, t4 and t5 were renamed to t6 and t7 to avoid a
variable name conflict.

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 libavcodec/mips/aaccoder_mips.c | 929 +---
 1 file changed, 477 insertions(+), 452 deletions(-)

diff --git a/libavcodec/mips/aaccoder_mips.c b/libavcodec/mips/aaccoder_mips.c
index 8595913..ea0bf31 100644
--- a/libavcodec/mips/aaccoder_mips.c
+++ b/libavcodec/mips/aaccoder_mips.c
@@ -221,6 +221,7 @@ static void quantize_and_encode_band_cost_SQUAD_mips(struct 
AACEncContext *s,
 for (i = 0; i  size; i += 4) {
 int curidx;
 int *in_int = (int *)in[i];
+int t0, t1, t2, t3, t4, t5, t6, t7;
 
 qc1 = scaled[i  ] * Q34 + 0.4054f;
 qc2 = scaled[i+1] * Q34 + 0.4054f;
@@ -235,31 +236,31 @@ static void 
quantize_and_encode_band_cost_SQUAD_mips(struct AACEncContext *s,
 slt%[qc2], $zero,  %[qc2]  \n\t
 slt%[qc3], $zero,  %[qc3]  \n\t
 slt%[qc4], $zero,  %[qc4]  \n\t
-lw $t0,0(%[in_int])\n\t
-lw $t1,4(%[in_int])\n\t
-lw $t2,8(%[in_int])\n\t
-lw $t3,12(%[in_int])   \n\t
-srl$t0,$t0,31  \n\t
-srl$t1,$t1,31  \n\t
-srl$t2,$t2,31  \n\t
-srl$t3,$t3,31  \n\t
-subu   $t4,$zero,  %[qc1]  \n\t
-subu   $t5,$zero,  %[qc2]  \n\t
-subu   $t6,$zero,  %[qc3]  \n\t
-subu   $t7,$zero,  %[qc4]  \n\t
-movn   %[qc1], $t4,$t0 \n\t
-movn   %[qc2], $t5,$t1 \n\t
-movn   %[qc3], $t6,$t2 \n\t
-movn   %[qc4], $t7,$t3 \n\t
+lw %[t0],  0(%[in_int])\n\t
+lw %[t1],  4(%[in_int])\n\t
+lw %[t2],  8(%[in_int])\n\t
+lw %[t3],  12(%[in_int])   \n\t
+srl%[t0],  %[t0],  31  \n\t
+srl%[t1],  %[t1],  31  \n\t
+srl%[t2],  %[t2],  31  \n\t
+srl%[t3],  %[t3],  31  \n\t
+subu   %[t4],  $zero,  %[qc1]  \n\t
+subu   %[t5],  $zero,  %[qc2]  \n\t
+subu   %[t6],  $zero,  %[qc3]  \n\t
+subu   %[t7],  $zero,  %[qc4]  \n\t
+movn   %[qc1], %[t4],  %[t0]   \n\t
+movn   %[qc2], %[t5],  %[t1]   \n\t
+movn   %[qc3], %[t6],  %[t2]   \n\t
+movn   %[qc4], %[t7],  %[t3]   \n\t
 
 .set pop   \n\t
 
 : [qc1]+r(qc1), [qc2]+r(qc2),
-  [qc3]+r(qc3), [qc4]+r(qc4)
+  [qc3]+r(qc3), [qc4]+r(qc4),
+  [t0]=r(t0), [t1]=r(t1), [t2]=r(t2), [t3]=r(t3),
+  [t4]=r(t4), [t5]=r(t5), [t6]=r(t6), [t7]=r(t7)
 : [in_int]r(in_int)
-: t0, t1, t2, t3,
-  t4, t5, t6, t7,
-  memory
+: memory
 );
 
 curidx = qc1;
@@ -295,6 +296,7 @@ static void quantize_and_encode_band_cost_UQUAD_mips(struct 
AACEncContext *s,
 int *in_int = (int *)in[i];
 uint8_t v_bits;
 unsigned int v_codes;
+int t0, t1, t2, t3, t4;
 
 qc1 = scaled[i  ] * Q34 + 0.4054f;
 qc2 = scaled[i+1] * Q34 + 0.4054f;
@@ -305,50 +307,51 @@ static void 
quantize_and_encode_band_cost_UQUAD_mips(struct AACEncContext *s,
 .set push  \n\t
 .set noreorder \n\t
 
-ori$t4,$zero,  2   \n\t
+ori%[t4],  $zero,  2   \n\t
 ori%[sign],$zero,  0   \n\t
-slt$t0,$t4,%[qc1]  \n\t
-slt$t1,$t4,%[qc2]  \n\t
-slt$t2,$t4,%[qc3]  \n\t
-slt$t3,$t4,%[qc4]  \n\t
-movn   %[qc1], $t4,$t0 \n\t
-movn   %[qc2], $t4,$t1 \n\t
-movn   %[qc3], $t4,$t2 \n\t
-movn   %[qc4], $t4,$t3 \n\t
-lw $t0,0(%[in_int])\n\t
-lw $t1,4(%[in_int])\n\t
-lw $t2,8(%[in_int])\n\t
-lw $t3,12(%[in_int])   \n\t
-slt$t0,$t0,$zero   \n\t
-movn   %[sign],$t0,%[qc1]  \n\t
-slt$t1,$t1,$zero   \n\t
-slt$t2,$t2,$zero   \n\t
-slt$t3,$t3,$zero

[FFmpeg-devel] [PATCH 00/12] mips cleanups and port to mips64

2015-02-26 Thread James Cowgill

Hi,

This patchset aims to cleanup the MIPS optimizations a bit and add support for
64-bit processors.

I haven't attempted specifically to optimize any of this for 64-bit systems,
except for the removal of some assembly blocks which GCC can optimize just as
well itself. Also I havn't gone through and cleaned up everything, just the
bits that make it easier to port to 64-bits or some things that were really
bugging me :)

I've run fate on both 32 and 64-bit mips machines and it passes all the tests
on both. I don't have a machine with DSP instructions but I managed (with some
effort) to run fate using qemu and it passed all the tests there as well.

One thing I was sligly uneasy about in the change I made to the configure
script was forcing specific ISA levels unless you pass --disable-xxx to
configure. This has a habit of causing the final binaries not to run at all
(eg I have to disable DSP otherwise I get a lot of SIGILL). Since this was what
the code was doing before, I just left it instead of messing up all the MIPS
configure options (more than I have done).

Thanks,
James

James Cowgill (12):
  mips/mathops: remove 64-bit code
  mips/float_dsp: replace assembly with C implementations
  mips/aacpsdsp: fix definition of ps_decorrelate_mips
  mips/fft: remove some useless assembly
  mips/sbrdsp: remove sbr_neg_odd_64_mips
  mips/aacdec: refactor out duplicated assembly code
  mips/aacdec: remove uses of mips32r2 specific ext instructions
  configure, mips: remove MIPS32R2, merging it with MIPSFPU
  mips: port optimizations to mips n64
  mips: use float* to hold pointer instead of int
  mips/acelp_filters: fix incorrect register constraint
  mips/aaccoder: use variables instead of using register names directly

 Makefile  |   2 +-
 arch.mak  |   1 -
 configure |  19 +-
 libavcodec/mips/aaccoder_mips.c   | 929 +++---
 libavcodec/mips/aacdec_mips.c | 623 
 libavcodec/mips/aacdec_mips.h |  58 +-
 libavcodec/mips/aacpsdsp_mips.c   |  61 +-
 libavcodec/mips/aacpsy_mips.h |   6 +-
 libavcodec/mips/aacsbr_mips.c |  53 +-
 libavcodec/mips/aacsbr_mips.h |  17 +-
 libavcodec/mips/ac3dsp_mips.c |  63 +-
 libavcodec/mips/acelp_filters_mips.c  |  15 +-
 libavcodec/mips/acelp_vectors_mips.c  |   7 +-
 libavcodec/mips/asmdefs.h |  48 ++
 libavcodec/mips/celp_filters_mips.c   |  13 +-
 libavcodec/mips/celp_math_mips.c  |   5 +-
 libavcodec/mips/compute_antialias_float.h |   4 +-
 libavcodec/mips/fft_mips.c|  39 +-
 libavcodec/mips/fmtconvert_mips.c |  33 +-
 libavcodec/mips/lsp_mips.h|   6 +-
 libavcodec/mips/mathops.h |  26 -
 libavcodec/mips/mpegaudiodsp_mips_fixed.c |  11 +-
 libavcodec/mips/mpegaudiodsp_mips_float.c |  25 +-
 libavcodec/mips/sbrdsp_mips.c |  89 +--
 libavutil/mips/float_dsp_mips.c   | 354 +++-
 25 files changed, 963 insertions(+), 1544 deletions(-)
 create mode 100644 libavcodec/mips/asmdefs.h

-- 
2.1.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 01/12] mips/mathops: remove 64-bit code

2015-02-26 Thread James Cowgill

GCC is perfectly happy generating optimized multiplication code on its own for
64-bit arches. GCC refuses to optimize the loongson code when in 32-bit mode,
so I've left that.

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 libavcodec/mips/mathops.h | 26 --
 1 file changed, 26 deletions(-)

diff --git a/libavcodec/mips/mathops.h b/libavcodec/mips/mathops.h
index 368290a..5673fc0 100644
--- a/libavcodec/mips/mathops.h
+++ b/libavcodec/mips/mathops.h
@@ -49,32 +49,6 @@ static inline av_const int64_t MLS64(int64_t d, int a, int b)
 }
 #define MLS64(d, a, b) ((d) = MLS64(d, a, b))
 
-#elif ARCH_MIPS64
-
-static inline av_const int64_t MAC64(int64_t d, int a, int b)
-{
-int64_t m;
-__asm__ (dmult %2, %3 \n\t
- mflo  %1 \n\t
- daddu %0, %0, %1 \n\t
- : +r(d), =r(m) : r(a), r(b)
- : hi, lo);
-return d;
-}
-#define MAC64(d, a, b) ((d) = MAC64(d, a, b))
-
-static inline av_const int64_t MLS64(int64_t d, int a, int b)
-{
-int64_t m;
-__asm__ (dmult %2, %3 \n\t
- mflo  %1 \n\t
- dsubu %0, %0, %1 \n\t
- : +r(d), =r(m) : r(a), r(b)
- : hi, lo);
-return d;
-}
-#define MLS64(d, a, b) ((d) = MLS64(d, a, b))
-
 #endif
 
 #endif /* HAVE_INLINE_ASM */
-- 
2.1.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 09/12] mips: port optimizations to mips n64

2015-02-26 Thread James Cowgill

This mainly consists of replacing all the pointer arithmatic 'addiu'
instructions with PTR_ADDIU which will handle the differences in pointer
sizes when compiled on 64 bit mips systems.

The header asmdefs.h contains the PTR_ macros which expend to the correct mips
instructions to manipulate registers containing pointers.

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 libavcodec/mips/aacdec_mips.c | 21 +--
 libavcodec/mips/aacdec_mips.h |  9 ++---
 libavcodec/mips/aacpsdsp_mips.c   | 43 +++---
 libavcodec/mips/aacpsy_mips.h |  6 ++--
 libavcodec/mips/aacsbr_mips.c | 53 +--
 libavcodec/mips/aacsbr_mips.h | 17 -
 libavcodec/mips/ac3dsp_mips.c | 59 ---
 libavcodec/mips/acelp_filters_mips.c  | 13 +++
 libavcodec/mips/acelp_vectors_mips.c  |  7 ++--
 libavcodec/mips/asmdefs.h | 48 +
 libavcodec/mips/celp_filters_mips.c   | 13 +++
 libavcodec/mips/celp_math_mips.c  |  5 +--
 libavcodec/mips/compute_antialias_float.h |  4 ++-
 libavcodec/mips/fft_mips.c| 13 +++
 libavcodec/mips/fmtconvert_mips.c | 33 -
 libavcodec/mips/lsp_mips.h|  6 ++--
 libavcodec/mips/mpegaudiodsp_mips_fixed.c | 11 +++---
 libavcodec/mips/mpegaudiodsp_mips_float.c | 25 ++---
 libavcodec/mips/sbrdsp_mips.c | 45 +++
 19 files changed, 250 insertions(+), 181 deletions(-)
 create mode 100644 libavcodec/mips/asmdefs.h

diff --git a/libavcodec/mips/aacdec_mips.c b/libavcodec/mips/aacdec_mips.c
index 909e22b..5e0a83d 100644
--- a/libavcodec/mips/aacdec_mips.c
+++ b/libavcodec/mips/aacdec_mips.c
@@ -56,6 +56,7 @@
 #include aacdec_mips.h
 #include libavcodec/aactab.h
 #include libavcodec/sinewin.h
+#include libavcodec/mips/asmdefs.h
 
 #if HAVE_INLINE_ASM
 static av_always_inline void float_copy(float *dst, const float *src, int 
count)
@@ -80,7 +81,7 @@ static av_always_inline void float_copy(float *dst, const 
float *src, int count)
 lw  %[temp5],20(%[src]) \n\t
 lw  %[temp6],24(%[src]) \n\t
 lw  %[temp7],28(%[src]) \n\t
-addiu   %[src],  %[src],  32\n\t
+PTR_ADDIU %[src],%[src],  32\n\t
 sw  %[temp0],0(%[dst])  \n\t
 sw  %[temp1],4(%[dst])  \n\t
 sw  %[temp2],8(%[dst])  \n\t
@@ -90,7 +91,7 @@ static av_always_inline void float_copy(float *dst, const 
float *src, int count)
 sw  %[temp6],24(%[dst]) \n\t
 sw  %[temp7],28(%[dst]) \n\t
 bne %[src],  %[loop_end], 1b\n\t
-addiu   %[dst],  %[dst],  32\n\t
+PTR_ADDIU %[dst],%[dst],  32\n\t
 .set pop\n\t
 
 : [temp0]=r(temp[0]), [temp1]=r(temp[1]),
@@ -250,7 +251,7 @@ static void apply_ltp_mips(AACContext *ac, 
SingleChannelElement *sce)
 sw  $0,  4(%[p_predTime])\n\t
 sw  $0,  8(%[p_predTime])\n\t
 sw  $0,  12(%[p_predTime])   \n\t
-addiu   %[p_predTime],   %[p_predTime], 16   \n\t
+PTR_ADDIU %[p_predTime], %[p_predTime], 16   \n\t
 
 : [p_predTime]+r(p_predTime)
 :
@@ -261,7 +262,7 @@ static void apply_ltp_mips(AACContext *ac, 
SingleChannelElement *sce)
 
 __asm__ volatile (
 sw  $0,  0(%[p_predTime])\n\t
-addiu   %[p_predTime],   %[p_predTime], 4\n\t
+PTR_ADDIU %[p_predTime], %[p_predTime], 4\n\t
 
 : [p_predTime]+r(p_predTime)
 :
@@ -315,9 +316,9 @@ static av_always_inline void fmul_and_reverse(float *dst, 
const float *src0, con
 swc1%[temp9],4(%[ptr1])\n\t
 swc1%[temp10],   8(%[ptr1])\n\t
 swc1%[temp11],   12(%[ptr1])   \n\t
-addiu   %[ptr1], %[ptr1],  16  \n\t
-addiu   %[ptr2], %[ptr2],  -16 \n\t
-addiu   %[ptr3], %[ptr3],  -16 \n\t
+PTR_ADDIU %[ptr1],   %[ptr1],  16  \n\t
+PTR_ADDIU %[ptr2],   %[ptr2],  -16 \n\t
+PTR_ADDIU %[ptr3],   %[ptr3],  -16 \n\t
 
 : [temp0]=f(temp[0]), [temp1]=f(temp[1]),
   [temp2]=f(temp[2]), [temp3]=f(temp[3]),
@@ -358,7 +359,7 @@ static void update_ltp_mips(AACContext *ac, 
SingleChannelElement *sce)
 sw $0,  20(%[p_saved_ltp])   \n\t
 sw $0,  24

[FFmpeg-devel] [PATCH 06/12] mips/aacdec: refactor out duplicated assembly code

2015-02-26 Thread James Cowgill

The float_copy and fmul_and_reverse functions are refactored out from the
multiple copies in this file.

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 libavcodec/mips/aacdec_mips.c | 612 --
 1 file changed, 111 insertions(+), 501 deletions(-)

diff --git a/libavcodec/mips/aacdec_mips.c b/libavcodec/mips/aacdec_mips.c
index 5db10f9..909e22b 100644
--- a/libavcodec/mips/aacdec_mips.c
+++ b/libavcodec/mips/aacdec_mips.c
@@ -58,6 +58,51 @@
 #include libavcodec/sinewin.h
 
 #if HAVE_INLINE_ASM
+static av_always_inline void float_copy(float *dst, const float *src, int 
count)
+{
+// Copy 'count' floats from src to dst
+const float *loop_end = src + count;
+int temp[8];
+
+// count must be a multiple of 8
+av_assert2(count % 8 == 0);
+
+// loop unrolled 8 times
+__asm__ volatile (
+.set push   \n\t
+.set noreorder  \n\t
+1:  \n\t
+lw  %[temp0],0(%[src])  \n\t
+lw  %[temp1],4(%[src])  \n\t
+lw  %[temp2],8(%[src])  \n\t
+lw  %[temp3],12(%[src]) \n\t
+lw  %[temp4],16(%[src]) \n\t
+lw  %[temp5],20(%[src]) \n\t
+lw  %[temp6],24(%[src]) \n\t
+lw  %[temp7],28(%[src]) \n\t
+addiu   %[src],  %[src],  32\n\t
+sw  %[temp0],0(%[dst])  \n\t
+sw  %[temp1],4(%[dst])  \n\t
+sw  %[temp2],8(%[dst])  \n\t
+sw  %[temp3],12(%[dst]) \n\t
+sw  %[temp4],16(%[dst]) \n\t
+sw  %[temp5],20(%[dst]) \n\t
+sw  %[temp6],24(%[dst]) \n\t
+sw  %[temp7],28(%[dst]) \n\t
+bne %[src],  %[loop_end], 1b\n\t
+addiu   %[dst],  %[dst],  32\n\t
+.set pop\n\t
+
+: [temp0]=r(temp[0]), [temp1]=r(temp[1]),
+  [temp2]=r(temp[2]), [temp3]=r(temp[3]),
+  [temp4]=r(temp[4]), [temp5]=r(temp[5]),
+  [temp6]=r(temp[6]), [temp7]=r(temp[7]),
+  [src]+r(src), [dst]+r(dst)
+: [loop_end]r(loop_end)
+: memory
+);
+}
+
 static av_always_inline int lcg_random(unsigned previous_val)
 {
 union { unsigned u; int s; } v = { previous_val * 1664525u + 1013904223 };
@@ -92,49 +137,7 @@ static void imdct_and_windowing_mips(AACContext *ac, 
SingleChannelElement *sce)
 (ics-window_sequence[0] == ONLY_LONG_SEQUENCE || 
ics-window_sequence[0] == LONG_START_SEQUENCE)) {
 ac-fdsp-vector_fmul_window(out,   saved,
buf, lwindow_prev, 512);
 } else {
-{
-float *buf1 = saved;
-float *buf2 = out;
-int temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7;
-int loop_end;
-
-/* loop unrolled 8 times */
-__asm__ volatile (
-.set push   \n\t
-.set noreorder  \n\t
-addiu   %[loop_end], %[src],  1792  \n\t
-1:  \n\t
-lw  %[temp0],0(%[src])  \n\t
-lw  %[temp1],4(%[src])  \n\t
-lw  %[temp2],8(%[src])  \n\t
-lw  %[temp3],12(%[src]) \n\t
-lw  %[temp4],16(%[src]) \n\t
-lw  %[temp5],20(%[src]) \n\t
-lw  %[temp6],24(%[src]) \n\t
-lw  %[temp7],28(%[src]) \n\t
-addiu   %[src],  %[src],  32\n\t
-sw  %[temp0],0(%[dst])  \n\t
-sw  %[temp1],4(%[dst])  \n\t
-sw  %[temp2],8(%[dst])  \n\t
-sw  %[temp3],12(%[dst]) \n\t
-sw  %[temp4],16(%[dst]) \n\t
-sw  %[temp5],20(%[dst]) \n\t
-sw  %[temp6],24(%[dst]) \n\t
-sw  %[temp7],28(%[dst]) \n\t
-bne %[src],  %[loop_end], 1b\n\t
- addiu  %[dst],  %[dst],  32\n\t
-.set pop\n\t
-
-: [temp0]=r(temp0), [temp1]=r(temp1),
-  [temp2]=r(temp2), [temp3]=r(temp3),
-  [temp4]=r(temp4), [temp5]=r(temp5),
-  [temp6]=r(temp6), [temp7]=r(temp7),
-  [loop_end]=r(loop_end), [src]+r(buf1),
-  [dst]+r(buf2)
-:
-: memory

[FFmpeg-devel] [PATCH 11/12] mips/acelp_filters: fix incorrect register constraint

2015-02-26 Thread James Cowgill

Change register constraint on the v variable from = to +. This was causing GCC
to think that the v variable was never read and therefore not initialize it.

This fixes about 20 fate failures on mips64el.

Signed-off-by: James Cowgill james...@cowgill.org.uk
---
 libavcodec/mips/acelp_filters_mips.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/mips/acelp_filters_mips.c 
b/libavcodec/mips/acelp_filters_mips.c
index 98ddc54..c77d37b 100644
--- a/libavcodec/mips/acelp_filters_mips.c
+++ b/libavcodec/mips/acelp_filters_mips.c
@@ -90,7 +90,7 @@ static void ff_acelp_interpolatef_mips(float *out, const 
float *in,
 PTR_ADDU %[p_filter_coeffs_m],%[p_filter_coeffs_m],   %[prec] 
\n\t
 madd.s %[v],%[v],%[in_val_m], %[fc_val_m] 
\n\t
 
-: [v] =f (v),[p_in_p] +r (p_in_p), [p_in_m] +r (p_in_m),
+: [v] +f (v),[p_in_p] +r (p_in_p), [p_in_m] +r (p_in_m),
   [p_filter_coeffs_p] +r (p_filter_coeffs_p),
   [in_val_p] =f (in_val_p), [in_val_m] =f (in_val_m),
   [fc_val_p] =f (fc_val_p), [fc_val_m] =f (fc_val_m),
-- 
2.1.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 02/12] mips/float_dsp: replace assembly with C implementations

2015-02-26 Thread James Cowgill

On Thu, 2015-02-26 at 13:51 +, Derek Buitenhuis wrote:
 On 2/26/2015 1:42 PM, James Cowgill wrote:
  The assembly versions have a few problems
  - They only work with mips32r2 enabled
  - They don't work on 64-bits
  - They're massive and complex
  
  So replace them with C implementations which solve these problems and let 
  GCC
  magically optimize for different platforms. All the functions are manually
  unrolled 4 times (like the assembly code). With the addition of a few 
  restrict
  keywords, the functions produce almost identical assembly to the original
  versions when compiled with gcc -O3.
 
 Why have C implementations in the *MIPS* DSP code? That's silly.

Hmm maybe a little. I was just worried that if I moved all the loop
unrolling stuff into generic code it might go slower on other arches I
haven't tested.

James

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

44 matches

Mail list logo