Re: [FFmpeg-devel] [PATCH 07/11] avcodec/mips: loongson optimize h264qpel with mmi v2

2016-05-30 Thread Michael Niedermayer
On Wed, May 25, 2016 at 05:40:44PM +0800, 周晓勇 wrote:
> i have fix up the error functions in h264qpel_mmi, please try the new patch 
> below
> 
> 
> ---
> From a55e3a93a30226e3f0fb7d2d8ac43e74b5eae5d6 Mon Sep 17 00:00:00 2001
> From: ZhouXiaoyong 
> Date: Wed, 25 May 2016 16:07:38 +0800
> Subject: [PATCH] avcodec/mips/h264qpel_mmi.c: Version 2 of the optimizations
>  for loongson mmi
> 
> 
> 1. no longer use the register names directly and optimized code format
> 2. to be compatible with O32, specify type of address variable with mips_reg 
> and handle the address variable with PTR_ operator
> 3. use uld and mtc1 to workaround cpu 3A2000 gslwlc1 bug (gslwlc1 instruction 
> extension bug in O32 ABI)
> 4. h264qpel use hepldsp optimizations
> ---
>  libavcodec/mips/h264qpel_mmi.c | 3824 
> +++-
>  1 file changed, 2225 insertions(+), 1599 deletions(-)

applied

thanks

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 07/11] avcodec/mips: loongson optimize h264qpel with mmi v2

2016-05-24 Thread 周晓勇
these functions couldn't pass fate-h264 test neither O32 nor N64 ABI
but the earlier optimization (version 1) has too much bug in O32 ABI
i will fix the bugs in function put_h264_qpel16_hv_lowpass_mmi and
avg_h264_qpel16_hv_lowpass_mmi in the future


have you installed the lastest fedora21 for loongson?
http://mirror.lemote.com/fedora/live/Fedora-MATE-Live-2.iso
http://mirror.lemote.com/fedora/live/Fedora-MATE-Live-2.iso.md5
use this script to make live-usb installer:
http://mirror.lemote.com/fedora/live/makeliveusb
tips:make sure the /dev/sda1 is ext2 or ext3, as pmon not support ext4 to boot 
kernel








>
>why do these functions not work ?
>
>
>[...]
>-- 
>Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
>The misfortune of the wise is better than the prosperity of the fool.
>-- Epicurus
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 07/11] avcodec/mips: loongson optimize h264qpel with mmi v2

2016-05-24 Thread Michael Niedermayer
On Tue, May 17, 2016 at 03:08:13PM +0800, 周晓勇 wrote:
> avcodec/mips/h264qpel_mmi: Version 2 of the optimizations for loongson mmi
> 
> 1. no longer use the register names directly and optimized code format
> 2. to be compatible with O32, specify type of address variable with 
> mips_reg and handle the address variable with PTR_ operator
> 3. temporarily annotated func put_(avg_)h264_qpel16_hv_lowpass_mmi and 
> related funcs which couldn't pass fate testing in O32 ABI
> 4. use uld and mtc1 to workaround cpu 3A2000 gslwlc1 bug (gslwlc1 
> instruction extension bug in O32 ABI)
> 5. put_pixels_ an avg_pixels_ functions use hpeldsp optimizations instead
> 
> 
> 
> 
> 
> 
> 
> 
> 在 2016-05-13 18:05:51,"周晓勇"  写道:
> 
> From 151ccd1cefff1887b58166113e65893bcc2e724d Mon Sep 17 00:00:00 2001
> From: Zhou Xiaoyong 
> Date: Thu, 12 May 2016 10:46:09 +0800
> Subject: [PATCH 07/11] avcodec/mips: loongson optimize h264qpel with mmi v2
> 
> 
> ---
>  libavcodec/mips/h264qpel_init_mips.c |   28 +
>  libavcodec/mips/h264qpel_mmi.c   | 3823 
> --
>  2 files changed, 2252 insertions(+), 1599 deletions(-)
> 
> 
> diff --git a/libavcodec/mips/h264qpel_init_mips.c 
> b/libavcodec/mips/h264qpel_init_mips.c
> index 92219f8..d97e9cc 100644
> --- a/libavcodec/mips/h264qpel_init_mips.c
> +++ b/libavcodec/mips/h264qpel_init_mips.c
> @@ -133,38 +133,52 @@ static av_cold void h264qpel_init_msa(H264QpelContext 
> *c, int bit_depth)
>  static av_cold void h264qpel_init_mmi(H264QpelContext *c, int bit_depth)
>  {
>  if (8 == bit_depth) {
> +//FIXME put_h264_qpel16_hv_lowpass_mmi
>  c->put_h264_qpel_pixels_tab[0][0] = ff_put_h264_qpel16_mc00_mmi;
>  c->put_h264_qpel_pixels_tab[0][1] = ff_put_h264_qpel16_mc10_mmi;
>  c->put_h264_qpel_pixels_tab[0][2] = ff_put_h264_qpel16_mc20_mmi;
>  c->put_h264_qpel_pixels_tab[0][3] = ff_put_h264_qpel16_mc30_mmi;
>  c->put_h264_qpel_pixels_tab[0][4] = ff_put_h264_qpel16_mc01_mmi;
>  c->put_h264_qpel_pixels_tab[0][5] = ff_put_h264_qpel16_mc11_mmi;
> +#if 0
>  c->put_h264_qpel_pixels_tab[0][6] = ff_put_h264_qpel16_mc21_mmi;
> +#endif
>  c->put_h264_qpel_pixels_tab[0][7] = ff_put_h264_qpel16_mc31_mmi;
>  c->put_h264_qpel_pixels_tab[0][8] = ff_put_h264_qpel16_mc02_mmi;
> +#if 0
>  c->put_h264_qpel_pixels_tab[0][9] = ff_put_h264_qpel16_mc12_mmi;
>  c->put_h264_qpel_pixels_tab[0][10] = ff_put_h264_qpel16_mc22_mmi;
>  c->put_h264_qpel_pixels_tab[0][11] = ff_put_h264_qpel16_mc32_mmi;
> +#endif
>  c->put_h264_qpel_pixels_tab[0][12] = ff_put_h264_qpel16_mc03_mmi;
>  c->put_h264_qpel_pixels_tab[0][13] = ff_put_h264_qpel16_mc13_mmi;
> +#if 0
>  c->put_h264_qpel_pixels_tab[0][14] = ff_put_h264_qpel16_mc23_mmi;
> +#endif
>  c->put_h264_qpel_pixels_tab[0][15] = ff_put_h264_qpel16_mc33_mmi;
>  
> +//FIXME put_h264_qpel16_hv_lowpass_mmi
>  c->put_h264_qpel_pixels_tab[1][0] = ff_put_h264_qpel8_mc00_mmi;
>  c->put_h264_qpel_pixels_tab[1][1] = ff_put_h264_qpel8_mc10_mmi;
>  c->put_h264_qpel_pixels_tab[1][2] = ff_put_h264_qpel8_mc20_mmi;
>  c->put_h264_qpel_pixels_tab[1][3] = ff_put_h264_qpel8_mc30_mmi;
>  c->put_h264_qpel_pixels_tab[1][4] = ff_put_h264_qpel8_mc01_mmi;
>  c->put_h264_qpel_pixels_tab[1][5] = ff_put_h264_qpel8_mc11_mmi;
> +#if 0
>  c->put_h264_qpel_pixels_tab[1][6] = ff_put_h264_qpel8_mc21_mmi;
> +#endif
>  c->put_h264_qpel_pixels_tab[1][7] = ff_put_h264_qpel8_mc31_mmi;
>  c->put_h264_qpel_pixels_tab[1][8] = ff_put_h264_qpel8_mc02_mmi;
> +#if 0
>  c->put_h264_qpel_pixels_tab[1][9] = ff_put_h264_qpel8_mc12_mmi;
>  c->put_h264_qpel_pixels_tab[1][10] = ff_put_h264_qpel8_mc22_mmi;
>  c->put_h264_qpel_pixels_tab[1][11] = ff_put_h264_qpel8_mc32_mmi;
> +#endif
>  c->put_h264_qpel_pixels_tab[1][12] = ff_put_h264_qpel8_mc03_mmi;
>  c->put_h264_qpel_pixels_tab[1][13] = ff_put_h264_qpel8_mc13_mmi;
> +#if 0
>  c->put_h264_qpel_pixels_tab[1][14] = ff_put_h264_qpel8_mc23_mmi;
> +#endif
>  c->put_h264_qpel_pixels_tab[1][15] = ff_put_h264_qpel8_mc33_mmi;
>  
>  c->put_h264_qpel_pixels_tab[2][0] = ff_put_h264_qpel4_mc00_mmi;
> @@ -184,38 +198,52 @@ static av_cold void h264qpel_init_mmi(H264QpelContext 
> *c, int bit_depth)
>  c->put_h264_qpel_pixels_tab[2][14] = ff_put_h264_qpel4_mc23_mmi;
>  c->put_h264_qpel_pixels_tab[2][15] = ff_put_h264_qpel4_mc33_mmi;
>  
> +//FIXME avg_h264_qpel16_hv_lowpass_mmi
>  c->avg_h264_qpel_pixels_tab[0][0] = ff_avg_h264_qpel16_mc00_mmi;
>  c->avg_h264_qpel_pixels_tab[0][1] = ff_avg_h264_qpel16_mc10_mmi;
>  c->avg_h264_qpel_pixels_tab[0][2] = ff_avg_h264_qpel16_mc20_mmi;
>  c->avg_h264_qpel_pixels_tab[0][3] = ff_avg_h264_qpel16_mc30_mmi;
>  

Re: [FFmpeg-devel] [PATCH 07/11] avcodec/mips: loongson optimize h264qpel with mmi v2

2016-05-23 Thread 周晓勇
that is my fault and thank you for pointing out the mistake, it should be:


diff --git a/libavcodec/mips/h264qpel_mmi.c b/libavcodec/mips/h264qpel_mmi.c
index d641a51..737c68c 100644
--- a/libavcodec/mips/h264qpel_mmi.c
+++ b/libavcodec/mips/h264qpel_mmi.c
@@ -1901,9 +1901,9 @@ static void put_pixels8_l2_shift5_mmi(uint8_t *dst, 
int16_t *src16,
 : "memory"
 );
 
-src8  += 2L * src8Stride;
+src8  += 2 * src8Stride;
 src16 += 48;
-dst   += 2L * dstStride;
+dst   += 2 * dstStride;
 } while (h -= 2);
 }
 
@@ -2260,9 +2260,9 @@ static void avg_pixels8_l2_shift5_mmi(uint8_t *dst, 
int16_t *src16,
 : "memory"
 );
 
-src8  += 2L * src8Stride;
+src8  += 2 * src8Stride;
 src16 += 48;
-dst   += 2L * dstStride;
+dst   += 2 * dstStride;
 } while (b -= 2);
 }










At 2016-05-24 03:47:30, "Michael Niedermayer"  wrote:
>On Tue, May 17, 2016 at 03:08:13PM +0800, 周晓勇 wrote:
>> avcodec/mips/h264qpel_mmi: Version 2 of the optimizations for loongson mmi
>> 
>> 1. no longer use the register names directly and optimized code format
>> 2. to be compatible with O32, specify type of address variable with 
>> mips_reg and handle the address variable with PTR_ operator
>> 3. temporarily annotated func put_(avg_)h264_qpel16_hv_lowpass_mmi and 
>> related funcs which couldn't pass fate testing in O32 ABI
>> 4. use uld and mtc1 to workaround cpu 3A2000 gslwlc1 bug (gslwlc1 
>> instruction extension bug in O32 ABI)
>> 5. put_pixels_ an avg_pixels_ functions use hpeldsp optimizations instead
>
>[...]
>> @@ -1373,161 +1412,589 @@ static void put_h264_qpel4_hv_lowpass_mmi(uint8_t 
>> *dst, const uint8_t *src,
>>  }
>>  }
>>  
>> -static void put_h264_qpel8_hv_lowpass_mmi(uint8_t *dst, const uint8_t *src,
>> -int dstStride, int srcStride)
>> -{
>> -int16_t _tmp[104];
>> -int16_t *tmp = _tmp;
>> -int i;
>> -src -= 2*srcStride;
>> +static inline void put_h264_qpel8or16_hv1_lowpass_mmi(int16_t *tmp,
>> +const uint8_t *src, ptrdiff_t tmpStride, ptrdiff_t srcStride, int 
>> size)
>> +{
>> +int w = (size + 8) >> 2;
>> +double ftmp[11];
>> +uint64_t tmp0;
>> +uint64_t low32;
>> +
>> +src -= 2 * srcStride + 2;
>[...]
>
>> +src8  += 2L * src8Stride;
>> +src16 += 48;
>> +dst   += 2L * dstStride;
>
>why does this use long types  instead of ints while other code uses
>ints ?
>
>> +} while (h -= 2);
>> +}
>> +
>> +static void put_h264_qpel16_h_lowpass_l2_mmi(uint8_t *dst, const uint8_t 
>> *src,
>> +const uint8_t *src2, ptrdiff_t dstStride, ptrdiff_t src2Stride)
>> +{
>> +put_h264_qpel8_h_lowpass_l2_mmi(dst, src, src2, dstStride, src2Stride);
>> +put_h264_qpel8_h_lowpass_l2_mmi(dst + 8, src + 8, src2 + 8, dstStride,
>> +src2Stride);
>> +
>> +src += 8 * dstStride;
>> +dst += 8 * dstStride;
>> +src2 += 8 * src2Stride;
>
>
>
>[...]
>
>-- 
>Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
>I do not agree with what you have to say, but I'll defend to the death your
>right to say it. -- Voltaire
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 07/11] avcodec/mips: loongson optimize h264qpel with mmi v2

2016-05-23 Thread Michael Niedermayer
On Tue, May 17, 2016 at 03:08:13PM +0800, 周晓勇 wrote:
> avcodec/mips/h264qpel_mmi: Version 2 of the optimizations for loongson mmi
> 
> 1. no longer use the register names directly and optimized code format
> 2. to be compatible with O32, specify type of address variable with 
> mips_reg and handle the address variable with PTR_ operator
> 3. temporarily annotated func put_(avg_)h264_qpel16_hv_lowpass_mmi and 
> related funcs which couldn't pass fate testing in O32 ABI
> 4. use uld and mtc1 to workaround cpu 3A2000 gslwlc1 bug (gslwlc1 
> instruction extension bug in O32 ABI)
> 5. put_pixels_ an avg_pixels_ functions use hpeldsp optimizations instead

[...]
> @@ -1373,161 +1412,589 @@ static void put_h264_qpel4_hv_lowpass_mmi(uint8_t 
> *dst, const uint8_t *src,
>  }
>  }
>  
> -static void put_h264_qpel8_hv_lowpass_mmi(uint8_t *dst, const uint8_t *src,
> -int dstStride, int srcStride)
> -{
> -int16_t _tmp[104];
> -int16_t *tmp = _tmp;
> -int i;
> -src -= 2*srcStride;
> +static inline void put_h264_qpel8or16_hv1_lowpass_mmi(int16_t *tmp,
> +const uint8_t *src, ptrdiff_t tmpStride, ptrdiff_t srcStride, int 
> size)
> +{
> +int w = (size + 8) >> 2;
> +double ftmp[11];
> +uint64_t tmp0;
> +uint64_t low32;
> +
> +src -= 2 * srcStride + 2;
[...]

> +src8  += 2L * src8Stride;
> +src16 += 48;
> +dst   += 2L * dstStride;

why does this use long types  instead of ints while other code uses
ints ?

> +} while (h -= 2);
> +}
> +
> +static void put_h264_qpel16_h_lowpass_l2_mmi(uint8_t *dst, const uint8_t 
> *src,
> +const uint8_t *src2, ptrdiff_t dstStride, ptrdiff_t src2Stride)
> +{
> +put_h264_qpel8_h_lowpass_l2_mmi(dst, src, src2, dstStride, src2Stride);
> +put_h264_qpel8_h_lowpass_l2_mmi(dst + 8, src + 8, src2 + 8, dstStride,
> +src2Stride);
> +
> +src += 8 * dstStride;
> +dst += 8 * dstStride;
> +src2 += 8 * src2Stride;



[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I do not agree with what you have to say, but I'll defend to the death your
right to say it. -- Voltaire


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel