Re: [FFmpeg-devel] [PATCH 07/11] avcodec/mips: loongson optimize h264qpel with mmi v2
On Wed, May 25, 2016 at 05:40:44PM +0800, 周晓勇 wrote: > i have fix up the error functions in h264qpel_mmi, please try the new patch > below > > > --- > From a55e3a93a30226e3f0fb7d2d8ac43e74b5eae5d6 Mon Sep 17 00:00:00 2001 > From: ZhouXiaoyong> Date: Wed, 25 May 2016 16:07:38 +0800 > Subject: [PATCH] avcodec/mips/h264qpel_mmi.c: Version 2 of the optimizations > for loongson mmi > > > 1. no longer use the register names directly and optimized code format > 2. to be compatible with O32, specify type of address variable with mips_reg > and handle the address variable with PTR_ operator > 3. use uld and mtc1 to workaround cpu 3A2000 gslwlc1 bug (gslwlc1 instruction > extension bug in O32 ABI) > 4. h264qpel use hepldsp optimizations > --- > libavcodec/mips/h264qpel_mmi.c | 3824 > +++- > 1 file changed, 2225 insertions(+), 1599 deletions(-) applied thanks [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB When you are offended at any man's fault, turn to yourself and study your own failings. Then you will forget your anger. -- Epictetus signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 07/11] avcodec/mips: loongson optimize h264qpel with mmi v2
these functions couldn't pass fate-h264 test neither O32 nor N64 ABI but the earlier optimization (version 1) has too much bug in O32 ABI i will fix the bugs in function put_h264_qpel16_hv_lowpass_mmi and avg_h264_qpel16_hv_lowpass_mmi in the future have you installed the lastest fedora21 for loongson? http://mirror.lemote.com/fedora/live/Fedora-MATE-Live-2.iso http://mirror.lemote.com/fedora/live/Fedora-MATE-Live-2.iso.md5 use this script to make live-usb installer: http://mirror.lemote.com/fedora/live/makeliveusb tips:make sure the /dev/sda1 is ext2 or ext3, as pmon not support ext4 to boot kernel > >why do these functions not work ? > > >[...] >-- >Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > >The misfortune of the wise is better than the prosperity of the fool. >-- Epicurus ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 07/11] avcodec/mips: loongson optimize h264qpel with mmi v2
On Tue, May 17, 2016 at 03:08:13PM +0800, 周晓勇 wrote: > avcodec/mips/h264qpel_mmi: Version 2 of the optimizations for loongson mmi > > 1. no longer use the register names directly and optimized code format > 2. to be compatible with O32, specify type of address variable with > mips_reg and handle the address variable with PTR_ operator > 3. temporarily annotated func put_(avg_)h264_qpel16_hv_lowpass_mmi and > related funcs which couldn't pass fate testing in O32 ABI > 4. use uld and mtc1 to workaround cpu 3A2000 gslwlc1 bug (gslwlc1 > instruction extension bug in O32 ABI) > 5. put_pixels_ an avg_pixels_ functions use hpeldsp optimizations instead > > > > > > > > > 在 2016-05-13 18:05:51,"周晓勇"写道: > > From 151ccd1cefff1887b58166113e65893bcc2e724d Mon Sep 17 00:00:00 2001 > From: Zhou Xiaoyong > Date: Thu, 12 May 2016 10:46:09 +0800 > Subject: [PATCH 07/11] avcodec/mips: loongson optimize h264qpel with mmi v2 > > > --- > libavcodec/mips/h264qpel_init_mips.c | 28 + > libavcodec/mips/h264qpel_mmi.c | 3823 > -- > 2 files changed, 2252 insertions(+), 1599 deletions(-) > > > diff --git a/libavcodec/mips/h264qpel_init_mips.c > b/libavcodec/mips/h264qpel_init_mips.c > index 92219f8..d97e9cc 100644 > --- a/libavcodec/mips/h264qpel_init_mips.c > +++ b/libavcodec/mips/h264qpel_init_mips.c > @@ -133,38 +133,52 @@ static av_cold void h264qpel_init_msa(H264QpelContext > *c, int bit_depth) > static av_cold void h264qpel_init_mmi(H264QpelContext *c, int bit_depth) > { > if (8 == bit_depth) { > +//FIXME put_h264_qpel16_hv_lowpass_mmi > c->put_h264_qpel_pixels_tab[0][0] = ff_put_h264_qpel16_mc00_mmi; > c->put_h264_qpel_pixels_tab[0][1] = ff_put_h264_qpel16_mc10_mmi; > c->put_h264_qpel_pixels_tab[0][2] = ff_put_h264_qpel16_mc20_mmi; > c->put_h264_qpel_pixels_tab[0][3] = ff_put_h264_qpel16_mc30_mmi; > c->put_h264_qpel_pixels_tab[0][4] = ff_put_h264_qpel16_mc01_mmi; > c->put_h264_qpel_pixels_tab[0][5] = ff_put_h264_qpel16_mc11_mmi; > +#if 0 > c->put_h264_qpel_pixels_tab[0][6] = ff_put_h264_qpel16_mc21_mmi; > +#endif > c->put_h264_qpel_pixels_tab[0][7] = ff_put_h264_qpel16_mc31_mmi; > c->put_h264_qpel_pixels_tab[0][8] = ff_put_h264_qpel16_mc02_mmi; > +#if 0 > c->put_h264_qpel_pixels_tab[0][9] = ff_put_h264_qpel16_mc12_mmi; > c->put_h264_qpel_pixels_tab[0][10] = ff_put_h264_qpel16_mc22_mmi; > c->put_h264_qpel_pixels_tab[0][11] = ff_put_h264_qpel16_mc32_mmi; > +#endif > c->put_h264_qpel_pixels_tab[0][12] = ff_put_h264_qpel16_mc03_mmi; > c->put_h264_qpel_pixels_tab[0][13] = ff_put_h264_qpel16_mc13_mmi; > +#if 0 > c->put_h264_qpel_pixels_tab[0][14] = ff_put_h264_qpel16_mc23_mmi; > +#endif > c->put_h264_qpel_pixels_tab[0][15] = ff_put_h264_qpel16_mc33_mmi; > > +//FIXME put_h264_qpel16_hv_lowpass_mmi > c->put_h264_qpel_pixels_tab[1][0] = ff_put_h264_qpel8_mc00_mmi; > c->put_h264_qpel_pixels_tab[1][1] = ff_put_h264_qpel8_mc10_mmi; > c->put_h264_qpel_pixels_tab[1][2] = ff_put_h264_qpel8_mc20_mmi; > c->put_h264_qpel_pixels_tab[1][3] = ff_put_h264_qpel8_mc30_mmi; > c->put_h264_qpel_pixels_tab[1][4] = ff_put_h264_qpel8_mc01_mmi; > c->put_h264_qpel_pixels_tab[1][5] = ff_put_h264_qpel8_mc11_mmi; > +#if 0 > c->put_h264_qpel_pixels_tab[1][6] = ff_put_h264_qpel8_mc21_mmi; > +#endif > c->put_h264_qpel_pixels_tab[1][7] = ff_put_h264_qpel8_mc31_mmi; > c->put_h264_qpel_pixels_tab[1][8] = ff_put_h264_qpel8_mc02_mmi; > +#if 0 > c->put_h264_qpel_pixels_tab[1][9] = ff_put_h264_qpel8_mc12_mmi; > c->put_h264_qpel_pixels_tab[1][10] = ff_put_h264_qpel8_mc22_mmi; > c->put_h264_qpel_pixels_tab[1][11] = ff_put_h264_qpel8_mc32_mmi; > +#endif > c->put_h264_qpel_pixels_tab[1][12] = ff_put_h264_qpel8_mc03_mmi; > c->put_h264_qpel_pixels_tab[1][13] = ff_put_h264_qpel8_mc13_mmi; > +#if 0 > c->put_h264_qpel_pixels_tab[1][14] = ff_put_h264_qpel8_mc23_mmi; > +#endif > c->put_h264_qpel_pixels_tab[1][15] = ff_put_h264_qpel8_mc33_mmi; > > c->put_h264_qpel_pixels_tab[2][0] = ff_put_h264_qpel4_mc00_mmi; > @@ -184,38 +198,52 @@ static av_cold void h264qpel_init_mmi(H264QpelContext > *c, int bit_depth) > c->put_h264_qpel_pixels_tab[2][14] = ff_put_h264_qpel4_mc23_mmi; > c->put_h264_qpel_pixels_tab[2][15] = ff_put_h264_qpel4_mc33_mmi; > > +//FIXME avg_h264_qpel16_hv_lowpass_mmi > c->avg_h264_qpel_pixels_tab[0][0] = ff_avg_h264_qpel16_mc00_mmi; > c->avg_h264_qpel_pixels_tab[0][1] = ff_avg_h264_qpel16_mc10_mmi; > c->avg_h264_qpel_pixels_tab[0][2] = ff_avg_h264_qpel16_mc20_mmi; > c->avg_h264_qpel_pixels_tab[0][3] = ff_avg_h264_qpel16_mc30_mmi; >
Re: [FFmpeg-devel] [PATCH 07/11] avcodec/mips: loongson optimize h264qpel with mmi v2
that is my fault and thank you for pointing out the mistake, it should be: diff --git a/libavcodec/mips/h264qpel_mmi.c b/libavcodec/mips/h264qpel_mmi.c index d641a51..737c68c 100644 --- a/libavcodec/mips/h264qpel_mmi.c +++ b/libavcodec/mips/h264qpel_mmi.c @@ -1901,9 +1901,9 @@ static void put_pixels8_l2_shift5_mmi(uint8_t *dst, int16_t *src16, : "memory" ); -src8 += 2L * src8Stride; +src8 += 2 * src8Stride; src16 += 48; -dst += 2L * dstStride; +dst += 2 * dstStride; } while (h -= 2); } @@ -2260,9 +2260,9 @@ static void avg_pixels8_l2_shift5_mmi(uint8_t *dst, int16_t *src16, : "memory" ); -src8 += 2L * src8Stride; +src8 += 2 * src8Stride; src16 += 48; -dst += 2L * dstStride; +dst += 2 * dstStride; } while (b -= 2); } At 2016-05-24 03:47:30, "Michael Niedermayer"wrote: >On Tue, May 17, 2016 at 03:08:13PM +0800, 周晓勇 wrote: >> avcodec/mips/h264qpel_mmi: Version 2 of the optimizations for loongson mmi >> >> 1. no longer use the register names directly and optimized code format >> 2. to be compatible with O32, specify type of address variable with >> mips_reg and handle the address variable with PTR_ operator >> 3. temporarily annotated func put_(avg_)h264_qpel16_hv_lowpass_mmi and >> related funcs which couldn't pass fate testing in O32 ABI >> 4. use uld and mtc1 to workaround cpu 3A2000 gslwlc1 bug (gslwlc1 >> instruction extension bug in O32 ABI) >> 5. put_pixels_ an avg_pixels_ functions use hpeldsp optimizations instead > >[...] >> @@ -1373,161 +1412,589 @@ static void put_h264_qpel4_hv_lowpass_mmi(uint8_t >> *dst, const uint8_t *src, >> } >> } >> >> -static void put_h264_qpel8_hv_lowpass_mmi(uint8_t *dst, const uint8_t *src, >> -int dstStride, int srcStride) >> -{ >> -int16_t _tmp[104]; >> -int16_t *tmp = _tmp; >> -int i; >> -src -= 2*srcStride; >> +static inline void put_h264_qpel8or16_hv1_lowpass_mmi(int16_t *tmp, >> +const uint8_t *src, ptrdiff_t tmpStride, ptrdiff_t srcStride, int >> size) >> +{ >> +int w = (size + 8) >> 2; >> +double ftmp[11]; >> +uint64_t tmp0; >> +uint64_t low32; >> + >> +src -= 2 * srcStride + 2; >[...] > >> +src8 += 2L * src8Stride; >> +src16 += 48; >> +dst += 2L * dstStride; > >why does this use long types instead of ints while other code uses >ints ? > >> +} while (h -= 2); >> +} >> + >> +static void put_h264_qpel16_h_lowpass_l2_mmi(uint8_t *dst, const uint8_t >> *src, >> +const uint8_t *src2, ptrdiff_t dstStride, ptrdiff_t src2Stride) >> +{ >> +put_h264_qpel8_h_lowpass_l2_mmi(dst, src, src2, dstStride, src2Stride); >> +put_h264_qpel8_h_lowpass_l2_mmi(dst + 8, src + 8, src2 + 8, dstStride, >> +src2Stride); >> + >> +src += 8 * dstStride; >> +dst += 8 * dstStride; >> +src2 += 8 * src2Stride; > > > >[...] > >-- >Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > >I do not agree with what you have to say, but I'll defend to the death your >right to say it. -- Voltaire ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 07/11] avcodec/mips: loongson optimize h264qpel with mmi v2
On Tue, May 17, 2016 at 03:08:13PM +0800, 周晓勇 wrote: > avcodec/mips/h264qpel_mmi: Version 2 of the optimizations for loongson mmi > > 1. no longer use the register names directly and optimized code format > 2. to be compatible with O32, specify type of address variable with > mips_reg and handle the address variable with PTR_ operator > 3. temporarily annotated func put_(avg_)h264_qpel16_hv_lowpass_mmi and > related funcs which couldn't pass fate testing in O32 ABI > 4. use uld and mtc1 to workaround cpu 3A2000 gslwlc1 bug (gslwlc1 > instruction extension bug in O32 ABI) > 5. put_pixels_ an avg_pixels_ functions use hpeldsp optimizations instead [...] > @@ -1373,161 +1412,589 @@ static void put_h264_qpel4_hv_lowpass_mmi(uint8_t > *dst, const uint8_t *src, > } > } > > -static void put_h264_qpel8_hv_lowpass_mmi(uint8_t *dst, const uint8_t *src, > -int dstStride, int srcStride) > -{ > -int16_t _tmp[104]; > -int16_t *tmp = _tmp; > -int i; > -src -= 2*srcStride; > +static inline void put_h264_qpel8or16_hv1_lowpass_mmi(int16_t *tmp, > +const uint8_t *src, ptrdiff_t tmpStride, ptrdiff_t srcStride, int > size) > +{ > +int w = (size + 8) >> 2; > +double ftmp[11]; > +uint64_t tmp0; > +uint64_t low32; > + > +src -= 2 * srcStride + 2; [...] > +src8 += 2L * src8Stride; > +src16 += 48; > +dst += 2L * dstStride; why does this use long types instead of ints while other code uses ints ? > +} while (h -= 2); > +} > + > +static void put_h264_qpel16_h_lowpass_l2_mmi(uint8_t *dst, const uint8_t > *src, > +const uint8_t *src2, ptrdiff_t dstStride, ptrdiff_t src2Stride) > +{ > +put_h264_qpel8_h_lowpass_l2_mmi(dst, src, src2, dstStride, src2Stride); > +put_h264_qpel8_h_lowpass_l2_mmi(dst + 8, src + 8, src2 + 8, dstStride, > +src2Stride); > + > +src += 8 * dstStride; > +dst += 8 * dstStride; > +src2 += 8 * src2Stride; [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB I do not agree with what you have to say, but I'll defend to the death your right to say it. -- Voltaire signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel