On 10/19/2015 5:00 PM, Timothy Gu wrote: > 4% to 35% faster depending on the width. > --- > libavcodec/x86/huffyuvencdsp.asm | 31 ++++++++++++++++++++----------- > libavcodec/x86/huffyuvencdsp_mmx.c | 8 +++++++- > 2 files changed, 27 insertions(+), 12 deletions(-) > > diff --git a/libavcodec/x86/huffyuvencdsp.asm > b/libavcodec/x86/huffyuvencdsp.asm > index 97de7e9..9625fbe 100644 > --- a/libavcodec/x86/huffyuvencdsp.asm > +++ b/libavcodec/x86/huffyuvencdsp.asm > @@ -27,27 +27,27 @@ > > section .text > > -INIT_MMX mmx > ; void ff_diff_bytes_mmx(uint8_t *dst, const uint8_t *src1, const uint8_t > *src2, > ; intptr_t w); > -cglobal diff_bytes, 4,6,0, dst, src1, src2, w, i > +%macro DIFF_BYTES 0 > +cglobal diff_bytes, 4,6,2, dst, src1, src2, w, i > xor iq, iq > - cmp wq, 16 > + cmp wq, mmsize * 2 > jb .loop2 > - sub wq, 15 > + sub wq, mmsize * 2 - 1 > .loop: > - mova m0, [src2q + iq] > - mova m1, [src1q + iq] > + movu m0, [src2q + iq] > + movu m1, [src1q + iq]
If dst and/or src can sometimes be aligned, check how ff_add_hfyu_left_pred (also huffyuvdsp.asm) handles it. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel