LGTM Mickael
2015-02-04 13:39 GMT+01:00 Christophe Gisquet <christophe.gisq...@gmail.com> : > Hi, > > 2015-02-04 4:55 GMT+01:00 James Almer <jamr...@gmail.com>: > > Original x86 intrinsics code and initial yasm port by Pierre-Edouard > Lepere. > > Refactoring and optimizations by James Almer. > > Add your own copyright to this file then. > > > Width 32 > > 158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips > > 5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 > skips > > 2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips > > > > Width 64 > > 705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips > > 19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 > skips > > 10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 > skips > > Are the first number for each case from before you split out the > restore part? Otherwise, that's gruesome. > > > - void (*sao_edge_filter)(uint8_t *_dst, uint8_t *_src, ptrdiff_t > stride_dst, > > - ptrdiff_t stride_src, int16_t > *sao_offset_val, int sao_eo_class, > > - int width, int height); > > + void (*sao_edge_filter[5])(uint8_t *_dst, uint8_t *_src, ptrdiff_t > stride_dst, > > + ptrdiff_t stride_src, int16_t > *sao_offset_val, int sao_eo_class, > > + int width, int height); > > Maybe add a comment on top of that to indicate that _dst is > 16-byte-aligned? > > Also, src and stride_src are so that the buffer is 32-byte-aligned, > because of: > stride_dst = 2*MAX_PB_SIZE + FF_INPUT_BUFFER_PADDING_SIZE; > dst = lc->edge_emu_buffer + stride_dst + > FF_INPUT_BUFFER_PADDING_SIZE; > in hevc_filter.c, but I'm not sure how much it is a benefit here, or > often it is helping here. Don't hesitate to modify them if need be. > > > +%else ; ARCH_X86_32 > > +cglobal hevc_sao_edge_filter_%1_8, 1, 7, 8, dst, src, dststride, > srcstride, a_stride, b_stride, height > > As seen from above, srcstride is constant and is 2*MAX_PB_SIZE + > FF_INPUT_BUFFER_PADDING_SIZE. > That may save you one whole gpr. Not really useful here, but I think > you are more limited for the>8 bits case. > If you want to exploit this, also add it above void (*sao_edge_filter[5]) > > No comment on the actual assembly, it looks fine. > > -- > Christophe > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel