On Thu, May 12, 2016 at 10:01 AM, Benoit Fouet <benoit.fo...@free.fr> wrote:
> Hi, > > I mostly have nits remarks. > > On 11/05/2016 18:39, Matthieu Bouron wrote: > >> From: Matthieu Bouron <matthieu.bou...@stupeflix.com> >> >> > [...] > > diff --git a/libswresample/arm/resample.S b/libswresample/arm/resample.S >> new file mode 100644 >> index 0000000..13462e3 >> --- /dev/null >> +++ b/libswresample/arm/resample.S >> @@ -0,0 +1,77 @@ >> >> [...] >> >> +function ff_resample_common_apply_filter_x4_float_neon, export=1 >> + vmov.f32 q0, #0.0 @ >> accumulator >> +1: vld1.32 {q1}, [r1]! @ >> src >> + vld1.32 {q2}, [r2]! @ >> filter >> + vmla.f32 q0, q1, q2 @ >> src + {0..3} * filter + {0..3} >> > > nit: the comment could be "accu += src[0..3] . filter[0..3]" > same for the other ones below > > [...] > > + subs r3, #4 @ >> filter_length -= 4 >> + bgt 1b @ >> loop until filter_length >> + vpadd.f32 d0, d0, d1 @ >> pair adding of the 4x32-bit accumulated values >> + vpadd.f32 d0, d0, d0 @ >> pair adding of the 4x32-bit accumulator values >> + vst1.32 {d0[0]}, [r0] @ >> write accumulator >> + mov pc, lr >> +endfunc >> + >> +function ff_resample_common_apply_filter_x8_float_neon, export=1 >> + vmov.f32 q0, #0.0 @ >> accumulator >> +1: vld1.32 {q1}, [r1]! @ >> src1 >> + vld1.32 {q2}, [r2]! @ >> filter1 >> + vld1.32 {q8}, [r1]! @ >> src2 >> + vld1.32 {q9}, [r2]! @ >> filter2 >> + vmla.f32 q0, q1, q2 @ >> src1 + {0..3} * filter1 + {0..3} >> + vmla.f32 q0, q8, q9 @ >> src2 + {0..3} * filter2 + {0..3} >> > > instead of using src1 and src2, you may want to use src[0..3] and src[4..7] > so, if I reuse the formulation I proposed above: > accu += src[0..3] . filter[0..3] > accu += src[4..7] . filter[4..7] > Fixed locally (as well as the other case you mentionned) with: - vmla.f32 q0, q1, q2 @ src1 + {0..3} * filter1 + {0..3} - vmla.f32 q0, q8, q9 @ src2 + {0..3} * filter2 + {0..3} + vmla.f32 q0, q1, q2 @ accumulator += src1 + {0..3} * filter1 + {0..3} + vmla.f32 q0, q8, q9 @ accumulator += src2 + {4..7} * filter2 + {4..7} I prefer to use + {0..3} instead of [0..3] to make the comments consistent with what has been done in swscale/arm. > > + subs r3, #8 @ >> filter_length -= 4 >> > > -= 8 > Fixed locally. > > [...] > > diff --git a/libswresample/arm/resample_init.c >> b/libswresample/arm/resample_init.c >> new file mode 100644 >> index 0000000..c817d03 >> --- /dev/null >> +++ b/libswresample/arm/resample_init.c >> >> [...] >> >> +static int ff_resample_common_##TYPE##_neon(ResampleContext *c, void >> *dest, const void *source, \ >> + int n, int update_ctx) >> \ >> +{ >> \ >> + DELEM *dst = dest; >> \ >> + const DELEM *src = source; >> \ >> + int dst_index; >> \ >> + int index= c->index; >> \ >> + int frac= c->frac; >> \ >> + int sample_index = index >> c->phase_shift; >> \ >> + int x4_aligned_filter_length = c->filter_length & ~3; >> \ >> + int x8_aligned_filter_length = c->filter_length & ~7; >> \ >> + >> \ >> + index &= c->phase_mask; >> \ >> + for (dst_index = 0; dst_index < n; dst_index++) { >> \ >> + FELEM *filter = ((FELEM *) c->filter_bank) + c->filter_alloc * >> index; \ >> + >> \ >> + FELEM2 val=0; >> \ >> + int i = 0; >> \ >> + if (x8_aligned_filter_length >= 8) { >> \ >> + ff_resample_common_apply_filter_x8_##TYPE##_neon(&val, >> &src[sample_index], \ >> + filter, >> x8_aligned_filter_length); \ >> + i += x8_aligned_filter_length; >> \ >> + >> \ >> + } else if (x4_aligned_filter_length >= 4) { >> \ >> > > do you think there could be a gain processing the remainder of the > 8-aligned part through the 4-aligned part of the code? e.g. for a filter > length of 15, that would make: > - one run of the 8-aligned > - one run of the 4-aligned > - 3 C loops > As you stated filter length seems to commonly be 32, I guess that wouldn't > be easy to benchmark, though. > I'll see if I could trigger a case where the filter_length is 15 and come up with a benchmark. If there is a performance gain, would you be ok if it is implemented as a separate patch ? If there is no objection, i'll push the updated patch in one day. Thanks for the review, Matthieu [...] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel