2012/4/5 Ronald S. Bultje <[email protected]>:
> This looks OK. Is there a performance benefit? I assume there isn't
> anything measurable, because the overhead is relatively low?

Yes, the split between prescaled/non-scaled cases shaves like 3 cycles
per call, and leads to a result within measure noise.

However, this makes it clear that such a distinction should be made
(I'm thinking of neon code here). For the SSSE3 case, the non-scaled
version is ~217 cycles, and the prescaled (producing identical
results) is ~141.

> As for the code, do please document the arrays in rv34dsp.h, so we
> don't have to look at the code to figure out what the difference
> between [0][0] and [1][1] is.

Done. The commit message is also more verbose, but in the end, it
would be interesting to know if it is good enough for someone
implementing code based on this.

Another (clearer?) solution would be to have:
rv40_weight_func rv40_nonscaled_biweight[2];
rv40_weight_func rv40_prescaled_biweight[2];
in RV34DSPContext
and have function pointers set to the correct values in RV34DecContex.

Christophe

Attachment: 0001-rv40dsp-implement-prescaled-versions-for-biweight.patch
Description: Binary data

_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to