On 2/27/2018 9:58 PM, Hendrik Leppkes wrote:
> On Tue, Feb 27, 2018 at 9:35 PM, David Murmann <david.murm...@btf.de> wrote:
>> Quantization scaling seems to be a slight bottleneck,
>> this change allows the compiler to more easily vectorize
>> the loop. This improves total encoding performance in my
>> tests by about 10-20%.
>>
>> Signed-off-by: David Murmann <da...@btf.de>
>> ---
>>   libavcodec/proresenc_anatoliy.c | 12 ++++++++----
>>   1 file changed, 8 insertions(+), 4 deletions(-)
>>
[...]
>> +    for (j = 0; j < blocks_per_slice; j++) {
>> +        for (i = 0; i < 64; i++) {
>> +            block[i] = (float)in[(j << 6) + i] / (float)qmat[i];
>> +        }
>> +
>> +        for (i = 1; i < 64; i++) {
>> +            int val = block[progressive_scan[i]];
>>               if (val) {
>> encode_codeword(pb, run, run_to_cb[FFMIN(prev_run, 15)]);
>
> Usually, using float is best avoided. Did you test re-factoring the
> loop structure without changing it to float?

Yes, the vector instructions don't have integer division, AFAIK, and the
compiler just generates a loop with idivs. This is quite a bit slower
than converting to float, dividing and converting back, if the compiler
uses vector instructions. In the general case this wouldn't be exact,
but since the input values are int16 they should losslessly fit into
float32. On platforms where this auto-vectorization fails this might
actually be quite a bit slower, but I have not seen that in my tests
(though I have only tested on x86_64).

--
David Murmann

da...@btf.de
Telefon +49 (0) 221 82008710
Fax +49 (0) 221 82008799

http://btf.de/

--
btf GmbH | Leyendeckerstr. 27, 50825 Köln | +49 (0) 221 82 00 87 10
Geschäftsführer: Philipp Käßbohrer & Matthias Murmann | HR Köln | HRB 74707
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to