Hi,
On Thu, May 10, 2012 at 9:35 AM, Ronald S. Bultje <[email protected]> wrote:
> On Wed, May 9, 2012 at 11:00 PM, Aneesh Dogra <[email protected]> wrote:
>> @@ -433,8 +440,7 @@ static int tta_decode_frame(AVCodecContext *avctx, void
>> *data,
>> // convert to output buffer
>> if (s->bps == 2) {
>> int16_t *samples = (int16_t *)s->frame.data[0];
>> - for (p = s->decode_buffer; p < s->decode_buffer + (framelen *
>> s->channels); p++)
>> - *samples++ = *p;
>> + s->fmt_conv.int32_to_int16_clipped(samples, s->decode_buffer,
>> samples_aligned * s->channels);
>
> Not a bad idea. The downside is that we're still doing a memory
> round-trip, i.e. "write stuff to memory in int32", "read int32,
> convert to int16, write as int16". Is there some way we can write it
> directly as int16, or is there a particular reason not to?
To elaborate a little more, check this code:
// fixed order prediction
#define PRED(x, k) (int32_t)((((uint64_t)x << k) - x) >> k)
switch (s->bps) {
case 1: *p += PRED(*predictor, 4); break;
case 2:
case 3: *p += PRED(*predictor, 5); break;
case 4: *p += *predictor; break;
}
*predictor = *p;
// flip channels
if (cur_chan < (s->channels-1))
cur_chan++;
else {
// decorrelate in case of stereo integer
if (s->channels > 1) {
int32_t *r = p - 1;
for (*p += *r / 2; r > p - s->channels; r--)
*r = *(r + 1) - *r;
}
cur_chan = 0;
}
This does:
for 24bps: predict, for stereo decorrelate, store in output buffer,
then load, shift and store in output buffer again
for 16bps mono: predict, store as int32 in temp buffer (but only 16bps is used)
for 16bps stereo: predict, store as int32 in temp buffer (but only
16bps for the first channel, and 17bps for each subsequent channel, is
used)
then for both 16bps forms convert to int16 and store in output buffer
What you want to do:
for 24bps mono: predict, shift, store as int32 in output buffer
for 24bps stereo: predict two samples, decorrelate second, shift,
store both in output buffer
for 16bps mono: predict, store as int16 in output buffer
for 16bps stereo: predict two samples, decorrelate second, and store
both as int16 in output buffer (this could be done for 24bps stereo
also)
This prevents several memory roundtrips and should lead to much better
overall performance.
A pretty typical way of doing this without duplicating all too much
code is to make the sample synthesis loop its own function, and make
it take two arguments: is_24_bps and is_stereo. Then read second
sample under a if(stereo) and same for is_24_bps. then mark this as
av_always_inline, and make 4 functions that call this with fixed
arguments: synth_24bps_stereo/mono and synth_16bps_stereo/mono. Then
in the main decode function call one of these 4 depending on settings.
Performance will now overall be optimal. Ask me (or Justin) on IRC for
help if you need more.
Ronald
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel