> I should add that running DSP tasks will move the CPU frequency to 330MHz,
> so this is probably not the answer to everyone's prayers with regard to
> freeing the CPU to do Xvid decoding or the like. There is a kernel patch
to
> not force the CPU to 330MHz (the DSP runs slower) and I'll do some testing
> to see if the DSP task can run in real-time at the lower DSP clock speed.
> Then it will be significantly more useful. 

Right, I've tested the SBC encoder task with the ARM running at 400MHz (and
therefore the DSP running at 133MHz (rather than its top speed of 220MHz
with the ARM running at 330MHz). Thanks qwerty for the link to the patch.
Anyway the task runs and plays music, but there are far too many drop-outs
and the sound gets progressively deeper on the run up to each dropout (due
to the encoder being too slow). So it certainly needs more optimisation
before it could be considered for this role.

> The change which has allowed it to encode an entire song rather than just
a
> few seconds was to move the input and output buffers from SDRAM (OMAP main
> memory) to SRAM (DSP fast single access memory). There are probably other
> things which would benefit from being moved, the sbc->priv data (or parts
> thereof) for one. This structure is pretty big so I allocated it in SDRAM,
> but at least parts of it might be better off in faster local memory. This
is
> something to look at.

I looked at this yesterday evening (thanks to derf, crashanddie, and others
for answering my C questions), trying to move some parts of the priv
structure to SARAM (sorry for the SRAM typo above). Unfortunately just
moving the bare minimum (the X array) won't happen as there's not enough
SARAM (so dsp_dld tells me). I don't know where it's all gone, anyone have
any ideas?

I currently have a fast_in[] array in SARAM to which I copy part of the data
from the slow (SDRAM) X[] array in the sbc_analyze_eight/four() fns before
it's used in the _sbc_analyze_eight/four() fns. These two fns are inlined,
so this memcpy is performed in every loop through the code (called something
like 150,000 times in total for my test file iirc). I'm not sure if the
faster manipulation of the data makes up for the copy overhead (it is a
faster 32bit copy at least). No clocks available, so I'll try removing this
"optimisation" and testing what it sounds like.

More importantly, if the whole X array could be placed in SARAM, there'd be
no need for my memcpy anyway and I'd have the benefits of faster access. I'm
not too sure how to analyse the code to work out how much data is allocated
in SARAM (to work out if I'm close to fitting it or have no chance).

Talking about SARAM, the input and output buffers (which the dsp task uses
for bulk transfers) are in SARAM, this is what I changed to make the task
play in real-time so this obviously makes a difference. It would be good if
I could avoid having to copy from the input buffer into one of the priv
structure arrays (which holds the PCM data). This is probably not really a
big saving compared to optimising the main loop as the read fn is not called
all that often (~5000 times for my test file), but every little helps and
obviously did before. The input array is currently read into a 2D array, I
need to check and see the array dimensions and whether I could write the
data into it directly (and place it in SARAM rather than the input array).
The output array has data packed into it, so I'm not sure I'll get any
savings from fiddling with this.

There may yet be other little bits of code which would benefit from being
moved to faster memory (or intrinsic-ised), it's just a bit hard to quantify
the memcpy slowdown vs. any possible memory access speedup gains without any
way of timing individual parts of the code :(

I'm currently revisiting my attempt to re-write the inner loop to use lots
of DSP intrinsics and the like in the hope that this will provide some sort
of speed up. Again to be tested with the mk1 ear ;)

Anyway, that's about where I am. If anyone wants to take a look at the code
and suggest possible locations for optimisations I'm all ears :)

Thanks for reading,

Cheers,


Simon

_______________________________________________
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Reply via email to