>>>>> "Alan" == Alan W Black <[email protected]> writes:

> David Kuehling wrote:
[..]
>> There is also AFAIK no Linux kernel support, so process switching
>> won't save/restore the simd register file.  So no multi-tasking with
>> SIMD-using programs (at least not without weird side-effects).

> Ok, this would make it very hard to use.  

Not, not really.  As you'd be the only one bothering with SIMD
instructions, nobody else would touch the register file, so your
application would run correctly :)

>> As these SIMD instructions are completely proprietary I'd rather not
>> invest time into optimizing your code around them.  What's the
>> problem with synthesis that is so CPU-intensive?  Maybe some bunch of
>> hand-coded MIPS-assembler would already do the job?  Or maybe some
>> algorithmic optimizations can solve the problem without resorting to
>> a "brute force" machine code optimization approach?  Can you point us
>> to the specific C-code that needs tuning?  Looks like a fun problem.

> In statistical parametric synthesis, we use a computationally
> expensive process called MLSA (cst_mlsa.c) with takes about 90% of the
> time to synthesize.  

Looking at 

http://flite.sourcearchive.com/documentation/1.4-release-4/cst__mlsa_8c_source.html

I notice a lot of code does floating point computation?  Is the floating
point part significant to total runtime?  The Jz47xx does not have an
FPU, and the SIMD unit is also integer-only, so that won't help you
much.  Recoding parts in integer arithmic might help if the float-part
has any significant impact on performance.

> It really needs about 800MHz (on an ARM) to be
> real-time. 

There are a lot of ARMs, some without FPU, some with crappy FPU
(e.g. imx51 with non-pipelined VFP unit).  Some distributions' tool
chains won't even compile FPU instructions per default (Debian armel,
I'm looking at you :).  Just to say, "800 MHz ARM" leaves a lot room for
speculation :)

> With restricting the parameter order and number of other interesting
> hacks we can get something fast enough on a 400MHz ARM (an HTC
> TytnII). Although there probably is more optimization possible with
> that algorithm, we've also been looking at different parameterization
> of the speech, that still has the same predictive capabilities
> (e.g. LSPs) but require less computation for resynthesis.  The
> Nanonote although *I*'d like that to work, 600MHz+ devices (Raspberry
> PI, MK802 and various android phones) are probably our real target.

At first glance, HTC TytnII seems to have no FPU, BTW.

Quite remarkable that something as low-data rate as speech-quality audio
is so difficult to generate...

cheers,

David
-- 
GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40

Attachment: pgp5xPJj07Qkn.pgp
Description: PGP signature

_______________________________________________
Qi Hardware Discussion List
Mail to list (members only): [email protected]
Subscribe or Unsubscribe: 
http://lists.en.qi-hardware.com/mailman/listinfo/discussion

Reply via email to