>>>>> "Alan" == Alan W Black <[email protected]> writes:
> David Kuehling wrote: [..] >> There is also AFAIK no Linux kernel support, so process switching >> won't save/restore the simd register file. So no multi-tasking with >> SIMD-using programs (at least not without weird side-effects). > Ok, this would make it very hard to use. Not, not really. As you'd be the only one bothering with SIMD instructions, nobody else would touch the register file, so your application would run correctly :) >> As these SIMD instructions are completely proprietary I'd rather not >> invest time into optimizing your code around them. What's the >> problem with synthesis that is so CPU-intensive? Maybe some bunch of >> hand-coded MIPS-assembler would already do the job? Or maybe some >> algorithmic optimizations can solve the problem without resorting to >> a "brute force" machine code optimization approach? Can you point us >> to the specific C-code that needs tuning? Looks like a fun problem. > In statistical parametric synthesis, we use a computationally > expensive process called MLSA (cst_mlsa.c) with takes about 90% of the > time to synthesize. Looking at http://flite.sourcearchive.com/documentation/1.4-release-4/cst__mlsa_8c_source.html I notice a lot of code does floating point computation? Is the floating point part significant to total runtime? The Jz47xx does not have an FPU, and the SIMD unit is also integer-only, so that won't help you much. Recoding parts in integer arithmic might help if the float-part has any significant impact on performance. > It really needs about 800MHz (on an ARM) to be > real-time. There are a lot of ARMs, some without FPU, some with crappy FPU (e.g. imx51 with non-pipelined VFP unit). Some distributions' tool chains won't even compile FPU instructions per default (Debian armel, I'm looking at you :). Just to say, "800 MHz ARM" leaves a lot room for speculation :) > With restricting the parameter order and number of other interesting > hacks we can get something fast enough on a 400MHz ARM (an HTC > TytnII). Although there probably is more optimization possible with > that algorithm, we've also been looking at different parameterization > of the speech, that still has the same predictive capabilities > (e.g. LSPs) but require less computation for resynthesis. The > Nanonote although *I*'d like that to work, 600MHz+ devices (Raspberry > PI, MK802 and various android phones) are probably our real target. At first glance, HTC TytnII seems to have no FPU, BTW. Quite remarkable that something as low-data rate as speech-quality audio is so difficult to generate... cheers, David -- GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk.gpg Fingerprint: B17A DC95 D293 657B 4205 D016 7DEF 5323 C174 7D40
pgp5xPJj07Qkn.pgp
Description: PGP signature
_______________________________________________ Qi Hardware Discussion List Mail to list (members only): [email protected] Subscribe or Unsubscribe: http://lists.en.qi-hardware.com/mailman/listinfo/discussion

