>>>>> "Alan" == Alan W Black <[email protected]> writes:
[..] > Hmm, "yes my code is thread safe, except for using global registers" > :-). I agree this might be possible and actually work but ... :) [..] >> Looking at >> >> http://flite.sourcearchive.com/documentation/1.4-release-4/cst__mlsa_8c_source.html >> >> I notice a lot of code does floating point computation? Is the >> floating point part significant to total runtime? The Jz47xx does >> not have an FPU, and the SIMD unit is also integer-only, so that >> won't help you much. Recoding parts in integer arithmic might help >> if the float-part has any significant impact on performance. > Yes it is floating point intensive. The non-mlsa code (the diphones > and unit selection) code uses almost no floating point and is integer > optimized, that does run fast enough on the Nanonote, but the > synthesis quality (and convenience) isn't as good. Well, floating point emulation easily costs you factor of 10-100 in performance. Why not move the MLAS to integer math, instead of seeking for low-level SIMD optimizations? Integer arithmetic will also be helpful with CPUs that *do* have an FPU, as integer operation latency is generally lower than FPU latency, so even on high performance FPUs, much code may just be stalled, waiting on results to become available instead of actually doing any computing. Isn't 32 bit enough? What about 64 bit 'long long'? MIPS doesn't have native add-with-carry support, so 64-bit adds may be twice or three times as expensive as on ARM, still much cheaper than doing floating point math. On the other hand MIPS *does* have multiply-accumulate with 32 bit inputs and 64 bit output + accumulator, if that helps (taking 2 cycles on the jz4720, IIRC). BTW moving your code from double to floats may also improve performance with floating point emulation a lot. >> Quite remarkable that something as low-data rate as speech-quality >> audio is so difficult to generate... > Well us speech synthesis people seem to try to compete with speech > recognition people to use more CPU time :-). That's not quite true > but speech synthesis researchers rarely (except me) cares about end > processor performance. Signal generation for models is much harder > than simple signal reconstruction (like mp3 encoding, or simple LPCs) > It therefore might be more worthwhile to look for a better soft float > option. I remember back when we used ipaq 38xx's the floating point > performance under linux was much worse than the performance under > WinCE due to a better soft float optimization. Also I note how our > statistical synthesizers got much better on Google Nexus 1's when the > SDK compiler was upgraded to generate better float code. Maybe it's not better float code, only less standards compliant float code? You probably can improve performance a lot by sacrificing some accuracy and error checks. Still I see no sense in emulating a hardware-optimized compactly-stored floating point format, on a hardware that does have no FPU. Roll your own FP, or better yet, just go for integer math! cheers, David -- GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk.gpg Fingerprint: B17A DC95 D293 657B 4205 D016 7DEF 5323 C174 7D40
pgpbhBMMEHMQp.pgp
Description: PGP signature
_______________________________________________ Qi Hardware Discussion List Mail to list (members only): [email protected] Subscribe or Unsubscribe: http://lists.en.qi-hardware.com/mailman/listinfo/discussion

