Hi, > Please find attached a BN/AES/SHA1 asm implementation for SH4 and > MIPS32 little endian systems (common CPUs in SoC). This ASM code have > been done with the help of the great Andy Polyakov's framework for > the AES and SHA1 functions. > > The gain is almost 2x on SH4 with scheduling optimization, and 33/40% > on MIPS with all register usage (compared to GCC generated code for > RSA/AES/SHA1, the very standard ciphers for SSL connections). This > gain is visible on low power CPU systems. > > A lot of tests have been run with this patch on 0.9.8o and 1.0.0a > openssl trees, so I think I can propose it for the openssl main tree. > Review and feedback is welcome.
While the effort is absolutely worthy and admirable, the submission can't be accepted *as is* for following reasons. 1. The protocol is to develop things in development HEAD branch and back-port them to production branches if appropriate. It's inappropriate to target 0.9.8 or 1.0.0. This is because there might be something implemented already in HEAD (there is sha1-mips.pl!), or there might be simpler way to solve the problem (most notably Montgomery multiplication module is simpler to program and gives better or adequate result). 2. perlasm/*-xlate.pl are about supporting multiple assembler flavors, not about inventing own one. I mean I can't find appropriate to e.g. write 'xor %x,%y' when the real syntax is 'xor $y,$x'. Assembler is also language and as such is a mean of communication with implied rules. Basically you should be able to look at code and manual and they should not contradict. Or in other words unfiltered output should be compilable by a real assembler. Not to mention that filter should make sense [as opposite to tweaking something that was designed for *completely* different purpose]. 3. Sheer performance is not always the goal. Most notably it makes more sense to program AES with compressed tables to mitigate timing attacks, especially on slower CPUs, such as embedded ones. 4. Explore possibilities to improve performance in C with *few-line* inline assembler. Examine compiler-generated code and see if it uses instructions you would use. See http://cvs.openssl.org/chngview?cn=19092 for example. Things that apply mostly to MIPS. 5. Keep in mind that there are several MIPS ABIs in use. Try to organize code in manner that allows using the code with these multiple ABIs. See ppc modules for example. Well, mips3 modules do it, but in IRIX-specific way. Let's find a way to adapt it for Linux... That's where xlate should become handy. 6. Reuse mips3 code, most notably misp3-mont. It should be possible to make it work in MIPS32 by replacing d[multu|addu|subu] with [multu|addu]. Again, see ppc modules for example... So what happens next? I'd suggest to take smaller steps at a time. Let's start with figuring out what it takes to adapt sha1-mips.pl from HEAD for Linux... Cheers. A. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [email protected] Automated List Manager [email protected]
