SHA1 acceleration for SH4 and MIPS32

Andy Polyakov Fri, 17 Sep 2010 14:29:45 -0700

Hi,

> Please find attached a BN/AES/SHA1 asm implementation for SH4 and
> MIPS32 little endian systems (common CPUs in SoC). This ASM code have
> been done with the help of the great Andy Polyakov's framework for
> the AES and SHA1 functions.
> 
> The gain is almost 2x on SH4 with scheduling optimization, and 33/40%
> on MIPS with all register usage (compared to GCC generated code for
> RSA/AES/SHA1, the very standard ciphers for SSL connections). This
> gain is visible on low power CPU systems.
> 
> A lot of tests have been run with this patch on 0.9.8o and 1.0.0a
> openssl trees, so I think I can propose it for the openssl main tree.
>  Review and feedback is welcome.


While the effort is absolutely worthy and admirable, the submission
can't be accepted *as is* for following reasons.

1. The protocol is to develop things in development HEAD branch and
back-port them to production branches if appropriate. It's inappropriate
to target 0.9.8 or 1.0.0. This is because there might be something
implemented already in HEAD (there is sha1-mips.pl!), or there might be
simpler way to solve the problem (most notably Montgomery multiplication
module is simpler to program and gives better or adequate result).

2. perlasm/*-xlate.pl are about supporting multiple assembler flavors,
not about inventing own one. I mean I can't find appropriate to e.g.
write 'xor %x,%y' when the real syntax is 'xor $y,$x'. Assembler is also
language and as such is a mean of communication with implied rules.
Basically you should be able to look at code and manual and they should
not contradict. Or in other words unfiltered output should be compilable
by a real assembler. Not to mention that filter should make sense [as
opposite to tweaking something that was designed for *completely*
different purpose].

3. Sheer performance is not always the goal. Most notably it makes more
sense to program AES with compressed tables to mitigate timing attacks,
especially on slower CPUs, such as embedded ones.

4. Explore possibilities to improve performance in C with *few-line*
inline assembler. Examine compiler-generated code and see if it uses
instructions you would use. See http://cvs.openssl.org/chngview?cn=19092
for example.

Things that apply mostly to MIPS.

5. Keep in mind that there are several MIPS ABIs in use. Try to organize
code in manner that allows using the code with these multiple ABIs. See
ppc modules for example. Well, mips3 modules do it, but in IRIX-specific
way. Let's find a way to adapt it for Linux... That's where xlate should
become handy.

6. Reuse mips3 code, most notably misp3-mont. It should be possible to
make it work in MIPS32 by replacing d[multu|addu|subu] with
[multu|addu]. Again, see ppc modules for example...

So what happens next? I'd suggest to take smaller steps at a time. Let's
start with figuring out what it takes to adapt sha1-mips.pl from HEAD
for Linux... Cheers. A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Re: [PATCH] Openssl asm BN/AES/SHA1 acceleration for SH4 and MIPS32

Reply via email to