Hi, Thank you for your quick feedback, I added some comments, Vincent On Thu, Oct 14, 2010 at 11:53 AM, Andy Polyakov <[email protected]> wrote:
> Hi, > > There will be more comments later. > > > - for SHA1(x2) /SHA256(+40%), not such to say. The SHA256 gain is > > limited due to the low register count of the SH4 > > It was not my intention to make you implement SHA256. But since you've > chosen to do it here it goes. > => Of course, but since SHA1 get really old (~2^51), SHA256 get required for a number of needs. > > 1. The file should have been called sha256-sh4.pl, not sha512-sh4.pl. > MIPS, as well as number of other modules, are called sha512-*, because > they either generate *both* SHA512 and SHA256 or SHA512 alone. Modules > that can't generate SHA512 code should *not* be called sha512-*:-) > > => I fully agree. The goal is to avoid to patch the Makefile. Currently, there is a sha512-%.o rule to create the sha256 code, but no sha256-%.o rule. Correcting that only imply renaming the .pl file and adding a rule in the Makefile, that I wanted to avoid to minimize the patch. 2. Why do you use tables of small constants? There is 'mov #imm,Rn' > instruction, where #imm is 8-bit signed value. Works for all [Ss]igma > constants. As for mask_ff. There is extu.b that does &0xff... > > This is very important for the sh4 serie 200 pipeline: there is only one ALU pipe, so you have to use load/store for optimization. I will take a deeper look on the extx ins usage. 3. Position independence is still problem. > > > - In SH4 asm, the MOVA is hidden behind a normal mov.l without a base > > register, so in fact, it is used very often. > > Can't confirm this. Well, I can see now that it extensively uses 'mov.l > @(disp,PC),Rn' for loading constants, but no mova... I.e. following is > position-independent: > > mov.l label,rx > > label: > .long xxxx > > Only[!] as long as xxxx is *not* another label, in which case a > relocation record is generated voiding position independence. > > => I know, nevertheless, if you compile code on SH4 with gcc with -PIC, then the same apply, so currently, I don't see the point to make real PIC code on this CPU. I will study some more anyway on that. Note: the mova ins is a ALU ins for this CPU. > The issue is that the possible offset (256 words forward only from the > > mov.l) is very small, so it is just not possible to use direct access > > most of the time. > > This is normal for this CPU. > > Right. So position-independent way to pull K256 address is something > like following: > > mova K256,r0 > bra skip > nop > > .align 4 > K256: > .long .... > > skip: > > Needless to mention that 'skip' can be something that is already there, > e.g. sha256asm_loop0. > > Looping is not position-independent either, not in SHA1 nor SHA256. > Position-independent way (for loops larger than 4KB) is following: > > loop_start: > ... > mov.l loop_size,rx > braf rx > nop > loop_stop: > .align 4 > loop_size: > .long loop_start-loop_stop > > More to come... A. > ______________________________________________________________________ > OpenSSL Project http://www.openssl.org > Development Mailing List [email protected] > Automated List Manager [email protected] >
