Andy Polyakov wrote: | [...] | H-m-m-m... It's not like I just wrote the note off the top of my head... | I actually benchmarked 9% improvement with off-by-2 shifts on P4 | workstation available in *my* disposal... Two possibilities: 1) they've | changed something between steppings and we have different steppings, 2) | I've benchmarked an intermediate version [most likely prior folding | reference to %esi in last steps]... This is exactly the kind of thing I | hate about x86: it's virtually impossible to figure out in advance how | it turns out in the very end... Well, I suppose I have to beat the | retreat:-) Which leaves the question about why RC4_INT code was | performing so poorly on P4 opened...
Which is one of the reasons why I just love developping on AMD64: you don't have to care about obsolete CPUs. But, yes, the situation will change in the next few years, when many generations of AMD64 CPUs from multiple vendors will be on the market :) | PIC code must be getting bound to memory interface on P4, which | is why OpenSSL gains nothing between P4-2 and P4-3... Like you said in an earlier mail, producing PIC code on AMD64 should pose no problem at all. I am eager to start working on AES for this arch. -- Marc Bevand http://epita.fr/~bevand_m Computer Science School EPITA - System, Network and Security Dept. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [email protected] Automated List Manager [EMAIL PROTECTED]
