Hi! Following is far from what I'd consider as a good programming
practice... First a trivial table everybody would recognize:
Intel,SPARCv8,MIPS 32 Alpha,SPARCv9,MIPS 64
sizeof(int) 4 4
sizeof(long) 4 8
hereafter ILP32 hereafter LP64
No surprises for far... But pay attention now!
>From crypto/bf/blowfish.h:
> /* If you make this 'unsigned int' the pointer variants will work on
> * the Alpha, otherwise they will not. Strangly using the '8 byte'
> * BF_LONG and the default 'non-pointer' inner loop is the best configuration
> * for the Alpha */
> #if defined(__sgi)
> # if (_MIPS_SZLONG==64)
> # define BF_LONG unsigned int
> # else
> # define BF_LONG unsigned long
> # endif
> #else
> # define BF_LONG unsigned long
> #endif
i.e. basically we always get long (mind the ILP32/LP64!) with an
exception
for IRIX where we *always* get int. And then next line in the same file:
> #define BF_M 0x3fc
i.e. *two* least-significant bits are cleared.
Now let's examine crypto/bf/bf_locl.h:
> /* use BF_PTR2 for intel boxes,
> * BF_PTR for sparc and MIPS/SGI
> * use nothing for Alpha and HP.
> */
> #if !defined(BF_PTR) && !defined(BF_PTR2)
> #define BF_PTR
> #endif
i.e. the comment is effectively *disregarded*, isn't it? Right! Somebody
carefully thinks it trough, somebody else gives a damn, everybody
wonders where is the bug... When do we start *listen* to each other???
Never mind...
Later on:
> #elif defined(BF_PTR)
>
> /* This is normally very good */
>
> #define BF_ENC(LL,R,S,P) \
> LL^=P; \
> LL^= (((*(BF_LONG *)((unsigned char *)&(S[ 0])+((R>>BF_0)&BF_M))+ \
> *(BF_LONG *)((unsigned char *)&(S[256])+((R>>BF_1)&BF_M)))^ \
> *(BF_LONG *)((unsigned char *)&(S[512])+((R>>BF_2)&BF_M)))+ \
> *(BF_LONG *)((unsigned char *)&(S[768])+((R<<BF_3)&BF_M)));
Observe &BF_M! Remember that it had *two* LSB cleared? What does it
mean? It means that on LP64 it's going to generate misaligned access!
Indeed, two bits means 32-bit alignment, doesn't it? Boom! The code
dumps the core with BUS ERROR. Well, probably not on Alpha (I don't have
one to test, yet...). But first I want to point out that misaligned
access is actually a minor problem here! All the shifts and masks are
chosen in assumption that S is an array of 32 bits values!!! No wonder
it doesn't work on LP64s...
> #else
>
> /* This will always work, even on 64 bit machines and strangly enough,
> * on the Alpha it is faster than the pointer versions (both 32 and 64
> * versions of BF_LONG) */
Well, Alpha does have misaligned access instructions that are known
for *hurting* performance of memory access bound algorithms like this
one!
So for me it's no miracle that below goes (or rather "went" once upon a
time when the code was actually compiled without BF_PTR nor BR_PTR2)
faster on Alpha.
>
> #define BF_ENC(LL,R,S,P) \
> LL^=P; \
> LL^=((( S[ (int)(R>>24L) ] + \
> S[0x0100+((int)(R>>16L)&0xff)])^ \
> S[0x0200+((int)(R>> 8L)&0xff)])+ \
> S[0x0300+((int)(R )&0xff)])&0xffffffffL;
> #endif
Ways to solve the puzzle:
- force BF_LONG to unsigned int on *all* platforms;
- pick BF_* depending on sizeof(long);
I myself would vote for the first alternative unless someone can either:
- confirm that *(unligned long *)((unsigned char *)p+i&~7) generates the
unwanted unaligned load instruction on Alpha;
- confirm that the library compiles and works under MS-DOS and the
development team *cares* about it;
- confirm that sizeof(int) is 16 under MacOS (the way I remember you can
choose it, at least for the CodeWarrior family, correct me if I'm
wrong);
Tomorrow I'll try to get hold of some Alpha to check things out on and
post a proposal patch. But for today so long, fellows... Andy.
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [EMAIL PROTECTED]
Automated List Manager [EMAIL PROTECTED]