> But getting bn_mul_mont_fpu working on T1 is *not* the goal, because > performance would be *horrible* (1/10th or worth). Idea implemented in > updated sparcv9cap.c is to use this SIGBUS to heuristically detect T1 > and to disable FP code in favor of pure IALU bn_mul_mont_int... > > ... But wait... The fact that I remember 1/10th coefficient must mean > that sparcv9a-mont did work under Solaris on T1. Question is how. > Chances are that Solaris kernel transparently fixes the ldda unaligned > access in trap handler. Meaning that *if/when* Linux chooses to do the > same, the above mentioned heuristic test will fail to detect T1...
As it turned out 16-bit ldda is emulated by Solaris kernel [but apparently not Linux one]. Secondly [and most importantly] 16-bit ldda is documented to be implemented in hardware by UltraSPARC T2, meaning that test in question will fail on T2. But bn_mul_mont_fpu performance is suboptimal even on T2, so the procedure should detect it too, not only T1... I've examined glibc code responsible for printing AT_HWCAP vector (with earlier suggested 'env LD_SHOW_AUXV=1 /bin/true'). There is _dl_auxv vector filled by kernel/fs/binfmt_elf.c, but it's totally private to ld-linux.so.2 and not accessible to me... As result I've chosen to settle for instrumentation of pair of VIS1 instructions to detect Tx. See http://cvs.openssl.org/chngview?cn=19738 for further details. A. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [email protected] Automated List Manager [email protected]
