> But getting bn_mul_mont_fpu working on T1 is *not* the goal, because
> performance would be *horrible* (1/10th or worth). Idea implemented in
> updated sparcv9cap.c is to use this SIGBUS to heuristically detect T1
> and to disable FP code in favor of pure IALU bn_mul_mont_int...
> 
> ... But wait... The fact that I remember 1/10th coefficient must mean
> that sparcv9a-mont did work under Solaris on T1. Question is how.
> Chances are that Solaris kernel transparently fixes the ldda unaligned
> access in trap handler. Meaning that *if/when* Linux chooses to do the
> same, the above mentioned heuristic test will fail to detect T1...

As it turned out 16-bit ldda is emulated by Solaris kernel [but
apparently not Linux one]. Secondly [and most importantly] 16-bit ldda
is documented to be implemented in hardware by UltraSPARC T2, meaning
that test in question will fail on T2. But bn_mul_mont_fpu performance
is suboptimal even on T2, so the procedure should detect it too, not
only T1...

I've examined glibc code responsible for printing AT_HWCAP vector (with
earlier suggested 'env LD_SHOW_AUXV=1 /bin/true'). There is _dl_auxv
vector filled by kernel/fs/binfmt_elf.c, but it's totally private to
ld-linux.so.2 and not accessible to me...

As result I've chosen to settle for instrumentation of pair of VIS1
instructions to detect Tx. See http://cvs.openssl.org/chngview?cn=19738
for further details. A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Reply via email to