First of all I'd dare to assert that results in originating message are "tainted" by varying oscillator frequency. If you want to obtain reliable results you should perform benchmarks at fixed oscillator frequency. Note that it's not sufficient to configure OS to run at fixed frequency, e.g. by setting cpufreq/scaling_governor to 'performance' on Linux, you have to disable even TurboBoost in BIOS. Only then one can meaningfully compare results.
>> But I'm curious, why there is such a drop in performance of asm code and > > In C case unrolled loop is entered for lengths of 8 bytes and beyond. In > assembler optimized loop is engaged for lengths larger than 32... > >> what can be done to address that issue? > > As Peter implied the question is if it's worth the effort. Say you > improve small block performance by 60%. But if the operation in question > takes only 10%, then netto effect would by 6%. Well, I can have a look, > but please don't hold your breath. SSH is surely the protocol that would exhibit shortest packets, upon single character entry, right? As the whole thing is encrypted (i.e. there is no clear-text data in SSH stream, right?) shortest observable packet is directly representative in the context. Now, even shortest packet seems to be larger than 16, the least I could observe is 48. So that comparison at 16 bytes block size is probably not as relevant. 48 is larger than [above mentioned] 32 and so there hardly is any reason to put effort into optimizing for shorter inputs. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org