First of all I'd dare to assert that results in originating message are
"tainted" by varying oscillator frequency. If you want to obtain
reliable results you should perform benchmarks at fixed oscillator
frequency. Note that it's not sufficient to configure OS to run at fixed
frequency, e.g. by setting cpufreq/scaling_governor to 'performance' on
Linux, you have to disable even TurboBoost in BIOS. Only then one can
meaningfully compare results.

>> But I'm curious, why there is such a drop in performance of asm code and
> 
> In C case unrolled loop is entered for lengths of 8 bytes and beyond. In
> assembler optimized loop is engaged for lengths larger than 32...
> 
>> what can be done to address that issue?
> 
> As Peter implied the question is if it's worth the effort. Say you
> improve small block performance by 60%. But if the operation in question
> takes only 10%, then netto effect would by 6%. Well, I can have a look,
> but please don't hold your breath.

SSH is surely the protocol that would exhibit shortest packets, upon
single character entry, right? As the whole thing is encrypted (i.e.
there is no clear-text data in SSH stream, right?) shortest observable
packet is directly representative in the context. Now, even shortest
packet seems to be larger than 16, the least I could observe is 48. So
that comparison at 16 bytes block size is probably not as relevant. 48
is larger than [above mentioned] 32 and so there hardly is any reason to
put effort into optimizing for shorter inputs.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to