On 9/5/2015 2:50 PM, Leyne, Sean wrote:
> Jim and Boris,
>
>> Something you may want to investigate is replacing the "pure C"
>> implementation of ChaCha20 with the rotate step replaced with either a
>> compiler intrinsic (Microsoft) or a bit of assembler (gcc).  SHA1 has
>> the same issue.  I haven't a clue as to why popular crypto algoritms
>> use a rotate, virtually all microprocessors have rotate instructions,
>> but C lacks a rotate operator and the standard libraries neglect to support 
>> it.
> Forgive my naïve point of view, but given that AES instruction set has been 
> built into AMD and Intel CPUs since 2011, why do you feel that it is 
> necessary to push for ChaCha20***?

The goal is to make encryption so cheap that it becomes standard 
(without even an option to turn it off).    But first we need to 
understand the performance issues.  For example, there is a big 
difference between AES encryption and decryption speeds, particularly 
with CBC mode.  It may have only to do with the additional bit juggling 
necessary to implement CBC, but it's still something that we need to 
understand to evaluate the costs tradeoffs.  ChaCha20 is symmetric, so 
the encryption and decryption costs are the same.

ChaCha20 -- and virtually every other stream cipher -- is easier to use 
than a block cipher with AES, and especially so if there is a 
possibility of messages under the block size (16 bytes for AES). And 
having to add complexity to the protocols to handle explicit message 
padding adds to the cost, even when it isn't necessary for larger packets.

Sure, it would be nice if AES-NI made it the clear performance winner 
across the board, but I don't think that will prove to the case.  It 
might, though...
>
> To my reading, Boris' numbers have shown that AES performance is more than 
> adequate (53.2 AVG seconds to process 256MB = 4+MB/s).
>
> Further, considering that the use can is the encryption of data blocks which 
> would be much smaller than even 1MB, will be performance difference really be 
> noticeable?

If the delta cost for encryption is significant, then it pretty much 
needs to be made optional, which necessarily increases complexity and 
further reduces performance.   I think I explained the lessons from the 
bit-blt chip in the Sun 3-50.  The abbreviated version is that while the 
blt-bit chip was much faster for large operations, most operations were 
small, single character cell transfers, and software had an edge over 
the hardware.  But putting in a test for operation size further slowed 
down small operations, tilting the net gain/loss deeper in the loss 
territory.

In other words, number in situ count more than abstract performance studies.
>
>
> Sean
>
> *** Separately, with Intel HyperThreaded CPUs and considering that AES in 
> "on-chip" wouldn't that allow the core processing the encryption to shift to 
> focus on the other thread instruction while the first thread wait for the on 
> chip AES processor operates?  In other words, isn't it possible that ChaCha20 
> is only faster when CPUs are being "single minded" and that real world 
> performance on a server dealing with several tasks might favor CPUs with 
> native AES instructions?

I don't think so, but I haven't been able to find a definitive answer.  
My understanding, at it may be wrong, is that the AES-NI instructions 
aren't really hardware but just very complex microcoded instructions 
that operate off internal processor registers, so threading doesn't come 
into it.

By the way, my measurements at NuoDB with hyper-threading was that is 
was worth a plug nickle -- almost no measurable difference between real 
threads and hyperthreads.  I'm sure Intel has benchmarks that show 
otherwise, but unless you have benchmarks on your own code that show 
something else, I wouldn't make any assumptions of performance benefits 
of hyperthreading.

Intel's focus on multi-threading tends to be on parallelizing existing 
single threaded code across cores.  In this case, hyperthreading might 
actually work.  But when you have many more client threads than cores, 
which is the case for database systems using a thread-per-client model, 
you will get quite different results.  At NuoDB, switching from a thread 
per client model to a (more or less) fixed pool of worker threads caused 
the performance to take off like a rocket.  But in that model, kicking 
the worker thread pool up the the number hyper-threads just increases 
the number of stalled threads contending with running threads for the 
same resources.   Not good.

But in any case, Sean, learning about what's actually happening does 
have merit.  Surely you aren't against knowledge, are you?

>
>
>> Here are numbers:
>> ----------------------------------------------------------------------
>> ------- AES, BOTAN based code, with AES-NI instruction set all     enc
>> ------------
>> 531.1    53.2
>>
>> ----------------------------------------------------------------------
>>
>> AES, INTEL based code, with AES-NI instruction set all     enc
>> ------------
>> 544.8    76.6
>>
>>
>> ----------------------------------------------------------------------
>> AES, code based on Bouncy Castle (Java)  , without AES-NI instruction set
>>    all    enc
>> ------------
>> 2071.8 1620.6
>>
>>
>> ----------------------------------------------------------------------
>> ChaCha20, code based on Bouncy Castle (Java)
>> ------------
>> 1712.7 1234.8
>
> ------------------------------------------------------------------------------
> Firebird-Devel mailing list, web interface at 
> https://lists.sourceforge.net/lists/listinfo/firebird-devel


------------------------------------------------------------------------------
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to