On Tue, Jul 28, 2020 at 22:31:51 +0200, Joerg Sonnenberger wrote:
> On Tue, Jul 28, 2020 at 03:55:55PM -0400, Josef 'Jeff' Sipek wrote:
> > On Tue, Jul 28, 2020 at 21:35:51 +0200, Joerg Sonnenberger wrote:
> > > On Tue, Jul 28, 2020 at 03:02:46PM -0400, Josef 'Jeff' Sipek wrote:
> > > > On Sun, Jul 26, 2020 at 18:26:51 +0200, Joerg Sonnenberger wrote:
> > > > ...
> > > > > I've attached basic benchmark numbers below. The asm variant is using
> > > > > whatever my Threadripper supports in terms of low-level primitives, 
> > > > > e.g.
> > > > > AVX2 and the SHA extension, either from OpenSSL (BLAKE2, SHA2, SHA3) 
> > > > > or
> > > > > the reference implementations (K12, BLAKE3, BLAKE3*). Test case was
> > > > > hashing a large file (~7GB). 
> > > > 
> > > > While these performance measurements are important, it is also 
> > > > important to
> > > > make sure that older (or less "top of the line") hardware isn't 
> > > > completely
> > > > terrible.  For example, it is completely reasonable (at least IMO) to 
> > > > still
> > > > use a Sandy Bridge-era CPUs.  Likewise, it is reasonable to run hg on a
> > > > embedded system (although those tend to have wimpy I/O as well).
> > > 
> > > Yeah, that's why I included the C variant.
> > 
> > I don't trust compilers *not* to do some massive amount of optimization
> > unless they are told to target an older CPU.  Also, newer CPUs like to do a
> > lot of "magic" to speed things up and keep security researchers employed ;)
> 
> The core of the hash functions isn't by itself very friendly to compiler
> optimisations. It's more a case of how bad the automatic code generation
> will be.

In general, yes you are correct.  However, it is a big loop and a compiler
can do a decent amount of pipelining.

> > > Just to establish that baseline:
> > > 
> > > SHA1 (asm) 4.8s
> > > SHA1 (C)   10.7s
> > >
> > > So K12 is somewhat slower on a Threadripper, but should be somewhat
> > > faster than hardware without specific acceleration. SHA support on the
> > > Zen1 Threadripper is quite fast.
> > 
> > I think we're in agreement.  The new algo shouldn't be much worse than the
> > existing SHA1.
> > 
> > For the record: when the time comes, I'm willing to collect some hash perf
> > data on slightly older/weaker hw as a sanity check.
> 
> If you have a modern OpenSSL version, you can get the numbers for
> sha256, sha3-256, blake2b512 and blake2s256 easily. K12 for non-vector
> CPUs or short messages can be reasonable approximated as "half of
> sha3-256 time" as that's the primary difference. BLAKE3 is the only one
> that would be more tricky.

I couldn't help myself and I ran it on 3 of my systems.  The results
are...interesting.


Thinkpad T520 (2011 vintage Sandy Bridge i7) with FreeBSD 12:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  
16384 bytes
sha1             38552.35k   112781.34k   256808.02k   375173.76k   433724.00k  
 438686.07k
sha256           26402.90k    67908.11k   131459.19k   171880.98k   188168.04k  
 189348.31k
sha3-256         13828.62k    55332.70k   130879.40k   152721.79k   169003.69k  
 170909.80k
blake2b512       23551.38k    94267.47k   240451.58k   311282.76k   342076.92k  
 343736.32k
blake2s256       27319.90k   107783.29k   162617.17k   187314.17k   195500.87k  
 196625.70k

Server (2009 vintage Xeon) running OmniOS:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  
16384 bytes
sha1             34924.79k   109450.16k   269382.74k   425380.18k   508193.55k  
 517892.78k
sha256           25028.82k    67870.78k   142345.81k   195099.15k   218278.57k  
 220632.41k
sha3-256         14281.78k    57115.29k   140363.97k   172130.99k   195810.65k  
 199223.98k
blake2b512       24261.26k   101366.74k   286062.42k   433886.55k   514296.49k  
 519682.56k
blake2s256       28521.81k   113023.49k   212133.77k   275259.33k   301708.63k  
 304365.57k

Raspberry Pi 1B+ running FreeBSD 12:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  
16384 bytes
sha1              1084.15k     3711.04k    10173.97k    18059.62k    23704.69k  
  23738.60k
sha256             942.90k     3105.90k     7843.36k    12770.46k    16318.84k  
  16399.04k
sha3-256           456.77k     1805.49k     4432.56k     5607.01k     6652.73k  
   6715.47k
blake2b512         173.38k      694.93k     1646.80k     2107.62k     2320.81k  
   2328.24k
blake2s256         861.31k     3420.77k     7963.73k    12459.78k    15746.41k  
  15978.57k


The difference between between blake2b512 and blake2s256 on the Raspberry Pi
is huge.  Overall, blake2s256 seems to be on par or better than sha256.
blake2b512 is pretty good on the 64-bit systems, but completely sucks on the
Pi.  Picking the right algo will be "fun".

Anyway, sorry for the "distaction" and thanks again for working on this.

Jeff.

-- 
Ready; T=0.01/0.01 17:29:33
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Reply via email to