On Tue, Jul 28, 2020 at 22:31:51 +0200, Joerg Sonnenberger wrote: > On Tue, Jul 28, 2020 at 03:55:55PM -0400, Josef 'Jeff' Sipek wrote: > > On Tue, Jul 28, 2020 at 21:35:51 +0200, Joerg Sonnenberger wrote: > > > On Tue, Jul 28, 2020 at 03:02:46PM -0400, Josef 'Jeff' Sipek wrote: > > > > On Sun, Jul 26, 2020 at 18:26:51 +0200, Joerg Sonnenberger wrote: > > > > ... > > > > > I've attached basic benchmark numbers below. The asm variant is using > > > > > whatever my Threadripper supports in terms of low-level primitives, > > > > > e.g. > > > > > AVX2 and the SHA extension, either from OpenSSL (BLAKE2, SHA2, SHA3) > > > > > or > > > > > the reference implementations (K12, BLAKE3, BLAKE3*). Test case was > > > > > hashing a large file (~7GB). > > > > > > > > While these performance measurements are important, it is also > > > > important to > > > > make sure that older (or less "top of the line") hardware isn't > > > > completely > > > > terrible. For example, it is completely reasonable (at least IMO) to > > > > still > > > > use a Sandy Bridge-era CPUs. Likewise, it is reasonable to run hg on a > > > > embedded system (although those tend to have wimpy I/O as well). > > > > > > Yeah, that's why I included the C variant. > > > > I don't trust compilers *not* to do some massive amount of optimization > > unless they are told to target an older CPU. Also, newer CPUs like to do a > > lot of "magic" to speed things up and keep security researchers employed ;) > > The core of the hash functions isn't by itself very friendly to compiler > optimisations. It's more a case of how bad the automatic code generation > will be.
In general, yes you are correct. However, it is a big loop and a compiler can do a decent amount of pipelining. > > > Just to establish that baseline: > > > > > > SHA1 (asm) 4.8s > > > SHA1 (C) 10.7s > > > > > > So K12 is somewhat slower on a Threadripper, but should be somewhat > > > faster than hardware without specific acceleration. SHA support on the > > > Zen1 Threadripper is quite fast. > > > > I think we're in agreement. The new algo shouldn't be much worse than the > > existing SHA1. > > > > For the record: when the time comes, I'm willing to collect some hash perf > > data on slightly older/weaker hw as a sanity check. > > If you have a modern OpenSSL version, you can get the numbers for > sha256, sha3-256, blake2b512 and blake2s256 easily. K12 for non-vector > CPUs or short messages can be reasonable approximated as "half of > sha3-256 time" as that's the primary difference. BLAKE3 is the only one > that would be more tricky. I couldn't help myself and I ran it on 3 of my systems. The results are...interesting. Thinkpad T520 (2011 vintage Sandy Bridge i7) with FreeBSD 12: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha1 38552.35k 112781.34k 256808.02k 375173.76k 433724.00k 438686.07k sha256 26402.90k 67908.11k 131459.19k 171880.98k 188168.04k 189348.31k sha3-256 13828.62k 55332.70k 130879.40k 152721.79k 169003.69k 170909.80k blake2b512 23551.38k 94267.47k 240451.58k 311282.76k 342076.92k 343736.32k blake2s256 27319.90k 107783.29k 162617.17k 187314.17k 195500.87k 196625.70k Server (2009 vintage Xeon) running OmniOS: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha1 34924.79k 109450.16k 269382.74k 425380.18k 508193.55k 517892.78k sha256 25028.82k 67870.78k 142345.81k 195099.15k 218278.57k 220632.41k sha3-256 14281.78k 57115.29k 140363.97k 172130.99k 195810.65k 199223.98k blake2b512 24261.26k 101366.74k 286062.42k 433886.55k 514296.49k 519682.56k blake2s256 28521.81k 113023.49k 212133.77k 275259.33k 301708.63k 304365.57k Raspberry Pi 1B+ running FreeBSD 12: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha1 1084.15k 3711.04k 10173.97k 18059.62k 23704.69k 23738.60k sha256 942.90k 3105.90k 7843.36k 12770.46k 16318.84k 16399.04k sha3-256 456.77k 1805.49k 4432.56k 5607.01k 6652.73k 6715.47k blake2b512 173.38k 694.93k 1646.80k 2107.62k 2320.81k 2328.24k blake2s256 861.31k 3420.77k 7963.73k 12459.78k 15746.41k 15978.57k The difference between between blake2b512 and blake2s256 on the Raspberry Pi is huge. Overall, blake2s256 seems to be on par or better than sha256. blake2b512 is pretty good on the 64-bit systems, but completely sucks on the Pi. Picking the right algo will be "fun". Anyway, sorry for the "distaction" and thanks again for working on this. Jeff. -- Ready; T=0.01/0.01 17:29:33 _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel