Thanks for the comments gents. This is the type of discussion I was aiming at.
On 13 January 2017 at 22:45, Timothy Arceri <timothy.arc...@collabora.com> wrote: > On Fri, 2017-01-13 at 14:32 -0800, Jason Ekstrand wrote: >> On Fri, Jan 13, 2017 at 2:18 PM, Timothy Arceri <timothy.arceri@colla >> bora.com> wrote: >> > On Fri, 2017-01-13 at 13:59 -0800, Jason Ekstrand wrote: >> > > On Fri, Jan 13, 2017 at 11:22 AM, Vladislav Egorov <vegorov180@gm >> > ail. >> > > com> wrote: >> > > > 13.01.2017 19:51, Emil Velikov пишет: >> > > > > From: Emil Velikov <emil.veli...@collabora.com> >> > > > > >> > > > > At the moment we support 5+ different implementations each >> > with >> > > > > varying >> > > > > amount of bugs - from thread safely problems [1], to outright >> > > > > broken >> > > > > implementation(s) [2] >> > > > > >> > > > > In order to accommodate these we have 150+ lines of configure >> > > > > script and >> > > > > extra two configure toggles. Whist an actual implementation >> > being >> > > > > ~200loc and our current compat wrapping ~250. >> > > >> > > Yes, this is a problem. Especially given that at least one of >> > those >> > > implementations (openssl?) is something that a certain major game >> > > distributor likes to hard-link into things causing interesting >> > and >> > > hard-to-debug problems. I am all for getting rid of the "piles >> > of >> > > different dependencies" approach. >> > > >> > > Also, something I would like to see (maybe a follow-on patch?) >> > would >> > > a change to the mesa internal API to be able to put the SHA >> > context >> > > on the stack and not need to malloc it. It's not really a memory >> > or >> > > cycle-saving thing so much as it leaves one fewer cleanup paths >> > you >> > > have to worry about. >> > > >> > > > > Let's not forget that different people use different code >> > paths, >> > > > > thus >> > > > > effectively makes it harder to test and debug since the >> > default >> > > > > implementation is automatically detected. >> > > > > >> > > > > To minimise all these lovely experiences, import the "100% >> > Public >> > > > > Domain" OpenBSD sha1 implementation. Clearly document any >> > changes >> > > > > needed >> > > > > to get building correctly, since many/most of those can be >> > > > > upstreamed >> > > > > making future syncs easier. >> > > > > >> > > > >> > > > It can hurt performance. OpenSSL implementation is optimized >> > for >> > > > all thinkable architectures and it will use hardware SHA-1 >> > > > instructions on newer CPUs. From https://github.com/openssl/ope >> > nssl >> > > > /blob/master/crypto/sha/asm/sha1-x86_64.pl : >> > > > >> > > > > Current performance is summarized in following table. Numbers >> > are >> > > > > CPU clock cycles spent to process single byte (less is >> > better). >> > > > > >> > > > > x86_64 SSSE3 AVX[2] >> > > > > P4 9.05 - >> > > > > Opteron 6.26 - >> > > > > Core2 6.55 6.05/+8% - >> > > > > Westmere 6.73 5.30/+27% - >> > > > > Sandy Bridge 7.70 6.10/+26% 4.99/+54% >> > > > > Ivy Bridge 6.06 4.67/+30% 4.60/+32% >> > > > > Haswell 5.45 4.15/+31% 3.57/+53% >> > > > > Skylake 5.18 4.06/+28% 3.54/+46% >> > > > > Bulldozer 9.11 5.95/+53% >> > > > > VIA Nano 9.32 7.15/+30% >> > > > > Atom 10.3 9.17/+12% >> > > > > Silvermont 13.1(*) 9.37/+40% >> > > > > Goldmont 8.13 6.42/+27% 1.70/+380%(**) >> > > > >> > > > Quick benchmark on my Haswell of the OpenBSD implementation >> > > > compiled with GCC5 -O2: ~8 cycles per byte on 32-bit, ~7 cycles >> > per >> > > > byte on 64-bit. But Haswell is a very powerful CPU, on weaker >> > CPUs >> > > > the difference would be probably larger, especially on new CPUs >> > > > that have SHA instruction set. >> > > >> > > Thanks for the numbers. It sounds like, on Haswell, the openSSL >> > > implementation is about 2x as fast which is very useful to know. >> > > However, this isn't on a super perf-critical path. We never use >> > SHA1 >> > > on any draw-time paths; we always use a simpler hash function in >> > > those cases and reserve SHA1 for when we really don't want >> > > collisions. >> > >> > Actually the OpenGL shader cache uses it a draw time to find cached >> > variants. I looked at pulling an implementation into Mesa a while >> > ago >> > but found the perf drop wasn't worth it. >> >> Why doesn't the usual in-memory cache stand as a front-line defense? > > It does :) > >> Could you please be more specific about the perf implications >> you've seen? > > I'm asking for a chance to test before we jump in, its probably not a > big deal and I may even still be able to reduce my use of hashing but > it would be nice to be given a few days to test and even explore > alternatives before jumping on this implementation. > >> Also, which implementation were you linking to that was so much >> faster? > > I didn't test the OpenBSD implementation I tried another small > implementation that claimed it was fast. Pretty much any of the > available libraries were much faster as you would expect from something > that has been tweaked over the years. > Agreed. I've picked the OpenBSD since it was mentioned as compatible wrt licensing. If anyone has some tests/benchmarks and/or other implementations in mind please give them a spin and let us know of the numbers. I really don't mind if we use X over Y, as long as we can minimise the 'fun' experiences mentioned. >> >> > I really like the idea of having an internal implementation but I >> > don't >> > think we should dismiss performance so quickly it would be nice if >> > we >> > could hold this off until more testing can be done. >> > >> > > That said, it's a bit more critical than Emil makes it sound. >> > A >> > > typical Vulkan application may easily create 10k pipelines and >> > each Looks like I was an order of magnitude off in my assumption here. Thanks for the correction. >> > > of those will involve hashing at least about 100B of data (not >> > > include the SPIR-V source). I doubt, however, that this is >> > enough to >> > > really cause a problem given how much other work goes into >> > building a >> > > pipeline. >> > > >> > > Unfortunately, the OpenSSL implementation, while fast, is one of >> > the >> > > ones that is causing problems. One of our favorite game >> > distributors >> > > likes to hard-link against openssl in some of their games and/or >> > > libraries (not sure which). This means that, if mesa tries to >> > > dynamically open libssl, you get mysterious crashes due to slight >> > > differences between the system-installed version and the one that >> > has >> > > been linked into the game. This makes trying to use the OpenSSL >> > > implementation a non-starter without being able to wholesale >> > import >> > > the implementation. >> > > Sounds like exactly what I feared as we got the code merged. Hope we'll get to resolve this before too many people get hit by it. >> > > Emil, I'm fine with this change. I haven't reviewed the details, >> > but >> > > my gut tells me we can eat the perf difference for now. Consider >> > > that an Acked-by if you'd like but it would be good to have >> > someone >> > > review at least the build system stuff. Tyvm. Thanks Emil _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev