Re: [Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

Emil Velikov Fri, 13 Jan 2017 16:56:03 -0800

Thanks for the comments gents.

This is the type of discussion I was aiming at.


On 13 January 2017 at 22:45, Timothy Arceri
<timothy.arc...@collabora.com> wrote:
> On Fri, 2017-01-13 at 14:32 -0800, Jason Ekstrand wrote:
>> On Fri, Jan 13, 2017 at 2:18 PM, Timothy Arceri <timothy.arceri@colla
>> bora.com> wrote:
>> > On Fri, 2017-01-13 at 13:59 -0800, Jason Ekstrand wrote:
>> > > On Fri, Jan 13, 2017 at 11:22 AM, Vladislav Egorov <vegorov180@gm
>> > ail.
>> > > com> wrote:
>> > > > 13.01.2017 19:51, Emil Velikov пишет:
>> > > > > From: Emil Velikov <emil.veli...@collabora.com>
>> > > > >
>> > > > > At the moment we support 5+ different implementations each
>> > with
>> > > > > varying
>> > > > > amount of bugs - from thread safely problems [1], to outright
>> > > > > broken
>> > > > > implementation(s) [2]
>> > > > >
>> > > > > In order to accommodate these we have 150+ lines of configure
>> > > > > script and
>> > > > > extra two configure toggles. Whist an actual implementation
>> > being
>> > > > > ~200loc and our current compat wrapping ~250.
>> > >
>> > > Yes, this is a problem.  Especially given that at least one of
>> > those
>> > > implementations (openssl?) is something that a certain major game
>> > > distributor likes to hard-link into things causing interesting
>> > and
>> > > hard-to-debug problems.  I am all for getting rid of the "piles
>> > of
>> > > different dependencies" approach.
>> > >
>> > > Also, something I would like to see (maybe a follow-on patch?)
>> > would
>> > > a change to the mesa internal API to be able to put the SHA
>> > context
>> > > on the stack and not need to malloc it.  It's not really a memory
>> > or
>> > > cycle-saving thing so much as it leaves one fewer cleanup paths
>> > you
>> > > have to worry about.
>> > >
>> > > > > Let's not forget that different people use different code
>> > paths,
>> > > > > thus
>> > > > > effectively makes it harder to test and debug since the
>> > default
>> > > > > implementation is automatically detected.
>> > > > >
>> > > > > To minimise all these lovely experiences, import the "100%
>> > Public
>> > > > > Domain" OpenBSD sha1 implementation. Clearly document any
>> > changes
>> > > > > needed
>> > > > > to get building correctly, since many/most of those can be
>> > > > > upstreamed
>> > > > > making future syncs easier.
>> > > > >
>> > > >
>> > > > It can hurt performance. OpenSSL implementation is optimized
>> > for
>> > > > all thinkable architectures and it will use hardware SHA-1
>> > > > instructions on newer CPUs. From https://github.com/openssl/ope
>> > nssl
>> > > > /blob/master/crypto/sha/asm/sha1-x86_64.pl :
>> > > >
>> > > > > Current performance is summarized in following table. Numbers
>> > are
>> > > > > CPU clock cycles spent to process single byte (less is
>> > better).
>> > > > >
>> > > > >        x86_64        SSSE3        AVX[2]
>> > > > > P4        9.05        -
>> > > > > Opteron    6.26        -
>> > > > > Core2        6.55        6.05/+8%    -
>> > > > > Westmere    6.73        5.30/+27%    -
>> > > > > Sandy Bridge    7.70        6.10/+26%    4.99/+54%
>> > > > > Ivy Bridge    6.06        4.67/+30%    4.60/+32%
>> > > > > Haswell    5.45        4.15/+31%    3.57/+53%
>> > > > > Skylake    5.18        4.06/+28%    3.54/+46%
>> > > > > Bulldozer    9.11        5.95/+53%
>> > > > > VIA Nano    9.32        7.15/+30%
>> > > > > Atom        10.3        9.17/+12%
>> > > > > Silvermont    13.1(*)        9.37/+40%
>> > > > > Goldmont    8.13        6.42/+27%    1.70/+380%(**)
>> > > >
>> > > > Quick benchmark on my Haswell of the OpenBSD implementation
>> > > > compiled with GCC5 -O2: ~8 cycles per byte on 32-bit, ~7 cycles
>> > per
>> > > > byte on 64-bit. But Haswell is a very powerful CPU, on weaker
>> > CPUs
>> > > > the difference would be probably larger, especially on new CPUs
>> > > > that have SHA instruction set.
>> > >
>> > > Thanks for the numbers.  It sounds like, on Haswell, the openSSL
>> > > implementation is about 2x as fast which is very useful to know.
>> > > However, this isn't on a super perf-critical path.  We never use
>> > SHA1
>> > > on any draw-time paths; we always use a simpler hash function in
>> > > those cases and reserve SHA1 for when we really don't want
>> > > collisions.
>> >
>> > Actually the OpenGL shader cache uses it a draw time to find cached
>> > variants. I looked at pulling an implementation into Mesa a while
>> > ago
>> > but found the perf drop wasn't worth it.
>>
>> Why doesn't the usual in-memory cache stand as a front-line defense?
>
> It does :)
>
>>   Could you please be more specific about the perf implications
>> you've seen?
>
> I'm asking for a chance to test before we jump in, its probably not a
> big deal and I may even still be able to reduce my use of hashing but
> it would be nice to be given a few days to test and even explore
> alternatives before jumping on this implementation.
>
>>  Also, which implementation were you linking to that was so much
>> faster?
>
> I didn't test the OpenBSD implementation I tried another small
> implementation that claimed it was fast. Pretty much any of the
> available libraries were much faster as you would expect from something
> that has been tweaked over the years.
>
Agreed. I've picked the OpenBSD since it was mentioned as compatible
wrt licensing.
If anyone has some tests/benchmarks and/or other implementations in
mind please give them a spin and let us know of the numbers. I really
don't mind if we use X over Y, as long as we can minimise the 'fun'
experiences mentioned.

>>
>> > I really like the idea of having an internal implementation but I
>> > don't
>> > think we should dismiss performance so quickly it would be nice if
>> > we
>> > could hold this off until more testing can be done.
>> >
>> > >   That said, it's a bit more critical than Emil makes it sound.
>> > A
>> > > typical Vulkan application may easily create 10k pipelines and
>> > each
Looks like I was an order of magnitude off in my assumption here.
Thanks for the correction.

>> > > of those will involve hashing at least about 100B of data (not
>> > > include the SPIR-V source).  I doubt, however, that this is
>> > enough to
>> > > really cause a problem given how much other work goes into
>> > building a
>> > > pipeline.
>> > >
>> > > Unfortunately, the OpenSSL implementation, while fast, is one of
>> > the
>> > > ones that is causing problems.  One of our favorite game
>> > distributors
>> > > likes to hard-link against openssl in some of their games and/or
>> > > libraries (not sure which).  This means that, if mesa tries to
>> > > dynamically open libssl, you get mysterious crashes due to slight
>> > > differences between the system-installed version and the one that
>> > has
>> > > been linked into the game.  This makes trying to use the OpenSSL
>> > > implementation a non-starter without being able to wholesale
>> > import
>> > > the implementation.
>> > >
Sounds like exactly what I feared as we got the code merged. Hope
we'll get to resolve this before too many people get hit by it.

>> > > Emil, I'm fine with this change.  I haven't reviewed the details,
>> > but
>> > > my gut tells me we can eat the perf difference for now.  Consider
>> > > that an Acked-by if you'd like but it would be good to have
>> > someone
>> > > review at least the build system stuff.
Tyvm.

Thanks
Emil
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] util: import sha1 implementation from OpenBSD

Reply via email to