I made a patch that optimizes SHA functions on S390x architecture. the
patch implements the optimized cores using cipher instructions that have
been added to s390x arch in message security assist extensions. The patch
uses the following functions:

KIMD-SHA-1, KLMD-SHA-1 (SHA1)
KIMD-SHA-256, KLMD-SHA-256 (SHA256)
KIMD-SHA-512, KLMD-SHA-512 (SHA512)
KIMD-SHA3-224, KLMD-SHA3-224 (SHA3-224)
KIMD-SHA3-256, KLMD-SHA3-224 (SHA3-256)
KIMD-SHA3-384, KLMD-SHA3-224 (SHA3-384)
KIMD-SHA3-512, KLMD-SHA3-224 (SHA3-512)
KLMD-SHAKE-256 (SHA3-256-SHAKE)

The patch built on top of AES patch of s390x so I can't make a merge
request until the previous patch got merged. However, the code can be found
in my fork s390x-sha
<https://git.lysator.liu.se/mamonet/nettle/-/tree/s390x-sha>.
The optimized core can be enabled by either fat build or enabling the
corresponding configuration options (MSA, MSA-X1, MSA-X2, MSA-X6).

Benchmark of this patch using nettle-benchmark (Tested on z15 5.2GHZ):

*---------------------------------------------------------------------------*
|   Algorithm        |      C             |   Hardware-accelerated  |
|   sha1               |      360.69     |   1735.34
|
|  sha224            |      244.63     |   2179.60                        |
|  sha256            |      244.63     |   2179.74                        |
|  sha384            |      372.57     |   3464.84                        |
|  sha512            |      370.82     |   3463.66                        |
|  sha512-224     |      364.93     |   3382.58                        |
|  sha512-256     |      373.19     |   3463.23                        |
|  sha3-224         |      236.50     |   6859.54                        |
|  sha3-256         |      224.76     |   6656.05                        |
|  sha3-384         |      173.21     |   5818.89                        |
|  sha3-512         |      119.79     |   4693.53                        |
*---------------------------------------------------------------------------*

I have a couple of questions for this patch:

Is packing the configuration MSA options in single option is more
convenient than spamming the options with MSA extensions?

The optimized functions of sha3_update store the state buffer in big-endian
order, while C implementation store each 64-bit of state buffer in
little-endian order, I see the state buffer is used internally and since
both sha3_update and sha3_digest are optimized so both have the
same convention I think it's okay to keep it up like that, any opinions
here?

regards,
Mamone
_______________________________________________
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to