On Sun, Aug 29, 2021 at 5:52 PM Maamoun TK <maamoun...@googlemail.com>
wrote:

> Applying hardware-accelerated SHA3 instruction to optimize sha3_permute
> function for s390x arch has an insignificant impact on the performance, I'm
> wondering what we can do to take full advantage of those instructions.
> Optimizing sha3_absorb seems a good way to go since the s390x-specific
> accelerator implies permuting of state bytes and XOR operations but the
> downside of implementing this function is handling the block size variants
> for each mode, S390x arch supports the standard block sizes so we can
> branch for each standard size in the supported modes but should we consider
> unexpected block size during the implementation?
>

I got almost 12% speedup of optimizing the sha3_permute() function using
the SHA hardware accelerator of s390x, is it worth adding that assembly
implementation? I'll attach the patch at the end of this email.

In another topic, are you aware of any CFarm alternative that have arm64
machine with SHA-256 and SHA3 support to continue optimizing those
functions for aarch64 architecture in addition to x86_64 machine with shani
support to complete the patch of sha1_comoress_n() function and maximize
the performance of SHA1 compress function on hardware-supported
architectures.

C s390x/msa_x6/sha3-permute.asm

ifelse(`
   Copyright (C) 2021 Mamone Tarsha
   This file is part of GNU Nettle.

   GNU Nettle is free software: you can redistribute it and/or
   modify it under the terms of either:

     * the GNU Lesser General Public License as published by the Free
       Software Foundation; either version 3 of the License, or (at your
       option) any later version.

   or

     * the GNU General Public License as published by the Free
       Software Foundation; either version 2 of the License, or (at your
       option) any later version.

   or both in parallel, as here.

   GNU Nettle is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   General Public License for more details.

   You should have received copies of the GNU General Public License and
   the GNU Lesser General Public License along with this program.  If
   not, see http://www.gnu.org/licenses/.
')

C KIMD (COMPUTE INTERMEDIATE MESSAGE DIGEST) is specefied in
C "z/Architecture Principles of Operation SA22-7832-12" as follows:
C A function specified by the function code in general register 0 is
performed.
C General register 1 contains the logical address of the leftmost byte of
the parameter block in storage.
C the second operand is processed as specified by the function code using
an initial chaining value in
C the parameter block, and the result replaces the chaining value.

C This implementation uses KIMD-SHA3-512 function.
C The parameter block used for the KIMD-SHA3-512 function has the following
format:
C *----------------------------------------------*
C |               ICV (200 bytes)                |
C *----------------------------------------------*

C SHA function code
define(`SHA3_512_FUNCTION_CODE', `35')
C Size of block
define(`SHA3_512_BLOCK_SIZE', `72')
C Size of state
define(`SHA3_STATE_SIZE', `200')

.file "sha3-permute.asm"

.text

C void
C sha3_permute(struct sha3_ctx *ctx)

PROLOGUE(nettle_sha3_permute)
    lghi           %r0,SHA3_512_FUNCTION_CODE    C FUNCTION_CODE
    ALLOC_STACK(%r1,SHA3_STATE_SIZE+SHA3_512_BLOCK_SIZE)
.irp idx, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24
    mvcin          \idx*8(8,%r1),\idx*8+7(%r2)
.endr
    la             %r4,SHA3_STATE_SIZE (%r1)
    xc             0(SHA3_512_BLOCK_SIZE,%r4),0(%r4)
    lghi           %r5,SHA3_512_BLOCK_SIZE
1:  .long   0xb93e0004                           C kimd %r0,%r4. perform
KIMD-SHA operation on data
    brc            1,1b
.irp idx, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24
    mvcin          \idx*8(8,%r2),\idx*8+7(%r1)
.endr
    FREE_STACK(SHA3_STATE_SIZE+SHA3_512_BLOCK_SIZE)
    br             RA
EPILOGUE(nettle_sha3_permute)

regards,
Mamone
_______________________________________________
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to