Re: powerpc ecc 256 redc (was Re: x86_64 ecc_256_redc)

2021-12-06 Thread Amitay Isaacs
Hi Niels,

On Mon, 2021-12-06 at 22:29 +0100, Niels Möller wrote:
> ni...@lysator.liu.se (Niels Möller) writes:
> 
> > I think the approach should apply to other 64-bit archs (should
> > probably
> > work also on x86_64, where it's sometimes tricky to avoid x86_64
> > instructions clobbering the carry flag when it should be preserved,
> > but
> > probably not so difficult in this case).
> 
> x86_64 version below. I could also trimmed register usage, so it no
> longer needs to save and restore any registers. On my machine, this
> gives a speedup of 17% for ecc_secp256r1_redc in isolation, 3%
> speedup
> for ecdsa sign and 7% speedup of ecdsa verify.

On POWER9, the new code gives ~20% speedup for ecc_secp256r1_redc in
isolation, and ~1% speedup for ecdsa sign and verify over the earlier
assembly version.


Amitay.
-- 

Do it ! Move it ! Make it happen ! No one ever sat their way to
success.
C powerpc64/ecc-secp256r1-redc.asm

ifelse(`
   Copyright (C) 2021 Amitay Isaacs & Martin Schwenke, IBM Corporation

   Based on x86_64/ecc-secp256r1-redc.asm

   This file is part of GNU Nettle.

   GNU Nettle is free software: you can redistribute it and/or
   modify it under the terms of either:

 * the GNU Lesser General Public License as published by the Free
   Software Foundation; either version 3 of the License, or (at your
   option) any later version.

   or

 * the GNU General Public License as published by the Free
   Software Foundation; either version 2 of the License, or (at your
   option) any later version.

   or both in parallel, as here.

   GNU Nettle is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   General Public License for more details.

   You should have received copies of the GNU General Public License and
   the GNU Lesser General Public License along with this program.  If
   not, see http://www.gnu.org/licenses/.
')

C Register usage:

define(`SP', `r1')

define(`RP', `r4')
define(`XP', `r5')

define(`F0', `r3')
define(`F1', `r6')
define(`F2', `r7')
define(`F3', `r8')

define(`U0', `r9')
define(`U1', `r10')
define(`U2', `r11')
define(`U3', `r12')
define(`U4', `r14')
define(`U5', `r15')
define(`U6', `r16')
define(`U7', `r17')

.file "ecc-secp256r1-redc.asm"

C FOLD(x), sets (F3,F2,F1,F0)  <-- [(x << 192) - (x << 160) + (x << 128) + (x 
<<32)]
define(`FOLD', `
sldiF0, $1, 32
srdiF1, $1, 32
subfc   F2, F0, $1
subfe   F3, F1, $1
')

C FOLDC(x), sets (F3,F2,F1,F0)  <-- [((x+c) << 192) - (x << 160) + (x << 128) + 
(x <<32)]
define(`FOLDC', `
sldiF0, $1, 32
srdiF1, $1, 32
addze   F3, $1
subfc   F2, F0, $1
subfe   F3, F1, F3
')

C void ecc_secp256r1_redc (const struct ecc_modulo *p, mp_limb_t *rp, 
mp_limb_t *xp)
.text
define(`FUNC_ALIGN', `5')
PROLOGUE(_nettle_ecc_secp256r1_redc)

std U4,-32(SP)
std U5,-24(SP)
std U6,-16(SP)
std U7,-8(SP)

ld  U0, 0(XP)
ld  U1, 8(XP)
ld  U2, 16(XP)
ld  U3, 24(XP)
ld  U4, 32(XP)
ld  U5, 40(XP)
ld  U6, 48(XP)
ld  U7, 56(XP)

FOLD(U0)
addcU1, F0, U1
addeU2, F1, U2
addeU3, F2, U3
addeU4, F3, U4

FOLDC(U1)
addcU2, F0, U2
addeU3, F1, U3
addeU4, F2, U4
addeU5, F3, U5

FOLDC(U2)
addcU3, F0, U3
addeU4, F1, U4
addeU5, F2, U5
addeU6, F3, U6

FOLDC(U3)
addcU4, F0, U4
addeU5, F1, U5
addeU6, F2, U6
addeU7, F3, U7

C If carry, we need to add in
C 2^256 - p = <0xfffe, 0xff..ff, 0x, 1>
li  F0, 0
addze   F0, F0
neg F2, F0
sldiF1, F2, 32
srdiF3, F2, 32
li  XP, -2
and F3, F3, XP

addcU0, F0, U4
addeU1, F1, U5
addeU2, F2, U6
addeU3, F3, U7

std U0, 0(RP)
std U1, 8(RP)
std U2, 16(RP)
std U3, 24(RP)

ld  U4,-32(SP)
ld  U5,-24(SP)
ld  U6,-16(SP)
ld  U7,-8(SP)

blr
EPILOGUE(_nettle_ecc_secp256r1_redc)
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: powerpc ecc 256 redc (was Re: x86_64 ecc_256_redc)

2021-12-07 Thread Niels Möller
Amitay Isaacs  writes:

> On POWER9, the new code gives ~20% speedup for ecc_secp256r1_redc in
> isolation, and ~1% speedup for ecdsa sign and verify over the earlier
> assembly version.

Thanks! Merged to master-updates for ci testing.

I think it should be possible to reduce number of needed registers, and
completely avoid using callee-save registers (load the values now in
U4-U7 one at a time a bit closer to the place where they are needed in),
and replace F3 with $1 in the FOLD and FOLDC macros.

Regards,
/Niels

-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: powerpc ecc 256 redc (was Re: x86_64 ecc_256_redc)

2021-12-09 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes:

> Thanks! Merged to master-updates for ci testing.

And now merged to the master branch.

> I think it should be possible to reduce number of needed registers, and
> completely avoid using callee-save registers (load the values now in
> U4-U7 one at a time a bit closer to the place where they are needed in),
> and replace F3 with $1 in the FOLD and FOLDC macros.

Attaching a variant to do this. Passes tests with qemu, but I haven't
benchmarked it on any real hardware.

C powerpc64/ecc-secp256r1-redc.asm

ifelse(`
   Copyright (C) 2021 Amitay Isaacs & Martin Schwenke, IBM Corporation

   Based on x86_64/ecc-secp256r1-redc.asm

   This file is part of GNU Nettle.

   GNU Nettle is free software: you can redistribute it and/or
   modify it under the terms of either:

 * the GNU Lesser General Public License as published by the Free
   Software Foundation; either version 3 of the License, or (at your
   option) any later version.

   or

 * the GNU General Public License as published by the Free
   Software Foundation; either version 2 of the License, or (at your
   option) any later version.

   or both in parallel, as here.

   GNU Nettle is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   General Public License for more details.

   You should have received copies of the GNU General Public License and
   the GNU Lesser General Public License along with this program.  If
   not, see http://www.gnu.org/licenses/.
')

C Register usage:

define(`RP', `r4')
define(`XP', `r5')

define(`F0', `r3')
define(`F1', `r6')
define(`F2', `r7')
define(`T', `r8')

define(`U0', `r9')
define(`U1', `r10')
define(`U2', `r11')
define(`U3', `r12')

.file "ecc-secp256r1-redc.asm"

C FOLD(x), sets (x,F2,F1,F0)  <-- [(x << 192) - (x << 160) + (x << 128) + (x 
<<32)]
define(`FOLD', `
sldiF0, $1, 32
srdiF1, $1, 32
subfc   F2, F0, $1
subfe   $1, F1, $1
')

C FOLDC(x), sets (x,F2,F1,F0)  <-- [((x+c) << 192) - (x << 160) + (x << 128) + 
(x <<32)]
define(`FOLDC', `
sldiF0, $1, 32
srdiF1, $1, 32
addze   T, $1
subfc   F2, F0, $1
subfe   $1, F1, T
')

C void ecc_secp256r1_redc (const struct ecc_modulo *p, mp_limb_t *rp, 
mp_limb_t *xp)
.text
define(`FUNC_ALIGN', `5')
PROLOGUE(_nettle_ecc_secp256r1_redc)

ld  U0, 0(XP)
ld  U1, 8(XP)
ld  U2, 16(XP)
ld  U3, 24(XP)

FOLD(U0)
ld  T, 32(XP)
addcU1, F0, U1
addeU2, F1, U2
addeU3, F2, U3
addeU0, U0, T

FOLDC(U1)
ld  T, 40(XP)
addcU2, F0, U2
addeU3, F1, U3
addeU0, F2, U0
addeU1, U1, T

FOLDC(U2)
ld  T, 48(XP)
addcU3, F0, U3
addeU0, F1, U0
addeU1, F2, U1
addeU2, U2, T

FOLDC(U3)
ld  T, 56(XP)
addcU0, F0, U0
addeU1, F1, U1
addeU2, F2, U2
addeU3, U3, T

C If carry, we need to add in
C 2^256 - p = <0xfffe, 0xff..ff, 0x, 1>
li  F0, 0
addze   F0, F0
neg F2, F0
sldiF1, F2, 32
srdiT, F2, 32
li  XP, -2
and T, T, XP

addcU0, F0, U0
addeU1, F1, U1
addeU2, F2, U2
addeU3, T, U3

std U0, 0(RP)
std U1, 8(RP)
std U2, 16(RP)
std U3, 24(RP)

blr
EPILOGUE(_nettle_ecc_secp256r1_redc)

> Regards,
> /Niels

-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: powerpc ecc 256 redc (was Re: x86_64 ecc_256_redc)

2022-01-04 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes:

> ni...@lysator.liu.se (Niels Möller) writes:
>
>> I think it should be possible to reduce number of needed registers, and
>> completely avoid using callee-save registers (load the values now in
>> U4-U7 one at a time a bit closer to the place where they are needed in),
>> and replace F3 with $1 in the FOLD and FOLDC macros.
>
> Attaching a variant to do this. Passes tests with qemu, but I haven't
> benchmarked it on any real hardware.

Would you like to test and benchmark this on relevant real hardware,
before I merged this version?

Code still below, and committed to the branch ppc-secp256-tweaks.

Regards,
/Niels

C powerpc64/ecc-secp256r1-redc.asm

ifelse(`
   Copyright (C) 2021 Amitay Isaacs & Martin Schwenke, IBM Corporation

   Based on x86_64/ecc-secp256r1-redc.asm

   This file is part of GNU Nettle.

   GNU Nettle is free software: you can redistribute it and/or
   modify it under the terms of either:

 * the GNU Lesser General Public License as published by the Free
   Software Foundation; either version 3 of the License, or (at your
   option) any later version.

   or

 * the GNU General Public License as published by the Free
   Software Foundation; either version 2 of the License, or (at your
   option) any later version.

   or both in parallel, as here.

   GNU Nettle is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   General Public License for more details.

   You should have received copies of the GNU General Public License and
   the GNU Lesser General Public License along with this program.  If
   not, see http://www.gnu.org/licenses/.
')

C Register usage:

define(`RP', `r4')
define(`XP', `r5')

define(`F0', `r3')
define(`F1', `r6')
define(`F2', `r7')
define(`T', `r8')

define(`U0', `r9')
define(`U1', `r10')
define(`U2', `r11')
define(`U3', `r12')

.file "ecc-secp256r1-redc.asm"

C FOLD(x), sets (x,F2,F1,F0)  <-- [(x << 192) - (x << 160) + (x << 128) + (x 
<<32)]
define(`FOLD', `
sldiF0, $1, 32
srdiF1, $1, 32
subfc   F2, F0, $1
subfe   $1, F1, $1
')

C FOLDC(x), sets (x,F2,F1,F0)  <-- [((x+c) << 192) - (x << 160) + (x << 128) + 
(x <<32)]
define(`FOLDC', `
sldiF0, $1, 32
srdiF1, $1, 32
addze   T, $1
subfc   F2, F0, $1
subfe   $1, F1, T
')

C void ecc_secp256r1_redc (const struct ecc_modulo *p, mp_limb_t *rp, 
mp_limb_t *xp)
.text
define(`FUNC_ALIGN', `5')
PROLOGUE(_nettle_ecc_secp256r1_redc)

ld  U0, 0(XP)
ld  U1, 8(XP)
ld  U2, 16(XP)
ld  U3, 24(XP)

FOLD(U0)
ld  T, 32(XP)
addcU1, F0, U1
addeU2, F1, U2
addeU3, F2, U3
addeU0, U0, T

FOLDC(U1)
ld  T, 40(XP)
addcU2, F0, U2
addeU3, F1, U3
addeU0, F2, U0
addeU1, U1, T

FOLDC(U2)
ld  T, 48(XP)
addcU3, F0, U3
addeU0, F1, U0
addeU1, F2, U1
addeU2, U2, T

FOLDC(U3)
ld  T, 56(XP)
addcU0, F0, U0
addeU1, F1, U1
addeU2, F2, U2
addeU3, U3, T

C If carry, we need to add in
C 2^256 - p = <0xfffe, 0xff..ff, 0x, 1>
li  F0, 0
addze   F0, F0
neg F2, F0
sldiF1, F2, 32
srdiT, F2, 32
li  XP, -2
and T, T, XP

addcU0, F0, U0
addeU1, F1, U1
addeU2, F2, U2
addeU3, T, U3

std U0, 0(RP)
std U1, 8(RP)
std U2, 16(RP)
std U3, 24(RP)

blr
EPILOGUE(_nettle_ecc_secp256r1_redc)


-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: powerpc ecc 256 redc (was Re: x86_64 ecc_256_redc)

2022-01-09 Thread Amitay Isaacs
Hi Niels,

On Tue, 2022-01-04 at 20:54 +0100, Niels Möller wrote:
> ni...@lysator.liu.se (Niels Möller) writes:
> 
> > ni...@lysator.liu.se (Niels Möller) writes:
> > 
> > > I think it should be possible to reduce number of needed
> > > registers, and
> > > completely avoid using callee-save registers (load the values now
> > > in
> > > U4-U7 one at a time a bit closer to the place where they are
> > > needed in),
> > > and replace F3 with $1 in the FOLD and FOLDC macros.
> > 
> > Attaching a variant to do this. Passes tests with qemu, but I
> > haven't
> > benchmarked it on any real hardware.
> 
> Would you like to test and benchmark this on relevant real hardware,
> before I merged this version?
> 
> Code still below, and committed to the branch ppc-secp256-tweaks.

Compared to the current version in master branch, this version
definitely improves the performance of the reduction code.

On POWER9, the reduction code shows 7% speed up when tested separately.

The improvement in P256 sign/verify is marginal.  Here are the numbers
from hogweed-benchmark on POWER9.

 
name size   sign/ms verify/ms
   ecdsa  256   11.10133.5713  (master)
   ecdsa  256   11.15273.6011  (this patch)


Amitay.
-- 

People on the net are always telling other people to "get a life." It 
would be so much simlper if there were on available under GPL. "If you
use this life, you must tell other people where to get a life of their
own."  - Christopher Davis
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: powerpc ecc 256 redc (was Re: x86_64 ecc_256_redc)

2022-01-10 Thread Niels Möller
Amitay Isaacs  writes:

> Compared to the current version in master branch, this version
> definitely improves the performance of the reduction code.
>
> On POWER9, the reduction code shows 7% speed up when tested separately.
>
> The improvement in P256 sign/verify is marginal.  Here are the numbers
> from hogweed-benchmark on POWER9.
>
>  
> name size   sign/ms verify/ms
>ecdsa  256   11.10133.5713  (master)
>ecdsa  256   11.15273.6011  (this patch)

Thanks for testing. Committed to the master branch now.

Regards,
/Niels

-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs