Re: Add RSA-OAEP encryption/decryption to Nettle

2024-01-15 Thread Simo Sorce
On Mon, 2024-01-15 at 21:43 +0100, Niels Möller wrote:
> Daiki Ueno  writes:
> 
> > Now that another attack on RSA encryption with PKCS#1 v1.5 padding has
> > been discovered (though Nettle is not vulnerable)[1], it is recommended
> > to avoid using the v1.5 scheme in new applications[2][3], and thus
> > supporting RSA-OAEP in Nettle is becoming more relevant.
> 
> I agree oaep support is desirable.
> 
> > I made some modifications to the existing merge request[4], mainly to
> > make it side-channel safe at decryption:
> > https://git.lysator.liu.se/nettle/nettle/-/merge_requests/60
> 
> Thanks for reviving this issue, and looking into side-channel silence.
> 
> > Could you take a look when you have time?
> 
> Thanks, I've had a look, and it looks pretty good to me. Some comments
> and questions:
> 
> * For tests, would it make some with some test that check that
>   encryption with a given message and randomness gives the expected
>   output? Even better if there are any authoritative testcases for that?
> 
> * Is it useful to have oaep_decode_mgf1 and oaep_encode_mgf1 advertised
>   as public functions, or would it be better to make them internal?
> 
> * Do you see any reasonable (i.e., with a net gain in maintainability)
>   way to share more code between _oaep_sec_decrypt_variable and
>   _pkcs1_sec_decrypt_variable?

Hi Niels,
I did review this part, and to me it seem like it is more maintainable
to keep them separate, they already are tricky as it is, adding more
variability sounds to me would just make them more complex and
difficult to reason about.

HTH,
Simo.

> * For oaep_decode_mgf1, oaep_encode_mgf1, maybe one could let the caller
>   allocate and pass in the appropriate hashing context? Would be easy to
>   do, e.g., in rsa_oaep_sha512_decrypt. But it looks like that would be
>   inconsistent with pss_mgf1, though (which looks like it needs a
>   separate hashing context).
> 
> * I think it was a design mistake to represent RSA ciphertexts as mpz_t
>   rather then octet strings in Nettle's original RSA interfaces. I
>   wonder if it would make sense to let the new functions take
>   octet strings instead?
> 
> Regards,
> /Niels
> 
> -- 
> Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
> Internet email is subject to wholesale government surveillance.
> ___
> nettle-bugs mailing list -- nettle-bugs@lists.lysator.liu.se
> To unsubscribe send an email to nettle-bugs-le...@lists.lysator.liu.se

-- 
Simo Sorce
Distinguished Engineer
RHEL Crypto Team
Red Hat, Inc








___
nettle-bugs mailing list -- nettle-bugs@lists.lysator.liu.se
To unsubscribe send an email to nettle-bugs-le...@lists.lysator.liu.se


Re: How to update OpenSSL benchmark glue?

2023-12-05 Thread Simo Sorce
On Tue, 2023-12-05 at 13:17 +1300, Amos Jeffries wrote:
> On 4/12/23 09:05, Niels Möller wrote:
> > Simo Sorce writes:
> > 
> > > Ah you do not need to pass any property for the default provider so you
> > > can pass "" or even NULL.
> > 
> > Thanks, I now have the RSA code updated (on branch update-openssl-bench,
> > if anyone wants to see the details). Initialization is now
> > 
> >ctx->pkey_ctx = EVP_PKEY_CTX_new_from_name (NULL, "RSA", "");
> >if (!ctx->pkey_ctx)
> >  die ("OpenSSL EVP_PKEY_CTX_new_from_name (\"RSA\") failed.\n");
> 
> 
> FWIW, In Squid with OpenSSLv3 we use this:
> 
>   EVP_PKEY_CTX_new_id(EVP_PKEY_RSA, NULL)
> 

EVP_PKEY_CTX_new_from_name is the more proper way in OpenSSL 3.0

> 
> 
> >if (EVP_PKEY_keygen_init (ctx->pkey_ctx) <= 0)
> >  die ("OpenSSL EVP_PKEY_keygen_init failed.\n");
> >if (EVP_PKEY_CTX_set_rsa_keygen_bits(ctx->pkey_ctx, size) <= 0)
> >  die ("OpenSSL EVP_PKEY_CTX_set_rsa_keygen_bits failed.\n");
> >BIGNUM *e = BN_new();
> >BN_set_word(e, 65537);
> >EVP_PKEY_CTX_set1_rsa_keygen_pubexp (ctx->pkey_ctx, e);
> >EVP_PKEY_keygen (ctx->pkey_ctx, >key);
> > 
> > However, when I run this under valgrind (to check the corresponding
> > cleanup code doesn't leak memory), I get an error:
> > 
> >==3016684== Conditional jump or move depends on uninitialised value(s)
> >==3016684==at 0x4B0B824: EVP_PKEY_generate (in 
> > /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
> >==3016684==by 0x10F30A: bench_openssl_rsa_init 
> > (hogweed-benchmark.c:721)
> >==3016684==by 0x10D7AE: bench_alg (hogweed-benchmark.c:153)
> >==3016684==by 0x10D7AE: main (hogweed-benchmark.c:972)
> >==3016684==
> > 

If you can get debug symbols on your machine so we can see where
(file/line number) in openssl this leaks we can easily find out what is
the source.

> > I wonder if that my code missing some initialization, or if that's an
> > openssl problem?
> 
> 
> How was the "ctx" variable created and initialized?
> 
> The new EVP_PKEY logic has a lot of "ctx_is_legacy" checks based on the 
> ctx itself. So that matters now where it did not before.
> 
> 
> > It's also unclear to me when the e bignum above can be
> > deallocated, does EVP_PKEY_CTX_set1_rsa_keygen_pubexp imply a full copy
> > into the context?

Form the manpage:
   EVP_PKEY_CTX_set1_rsa_keygen_pubexp() sets the public exponent value
   for RSA key generation to the value stored in pubexp. Currently it
   should be an odd integer. In accordance with the OpenSSL naming
   convention, the pubexp pointer must be freed independently of the
   EVP_PKEY_CTX (ie, it is internally copied).  If not specified 65537
   is used.


> 
> Quick reading of the source code indicates that yes the context used 
> BN_dup() one way or another.
> 
> 
> > 
> > Next is updating the ecdsa benchmarks, since, e.g.,
> > EC_KEY_new_by_curve_name, generates deprecation warnings.
> > 
> > Regards,
> > /Niels
> > 
> 
> 
> HTH
> Amos
> ___
> nettle-bugs mailing list -- nettle-bugs@lists.lysator.liu.se
> To unsubscribe send an email to nettle-bugs-le...@lists.lysator.liu.se

-- 
Simo Sorce,
DE @ RHEL Crypto Team,
Red Hat, Inc




___
nettle-bugs mailing list -- nettle-bugs@lists.lysator.liu.se
To unsubscribe send an email to nettle-bugs-le...@lists.lysator.liu.se


Re: How to update OpenSSL benchmark glue?

2023-11-29 Thread Simo Sorce
On Wed, 2023-11-29 at 07:43 +0100, Niels Möller wrote:
> Simo Sorce  writes:
> 
> > This code here generates RSA keys using the modern API:
> > https://github.com/latchset/pkcs11-provider/blob/main/tests/tgenkey.c#L129
> 
> Thanks, I'll look at that (but I'm off for the https://catsworkshop.dev/
> in Copenhagen rest of the week).
> 
> > (Ignore the pkcs11_uri params, that's special stuff for the pkcs11
> > provider).
> 
> What about the first call,
> 
>ctx = EVP_PKEY_CTX_new_from_name(NULL, key_type, "provider=pkcs11");
> 
> What should I pass for the last argument if I just want openssl's
> software implementation, no hardware tokens?

Ah you do not need to pass any property for the default provider so you
can pass "" or even NULL.

Simo.

-- 
Simo Sorce,
DE @ RHEL Crypto Team,
Red Hat, Inc




___
nettle-bugs mailing list -- nettle-bugs@lists.lysator.liu.se
To unsubscribe send an email to nettle-bugs-le...@lists.lysator.liu.se


Re: How to update OpenSSL benchmark glue?

2023-11-27 Thread Simo Sorce
On Thu, 2023-11-23 at 21:19 +0100, Niels Möller wrote:
> The hogweed-benchmark code for benchmarking OpenSSL uses several
> functions that have been deprecated in OpenSSL 3.0, like RSA_new(). I've
> spent some hour attempting to update it to non-deprecated functions, but
> I'm having a really hard time navigating the openssl apis and
> documentation.
> 
> I've not yet been able to even generate an RSA key, using openssl
> functions like EVP_PKEY_CTX_new_from_name, EVP_PKEY_keygen_init,
> EVP_PKEY_CTX_set_rsa_keygen_bits, etc.
> 
> Is there anyone on the list familiar with OpenSSL that could help with
> this update?
> 
> And related question: Do you see a value in hogweed-benchmark (and
> nettle-benchmark) running comparative benchmarks to OpenSSL? Otherwise,
> maybe it's not worth the effort to keep and maintain that code.

Hi Niels,
I do see some value and I can help you with that.

This code here generates RSA keys using the modern API:
https://github.com/latchset/pkcs11-provider/blob/main/tests/tgenkey.c#L129

(Ignore the pkcs11_uri params, that's special stuff for the pkcs11
provider).

You just need the "rsa_keygen_bits" parameters for RSA generation.

HTH,
Simo.

-- 
Simo Sorce,
DE @ RHEL Crypto Team,
Red Hat, Inc




___
nettle-bugs mailing list -- nettle-bugs@lists.lysator.liu.se
To unsubscribe send an email to nettle-bugs-le...@lists.lysator.liu.se


Re: [Aarch64] Optimize SHA1 Compress

2021-05-14 Thread Simo Sorce
.4s
> +addTMP0.4s,MSG0.4s,CONST3.4s
> +sha1su1MSG1.4s,MSG0.4s
> +sha1su0MSG2.4s,MSG3.4s,MSG0.4s
> +
> +C Rounds 60-63
> +sha1h  SFP(E0),SFP(ABCD)
> +sha1p  QFP(ABCD),SFP(E1),TMP1.4s
> +addTMP1.4s,MSG1.4s,CONST3.4s
> +sha1su1MSG2.4s,MSG1.4s
> +sha1su0MSG3.4s,MSG0.4s,MSG1.4s
> +
> +C Rounds 64-67
> +sha1h  SFP(E1),SFP(ABCD)
> +sha1p  QFP(ABCD),SFP(E0),TMP0.4s
> +addTMP0.4s,MSG2.4s,CONST3.4s
> +sha1su1MSG3.4s,MSG2.4s
> +sha1su0MSG0.4s,MSG1.4s,MSG2.4s
> +
> +C Rounds 68-71
> +sha1h  SFP(E0),SFP(ABCD)
> +sha1p  QFP(ABCD),SFP(E1),TMP1.4s
> +addTMP1.4s,MSG3.4s,CONST3.4s
> +sha1su1MSG0.4s,MSG3.4s
> +
> +C Rounds 72-75
> +sha1h      SFP(E1),SFP(ABCD)
> +sha1p  QFP(ABCD),SFP(E0),TMP0.4s
> +
> +C Rounds 76-79
> +sha1h  SFP(E0),SFP(ABCD)
> +sha1p  QFP(ABCD),SFP(E1),TMP1.4s
> +
> +C Combine state
> +addE0.4s,E0.4s,E0_SAVED.4s
> +addABCD.4s,ABCD.4s,ABCD_SAVED.4s
> +
> +C Store state
> +st1{ABCD.4s},[STATE]
> +st1{E0.s}[0],[x2]
> +
> +ret
> +EPILOGUE(nettle_sha1_compress)
> diff --git a/arm64/machine.m4 b/arm64/machine.m4
> index e69de29b..7df62bcc 100644
> --- a/arm64/machine.m4
> +++ b/arm64/machine.m4
> @@ -0,0 +1,7 @@
> +C Get 32-bit floating-point register from vector register
> +C SFP(VR)
> +define(`SFP',``s'substr($1,1,len($1))')
> +
> +C Get 128-bit floating-point register from vector register
> +C QFP(VR)
> +define(`QFP',``q'substr($1,1,len($1))')
> 

-- 
Simo Sorce
RHEL Crypto Team
Red Hat, Inc




___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Add AES Key Wrap (RFC 3394) in Nettle

2021-03-29 Thread Simo Sorce
On Mon, 2021-03-29 at 19:32 +0200, Niels Möller wrote:
> Nicolas Mora  writes:
> 
> > > The new feature also needs documentation, will you look into that once
> > > code, and in particular the interfaces, are solid?
> > > 
> > Definitely!
> > What do you think the documentation should look like? Should it be
> > near paragraph 7.2.1? Like
> > 
> > 7.2.1.1 AES Key Wrap
> 
> That's one possibility, but I think it would also be natural to put it
> somewhere under or close to "7.4. Authenticated encryption and
> associated data", even though there's no associated data. That section
> could perhaps be retitled to "Authenticated encryption" to generalize
> it?
> 
> Or possibly under "7.3 Cipher modes", if it's too different from the
> AEAD constructions.

FWIW NIST considers this a category on it's own under key management.

Simo.

-- 
Simo Sorce
RHEL Crypto Team
Red Hat, Inc




___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: SHA1 Collision Detection

2020-11-02 Thread Simo Sorce
On Mon, 2020-11-02 at 14:40 +0100, Neal H. Walfield wrote:
> Hi Simo,
> 
> On Mon, 02 Nov 2020 14:31:34 +0100,
> Simo Sorce wrote:
> > On Mon, 2020-11-02 at 12:53 +0100, Neal H. Walfield wrote:
> > This change would have to be conditional as it will break compatibility
> > for the very use case you mention, data at rest saved moons ago.
> 
> I see two ways forward.
> 
> If I recall the C calling conventions correctly, it should be possible
> to change sha1_digest from returning void to returning an int in a
> backwards compatible way.  Then, a user of the return code would check
> some function at run time to see whether the function really returns
> an int.
> 
> That's ugly.  For us (Sequoia [1]), we'd be happy to have a parallel
> API.
> 
>   [1] https://sequoia-pgp.org/
> 
> For SHA1, Nettle has the following functions [2]:
> 
>   void sha1_init (struct sha1_ctx *ctx)
>   void sha1_update (struct sha1_ctx *ctx, size_t length, const uint8_t *data)
>   void sha1_digest (struct sha1_ctx *ctx, size_t length, uint8_t *digest)
> 
>   [2] 
> http://www.lysator.liu.se/~nisse/nettle/nettle.html#Legacy-hash-functions
> 
> So we could add:
> 
>   void sha1_collision_detection_init(...);
>   void sha1_collision_detection_update (struct sha1_ctx *ctx, size_t length, 
> const uint8_t *data)
>   error_code_t sha1_collision_detection_digest (struct sha1_ctx *ctx, size_t 
> length, uint8_t *digest)
>   
> What do you think?  Or, am I misunderstanding what you mean by
> breaking compatibility?

This is a possible way to go, you did not misunderstand.

Simo.

-- 
Simo Sorce
RHEL Crypto Team
Red Hat, Inc




___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: SHA1 Collision Detection

2020-11-02 Thread Simo Sorce
On Mon, 2020-11-02 at 12:53 +0100, Neal H. Walfield wrote:
> Hi,
> 
> It's well known that SHA-1 is broken.  I don't want to save it.  But,
> particularly when dealing with data at rest, there are cases where one
> has to use SHA-1.  It would be nice if Nettle integrated SHA-1
> collision detection to make that a tiny bit safer:
> 
>   https://github.com/cr-marcstevens/sha1collisiondetection
> 
> That library is under the MIT license, and apparently detects known
> attacks against SHA-1:
> 
>   [The routines] will compute the SHA-1 hash of any given file and
>   additionally will detect cryptanalytic collision attacks against
>   SHA-1 present in each file. It is very fast and takes less than
>   twice the amount of time as regular SHA-1.
> 
>   More specifically they will detect any cryptanalytic collision
>   attack against SHA-1 using any of the top 32 SHA-1 disturbance
>   vectors with probability 1: ...
> 
>   The possibility of false positives can be neglected as the
>   probability is smaller than 2^-90.
> 
> Thanks,
> 
> :) Neal

This change would have to be conditional as it will break compatibility
for the very use case you mention, data at rest saved moons ago.

Simo.

-- 
Simo Sorce
RHEL Crypto Team
Red Hat, Inc




___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: V3 [PATCH] x86: Add X86_ENDBR and CET marker to config.m4.in

2020-03-13 Thread Simo Sorce
On Thu, 2020-03-12 at 21:53 +0100, Niels Möller wrote:
> "H.J. Lu"  writes:
> 
> > Here is the updated patch.
> 
> This V3 patch looks pretty nice to me. 
> 
> But I'm a bit confused by the use of ASM_X86_ENDBR. The instruction is
> added to entry points, via the PROLOGUE macro, but not to other branch
> targets, e.g., loop labels in the assembly files. Is that not needed
> (because branches are direct, without going via PLT indirection)? Or
> will the assembler insert some ENDBR instructions automatically?

Hi Niels,
ENDBR is only needed at CALL targets or indirect jump targets, so not
needed for direct call. Actually theoretical harmful to add any ENDBR
you do not need, as you give more landing sites.

Simo.

-- 
Simo Sorce
RHEL Crypto Team
Red Hat, Inc




___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: V3 [PATCH] x86: Add X86_ENDBR and CET marker to config.m4.in

2020-03-09 Thread Simo Sorce
On Mon, 2020-03-09 at 14:59 -0700, H.J. Lu wrote:
> On Mon, Mar 9, 2020 at 2:42 PM Simo Sorce  wrote:
> > On Mon, 2020-03-09 at 14:31 -0700, H.J. Lu wrote:
> > > On Mon, Mar 9, 2020 at 2:15 PM Simo Sorce  wrote:
> > > > On Mon, 2020-03-09 at 12:46 -0700, H.J. Lu wrote:
> > > > > On Mon, Mar 9, 2020 at 12:22 PM Simo Sorce  wrote:
> > > > > > On Mon, 2020-03-09 at 15:19 -0400, Simo Sorce wrote:
> > > > > > > On Mon, 2020-03-09 at 11:56 -0700, H.J. Lu wrote:
> > > > > > > > On Mon, Mar 9, 2020 at 11:19 AM Simo Sorce  
> > > > > > > > wrote:
> > > > > > > > > On Mon, 2020-03-09 at 19:03 +0100, Niels Möller wrote:
> > > > > > > > > > Simo Sorce  writes:
> > > > > > > > > > 
> > > > > > > > > > > The patchset i solder than I did remember, April 2019
> > > > > > > > > > > But I recall running at least one version of it on our 
> > > > > > > > > > > CET emulator @
> > > > > > > > > > > Red Hat.
> > > > > > > > > > 
> > > > > > > > > > Sorry I forgot to followup on that. It seems only the first 
> > > > > > > > > > easy cleanup
> > > > > > > > > > patch, "Add missing EPILOGUEs in assembly files", was 
> > > > > > > > > > applied back then.
> > > > > > > > > > 
> > > > > > > > > > Do you remember why you used GNU_CET_SECTION() explicitly 
> > > > > > > > > > in .asm files,
> > > > > > > > > > rather than using an m4 divert?
> > > > > > > > > 
> > > > > > > > > Not really I do not recall anymore, but I think there was a 
> > > > > > > > > reason, as
> > > > > > > > > I recall you made that comment back then and it "didn't work 
> > > > > > > > > out" when
> > > > > > > > > I tried is the memory I have of it.
> > > > > > > > > Might have to do with differences in how it lays out the code 
> > > > > > > > > when done
> > > > > > > > > via m4 divert, but not 100% sure.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > m4 divert  requires much less changes.   Here is the updated 
> > > > > > > > patch with
> > > > > > > > ASM_X86_ENDBR, ASM_X86_MARK_CET_ALIGN and ASM_X86_MARK_CET.
> > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > Two comments on your patch.
> > > > > > > 
> > > > > > > 1. It is an error to align based on architecture. All GNU Notes 
> > > > > > > MUST be
> > > > > > > aligned 8 bytes. Since 2018 GNU Libc ignores misaligned notes.
> > > > > > 
> > > > > > Ah nevermind this point, misunderstanding with my libc expert, the 4
> > > > > > bytes alignment is ok on 32 bit code.
> > > > > > 
> > > > > > > 2. It is better to use .pushsection .popsection pairs around the 
> > > > > > > note
> > > > > > > instead of .section because of the side effects of using .section
> > > > > 
> > > > > Done.
> > > > > 
> > > > > > > The m4 divert looks smaller impact, feel free to lift the Gnu Note
> > > > > > > section in my patch #3 and place it into your patch if you want. 
> > > > > > > My
> > > > > > > code also made it more explicit what all the sections values 
> > > > > > > actually
> > > > > > > mean which will help in long term maintenance if someone else 
> > > > > > > need to
> > > > > > > change anything (like for example changing to enable only 
> > > > > > > ShadowStack
> > > > > > > vs IBT).
> > > > > > > 
> > > > > 
> > > > > Since CET support requires all objects are marked for CET,  CET 
> > > > > marker on
> > > > > assembly sources is controlled by comp

Re: V3 [PATCH] x86: Add X86_ENDBR and CET marker to config.m4.in

2020-03-09 Thread Simo Sorce
On Mon, 2020-03-09 at 14:31 -0700, H.J. Lu wrote:
> On Mon, Mar 9, 2020 at 2:15 PM Simo Sorce  wrote:
> > On Mon, 2020-03-09 at 12:46 -0700, H.J. Lu wrote:
> > > On Mon, Mar 9, 2020 at 12:22 PM Simo Sorce  wrote:
> > > > On Mon, 2020-03-09 at 15:19 -0400, Simo Sorce wrote:
> > > > > On Mon, 2020-03-09 at 11:56 -0700, H.J. Lu wrote:
> > > > > > On Mon, Mar 9, 2020 at 11:19 AM Simo Sorce  wrote:
> > > > > > > On Mon, 2020-03-09 at 19:03 +0100, Niels Möller wrote:
> > > > > > > > Simo Sorce  writes:
> > > > > > > > 
> > > > > > > > > The patchset i solder than I did remember, April 2019
> > > > > > > > > But I recall running at least one version of it on our CET 
> > > > > > > > > emulator @
> > > > > > > > > Red Hat.
> > > > > > > > 
> > > > > > > > Sorry I forgot to followup on that. It seems only the first 
> > > > > > > > easy cleanup
> > > > > > > > patch, "Add missing EPILOGUEs in assembly files", was applied 
> > > > > > > > back then.
> > > > > > > > 
> > > > > > > > Do you remember why you used GNU_CET_SECTION() explicitly in 
> > > > > > > > .asm files,
> > > > > > > > rather than using an m4 divert?
> > > > > > > 
> > > > > > > Not really I do not recall anymore, but I think there was a 
> > > > > > > reason, as
> > > > > > > I recall you made that comment back then and it "didn't work out" 
> > > > > > > when
> > > > > > > I tried is the memory I have of it.
> > > > > > > Might have to do with differences in how it lays out the code 
> > > > > > > when done
> > > > > > > via m4 divert, but not 100% sure.
> > > > > > > 
> > > > > > 
> > > > > > m4 divert  requires much less changes.   Here is the updated patch 
> > > > > > with
> > > > > > ASM_X86_ENDBR, ASM_X86_MARK_CET_ALIGN and ASM_X86_MARK_CET.
> > > > > > 
> > > > > > 
> > > > > 
> > > > > Two comments on your patch.
> > > > > 
> > > > > 1. It is an error to align based on architecture. All GNU Notes MUST 
> > > > > be
> > > > > aligned 8 bytes. Since 2018 GNU Libc ignores misaligned notes.
> > > > 
> > > > Ah nevermind this point, misunderstanding with my libc expert, the 4
> > > > bytes alignment is ok on 32 bit code.
> > > > 
> > > > > 2. It is better to use .pushsection .popsection pairs around the note
> > > > > instead of .section because of the side effects of using .section
> > > 
> > > Done.
> > > 
> > > > > The m4 divert looks smaller impact, feel free to lift the Gnu Note
> > > > > section in my patch #3 and place it into your patch if you want. My
> > > > > code also made it more explicit what all the sections values actually
> > > > > mean which will help in long term maintenance if someone else need to
> > > > > change anything (like for example changing to enable only ShadowStack
> > > > > vs IBT).
> > > > > 
> > > 
> > > Since CET support requires all objects are marked for CET,  CET marker on
> > > assembly sources is controlled by compiler options, not by configure 
> > > option.
> > > Also linker can merge multiple .note.gnu.property sections in a single
> > > input file:
> > > 
> > > [hjl@gnu-cfl-1 tmp]$ cat p.s
> > > .pushsection ".note.gnu.property", "a"
> > > .p2align 3
> > > .long 1f - 0f
> > > .long 4f - 1f
> > > .long 5
> > > 0:
> > > .asciz "GNU"
> > > 1:
> > > .p2align 3
> > > .long 0xc002
> > > .long 3f - 2f
> > > 2:
> > > .long 1
> > > 3:
> > > .p2align 3
> > > 4:
> > > .popsection
> > > .pushsection ".note.gnu.property", "a"
> > > .p2align 3
> > > .long 1f - 0f
> > > .long 4f - 1f
> > > .long 5
> > > 0:
> > > .asciz "GNU"
> > > 1:
> > > .p2align 3
> > > .long 0x

Re: V3 [PATCH] x86: Add X86_ENDBR and CET marker to config.m4.in

2020-03-09 Thread Simo Sorce
On Mon, 2020-03-09 at 12:46 -0700, H.J. Lu wrote:
> On Mon, Mar 9, 2020 at 12:22 PM Simo Sorce  wrote:
> > On Mon, 2020-03-09 at 15:19 -0400, Simo Sorce wrote:
> > > On Mon, 2020-03-09 at 11:56 -0700, H.J. Lu wrote:
> > > > On Mon, Mar 9, 2020 at 11:19 AM Simo Sorce  wrote:
> > > > > On Mon, 2020-03-09 at 19:03 +0100, Niels Möller wrote:
> > > > > > Simo Sorce  writes:
> > > > > > 
> > > > > > > The patchset i solder than I did remember, April 2019
> > > > > > > But I recall running at least one version of it on our CET 
> > > > > > > emulator @
> > > > > > > Red Hat.
> > > > > > 
> > > > > > Sorry I forgot to followup on that. It seems only the first easy 
> > > > > > cleanup
> > > > > > patch, "Add missing EPILOGUEs in assembly files", was applied back 
> > > > > > then.
> > > > > > 
> > > > > > Do you remember why you used GNU_CET_SECTION() explicitly in .asm 
> > > > > > files,
> > > > > > rather than using an m4 divert?
> > > > > 
> > > > > Not really I do not recall anymore, but I think there was a reason, as
> > > > > I recall you made that comment back then and it "didn't work out" when
> > > > > I tried is the memory I have of it.
> > > > > Might have to do with differences in how it lays out the code when 
> > > > > done
> > > > > via m4 divert, but not 100% sure.
> > > > > 
> > > > 
> > > > m4 divert  requires much less changes.   Here is the updated patch with
> > > > ASM_X86_ENDBR, ASM_X86_MARK_CET_ALIGN and ASM_X86_MARK_CET.
> > > > 
> > > > 
> > > 
> > > Two comments on your patch.
> > > 
> > > 1. It is an error to align based on architecture. All GNU Notes MUST be
> > > aligned 8 bytes. Since 2018 GNU Libc ignores misaligned notes.
> > 
> > Ah nevermind this point, misunderstanding with my libc expert, the 4
> > bytes alignment is ok on 32 bit code.
> > 
> > > 2. It is better to use .pushsection .popsection pairs around the note
> > > instead of .section because of the side effects of using .section
> 
> Done.
> 
> > > The m4 divert looks smaller impact, feel free to lift the Gnu Note
> > > section in my patch #3 and place it into your patch if you want. My
> > > code also made it more explicit what all the sections values actually
> > > mean which will help in long term maintenance if someone else need to
> > > change anything (like for example changing to enable only ShadowStack
> > > vs IBT).
> > > 
> 
> Since CET support requires all objects are marked for CET,  CET marker on
> assembly sources is controlled by compiler options, not by configure option.
> Also linker can merge multiple .note.gnu.property sections in a single
> input file:
> 
> [hjl@gnu-cfl-1 tmp]$ cat p.s
> .pushsection ".note.gnu.property", "a"
> .p2align 3
> .long 1f - 0f
> .long 4f - 1f
> .long 5
> 0:
> .asciz "GNU"
> 1:
> .p2align 3
> .long 0xc002
> .long 3f - 2f
> 2:
> .long 1
> 3:
> .p2align 3
> 4:
> .popsection
> .pushsection ".note.gnu.property", "a"
> .p2align 3
> .long 1f - 0f
> .long 4f - 1f
> .long 5
> 0:
> .asciz "GNU"
> 1:
> .p2align 3
> .long 0xc002
> .long 3f - 2f
> 2:
> .long 2
> 3:
> .p2align 3
> 4:
> .popsection
> [hjl@gnu-cfl-1 tmp]$ as -o p.o p.s -mx86-used-note=no
> [hjl@gnu-cfl-1 tmp]$ readelf -n p.o
> 
> Displaying notes found in: .note.gnu.property
>   OwnerData size Description
>   GNU  0x0010 NT_GNU_PROPERTY_TYPE_0
>   Properties: x86 feature: IBT
>   GNU  0x0010 NT_GNU_PROPERTY_TYPE_0
>   Properties: x86 feature: SHSTK
> [hjl@gnu-cfl-1 tmp]$ ld -r p.o
> [hjl@gnu-cfl-1 tmp]$ readelf -n a.out
> 
> Displaying notes found in: .note.gnu.property
>   OwnerData size Description
>   GNU  0x0010 NT_GNU_PROPERTY_TYPE_0
>   Properties: x86 feature: IBT, SHSTK
> [hjl@gnu-cfl-1 tmp]$
> 
> New properties can be added without changing CET marker.
> 
> Here is the updated patch.

This patch looks good to me.
Unfortunately I never received the original email creating the thred,
did you send other patches too ?
Or is the prologue stuff sufficient to pass test suite in CET emulator?

-- 
Simo Sorce
RHEL Crypto Team
Red Hat, Inc




___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH] x86: Add X86_ENDBR and CET marker to config.m4.in

2020-03-09 Thread Simo Sorce
On Mon, 2020-03-09 at 15:19 -0400, Simo Sorce wrote:
> On Mon, 2020-03-09 at 11:56 -0700, H.J. Lu wrote:
> > On Mon, Mar 9, 2020 at 11:19 AM Simo Sorce  wrote:
> > > On Mon, 2020-03-09 at 19:03 +0100, Niels Möller wrote:
> > > > Simo Sorce  writes:
> > > > 
> > > > > The patchset i solder than I did remember, April 2019
> > > > > But I recall running at least one version of it on our CET emulator @
> > > > > Red Hat.
> > > > 
> > > > Sorry I forgot to followup on that. It seems only the first easy cleanup
> > > > patch, "Add missing EPILOGUEs in assembly files", was applied back then.
> > > > 
> > > > Do you remember why you used GNU_CET_SECTION() explicitly in .asm files,
> > > > rather than using an m4 divert?
> > > 
> > > Not really I do not recall anymore, but I think there was a reason, as
> > > I recall you made that comment back then and it "didn't work out" when
> > > I tried is the memory I have of it.
> > > Might have to do with differences in how it lays out the code when done
> > > via m4 divert, but not 100% sure.
> > > 
> > 
> > m4 divert  requires much less changes.   Here is the updated patch with
> > ASM_X86_ENDBR, ASM_X86_MARK_CET_ALIGN and ASM_X86_MARK_CET.
> > 
> > 
> 
> Two comments on your patch.
> 
> 1. It is an error to align based on architecture. All GNU Notes MUST be
> aligned 8 bytes. Since 2018 GNU Libc ignores misaligned notes.

Ah nevermind this point, misunderstanding with my libc expert, the 4
bytes alignment is ok on 32 bit code.

> 2. It is better to use .pushsection .popsection pairs around the note
> instead of .section because of the side effects of using .section

> The m4 divert looks smaller impact, feel free to lift the Gnu Note
> section in my patch #3 and place it into your patch if you want. My
> code also made it more explicit what all the sections values actually
> mean which will help in long term maintenance if someone else need to
> change anything (like for example changing to enable only ShadowStack
> vs IBT).
> 
> Simo.
> 

-- 
Simo Sorce
RHEL Crypto Team
Red Hat, Inc




___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH] x86: Add X86_ENDBR and CET marker to config.m4.in

2020-03-09 Thread Simo Sorce
On Mon, 2020-03-09 at 11:56 -0700, H.J. Lu wrote:
> On Mon, Mar 9, 2020 at 11:19 AM Simo Sorce  wrote:
> > On Mon, 2020-03-09 at 19:03 +0100, Niels Möller wrote:
> > > Simo Sorce  writes:
> > > 
> > > > The patchset i solder than I did remember, April 2019
> > > > But I recall running at least one version of it on our CET emulator @
> > > > Red Hat.
> > > 
> > > Sorry I forgot to followup on that. It seems only the first easy cleanup
> > > patch, "Add missing EPILOGUEs in assembly files", was applied back then.
> > > 
> > > Do you remember why you used GNU_CET_SECTION() explicitly in .asm files,
> > > rather than using an m4 divert?
> > 
> > Not really I do not recall anymore, but I think there was a reason, as
> > I recall you made that comment back then and it "didn't work out" when
> > I tried is the memory I have of it.
> > Might have to do with differences in how it lays out the code when done
> > via m4 divert, but not 100% sure.
> > 
> 
> m4 divert  requires much less changes.   Here is the updated patch with
> ASM_X86_ENDBR, ASM_X86_MARK_CET_ALIGN and ASM_X86_MARK_CET.
> 
> 

Two comments on your patch.

1. It is an error to align based on architecture. All GNU Notes MUST be
aligned 8 bytes. Since 2018 GNU Libc ignores misaligned notes.

2. It is better to use .pushsection .popsection pairs around the note
instead of .section because of the side effects of using .section

The m4 divert looks smaller impact, feel free to lift the Gnu Note
section in my patch #3 and place it into your patch if you want. My
code also made it more explicit what all the sections values actually
mean which will help in long term maintenance if someone else need to
change anything (like for example changing to enable only ShadowStack
vs IBT).

Simo.

-- 
Simo Sorce
RHEL Crypto Team
Red Hat, Inc




___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH] x86: Add X86_ENDBR and CET marker to config.m4.in

2020-03-09 Thread Simo Sorce
On Mon, 2020-03-09 at 19:03 +0100, Niels Möller wrote:
> Simo Sorce  writes:
> 
> > The patchset i solder than I did remember, April 2019
> > But I recall running at least one version of it on our CET emulator @
> > Red Hat.
> 
> Sorry I forgot to followup on that. It seems only the first easy cleanup
> patch, "Add missing EPILOGUEs in assembly files", was applied back then.
> 
> Do you remember why you used GNU_CET_SECTION() explicitly in .asm files,
> rather than using an m4 divert?

Not really I do not recall anymore, but I think there was a reason, as
I recall you made that comment back then and it "didn't work out" when
I tried is the memory I have of it.
Might have to do with differences in how it lays out the code when done
via m4 divert, but not 100% sure.

Simo.

-- 
Simo Sorce
RHEL Crypto Team
Red Hat, Inc




___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: [PATCH] x86: Add X86_ENDBR and CET marker to config.m4.in

2020-03-09 Thread Simo Sorce
On Mon, 2020-03-09 at 08:33 -0700, H.J. Lu wrote:
> On Mon, Mar 9, 2020 at 5:36 AM Simo Sorce  wrote:
> > On Sat, 2020-03-07 at 17:49 +0100, Niels Möller wrote:
> > > "H.J. Lu"  writes:
> > > 
> > > > Intel Control-flow Enforcement Technology (CET):
> > > > 
> > > > https://software.intel.com/en-us/articles/intel-sdm
> > > > 
> > > > contains shadow stack (SHSTK) and indirect branch tracking (IBT).  When
> > > > CET is enabled, ELF object files must be marked with .note.gnu.property
> > > > section.  Also when IBT is enabled, all indirect branch targets must
> > > > start with ENDBR instruction.
> > > > 
> > > > This patch adds X86_ENDBR and the CET marker to config.m4.in when CET
> > > > is enabled.  It updates PROLOGUE with X86_ENDBR.
> > > 
> > > I'd like to have a look at what gcc produces. How is it enabled with
> > > gcc? In the docs, I find
> > > 
> > >   -mshstk
> > > 
> > > The -mshstk option enables shadow stack built-in functions from x86
> > > Control-flow Enforcement Technology (CET).
> > > 
> > > but when I try compiling a trivial function,
> > > 
> > >   $ cat foo-cet.c
> > >   int foo(void) {return 0;}
> > >   $ gcc -save-temps -c -mshstk foo-cet.c
> > > 
> > > I get no endbr instruction and no note in the foo-cet.s. I'm using
> > > gcc-8.3. I do get an
> > > 
> > >   .section .note.GNU-stack,"",@progbits
> > > 
> > > corresponding to Nettle's ASM_MARK_NOEXEC_STACK
> > > 
> > > > --- a/config.m4.in
> > > > +++ b/config.m4.in
> > > > @@ -8,6 +8,10 @@ define(, <@ASM_ALIGN_LOG@>)dnl
> > > >  define(, <@W64_ABI@>)dnl
> > > >  define(, <@ASM_RODATA@>)dnl
> > > >  define(, <@ASM_WORDS_BIGENDIAN@>)dnl
> > > > +define(,<@X86_ENDBR@>)dnl
> > > > +divert(1)
> > > > +@X86_GNU_PROPERTY@
> > > > +divert
> > > >  divert(1)
> > > >  @ASM_MARK_NOEXEC_STACK@
> > > >  divert
> > > 
> > > You can put the two properties in the same m4 divert. Also, please
> > > rename the autoconf substitutions with ASM_ prefix, and something more
> > > descriptive than X64_GNU_PROPERTY. E.g., ASM_X86_ENDBR and
> > > ASM_X86_MARK_CET.
> > > 
> > > > diff --git a/configure.ac b/configure.ac
> > > > index ba3ab7c6..e9ed630c 100644
> > > > --- a/configure.ac
> > > > +++ b/configure.ac
> > > > @@ -803,6 +803,82 @@ EOF
> > > >ASM_ALIGN_LOG="$nettle_cv_asm_align_log"
> > > >  fi
> > > > 
> > > > +dnl  Define
> > > > +dnl  1. X86_ENDBR for endbr32/endbr64.
> > > > +dnl  2. X86_GNU_PROPERTY to add a .note.gnu.property section to mark
> > > > +dnl  Intel CET support if needed.
> > > > +dnl.section ".note.gnu.property", "a"
> > > > +dnl.p2align POINTER-ALIGN
> > > > +dnl.long 1f - 0f
> > > > +dnl.long 4f - 1f
> > > > +dnl.long 5
> > > > +dnl 0:
> > > > +dnl.asciz "GNU"
> > > > +dnl 1:
> > > > +dnl.p2align POINTER-ALIGN
> > > > +dnl.long 0xc002
> > > > +dnl.long 3f - 2f
> > > > +dnl 2:
> > > > +dnl.long 3
> > > > +dnl 3:
> > > > +dnl.p2align POINTER-ALIGN
> > > > +dnl 4:
> > > 
> > > No need to repeat the definition in full in this comment. And as I think
> > > I've said before, I'm a bit surprised that it needs to be this verbose.
> > > 
> > > > +AC_CACHE_CHECK([if Intel CET is enabled],
> > > > +  [nettle_cv_asm_x86_intel_cet],
> > > > +  [AC_TRY_COMPILE([
> > > > +#ifndef __CET__
> > > > +#error Intel CET is not enabled
> > > > +#endif
> > > > +  ], [],
> > > > +  [nettle_cv_asm_x86_intel_cet=yes],
> > > > +  [nettle_cv_asm_x86_intel_cet=no])])
> > > > +if test "$nettle_cv_asm_x86_intel_cet" = yes; then
> > > > +  case $ABI in
> > > > +  32|standard)
> > > > +X86_ENDBR=endbr32
> > > > +p2align=2
> > > > +;;
> > > > +  64)
> > > > +X86_ENDBR=endbr64
> > > > +p2align=3
> > >

Re: [PATCH] x86: Add X86_ENDBR and CET marker to config.m4.in

2020-03-09 Thread Simo Sorce
On Sat, 2020-03-07 at 17:49 +0100, Niels Möller wrote:
> "H.J. Lu"  writes:
> 
> > Intel Control-flow Enforcement Technology (CET):
> > 
> > https://software.intel.com/en-us/articles/intel-sdm
> > 
> > contains shadow stack (SHSTK) and indirect branch tracking (IBT).  When
> > CET is enabled, ELF object files must be marked with .note.gnu.property
> > section.  Also when IBT is enabled, all indirect branch targets must
> > start with ENDBR instruction.
> > 
> > This patch adds X86_ENDBR and the CET marker to config.m4.in when CET
> > is enabled.  It updates PROLOGUE with X86_ENDBR.
> 
> I'd like to have a look at what gcc produces. How is it enabled with
> gcc? In the docs, I find
> 
>   -mshstk
> 
> The -mshstk option enables shadow stack built-in functions from x86
> Control-flow Enforcement Technology (CET).
> 
> but when I try compiling a trivial function,
> 
>   $ cat foo-cet.c 
>   int foo(void) {return 0;}
>   $ gcc -save-temps -c -mshstk foo-cet.c 
> 
> I get no endbr instruction and no note in the foo-cet.s. I'm using
> gcc-8.3. I do get an
> 
>   .section .note.GNU-stack,"",@progbits
> 
> corresponding to Nettle's ASM_MARK_NOEXEC_STACK
> 
> > --- a/config.m4.in
> > +++ b/config.m4.in
> > @@ -8,6 +8,10 @@ define(, <@ASM_ALIGN_LOG@>)dnl
> >  define(, <@W64_ABI@>)dnl
> >  define(, <@ASM_RODATA@>)dnl
> >  define(, <@ASM_WORDS_BIGENDIAN@>)dnl
> > +define(,<@X86_ENDBR@>)dnl
> > +divert(1)
> > +@X86_GNU_PROPERTY@
> > +divert
> >  divert(1)
> >  @ASM_MARK_NOEXEC_STACK@
> >  divert
> 
> You can put the two properties in the same m4 divert. Also, please
> rename the autoconf substitutions with ASM_ prefix, and something more
> descriptive than X64_GNU_PROPERTY. E.g., ASM_X86_ENDBR and
> ASM_X86_MARK_CET.
> 
> > diff --git a/configure.ac b/configure.ac
> > index ba3ab7c6..e9ed630c 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -803,6 +803,82 @@ EOF
> >ASM_ALIGN_LOG="$nettle_cv_asm_align_log"
> >  fi
> >  
> > +dnl  Define
> > +dnl  1. X86_ENDBR for endbr32/endbr64.
> > +dnl  2. X86_GNU_PROPERTY to add a .note.gnu.property section to mark
> > +dnl  Intel CET support if needed.
> > +dnl.section ".note.gnu.property", "a"
> > +dnl.p2align POINTER-ALIGN
> > +dnl.long 1f - 0f
> > +dnl.long 4f - 1f
> > +dnl.long 5
> > +dnl 0:
> > +dnl.asciz "GNU"
> > +dnl 1:
> > +dnl.p2align POINTER-ALIGN
> > +dnl.long 0xc002
> > +dnl.long 3f - 2f
> > +dnl 2:
> > +dnl.long 3
> > +dnl 3:
> > +dnl.p2align POINTER-ALIGN
> > +dnl 4:
> 
> No need to repeat the definition in full in this comment. And as I think
> I've said before, I'm a bit surprised that it needs to be this verbose.
> 
> > +AC_CACHE_CHECK([if Intel CET is enabled],
> > +  [nettle_cv_asm_x86_intel_cet],
> > +  [AC_TRY_COMPILE([
> > +#ifndef __CET__
> > +#error Intel CET is not enabled
> > +#endif
> > +  ], [],
> > +  [nettle_cv_asm_x86_intel_cet=yes],
> > +  [nettle_cv_asm_x86_intel_cet=no])])
> > +if test "$nettle_cv_asm_x86_intel_cet" = yes; then
> > +  case $ABI in
> > +  32|standard)
> > +X86_ENDBR=endbr32
> > +p2align=2
> > +;;
> > +  64)
> > +X86_ENDBR=endbr64
> > +p2align=3
> > +;;
> > +  x32)
> > +X86_ENDBR=endbr64
> > +p2align=2
> > +;;
> > +  esac
> > +  AC_CACHE_CHECK([if .note.gnu.property section is needed],
> > +[nettle_cv_asm_x86_gnu_property],
> > +[AC_TRY_COMPILE([
> > +#if !defined __ELF__ || !defined __CET__
> > +#error GNU property is not needed
> > +#endif
> > +], [],
> > +[nettle_cv_asm_x86_gnu_property=yes],
> > +[nettle_cv_asm_x86_gnu_property=no])])
> > +else
> > +  nettle_cv_asm_x86_gnu_property=no
> > +fi
> > +if test "$nettle_cv_asm_x86_gnu_property" = yes; then
> > +  X86_GNU_PROPERTY="
> > +   .section \".note.gnu.property\", \"a\"
> > +   .p2align $p2align
> > +   .long 1f - 0f
> > +   .long 4f - 1f
> > +   .long 5
> > +0:
> > +   .asciz \"GNU\"
> > +1:
> > +   .p2align $p2align
> > +   .long 0xc002
> > +   .long 3f - 2f
> > +2:
> > +   .long 3
> > +3:
> > +   .p2align $p2align
> >

Re: rsa too slow?

2019-12-02 Thread Simo Sorce
On Mon, 2019-12-02 at 13:24 +0100, Nikos Mavrogiannopoulos wrote:
> Hi,
>  I got pinged by someone testing the performance of TLS handshakes and
> it seems that gnutls/nettle with RSA is significantly slower than
> openssl. On the other hand, secp256r1 and ed25519 are faster. (btw.
> both openssl and gnutls/nettle are slower than rusttls).

FYI last time I checked rusttls it does not employ any countermeasure,
not even blinding, easy to be fast that way.

>  Nevertheless
> the RSA caught my attention because I had the impression that nettle
> was at some point equivalent if not faster. I see that the hogweed
> benchmark values in nettle show a 3x difference in signing for the TR
> version and ~2x for the unprotected. Going back to 3.1 did not affect
> that. Was that always the case? If not any ideas what could have
> caused that? Did we miss some optimizations? (from a quick review of
> openssl' RSA code, I see that smooth CRT RSA was added relatively
> recently, but could that get such a big performance benefit?)

Would you be able to measure OpenSSL's RSA from a release before the
smooth CRt was added ?

> name size   sign/ms verify/ms
>  rsa 20480.8881   27.1422
>rsa (openssl) 20481.4249   45.2295
> 
>   rsa-tr 20480.4257   29.1152
> rsa-tr (openssl) 20481.3735   46.1692
> 
> regards,
> Nikos
> ___
> nettle-bugs mailing list
> nettle-bugs@lists.lysator.liu.se
> http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

-- 
Simo Sorce
RHEL Crypto Team
Red Hat, Inc




___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Nettle ECC code question

2019-10-31 Thread Simo Sorce
On Thu, 2019-10-31 at 15:07 +0300, Dmitry Eremin-Solenikov wrote:
> Hello,
> 
> I've noticed the following typical code sequence:
> 
> ecc_modp_mul(ecc, t, x, y);
> cy = mpn_sub_n(dest, t, ecc->p.m, ecc->p.size);
> cnd_copy(cy, dest, t, ecc->p.size);
> 
> What is the benefit of this piece of code over the following one?
> 
> ecc_modp_mul(ecc, t, x, y);
> memcpy(dest, t, ecc->p.size * sizeof(mp_limb_t));
> 
> Does mpn_sun_n/cnd_copy add any form of side channel attach protection?

cnd_copy provides side-channel protection when you want to make the
copy conditional.

In this case the copy is conditional to the carry being returned from
the subtraction, your code does not look equivalent.

Simo.

-- 
Simo Sorce
RHEL Crypto Team
Red Hat, Inc




___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Add check for ECC at point 0

2019-07-02 Thread Simo Sorce
On Tue, 2019-07-02 at 22:12 +0200, Niels Möller wrote:
> Simo Sorce  writes:
> 
> > On Wed, 2019-05-15 at 10:48 -0400, Simo Sorce wrote:
> > > On Wed, 2019-05-15 at 11:42 +0200, Niels Möller wrote:
> > > > Simo Sorce  writes:
> > > > 
> > > > > Attached find patch that adds points checks to the ECDH test case.
> > > > > Let me know if that's ok or if you prefer a whole new test.
> 
> Merged now.

Thank you!

-- 
Simo Sorce
RHEL Crypto Team
Red Hat, Inc




___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: ANNOUNCE: Nettle-3.5

2019-06-26 Thread Simo Sorce
On Wed, 2019-06-26 at 11:23 +0200, Niels Möller wrote:
> Nikos Mavrogiannopoulos  writes:
> 
> >  The x86_64/sha_ni directory is not included in the distribution, and
> > thus compilation fails when --enable-fat is provided. I update my
> > previous patch to add `make distcheck` to include enable-fat, so that
> > missing files from distribution are caught earlier.
> 
> Ouch. I'll have to make an updated release later today or tomorrow.
> 
> Thanks for finding it quickly. I've run at least a dozen of different
> tests on the rc1 tarball, but apparently none with x86_64 --enable-fat
> :-(
> 
> And problem is of course not limited to fat builds, it's just that
> --enable-fat happens to make it visible at compile time.

Niels, it may be nice to use gitlab and it's CI features to have tests
with all configurations to catch these things.

Simo.

-- 
Simo Sorce
RHEL Crypto Team
Red Hat, Inc




___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Add check for ECC at point 0

2019-05-17 Thread Simo Sorce
On Fri, 2019-05-17 at 08:47 +0200, Nikos Mavrogiannopoulos wrote:
> On Wed, 2019-05-15 at 10:48 -0400, Simo Sorce wrote:
> > On Wed, 2019-05-15 at 11:42 +0200, Niels Möller wrote:
> > > Simo Sorce  writes:
> > > 
> > > > Attached find patch that adds points checks to the ECDH test
> > > > case.
> > > > Let me know if that's ok or if you prefer a whole new test.
> > > 
> > > I think it's ok to have it in the same file.
> > > 
> > > > -static void
> > > > -set_point (struct ecc_point *p,
> > > > -  const char *x, const char *y)
> > > > +static int
> > > > +ret_set_point (struct ecc_point *p,
> > > > +   const char *x, const char *y)
> > > >  {
> > > 
> > > I think it's nicer to just change set_point to return int, and wrap
> > > all existing calls in ASSERT, e.g,
> > > 
> > > -  set_point (, ax, ay);
> > > +  ASSERT (set_point (, ax, ay));
> > > 
> > > in test_dh. Or name functions as int set_point(...), void
> > > set_point_or_die (...), but I think ASSERT is still clearer, in
> > > this
> > > case.
> > 
> > Ok, will change.
> > 
> > > > +  test_public_key ("(0,0) with secp-192r1", &_nettle_secp_192r1,
> > > > "0", "0", 0);
> > > > +  test_public_key (
> > > > +"(P,0) with secp-192r1", &_nettle_secp_192r1,
> > > > +"6277101735386680763835789423207666416083908700390324961279"
> > > > ,
> > > > +"0", 0);
> > > 
> > > Any particular reason the tests are all for secp_192r1 ?
> > 
> > Less copy-pasting as the numbers are smaller, the curve used really
> > makes no difference.
> > 
> > Nioks,
> > is the fact we do not enable 192r1 in some distribution a problem?
> 
> I replied in private previously,

sorry, never received that reply.

>  making a point that in fedora we
> remove the code and disable everything but secp256r1, 384r1 and 521r1.
> So any tests that use 192r1 or 224r1 will not be executed at all in
> that platform.

Understood, are you asking to add some tests with other curves ?

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Add check for ECC at point 0

2019-05-16 Thread Simo Sorce
On Wed, 2019-05-15 at 10:48 -0400, Simo Sorce wrote:
> On Wed, 2019-05-15 at 11:42 +0200, Niels Möller wrote:
> > Simo Sorce  writes:
> > 
> > > Attached find patch that adds points checks to the ECDH test case.
> > > Let me know if that's ok or if you prefer a whole new test.
> > 
> > I think it's ok to have it in the same file.
> > 
> > > -static void
> > > -set_point (struct ecc_point *p,
> > > -const char *x, const char *y)
> > > +static int
> > > +ret_set_point (struct ecc_point *p,
> > > +   const char *x, const char *y)
> > >  {
> > 
> > I think it's nicer to just change set_point to return int, and wrap
> > all existing calls in ASSERT, e.g,
> > 
> > -  set_point (, ax, ay);
> > +  ASSERT (set_point (, ax, ay));
> > 
> > in test_dh. Or name functions as int set_point(...), void
> > set_point_or_die (...), but I think ASSERT is still clearer, in this
> > case.
> 
> Ok, will change.

Attached patch that does this.
Nothing else was changed.

> > > +  test_public_key ("(0,0) with secp-192r1", &_nettle_secp_192r1, "0", 
> > > "0", 0);
> > > +  test_public_key (
> > > +"(P,0) with secp-192r1", &_nettle_secp_192r1,
> > > +"6277101735386680763835789423207666416083908700390324961279",
> > > +"0", 0);
> > 
> > Any particular reason the tests are all for secp_192r1 ?
> 
> Less copy-pasting as the numbers are smaller, the curve used really
> makes no difference.
> 
> Nioks,
> is the fact we do not enable 192r1 in some distribution a problem?
> 
> Simo.
> 

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

From 458e69336f89e818580d3b1674a560ea39880c14 Mon Sep 17 00:00:00 2001
From: Simo Sorce 
Date: Mon, 13 May 2019 15:24:56 -0400
Subject: [PATCH] Add tests that exercise public key checks for ECDH

When performing ECDH the peer provided public key needs to be checked
for validity. FIPS requires basic tests be performed to insure the
provided points are in fact on the selected curve. Those checks already
exists in the ecc_point_set() function.
Add an explicit test that checks the boundaries so that any regression
in checks will be caught.

Signed-off-by: Simo Sorce 
---
 testsuite/ecdh-test.c | 58 ---
 1 file changed, 49 insertions(+), 9 deletions(-)

diff --git a/testsuite/ecdh-test.c b/testsuite/ecdh-test.c
index 2bfffd68..0b39319d 100644
--- a/testsuite/ecdh-test.c
+++ b/testsuite/ecdh-test.c
@@ -31,20 +31,21 @@
 
 #include "testutils.h"
 
-static void
-set_point (struct ecc_point *p,
-	   const char *x, const char *y)
+static int
+set_point (struct ecc_point *p, const char *x, const char *y)
 {
   mpz_t X, Y;
+  int ret;
+
   mpz_init_set_str (X, x, 0);
   mpz_init_set_str (Y, y, 0);
-  if (!ecc_point_set (p, X, Y))
-die ("Test point not on curve!\n");
+  ret = ecc_point_set (p, X, Y);
 
   mpz_clear (X);
   mpz_clear (Y);
+  return ret;
 }
-  
+
 static void
 set_scalar (struct ecc_scalar *s,
 	const char *x)
@@ -102,15 +103,15 @@ test_dh (const char *name, const struct ecc_curve *ecc,
   ecc_scalar_init (_priv, ecc);
   set_scalar (_priv, a_priv);
   ecc_point_init (, ecc);
-  set_point (, ax, ay);
+  ASSERT (set_point (, ax, ay));
 
   ecc_scalar_init (_priv, ecc);
   set_scalar (_priv, b_priv);
   ecc_point_init (, ecc);
-  set_point (, bx, by);
+  ASSERT (set_point (, bx, by));
 
   ecc_point_init (, ecc);
-  set_point (, sx, sy);
+  ASSERT (set_point (, sx, sy));
 
   ecc_point_init (, ecc);
 
@@ -135,9 +136,48 @@ test_dh (const char *name, const struct ecc_curve *ecc,
   ecc_point_clear ();  
 }
 
+static void
+test_public_key (const char *label, const struct ecc_curve *ecc,
+ const char *x, const char *y, int expect_success)
+{
+  struct ecc_point P;
+  int ret;
+
+  ecc_point_init (, ecc);
+  ret = set_point (, x, y);
+
+  if (!ret && expect_success)
+die ("Test point '%s' not on curve!\n", label);
+
+  if (ret && !expect_success)
+die ("Expected failure to set point '%s'!", label);
+
+  ecc_point_clear ();
+}
+
 void
 test_main(void)
 {
+  test_public_key ("(0,0) with secp-192r1", &_nettle_secp_192r1, "0", "0", 0);
+  test_public_key (
+"(P,0) with secp-192r1", &_nettle_secp_192r1,
+"6277101735386680763835789423207666416083908700390324961279",
+"0", 0);
+  test_public_key (
+"(0,P) with secp-192r1", &_nettle_secp_192r1, "0",
+"6277101735386680763835789423207666416083908700390324961279",
+0);
+  test_public_key (
+"(P,P) with secp-192r

Re: Add check for ECC at point 0

2019-05-15 Thread Simo Sorce
On Wed, 2019-05-15 at 11:42 +0200, Niels Möller wrote:
> Simo Sorce  writes:
> 
> > Attached find patch that adds points checks to the ECDH test case.
> > Let me know if that's ok or if you prefer a whole new test.
> 
> I think it's ok to have it in the same file.
> 
> > -static void
> > -set_point (struct ecc_point *p,
> > -  const char *x, const char *y)
> > +static int
> > +ret_set_point (struct ecc_point *p,
> > +   const char *x, const char *y)
> >  {
> 
> I think it's nicer to just change set_point to return int, and wrap
> all existing calls in ASSERT, e.g,
> 
> -  set_point (, ax, ay);
> +  ASSERT (set_point (, ax, ay));
> 
> in test_dh. Or name functions as int set_point(...), void
> set_point_or_die (...), but I think ASSERT is still clearer, in this
> case.

Ok, will change.

> > +  test_public_key ("(0,0) with secp-192r1", &_nettle_secp_192r1, "0", "0", 
> > 0);
> > +  test_public_key (
> > +"(P,0) with secp-192r1", &_nettle_secp_192r1,
> > +"6277101735386680763835789423207666416083908700390324961279",
> > +"0", 0);
> 
> Any particular reason the tests are all for secp_192r1 ?

Less copy-pasting as the numbers are smaller, the curve used really
makes no difference.

Nioks,
is the fact we do not enable 192r1 in some distribution a problem?

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Add check for ECC at point 0

2019-05-13 Thread Simo Sorce
On Mon, 2019-05-13 at 08:44 -0400, Simo Sorce wrote:
> On Sat, 2019-05-11 at 10:00 +0200, Niels Möller wrote:
> > Simo Sorce  writes:
> > 
> > > While reviewing FIPS requirements for public key checks in Ephemeral
> > > Diffie-Hellman key exchanges it came out that FIPS requires checks that
> > > the public key point is not the (0, 0) coordinate and nettle is not
> > > doing it (only checks that neither point is negative.
> > 
> > ecc_point_set also checks that the point is on the curve, i.e.,
> > satisfies the curve equation. That should rule out (0, 0), except if we
> > have some curve with constant term b == 0, which I don't think makes
> > sense.
> 
> Ah you are right the later check would catch it.
> I was just following the checks FIPS explicitly requires in order and
> didn't think about that ...
> 
> > Not sure how FIPS requirements are formulated, but maybe it would be
> > better to add a test case to check that ecc_point_set rejects (0,0) ?
> 
> Yes, I will drop my patch and add a test case.

Attached find patch that adds points checks to the ECDH test case.
Let me know if that's ok or if you prefer a whole new test.

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

From a720da816f238e2a51747ab3df22715d8f134453 Mon Sep 17 00:00:00 2001
From: Simo Sorce 
Date: Mon, 13 May 2019 15:24:56 -0400
Subject: [PATCH] Add tests that exercise public key checks for ECDH

When performing ECDH the peer provided public key needs to be checked
for validity. FIPS requires basic tests be performed to insure the
provided points are in fact on the selected curve. Those checks already
exists in the ecc_point_set() function.
Add an explicit test that checks the boundaries so that any regression
in checks will be caught.

Signed-off-by: Simo Sorce 
---
 testsuite/ecdh-test.c | 61 ++-
 1 file changed, 55 insertions(+), 6 deletions(-)

diff --git a/testsuite/ecdh-test.c b/testsuite/ecdh-test.c
index 2bfffd68..9a029c61 100644
--- a/testsuite/ecdh-test.c
+++ b/testsuite/ecdh-test.c
@@ -31,20 +31,30 @@
 
 #include "testutils.h"
 
-static void
-set_point (struct ecc_point *p,
-	   const char *x, const char *y)
+static int
+ret_set_point (struct ecc_point *p,
+   const char *x, const char *y)
 {
   mpz_t X, Y;
+  int ret;
+
   mpz_init_set_str (X, x, 0);
   mpz_init_set_str (Y, y, 0);
-  if (!ecc_point_set (p, X, Y))
-die ("Test point not on curve!\n");
+  ret = ecc_point_set (p, X, Y);
 
   mpz_clear (X);
   mpz_clear (Y);
+  return ret;
+}
+
+static void
+set_point (struct ecc_point *p,
+	   const char *x, const char *y)
+{
+  if (!ret_set_point (p, x, y))
+die ("Test point not on curve!\n");
 }
-  
+
 static void
 set_scalar (struct ecc_scalar *s,
 	const char *x)
@@ -135,9 +145,48 @@ test_dh (const char *name, const struct ecc_curve *ecc,
   ecc_point_clear ();  
 }
 
+static void
+test_public_key (const char *label, const struct ecc_curve *ecc,
+ const char *x, const char *y, int expect_success)
+{
+  struct ecc_point P;
+  int ret;
+
+  ecc_point_init (, ecc);
+  ret = ret_set_point (, x, y);
+
+  if (!ret && expect_success)
+die ("Test point '%s' not on curve!\n", label);
+
+  if (ret && !expect_success)
+die ("Expected failure to set point '%s'!", label);
+
+  ecc_point_clear ();
+}
+
 void
 test_main(void)
 {
+  test_public_key ("(0,0) with secp-192r1", &_nettle_secp_192r1, "0", "0", 0);
+  test_public_key (
+"(P,0) with secp-192r1", &_nettle_secp_192r1,
+"6277101735386680763835789423207666416083908700390324961279",
+"0", 0);
+  test_public_key (
+"(0,P) with secp-192r1", &_nettle_secp_192r1, "0",
+"6277101735386680763835789423207666416083908700390324961279",
+0);
+  test_public_key (
+"(P,P) with secp-192r1", &_nettle_secp_192r1,
+"6277101735386680763835789423207666416083908700390324961279",
+"6277101735386680763835789423207666416083908700390324961279",
+0);
+  test_public_key ("(1,2) with secp-192r1", &_nettle_secp_192r1, "1", "2", 0);
+  test_public_key ("(X,Y) with secp-192r1", &_nettle_secp_192r1,
+"1050363442265225480786760666329560655512990381040021438562",
+"5298249600854377235107392014200406283816103564916230704184",
+1);
+
   test_dh ("secp-192r1", &_nettle_secp_192r1,
 	   "3406157206141798348095184987208239421004566462391397236532",
 	   "1050363442265225480786760666329560655512990381040021438562",
-- 
2.21.0

___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Add check for ECC at point 0

2019-05-13 Thread Simo Sorce
On Sat, 2019-05-11 at 10:00 +0200, Niels Möller wrote:
> Simo Sorce  writes:
> 
> > While reviewing FIPS requirements for public key checks in Ephemeral
> > Diffie-Hellman key exchanges it came out that FIPS requires checks that
> > the public key point is not the (0, 0) coordinate and nettle is not
> > doing it (only checks that neither point is negative.
> 
> ecc_point_set also checks that the point is on the curve, i.e.,
> satisfies the curve equation. That should rule out (0, 0), except if we
> have some curve with constant term b == 0, which I don't think makes
> sense.

Ah you are right the later check would catch it.
I was just following the checks FIPS explicitly requires in order and
didn't think about that ...

> Not sure how FIPS requirements are formulated, but maybe it would be
> better to add a test case to check that ecc_point_set rejects (0,0) ?

Yes, I will drop my patch and add a test case.

Simo. 

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Add check for ECC at point 0

2019-05-06 Thread Simo Sorce
While reviewing FIPS requirements for public key checks in Ephemeral
Diffie-Hellman key exchanges it came out that FIPS requires checks that
the public key point is not the (0, 0) coordinate and nettle is not
doing it (only checks that neither point is negative.

Add this check as we never want to allow this point in any case.

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

From e64d88c0144c22148acb9a24a277c587f084240b Mon Sep 17 00:00:00 2001
From: Simo Sorce 
Date: Mon, 6 May 2019 10:47:49 -0400
Subject: [PATCH] Check ECC coordinates are not zero

Signed-off-by: Simo Sorce 
---
 ecc-point.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/ecc-point.c b/ecc-point.c
index 31e3115a..5c4b782a 100644
--- a/ecc-point.c
+++ b/ecc-point.c
@@ -57,12 +57,16 @@ ecc_point_set (struct ecc_point *p, const mpz_t x, const mpz_t y)
   mp_size_t size;  
   mpz_t lhs, rhs;
   mpz_t t;
+  int x_sgn, y_sgn;
   int res;
 
   size = p->ecc->p.size;
-  
-  if (mpz_sgn (x) < 0 || mpz_limbs_cmp (x, p->ecc->p.m, size) >= 0
-  || mpz_sgn (y) < 0 || mpz_limbs_cmp (y, p->ecc->p.m, size) >= 0)
+  x_sgn = mpz_sgn (x);
+  y_sgn = mpz_sgn (y);
+
+  if ((x_sgn == 0 && y_sgn == 0) ||
+  x_sgn < 0 || mpz_limbs_cmp (x, p->ecc->p.m, size) >= 0
+  || y_sgn < 0 || mpz_limbs_cmp (y, p->ecc->p.m, size) >= 0)
 return 0;
 
   mpz_init (lhs);
-- 
2.20.1

___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Intel CET protection

2019-04-27 Thread Simo Sorce
Oh sorry I did not see this email and the previous before sending my
new patches.

About git, generally git am wants to be applied in order, but I think
there may be a fuzz option to git too, never really investigated as I
usually apply all patches to a branch (or pull that branch directly
from PR branches) and then git cherry-pick if I want a specific patch
on master.

On Sat, 2019-04-27 at 09:29 +0200, Niels Möller wrote:
> Simo Sorce  writes:
> 
> > Ok attached find new patches,
> > they address all concerns except for adding the CET_SECTION macro
> > automagically to all asm files.
> 
> Ah, one more thing:
> 
> > +define(,
> > + > +<.pushsection .note.gnu.property,"a"
> 
> How portable is .pushsection? If we ensure that notes are last, plain
> .section should be enough, I think.

No it needs to be .pushsection, when I was using just .section the
alignment of the property was incorrect (16 instead of 8) and glibc
gurus told me to use .pushsection to properly deal with that.
If you have/want CET you also have a modern enough GCC and Assembler
that support -fcf-protection and .pushsection so I do not think it is a
problem, if you are not using NGU As, you won't be using --enable-cet-
protection either

> > --- a/x86_64/sha3-permute.asm
> > +++ b/x86_64/sha3-permute.asm
> > @@ -107,6 +107,7 @@ define(, <
> > 
> > C sha3_permute(struct sha3_state *ctx)
> > .text
> > +GNU_CET_SECTION()
> > ALIGN(16)
> >  PROLOGUE(nettle_sha3_permute)
> > W64_ENTRY(1, 16)
> 
> This placement between .text and the prologue depends on .pushsection /
> .popsection. I think it should be moved last, just like in the other
> files, either explicitly or by means of a divert in some of the included
> m4 files.

Yes I had already caught this mistake, it is fixed in the patch series
I sent a few minutes ago.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Intel CET protection

2019-04-27 Thread Simo Sorce
On Fri, 2019-04-26 at 22:31 +0200, Niels Möller wrote:
> Simo Sorce  writes:
> 
> > I understand this is not a high bar, and I will fold the segment note
> > in if you think that is what we should do, but I wanted to make you
> > aware of why I did not do the same as what we do with the stack note.
> 
> I think we should aim to make all files "cet-compliant" if we do it att
> all. After all, the common case is to have a libnettle.so, and then any
> single object file missing the annotation will make the linker disable
> the feature, if I've understood it correctly. I agree it should be
> disabled by default until we're more confident in test coverage.

Yes we should have all files CET compliant, the idea of having a
distinct macro was to make it easy to catch submissions of new files
that lack it.

> BTW, do you know how that works with late loading using dlopen? One
> could have a process running in CET-mode, which dynamically loads an
> so-file with code lacking endbr instructions and corresponding
> annotation.

Yes I had the same question so I asked to our glibc gurus. The only
protection that is problematic is IBT as it requires insertion of new
instructions, so there is a mode where you can mark specific pages so
that the CPU will ignore missing endbr instruction exclusively for
executable code on those pages. So you can load an entire library via
dlopen() on specially marked pages and code running on those will have
no protection while the rest of the code will do. It is an all-or
nothing at the library level at that point.
It is not clear to me if this is fully supported today in glibc yet.

I guess in theory this could be done for individual libraries brought
in at execution time by the dynamic linker, but I think that is not
done and instead the whole binary is either enabled or disabled
currently.

> If we think about it as an arch-specific thing, which I guess we should,
> then maybe the m4 divert should be in x86_64/machine.m4 and
> x86/machine.m4, not asm.m4.

Makes sense, I can move it, do you still want it to be automagically
added to all assembly files?

> > That is funny, I actually used CET_ENDBR to make it easier to find for
> > others grepping as binutils also uses a _CET_ENDBR macro, it sounded
> > more consistent
> 
> I agree your name is better for readability, even if less amusing.
> 
> > > I'd like to understand what's missing. Maybe we can fix it more
> > > explicitly? 
> > 
> > I do not think we can easily fix it manually, it is mostly other
> > section notes that the gcc compiler adds when it fortifies C code. But
> > those notes do not really apply to handcrafted assembly.
> 
> [...]
> > So this flag
> > is basically just saying to the compiler that it should add whatever is
> > appropriate (which may change depending on compiler flags) because our
> > code is good as is.
> 
> Since the command line flag is passed with -Wa, it tells the *assembler*
> to add notes.

True.

>  Which ones? Is it based only on the command line, say,
> $(CFLAGS) contains -fharden-foo makes the assembler produces a "foo"
> note. Or is it based on what's actually in the assembly input file?

It is not based on what is in the assembly input file afaik.
It generates "GNU Build Attribute" notes.

readelf show a single additional Owner attribute named GA$3a1
with Data size always set to 0x10 and applied to each section of the
library corresponding to assembly files that cover the memory range
occupied by said library.

I have no more information than that at the moment, but I can ask if
you want me to dig through it.

> Is there a risk that it automatically generates a note promising
> something about the assembly code which we don't actually fulfill?

I was told that's not the case, I think it just sets bare notes that
basically do not assert anything specific, it just makes the tools that
check for those notes happy.

> Is there any documentation? I see that it is mentioned in the binutils-2.31
> release announcement, but I've not found it mentioned in the Gas manual.

Couldn't find anything either. Only some fedora wiki change proposal
page that mentions them.

> > https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
> > (I do no have a better link)
> 
> Looks like reasonable doc. (Closest on wikpedia seems to be
> https://en.wikipedia.org/wiki/Control-flow_integrity).

Yeah I found this page too on wikipedia but is is it too generic (it
also referenced the link above)

> > See above explanation and let me know if that changes your opinion,
> > otherwise I will do this.
> > 
> > > > +ALIGN(8)
> > > > +.long 1f - 0f
> > > > +.long 4f - 1f
> > &g

Re: Intel CET protection

2019-04-26 Thread Simo Sorce
Ok attached find new patches,
they address all concerns except for adding the CET_SECTION macro
automagically to all asm files.
I also added a patch to deal with the missing epilogues.

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

From 8c41a89ed3ef913bc8a12f8e6d21edf3627007ee Mon Sep 17 00:00:00 2001
From: Simo Sorce 
Date: Tue, 23 Apr 2019 18:03:35 -0400
Subject: [PATCH 1/3] Add Intel CET protection support

In upcoming processors Intel will make available Control-Flow
Enforcement Technology, which is comprised of two hardware
countermeasures against Return-Oriented Programming attacks.

The first is called Shadow Stack and checks that return from function
calls are not tampered with by keeping a shadow stack that cannot be
modified by applications. This measure requires no code changes (except
for code that intentionally modifies the return pointer on the stack).

The second is called Indirect Branch Tracking and is used to insure only
targets of indirect jumps are actually jumped to. This requires
modification of code to insert a special instruction that identifies a
valid indirect jump target. When enforcement is turned on, if an indirect
jump does not end on this special instruction the cpu raises an exception.
These instructions are noops on older CPU models so it is safe to use
them in all x86(_64) code.

To enable these protections GCC also introduces a new GNU property note
section that marks a piece of code as CET ready.
If the note is in place the dynamic linker will be able to confirm that
all loaded libraries support CET and will turn on CET protection for the
binary.

The changes here consist mostly in adding the GNU property note section
to all x86(_64) assembly files and the proper ENDBRANCH instruction for
the function entrypoints which is where other code calls into via
indirect call.

Signed-off-by: Simo Sorce 
---
 asm.m4| 36 ++-
 config.m4.in  |  2 ++
 configure.ac  | 17 +
 x86/aes-decrypt-internal.asm  |  1 +
 x86/aes-encrypt-internal.asm  |  1 +
 x86/arcfour-crypt.asm |  1 +
 x86/camellia-crypt-internal.asm   |  1 +
 x86/md5-compress.asm  |  1 +
 x86/sha1-compress.asm |  1 +
 x86_64/aes-decrypt-internal.asm   |  1 +
 x86_64/aes-encrypt-internal.asm   |  1 +
 x86_64/aesni/aes-decrypt-internal.asm |  1 +
 x86_64/aesni/aes-encrypt-internal.asm |  1 +
 x86_64/camellia-crypt-internal.asm|  1 +
 x86_64/chacha-core-internal.asm   |  1 +
 x86_64/fat/cpuid.asm  |  2 +-
 x86_64/gcm-hash8.asm  |  1 +
 x86_64/md5-compress.asm   |  1 +
 x86_64/memxor.asm |  1 +
 x86_64/memxor3.asm|  1 +
 x86_64/poly1305-internal.asm  |  1 +
 x86_64/salsa20-core-internal.asm  |  1 +
 x86_64/salsa20-crypt.asm  |  1 +
 x86_64/serpent-decrypt.asm|  2 ++
 x86_64/serpent-encrypt.asm|  2 ++
 x86_64/sha1-compress.asm  |  1 +
 x86_64/sha256-compress.asm|  1 +
 x86_64/sha3-permute.asm   |  1 +
 x86_64/sha512-compress.asm|  1 +
 x86_64/sha_ni/sha1-compress.asm   |  1 +
 x86_64/sha_ni/sha256-compress.asm |  1 +
 x86_64/umac-nh-n.asm  |  1 +
 x86_64/umac-nh.asm|  1 +
 33 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/asm.m4 b/asm.m4
index 8da47201..9622906e 100644
--- a/asm.m4
+++ b/asm.m4
@@ -32,7 +32,8 @@ define(,<>)dnl
 define(,
 <.globl C_NAME($1)
 DECLARE_FUNC(C_NAME($1))
-C_NAME($1):>)
+C_NAME($1):
+CET_ENDBR>)
 
 define(,
 ,
   m4exit(1)>)>)
 define(, , <$1>)>)
 
+dnl GNU properties section to enable CET protections macros
+
+dnl GNU Poperty type
+define(, <5>)
+dnl GNU Program Property Type range
+define(GNU_PROPERTY_X86_UINT32_AND_LO, <0xc002>)
+dnl Indirect Branch Tracking
+define(, <0x01>)
+dnl Shadow Stack
+define(, <0x02>)
+
+dnl NOTE: GNU Property sections MUST have alignment of 8
+define(,
+,<>)>)
+
 dnl Struct defining macros
 
 dnl STRUCTURE(prefix) 
diff --git a/config.m4.in b/config.m4.in
index 11f90a40..c3ebad60 100644
--- a/config.m4.in
+++ b/config.m4.in
@@ -8,6 +8,8 @@ define(, <@ASM_ALIGN_LOG@>)dnl
 define(, <@W64_ABI@>)dnl
 define(, <@ASM_RODATA@>)dnl
 define(, <@ASM_WORDS_BIGENDIAN@>)dnl
+define(, <@ASM_CET_PROTECTION@>)dnl
+define(, <@ASM_CET_ENDBR@>)dnl
 divert(1)
 @ASM_MARK_NOEXEC_STACK@
 divert
diff --git a/configure.ac b/configure.ac
index 00d2bf5d..6e12bb1f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -93,6 +93,10 @@ AC_ARG_ENABLE(mini-gmp,
   AC_HELP_STRING([--enable-mini-gmp], [Enable mini-gmp, used instead of libgmp.]),,
   [enable_mini_gmp=no])
 
+AC_ARG_ENABLE(cet-protection,
+  AC_HELP_STRING([--enable-cet-protection], [Enab

Re: Intel CET protection

2019-04-26 Thread Simo Sorce
On Fri, 2019-04-26 at 10:45 +0200, Niels Möller wrote:
> Simo Sorce  writes:
> 
> > the attached patches have been used to successfully enable and test
> > Intel CET support in an Intel emulator on SDV hardware.
> 
> Thanks.
> 
> > GCC already has all the needed support to create CET hardened code,
> > however the hand-coded assembly needs to be changed to conform.
> > Without these changes all the binaries that load nettle will otherwise
> > have CET disabled, as it is an all-or-nothing at the binary level and
> > missing ENDBRANCH instruction cause the program to terminate on
> > indirect jump/call instructions.
> 
> For the .note.gnu.property thing (which is per object file, right?), I
> think it's better to do it in the same place as ASM_MARK_NOEXEC_STACK.
> That's substituted in config.m4.in, and added at the end of each asm file
> using m4 divert.

So the reason I did not do that an instead explicitly added a statement
in the individual .asm files is that you cannot just add the note on
any random file.
The note states that the file uses CET instructions for all indirect
jump targets.
So, it just so happens now that no indirect jumps (except for the
target of the initial call into the functions) are used in our
handcrafted assembly, but that is not a given in the future.

So want I mean to say is that I took an explicit macro to mean the file
was actually analyzed and made CET compliant.

If we embed the macro magically instead this "inspection" part will be
missing and we may end up marking assembly files that do nbot in fact
comply with CET and will make the program segfault (that's how the
exception is surfaced) when run.

I understand this is not a high bar, and I will fold the segment note
in if you think that is what we should do, but I wanted to make you
aware of why I did not do the same as what we do with the stack note.

> The endbr instructions in the prologue are in the right place, as far as
> I can tell. And I'd be very tempted to rename the macro CET_ENDBR to
> COME_FROM, see https://en.wikipedia.org/wiki/COMEFROM ;-)

That is funny, I actually used CET_ENDBR to make it easier to find for
others grepping as binutils also uses a _CET_ENDBR macro, it sounded
more consistent to use something with both "CET" and ENDBR in it,
easier to find for someone looking and figuring out hwat it is about.

> > The second patch is used to make the system happy when hardening flags
> > are enabled in gcc, as it generates the appropriate section information
> > that tells the linker all is good.
> 
> I'd like to understand what's missing. Maybe we can fix it more
> explicitly? 

I do not think we can easily fix it manually, it is mostly other
section notes that the gcc compiler adds when it fortifies C code. But
those notes do not really apply to handcrafted assembly. So this flag
is basically just saying to the compiler that it should add whatever is
appropriate (which may change depending on compiler flags) because our
code is good as is.
It silence some checking tools mostly, so I do not think it is worth
wasting too much time trying to figure out what notes should be added,
the compiler is happy to do it for us.

> > Finally while looking at the assembly I noticed that some functions
> > have a PROLOGUE() defined but not an EPILOGUE() macro defined in their
> > .asm files. It is unclear to me if this is an error or intentional so
> > didn't touch those, it doesn't affect functionality for this patch
> > anyway.
> 
> I think it's an error. Except in the files in arm/fat with lines like
> 
>   dnl PROLOGUE(_nettle_chacha_core) picked up by configure
> 
> I find three offending files, using
> 
>   $ diff <(git grep -c '^ *EPILOGUE') <(git grep -c '^ *PROLOGUE')
>   49c49
>   < x86_64/poly1305-internal.asm:2
>   ---
>   > x86_64/poly1305-internal.asm:3
>   51a52,53
>   > x86_64/serpent-decrypt.asm:1
>   > x86_64/serpent-encrypt.asm:1
> 
> have you seen any others?

No, those are pretty much the places where I noticed it.
Would you want an additional patch that adds those EPILOGUES ?

> Some minor comments below.
> 
> > From de1b9bfeb4f8ad9a6bf8608c4b8c727dba315982 Mon Sep 17 00:00:00 2001
> > From: Simo Sorce 
> > Date: Tue, 23 Apr 2019 18:03:35 -0400
> > Subject: [PATCH 1/2] Add Intel CET protection support
> > 
> > In upcoming processors Intel will make available Control-Flow
> > Enforcement Technology, which is comprised of two hardware
> > countermeasures against ROP based attacks.
> 
> Please spell out ROP. It would be good with a link to further
> information in some reasonable place in the code.

Should I add a link in asm.m4 to this ?

https://software.intel.com/sites/default/files/managed/4d/2a/con

Intel CET protection

2019-04-25 Thread Simo Sorce
Hello,
the attached patches have been used to successfully enable and test
Intel CET support in an Intel emulator on SDV hardware.

The patches are minimally intrusive and enable to use a future
countermeasure that is very useful as it makes ROP attacks very hard to
carry out.

GCC already has all the needed support to create CET hardened code,
however the hand-coded assembly needs to be changed to conform.
Without these changes all the binaries that load nettle will otherwise
have CET disabled, as it is an all-or-nothing at the binary level and
missing ENDBRANCH instruction cause the program to terminate on
indirect jump/call instructions.

The second patch is used to make the system happy when hardening flags
are enabled in gcc, as it generates the appropriate section information
that tells the linker all is good.

Unfortunately I do not have actual tests for this feature (one of the
reasons why it is behind a configure flag even though it is safe to add
the code on any x86 hardware) because the only real way to test this is
to run on hardware or emulators that cause the segfaults on errors. But
we can add a simple test later once hardware becomes available.

Finally while looking at the assembly I noticed that some functions
have a PROLOGUE() defined but not an EPILOGUE() macro defined in their
.asm files. It is unclear to me if this is an error or intentional so
didn't touch those, it doesn't affect functionality for this patch
anyway.

HTH,
Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

From de1b9bfeb4f8ad9a6bf8608c4b8c727dba315982 Mon Sep 17 00:00:00 2001
From: Simo Sorce 
Date: Tue, 23 Apr 2019 18:03:35 -0400
Subject: [PATCH 1/2] Add Intel CET protection support

In upcoming processors Intel will make available Control-Flow
Enforcement Technology, which is comprised of two hardware
countermeasures against ROP based attacks.

The first is called Shadow Stack and checks that return from function
calls are not tampered with by keeping a shadow stack that cannot be
modified by aplications. This measure requires no code changes (except
for code that intentionally modifes the return pointer on the stack).

The second is called Indirect Branch Tracking and is used to insure only
targets of indirect jumps are actually jumped to. This requires
modification of code to insert a special instruction that identifies a
valid indirect jump target. When enforcement is turned on, if an indirect
jump does not end on this special instruction the cpu raises an exception.
These instructions are noops on older CPU models so it is safe to use
them in all x86(_64) code.

To enable these protections gcc also inroduces a new GNU property note
section that marks a piece of code as CET ready.
If the note is in place the dynamic linker will be able to confirm that
all loaded libraries support CET and will turn on CET protection for the
binary.

The changes here consist mostly in adding the GNU property note section
to all x86(_64) assembly files and the proper ENDBRANCH instruction for
the function entrypoints which is where other code calls into via
indirect call.

Signed-off-by: Simo Sorce 
---
 asm.m4| 25 -
 config.m4.in  |  2 ++
 configure.ac  | 17 +
 x86/aes-decrypt-internal.asm  |  1 +
 x86/aes-encrypt-internal.asm  |  1 +
 x86/arcfour-crypt.asm |  1 +
 x86/camellia-crypt-internal.asm   |  1 +
 x86/md5-compress.asm  |  1 +
 x86/sha1-compress.asm |  1 +
 x86_64/aes-decrypt-internal.asm   |  4 +++-
 x86_64/aes-encrypt-internal.asm   |  3 ++-
 x86_64/aesni/aes-decrypt-internal.asm |  1 +
 x86_64/aesni/aes-encrypt-internal.asm |  1 +
 x86_64/camellia-crypt-internal.asm|  1 +
 x86_64/chacha-core-internal.asm   |  3 ++-
 x86_64/fat/cpuid.asm  |  2 +-
 x86_64/gcm-hash8.asm  |  1 +
 x86_64/md5-compress.asm   |  1 +
 x86_64/memxor.asm |  1 +
 x86_64/memxor3.asm|  1 +
 x86_64/poly1305-internal.asm  |  1 +
 x86_64/salsa20-core-internal.asm  |  1 +
 x86_64/salsa20-crypt.asm  |  1 +
 x86_64/serpent-decrypt.asm|  2 ++
 x86_64/serpent-encrypt.asm|  2 ++
 x86_64/sha1-compress.asm  |  1 +
 x86_64/sha256-compress.asm|  1 +
 x86_64/sha3-permute.asm   |  1 +
 x86_64/sha512-compress.asm|  1 +
 x86_64/sha_ni/sha1-compress.asm   |  1 +
 x86_64/sha_ni/sha256-compress.asm |  1 +
 x86_64/umac-nh-n.asm  |  1 +
 x86_64/umac-nh.asm|  1 +
 33 files changed, 79 insertions(+), 5 deletions(-)

diff --git a/asm.m4 b/asm.m4
index 8da47201..c6db52e5 100644
--- a/asm.m4
+++ b/asm.m4
@@ -32,7 +32,8 @@ define(,<>)dnl
 define(,
 <.globl C_NAME($1)
 DECLARE_FUNC(C_NAME($1))
-C_NAME($1):>)
+C_NAME($1)

Re: SIV-CMAC

2019-04-17 Thread Simo Sorce
On Wed, 2019-04-17 at 20:27 +0200, Nikos Mavrogiannopoulos wrote:
> +  static const union nettle_block16 const_zero = { .b = {
> + 0x00, 0x00, 0x00, 0x00,  0x00, 0x00, 0x00, 0x00,
> + 0x00, 0x00, 0x00, 0x00,  0x00, 0x00, 0x00, 0x00 }

You could save some space/eyes by using .b = 0 (assuming we can depend
on modern C99 semantics) or also just via .u64 = { 0, 0 }

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Implement XTS block cipher mode

2019-03-25 Thread Simo Sorce
On Sun, 2019-03-24 at 19:57 +0100, Niels Möller wrote:
> ni...@lysator.liu.se (Niels Möller) writes:
> 
> > Simo Sorce  writes:
> > 
> > > I am attaching all 3 patches anew as I also fixed the other issues you
> > > mentioned in a previous email.
> > 
> > Thanks. I'm about to merge. I've run cross-compile+qemu tests also on
> > big-endian mips, and it seems to work fine.
> 
> Merged to master branch now. Thanks!

Thanks a lot!
I see you also fixed the _ctx -> _key suffix in the docs, sorry for not
catching it myself, and thank you for dealing with it!

Cheers,
Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Implement XTS block cipher mode

2019-03-20 Thread Simo Sorce
On Wed, 2019-03-20 at 14:46 +0100, Niels Möller wrote:
> Simo Sorce  writes:
> 
> > On Wed, 2019-03-20 at 06:14 +0100, Niels Möller wrote:
> > > And another possible trick for big-endian is to do an "opposite-endian"
> > > left shift as
> > > 
> > >   ((x & 0x7f7f7f7f7f7f7f7f) << 1) | ((x & 0x8080808080808080) >> 15)
> > >   where this bit is the carry out ^
> > 
> > This would allow us to avoid copies at the cost of more complicated code.
> > 
> > Which do you prefer? using endian.h where available? Or having two
> > separate codepaths depending on the endianess of the machine ?
> 
> If it matters for performance, use the fastest variant. Using separate
> implementations of xts_shift, with #if:s depending on endianness and
> compiler support, is fine.
> 
> I'd expect the opposite-endian shift to be more efficient when bswap is
> particularly slow, and implemented in terms of shifting and masking.
> 
> A bit difficult to determine, though. Neither existence of endian.h
> macros or __builtin_bswap64 implies that the byte swapping is cheap. Are
> there any interesting platforms these days that lack an efficient bswap
> instruction? And are big-endian? Does mips have a bswap instruction?

In the end I went with the opposit-endian swapping solution and two
separate implementations for LE and BE.
My reasoning is that the compiler can definitely better optimize the LE
version and the BE version done this way is probably not slower than
using byteswapping even when optimized bswap is available.
The secondary reason is that I feel this version is more readable.

I am attaching all 3 patches anew as I also fixed the other issues you
mentioned in a previous email.
Namely improved the non-stealing case for encryption/decryption by
removing the duplicate last block handling, and changed the memclearing
memxor with a memset.

HTH,
Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

From cd6960b2bc8f380ad2784d0f45969382aa178298 Mon Sep 17 00:00:00 2001
From: Simo Sorce 
Date: Thu, 4 Oct 2018 14:38:50 -0400
Subject: [PATCH 1/3] Add support for XTS encryption mode

XEX encryption mode with tweak and ciphertext stealing (XTS) is
standardized in IEEE 1619 and generally used for storage devices.

Signed-off-by: Simo Sorce 
---
 Makefile.in|   5 +-
 nettle.texinfo | 147 ++-
 testsuite/.gitignore   |   1 +
 testsuite/.test-rules.make |   3 +
 testsuite/Makefile.in  |   2 +-
 testsuite/xts-test.c   | 173 +++
 xts-aes128.c   |  77 ++
 xts-aes256.c   |  77 ++
 xts.c  | 202 +
 xts.h  | 124 +++
 10 files changed, 806 insertions(+), 5 deletions(-)
 create mode 100644 testsuite/xts-test.c
 create mode 100644 xts-aes128.c
 create mode 100644 xts-aes256.c
 create mode 100644 xts.c
 create mode 100644 xts.h

diff --git a/Makefile.in b/Makefile.in
index 83250cf3..440de9f7 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -135,7 +135,8 @@ nettle_SOURCES = aes-decrypt-internal.c aes-decrypt.c \
 		 umac32.c umac64.c umac96.c umac128.c \
 		 version.c \
 		 write-be32.c write-le32.c write-le64.c \
-		 yarrow256.c yarrow_key_event.c
+		 yarrow256.c yarrow_key_event.c \
+		 xts.c xts-aes128.c xts-aes256.c
 
 hogweed_SOURCES = sexp.c sexp-format.c \
 		  sexp-transport.c sexp-transport-format.c \
@@ -206,7 +207,7 @@ HEADERS = aes.h arcfour.h arctwo.h asn1.h blowfish.h \
 	  pgp.h pkcs1.h pss.h pss-mgf1.h realloc.h ripemd160.h rsa.h \
 	  salsa20.h sexp.h \
 	  serpent.h sha.h sha1.h sha2.h sha3.h twofish.h \
-	  umac.h yarrow.h poly1305.h
+	  umac.h yarrow.h xts.h poly1305.h
 
 INSTALL_HEADERS = $(HEADERS) version.h @IF_MINI_GMP@ mini-gmp.h
 
diff --git a/nettle.texinfo b/nettle.texinfo
index 9806bdc1..e79cb08c 100644
--- a/nettle.texinfo
+++ b/nettle.texinfo
@@ -2001,7 +2001,8 @@ Book mode, @acronym{ECB}), leaks information.
 
 Besides @acronym{ECB}, Nettle provides several other modes of operation:
 Cipher Block Chaining (@acronym{CBC}), Counter mode (@acronym{CTR}), Cipher
-Feedback (@acronym{CFB} and @acronym{CFB8}) and a couple of @acronym{AEAD}
+Feedback (@acronym{CFB} and @acronym{CFB8}), XEX-based tweaked-codebook mode
+with ciphertext stealing (@acronym{XTS}) and a couple of @acronym{AEAD}
 modes (@pxref{Authenticated encryption}).  @acronym{CBC} is widely used, but
 there are a few subtle issues of information leakage, see, e.g.,
 @uref{http://www.kb.cert.org/vuls/id/958563, @acronym{SSH} @acronym{CBC}
@@ -2016,6 +2017,7 @@ authenticate the message.
 * CBC:: 
 * CTR:: 
 * CFB and CFB8::
+* XTS::
 @end menu
 
 @node CBC, CTR, Cipher modes, Cipher modes
@@ -2187,7 +21

Re: Implement XTS block cipher mode

2019-03-20 Thread Simo Sorce
On Wed, 2019-03-20 at 06:14 +0100, Niels Möller wrote:
> ni...@lysator.liu.se (Niels Möller) writes:
> 
> > 3. Big-endian system, no __builtin_bswap64. Here we can either use the
> >current code, with byte accesses only. Or attempt to define byteswap
> >without builtins and follow 2. I'd lean towards using the current
> >code, unless there's some system where something more complicated is
> >known to improve performance.
> 
> And another possible trick for big-endian is to do an "opposite-endian"
> left shift as
> 
>   ((x & 0x7f7f7f7f7f7f7f7f) << 1) | ((x & 0x8080808080808080) >> 15)
>   where this bit is the carry out ^

This would allow us to avoid copies at the cost of more complicated code.

Which do you prefer? using endian.h where available? Or having two
separate codepaths depending on the endianess of the machine ?

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Implement XTS block cipher mode

2019-03-20 Thread Simo Sorce
On Tue, 2019-03-19 at 22:36 +0100, Niels Möller wrote:
> Simo Sorce  writes:
> 
> > Just for curiosity, would it be ok change,
> > LE_READ_UINT64/LE_WRITE_UINT64 to use uint64_t and have the macro use
> > __bswap64 if available or fall back to the original unoptimized code ?
> 
> I think LE_READ_UINT64 and related macros should be left as is,
> and used for unaligned reads and writes.
> 
> For xts_shift, there are three cases, and I think they could be handled
> like this:
> 
> 1. Little-endian system:
>Use 64-bit reads and writes, no swaps needed.
> 
> 2. Big-endian system, __builtin_bswap64 detected (HAVE_BUILTIN_BSWAP64):
>Use 64-bit reads and writes + __builtin_bswap64.
> 
> 3. Big-endian system, no __builtin_bswap64. Here we can either use the
>current code, with byte accesses only. Or attempt to define byteswap
>without builtins and follow 2. I'd lean towards using the current
>code, unless there's some system where something more complicated is
>known to improve performance.
> 
> In case (1) or (2), use 64-bit operations exclusively, also for the
> carry logic. And then the macros and #ifdefs should be arranged to make
> it not too ugly. E.g., cases (1) and (2) could perhaps be combined, with
> a BSWAP64_IF_BE macro or the like. Or a macro like
> LE_READ_ALIGNED_UINT64 (with an uint64_t * argument)
> 
> Only current use of __builtin_bswap64 is in ctr.c, the ctr_fill16
> function.
> 
> An older example is in salsa20-core-internal.c, with a LE_SWAP32 macro.
> That one could take advantage of __builtin_bswap32, if available. I
> don't know if the compiler can recognize the current expression as a
> byte swap.
> 
> And it probably is more important to optimize xts_shift if/when one also
> arranges the code to produces multiple blocks of T_k values at a time
> (say, 32 blocks, 512 bytes of stack-allocated temporary storage),
> without reading the intermediate values back from memory to registers.
> That has been a significant optimization for both ctr mode and cbc
> decrypt.

So the way I would do this is by using the glibc macros le64toh() and
htole64(), they already handle to do the right thing depending on which
architecture we are on.

If it is ok to make configure if those macros are available
(/usr/include/endian.h) I would use them or provide replacements that
just call into the existing macros. 

> I haven't reviewed the new version of the patch yet, I hope to get to
> that in a day or two.
> 
> Regards,
> /Niels
> 

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Implement XTS block cipher mode

2019-03-19 Thread Simo Sorce
Hi Niels,
attached find two patches, comments inline.

On Tue, 2019-03-19 at 07:31 +0100, Niels Möller wrote:
> Simo Sorce  writes:
> 
> > New patch attached, the diff has also been applied as an additional commit 
> > to my xts_exploded tree in gitlab.
> 
> Thanks. Looks pretty good, a few more comments below.
> 
> Id din't look so closely at the tests, but it would be good with to test
> inplace opertion also with the aes-specific functions.
> 
> For performance (which we don't need to focus that much on for the
> initial version), one thing to do would be to fill a buffer with
> consecutive T_k values, and do the encrypt/decrypt and memxor operations
> on several blocks at a time.
> 
> > +For a plaintext length that is a perfect multiple of the XTS block size:
> > +@example
> > +T_1 = E_k2(IV) MUL a^0
> > +C_1 = E_k1(P_1 XOR T_1) XOR T_1
> > +
> > +@dots{}
> > +
> > +T_n = E_k2(IV) MUL a^(n-1)
> > +C_n = E_k1(P_n XOR T_n) XOR T_n
> > +@end example
> 
> Nit: I'd write the T updates as
> 
>  T_1 = E_k2(IV)
>  T_k = T_(k-1) MUL a
> 
> since this is how to implement it, and no powering operation is needed.

Ok changed it.

> > +The functions @var{encf} @var{decf} are of type
> > +
> > +@code{void f (void *@var{ctx}, size_t @var{length}, uint8_t *@var{dst},
> > +const uint8_t *@var{src})},
> 
> It seems the manual doesn't really have an entry for the
> nettle_cipher_func type. Maybe it should. Anyway, the type for the first
> argument is _const_ void *.

Fixed.

> > +/* shift one and XOR with 0x87. */
> > +/* src and dest can point to the same buffer for in-place operations */
> > +static void
> > +xts_shift(union nettle_block16 *dst,
> > +  const union nettle_block16 *src)
> > +{
> > +  uint8_t carry = src->b[15] >> 7;
> > +  uint64_t b0 = LE_READ_UINT64(src->b);
> > +  uint64_t b1 = LE_READ_UINT64(src->b+8);
> > +  b1 = (b1 << 1) | (b0 >> 63);
> > +  b0 = b0 << 1;
> > +  LE_WRITE_UINT64(dst->b, b0);
> > +  LE_WRITE_UINT64(dst->b+8, b1);
> > +  dst->b[0] ^= 0x87 & -carry;
> > +}
> 
> Looks right. It can be optimized later on.

Just for curiosity, would it be ok change,
LE_READ_UINT64/LE_WRITE_UINT64 to use uint64_t and have the macro use
__bswap64 if available or fall back to the original unoptimized code ?

(__bswap64 already knows how to use __builtin_bswap64 or falls back to
similar code as in LE_READ_UINT64)

Or do we need a new macro that expects aligned input/output and leave
the existing macro alone, to be used for unaligned input/output ? 

> > +/*
> > + * prev is the block to steal from
> > + * curr is the input block to the last step
> > + * length is the partial block length
> > + * dst is the destination partial block
> > + * src is the source partial block
> > + *
> > + * In the Encryption case:
> > + *   prev -> the output of the N-1 encryption step
> > + *   curr -> the input to the Nth step (will be encrypted as Cn-1)
> > + *   dst  -> the final Cn partial block
> > + *   src  -> the final Pn partial block
> > + *
> > + * In the decryption case:
> > + *   prev -> the output of the N-1 decryption step
> > + *   curr -> the input to the Nth step (will be decrypted as Pn-1)
> > + *   dst  -> the final Pn partial block
> > + *   src  -> the final Cn partial block
> > + */
> > +static void
> > +xts_steal(uint8_t *prev, uint8_t *curr,
> > + size_t length, uint8_t *dst, const uint8_t *src)
> > +{
> > +  /* copy the remaining in the current input block */
> > +  memcpy(curr, src, length);
> > +  /* fill the current block with the last blocksize - length
> > +   * bytes of the previous block */
> > +  memcpy([length], [length], XTS_BLOCK_SIZE - length);
> > +
> > +  /* This must be last or inplace operations will break
> > +   * copy 'length' bytes of the previous block in the
> > +   * destination block, which is the final partial block
> > +   * returned to the caller */
> > +  memcpy(dst, prev, length);
> > +}
> 
> This is a bit confusing; first we write to the curr block, using data
> from src and prev. And next we copy from prev to dst; is that the
> swapping of the last two blocks? Maybe this could be made simpler, and a
> copy eliminatedif the main loops were
> 
>   for (;length >= 2*XTS_BLOCK_SIZE;
> 
> For the arguments, it looks like prev could be const, and curr could be
> nettle_block16 (but changing the latter is perhaps a bit pointless,
> since we only treat it as a byte array with no obvio

Re: [RFC] optimized poly1305 and ABI

2019-03-17 Thread Simo Sorce
On Sun, 2019-03-17 at 12:56 +0300, Yuriy M. Kaminskiy wrote:
> On 14.03.2019 00:31, Simo Sorce wrote:
> > On Thu, 2019-03-14 at 00:25 +0300, Yuriy M. Kaminskiy wrote:
> > > On 12.03.2019 15:02, Yuriy M. Kaminskiy wrote:
> > > And it is embedded in struct chacha_poly1305_ctx and poly1305_aes_ctx, 
> > > which looks like
> > > part of public (and used) low-level ABI.
> > > 
> > > (nettle-meta.h interface would be safe wrt struct size changes, but so 
> > > far everything I've looked
> > > at - including gnutls - was not using it :-()
> > 
> > FWIW, I wouldn't feel blocked by an ABI break in Nettle.
> 
> Breaking ABI in the library that is used in another libraries is always 
> problematic.
> 
> Scenario: $app links to libgnutls.so.1 and libnettle.so.1 (and libgnutls.so.1 
> linked
> against libnettle.so.1; then libnettle.so.2 installed and libgnutls.so.1 
> rebuilt
> against new nettle; what will happen with $app?
> 
> (Especially since nettle does not use versioned symbols)
> 
> So, you either bump libgnutls soname too (and you must rebuild all apps to 
> take advantage of it)
> [also it triggers same problem with libraries that uses libgnutls],
> or you add Conflict/Breaks in libnettle2 (and you must rebuild all libraries 
> and apps
> to be able to even install libnettle2).

No, if app is built against nettle all you need to do is bump nettle's
so name and rebuild gnutls, the app will have to be rebuilt against the
new nettle only as gnutls completely masks nettle behind it's apis and
offers a stable ABI even when nettle ABI changes.

> (And both renders new libnettle unusable for stable-backports.)

Yes, backports may be an issue, but nettle broke the ABI previously,
and it offers no ABI guarntee.

> When you are forced to break ABI, it is good point to think: can it be 
> avoided,
> and how can this be prevented in the future?

This is not possible with Nettle as it uses explicit structures
exported to the caller, to have a stable ABI nettle would need to
change how it deals with context structures by allocating them on the
heap and making them opaque. I do not see this happening.

> (poly1305 is not only algo that may require altering context structure for 
> optimized
> implementation [e.g. bitsliced or vectorized aes]).
> 
> E.g. openssl made all structures opaque, and I believe it is correct 
> long-term solution.

Openssl is a higher level library. We can discuss a wrapper library
around nettle that offers a stable API/ABI and could be used by GnuTLS
perhaps.

> (Well, I've thought about a way, although not very nice: keep old version 
> internally,
> add separate {chacha,aes}_poly1305_encrypt_v2, #define $foo $foo_v2 in 
> headers; you'll need
> to rebuild all directly dependent libraries and apps to take advantage of new 
> implementation,
> but not necessarily whole system; also it is not 146% safe [lib$a built 
> against old version,
> lib$b built against new version, lib$a allocates struct chacha20_poly13_ctx 
> and passes pointer
> to lib$b; libb calls $foo_v2, BOOM], but I doubt anyone uses libnettle this 
> way in practice).

Yes, as you described this method is broken, you would have to have
explicit different APIs or introduce symbol versioning.

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Implement XTS block cipher mode

2019-03-16 Thread Simo Sorce
On Sat, 2019-03-16 at 16:26 +0300, Yuriy M. Kaminskiy wrote:
> On 03/15/19 16:33 , Simo Sorce wrote:
> > On Fri, 2019-03-15 at 09:27 -0400, Simo Sorce wrote:
> > > > I think this is the same as block_mulx, in cmac.c. (Also same byte
> > > > order, right?)
> > > 
> > > Looks the same indeed, should I share it? Just copy it from cmac?
> > > Something else?
> > 
> > Turns out the algorithm is not equivalent, as the shift is applied to
> > the array as if it were a big 128bit little endian value, the endianess
> > of the two is different.
> > 
> > I changed the implementation to a much simpler form that show the
> > difference:
> > 
> > /* shift one and XOR with 0x87. */
> > /* src and dest can point to the same buffer for in-place operations */
> > static void
> > xts_shift(union nettle_block16 *dst,
> >   const union nettle_block16 *src)
> > {
> >   uint8_t carry = src->b[15] >> 7;
> >   dst->u64[1] = (src->u64[1] << 1) | (src->u64[0] >> 63);
> >   dst->u64[0] = src->u64[0] << 1;
> >   dst->b[0] ^= 0x87 & -carry;
> 
> Nitpick: mixing different-sized access (esp. writes) to same memory is
> problematic for modern cpus (it confuses speculative engine):

In what way does it confuses speculation ?

>   uint64_t carry = src->u64[1] >> 63;
>   dst->u64[1] = (src->u64[1] << 1) | (src->u64[0] >> 63);
>   dst->u64[0] = src->u64[0] << 1;
>   dst->u64[0] ^= 0x87 & -carry;

This is equivalent, but I would like to know what is the exact problem
before changing. Confusing speculation may actually be a feature in
this day and age :-)

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Implement XTS block cipher mode

2019-03-15 Thread Simo Sorce
On Fri, 2019-03-15 at 17:46 -0400, Simo Sorce wrote:
> On Fri, 2019-03-15 at 22:33 +0100, Niels Möller wrote:
> > Simo Sorce  writes:
> > 
> > > Turns out the algorithm is not equivalent, as the shift is applied to
> > > the array as if it were a big 128bit little endian value, the endianess
> > > of the two is different.
> > 
> > Ah, I see. 
> > 
> > > /* shift one and XOR with 0x87. */
> > > /* src and dest can point to the same buffer for in-place operations */
> > > static void
> > > xts_shift(union nettle_block16 *dst,
> > >   const union nettle_block16 *src)
> > > {
> > >   uint8_t carry = src->b[15] >> 7;
> > >   dst->u64[1] = (src->u64[1] << 1) | (src->u64[0] >> 63);
> > >   dst->u64[0] = src->u64[0] << 1;
> > >   dst->b[0] ^= 0x87 & -carry;
> > > }
> > 
> > This will then work only on little-endian systems?
> > 
> > I think it would be nice with a structure like
> > 
> >   b0 = src->u64[0]; b1 = src->u64[1];  /* Load inputs */
> >   ... swap if big-endian ...
> >   uint64_t carry = (b1 >> 63);
> >   b1 = (b1 << 1) | (b0 >> 63)
> >   b0 = (b0 << 1) ^ (0x87 & -carry);
> >   ... swap if big-endian ...
> >   dst->u64[0] = b0; dst->u64[1] = b1;  /* Store output */
> > 
> > I.e., no memory accesses smaller than 64-bits.
> > 
> > Possibly with load + swap and swap + store done with some
> > system-dependent macros.
> > 
> > But it's not essential for a first version of xts; copying block_mulx
> > and just replacing READ_UINT64 with LE_READ_UINT64 and similarly for
> > WRITE would be ok for now. There are more places with potential for
> > micro-optimizations related to endianness. While I think the
> > READ/WRITE_UINT macros are adequate in most places where unaligned
> > application data is read and written by C code.
> 
> I will add the macros to swap endianess, and resend a new version.

New patch attached, the diff has also been applied as an additional commit to 
my xts_exploded tree in gitlab.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

From a17e0ce7aeebe0d3db46d03ef857a622831fff4a Mon Sep 17 00:00:00 2001
From: Simo Sorce 
Date: Thu, 4 Oct 2018 14:38:50 -0400
Subject: [PATCH] Add support for XTS encryption mode

XEX encryption mode with tweak and ciphertext stealing (XTS) is
standardized in IEEE 1619 and generally used for storage devices.

Signed-off-by: Simo Sorce 
---
 Makefile.in|   5 +-
 nettle.texinfo | 147 +-
 testsuite/.gitignore   |   1 +
 testsuite/.test-rules.make |   3 +
 testsuite/Makefile.in  |   2 +-
 testsuite/xts-test.c   | 173 +++
 xts-aes128.c   |  75 ++
 xts-aes256.c   |  75 ++
 xts.c  | 205 +
 xts.h  | 117 +
 10 files changed, 798 insertions(+), 5 deletions(-)
 create mode 100644 testsuite/xts-test.c
 create mode 100644 xts-aes128.c
 create mode 100644 xts-aes256.c
 create mode 100644 xts.c
 create mode 100644 xts.h

diff --git a/Makefile.in b/Makefile.in
index 83250cf3..440de9f7 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -135,7 +135,8 @@ nettle_SOURCES = aes-decrypt-internal.c aes-decrypt.c \
 		 umac32.c umac64.c umac96.c umac128.c \
 		 version.c \
 		 write-be32.c write-le32.c write-le64.c \
-		 yarrow256.c yarrow_key_event.c
+		 yarrow256.c yarrow_key_event.c \
+		 xts.c xts-aes128.c xts-aes256.c
 
 hogweed_SOURCES = sexp.c sexp-format.c \
 		  sexp-transport.c sexp-transport-format.c \
@@ -206,7 +207,7 @@ HEADERS = aes.h arcfour.h arctwo.h asn1.h blowfish.h \
 	  pgp.h pkcs1.h pss.h pss-mgf1.h realloc.h ripemd160.h rsa.h \
 	  salsa20.h sexp.h \
 	  serpent.h sha.h sha1.h sha2.h sha3.h twofish.h \
-	  umac.h yarrow.h poly1305.h
+	  umac.h yarrow.h xts.h poly1305.h
 
 INSTALL_HEADERS = $(HEADERS) version.h @IF_MINI_GMP@ mini-gmp.h
 
diff --git a/nettle.texinfo b/nettle.texinfo
index 9806bdc1..c44302bc 100644
--- a/nettle.texinfo
+++ b/nettle.texinfo
@@ -2001,7 +2001,8 @@ Book mode, @acronym{ECB}), leaks information.
 
 Besides @acronym{ECB}, Nettle provides several other modes of operation:
 Cipher Block Chaining (@acronym{CBC}), Counter mode (@acronym{CTR}), Cipher
-Feedback (@acronym{CFB} and @acronym{CFB8}) and a couple of @acronym{AEAD}
+Feedback (@acronym{CFB} and @acronym{CFB8}), XEX-based tweaked-codebook mode
+with ciphertext stealing (@acronym{XTS}) and a couple of @acronym{AEAD}
 modes (@pxref{Authenticated encryption}).  @acronym{CBC} is widely used, but
 there are a few subtle issues of information leakage, 

Re: Implement XTS block cipher mode

2019-03-15 Thread Simo Sorce
On Fri, 2019-03-15 at 22:33 +0100, Niels Möller wrote:
> Simo Sorce  writes:
> 
> > Turns out the algorithm is not equivalent, as the shift is applied to
> > the array as if it were a big 128bit little endian value, the endianess
> > of the two is different.
> 
> Ah, I see. 
> 
> > /* shift one and XOR with 0x87. */
> > /* src and dest can point to the same buffer for in-place operations */
> > static void
> > xts_shift(union nettle_block16 *dst,
> >   const union nettle_block16 *src)
> > {
> >   uint8_t carry = src->b[15] >> 7;
> >   dst->u64[1] = (src->u64[1] << 1) | (src->u64[0] >> 63);
> >   dst->u64[0] = src->u64[0] << 1;
> >   dst->b[0] ^= 0x87 & -carry;
> > }
> 
> This will then work only on little-endian systems?
> 
> I think it would be nice with a structure like
> 
>   b0 = src->u64[0]; b1 = src->u64[1];  /* Load inputs */
>   ... swap if big-endian ...
>   uint64_t carry = (b1 >> 63);
>   b1 = (b1 << 1) | (b0 >> 63)
>   b0 = (b0 << 1) ^ (0x87 & -carry);
>   ... swap if big-endian ...
>   dst->u64[0] = b0; dst->u64[1] = b1;  /* Store output */
> 
> I.e., no memory accesses smaller than 64-bits.
> 
> Possibly with load + swap and swap + store done with some
> system-dependent macros.
> 
> But it's not essential for a first version of xts; copying block_mulx
> and just replacing READ_UINT64 with LE_READ_UINT64 and similarly for
> WRITE would be ok for now. There are more places with potential for
> micro-optimizations related to endianness. While I think the
> READ/WRITE_UINT macros are adequate in most places where unaligned
> application data is read and written by C code.

I will add the macros to swap endianess, and resend a new version.

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Implement XTS block cipher mode

2019-03-15 Thread Simo Sorce
On Fri, 2019-03-15 at 16:33 -0400, Simo Sorce wrote:
> On Fri, 2019-03-15 at 09:27 -0400, Simo Sorce wrote:
> > > I think this is the same as block_mulx, in cmac.c. (Also same byte
> > > order, right?)
> > 
> > Looks the same indeed, should I share it? Just copy it from cmac?
> > Something else?
> 
> Turns out the algorithm is not equivalent, as the shift is applied to
> the array as if it were a big 128bit little endian value, the endianess
> of the two is different.
> 
> I changed the implementation to a much simpler form that show the
> difference:
> 
> /* shift one and XOR with 0x87. */
> /* src and dest can point to the same buffer for in-place operations */
> static void
> xts_shift(union nettle_block16 *dst,
>   const union nettle_block16 *src)
> {
>   uint8_t carry = src->b[15] >> 7;
>   dst->u64[1] = (src->u64[1] << 1) | (src->u64[0] >> 63);
>   dst->u64[0] = src->u64[0] << 1;
>   dst->b[0] ^= 0x87 & -carry;
> }
> 
> Simo.

Attached find patch with all the changes I understood we agreed on in
the thread.

For easier review I also have a tree with the original patch and
discrete steps in form of separate commits that I rebased and merged
together:
https://gitlab.com/simo5/nettle/tree/xts_exploded

Note the documentation already warns that at least a block of data is
required and less than 16 bytes of input is illegal, so no change has
been applied in that regard.

HTH,
Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

From 3484f9085fcdfd72df0e54eae7fd0e23fa4c5446 Mon Sep 17 00:00:00 2001
From: Simo Sorce 
Date: Thu, 4 Oct 2018 14:38:50 -0400
Subject: [PATCH] Add support for XTS encryption mode

XEX encryption mode with tweak and ciphertext stealing (XTS) is
standardized in IEEE 1619 and generally used for storage devices.

Signed-off-by: Simo Sorce 
---
 Makefile.in|   5 +-
 nettle.texinfo | 147 ++-
 testsuite/.gitignore   |   1 +
 testsuite/.test-rules.make |   3 +
 testsuite/Makefile.in  |   2 +-
 testsuite/xts-test.c   | 173 +++
 xts-aes128.c   |  75 ++
 xts-aes256.c   |  75 ++
 xts.c  | 201 +
 xts.h  | 117 +
 10 files changed, 794 insertions(+), 5 deletions(-)
 create mode 100644 testsuite/xts-test.c
 create mode 100644 xts-aes128.c
 create mode 100644 xts-aes256.c
 create mode 100644 xts.c
 create mode 100644 xts.h

diff --git a/Makefile.in b/Makefile.in
index 83250cf3..440de9f7 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -135,7 +135,8 @@ nettle_SOURCES = aes-decrypt-internal.c aes-decrypt.c \
 		 umac32.c umac64.c umac96.c umac128.c \
 		 version.c \
 		 write-be32.c write-le32.c write-le64.c \
-		 yarrow256.c yarrow_key_event.c
+		 yarrow256.c yarrow_key_event.c \
+		 xts.c xts-aes128.c xts-aes256.c
 
 hogweed_SOURCES = sexp.c sexp-format.c \
 		  sexp-transport.c sexp-transport-format.c \
@@ -206,7 +207,7 @@ HEADERS = aes.h arcfour.h arctwo.h asn1.h blowfish.h \
 	  pgp.h pkcs1.h pss.h pss-mgf1.h realloc.h ripemd160.h rsa.h \
 	  salsa20.h sexp.h \
 	  serpent.h sha.h sha1.h sha2.h sha3.h twofish.h \
-	  umac.h yarrow.h poly1305.h
+	  umac.h yarrow.h xts.h poly1305.h
 
 INSTALL_HEADERS = $(HEADERS) version.h @IF_MINI_GMP@ mini-gmp.h
 
diff --git a/nettle.texinfo b/nettle.texinfo
index 9806bdc1..c44302bc 100644
--- a/nettle.texinfo
+++ b/nettle.texinfo
@@ -2001,7 +2001,8 @@ Book mode, @acronym{ECB}), leaks information.
 
 Besides @acronym{ECB}, Nettle provides several other modes of operation:
 Cipher Block Chaining (@acronym{CBC}), Counter mode (@acronym{CTR}), Cipher
-Feedback (@acronym{CFB} and @acronym{CFB8}) and a couple of @acronym{AEAD}
+Feedback (@acronym{CFB} and @acronym{CFB8}), XEX-based tweaked-codebook mode
+with ciphertext stealing (@acronym{XTS}) and a couple of @acronym{AEAD}
 modes (@pxref{Authenticated encryption}).  @acronym{CBC} is widely used, but
 there are a few subtle issues of information leakage, see, e.g.,
 @uref{http://www.kb.cert.org/vuls/id/958563, @acronym{SSH} @acronym{CBC}
@@ -2016,6 +2017,7 @@ authenticate the message.
 * CBC:: 
 * CTR:: 
 * CFB and CFB8::
+* XTS::
 @end menu
 
 @node CBC, CTR, Cipher modes, Cipher modes
@@ -2187,7 +2189,7 @@ last three arguments define the source and destination area for the
 operation.
 @end deffn
 
-@node CFB and CFB8, , CTR, Cipher modes
+@node CFB and CFB8, XTS, CTR, Cipher modes
 @comment  node-name,  next,  previous,  up
 @subsection Cipher Feedback mode
 
@@ -2340,6 +2342,147 @@ conventions. The last three arguments define the source and destination
 area for the operation.
 @end deffn
 
+@node XTS, , CFB and CFB8, Cipher modes
+@comment  node-name,  next,  

Re: Implement XTS block cipher mode

2019-03-15 Thread Simo Sorce
On Fri, 2019-03-15 at 09:27 -0400, Simo Sorce wrote:
> > I think this is the same as block_mulx, in cmac.c. (Also same byte
> > order, right?)
> 
> Looks the same indeed, should I share it? Just copy it from cmac?
> Something else?

Turns out the algorithm is not equivalent, as the shift is applied to
the array as if it were a big 128bit little endian value, the endianess
of the two is different.

I changed the implementation to a much simpler form that show the
difference:

/* shift one and XOR with 0x87. */
/* src and dest can point to the same buffer for in-place operations */
static void
xts_shift(union nettle_block16 *dst,
  const union nettle_block16 *src)
{
  uint8_t carry = src->b[15] >> 7;
  dst->u64[1] = (src->u64[1] << 1) | (src->u64[0] >> 63);
  dst->u64[0] = src->u64[0] << 1;
  dst->b[0] ^= 0x87 & -carry;
}

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Implement XTS block cipher mode

2019-03-15 Thread Simo Sorce
On Fri, 2019-03-15 at 17:30 +0100, Niels Möller wrote:
> Simo Sorce  writes:
> 
> > > But it looks like one has to pass the complete message to one call?
> > 
> > Yes, due to ciphertext stealing, XTS needs to know what are the last
> > two blocks, or at the very least needs to withhold the last processed
> > block in order to be able to change it if a final partial block is
> > provided. This means inputs and outputs would not be symmetrical and I
> > felt it would make it somewhat hard to deal with as an API.
> > In general XTS is used for block storage and the input is always fully
> > available (and relatively small, either around 512 bytes, or 4k).
> 
> I see. Can you mention this (and the fact that messages must be at least
> one full block) in the manual?

I will.

> Would it make sense to rename functions to xts_encrypt_message,
> xts_decrypt_message, to emphasize that they operate on complete
> messages? That would be analogous with the ccm message functions.

Good idea.

> > > So the second key, with E_k2, is only ever used to encrypt the IV? If
> > > you add a set_iv function, that could do this encryption and only store
> > > E_k2(IV).
> > 
> > What would be the advantage ? 
> 
> With an api for complete messages only, my suggestion makes little
> sense.
> 
> > Looks the same indeed, should I share it? Just copy it from cmac?
> > Something else?
> 
> Would be nice to share it, but since it's so short, duplication is no
> big deal.
> 
> Also re-reading the cmac version, block_mulx, I think it's unfortunate
> to use READ_UINT64 and WRITE_UINT64, since arguments are aligned. It
> would be preferable to load 64-bit values and use __builtin_bswap64 when
> needed and available (see ctr.c for a similar hack). But that's an
> independent improvement.

I can make a new function and attempt to make these improvements there,
then it can either be shared or copied back to cmac at a later time.

Simo.
 
-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: Implement XTS block cipher mode

2019-03-15 Thread Simo Sorce
On Fri, 2019-03-15 at 13:14 +0100, Niels Möller wrote:
> Simo Sorce  writes:
> 
> > The attached patch implements the XTS block cipher mode, as specified
> > in IEEE P1619. The interface is split into a generic pair of functions
> > for encryption and decryption and additional AES-128/AES-256 variants.
> 
> Thanks. Sorry for the late response.
> 
> > The function signatures follows the same pattern used by other block-
> > cipher modes like ctr, cfb, ccm, etc...
> 
> But it looks like one has to pass the complete message to one call?

Yes, due to ciphertext stealing, XTS needs to know what are the last
two blocks, or at the very least needs to withhold the last processed
block in order to be able to change it if a final partial block is
provided. This means inputs and outputs would not be symmetrical and I
felt it would make it somewhat hard to deal with as an API.
In general XTS is used for block storage and the input is always fully
available (and relatively small, either around 512 bytes, or 4k).

> Other modes support incremental encryption (with the requirement that
> all calls but the last must be an integral number of blocks). I.e.,
> calling sequence like
> 
> xts_aes128_set_key
> xts_aes128_set_iv
> xts_aes128_encrypt ... // 1 or more times
> xts_aes128_set_iv   // Start new message
> xts_aes128_encrypt ... // 1 or more times
> 
> 
> > +The @code{n} plaintext blocks are transformed into @code{n} ciphertext 
> > blocks
> > +@code{C_1},@dots{} @code{C_n} as follows.
> > +
> > +For a plaintext length that is a perfect multiple of the XTS block size:
> > +@example
> > +T_1 = E_k2(IV) MUL a^0
> > +C_1 = E_k1(P_1 XOR T_1) XOR T_1
> > +
> > +@dots{}
> > +
> > +T_n = E_k2(IV) MUL a^(n-1)
> > +C_n = E_k1(P_n XOR T_n) XOR T_n
> > +@end example
> > +
> > +For any other plaintext lengths:
> > +@example
> > +T_1 = E_k2(IV) MUL a^0
> > +C_1 = E_k1(P_1 XOR T_1) XOR T_1
> > +
> > +@dots{}
> > +
> > +T_(n-2) = E_k2(IV) MUL a^(n-3)
> > +C_(n-2) = E_k1(P_(n-2) XOR T_(n-2)) XOR T_(n-2)
> > +
> > +T_(n-1) = E_k2(IV) MUL a^(n-2)
> > +CC_(n-1) = E_k1(P_(n-1) XOR T_(n-1)) XOR T_(n-1)
> > +
> > +T_n = E_k2(IV) MUL a^(n-1)
> > +PP = [1..m]Pn | [m+1..128]CC_(n-1)
> > +C_(n-1) = E_k1(PP XOR T_n) XOR T_n
> > +
> > +C_n = [1..m]CC_(n-1)
> > +@end example
> 
> So the second key, with E_k2, is only ever used to encrypt the IV? If
> you add a set_iv function, that could do this encryption and only store
> E_k2(IV).

What would be the advantage ? I guess it may make sense if we were to
allow to call the encryption function multiple times, but as explained
above I am not sure this is necessarily desirable.
It may also risk misuse where people set the same IV for all encryption
operations, that would be catastrophic, but probably can be handled by
clearing the stored IV when the encryption is finalized.

> > --- /dev/null
> > +++ b/xts.c
> > @@ -0,0 +1,219 @@
> 
> [...]
> > +static void
> > +xts_shift(uint8_t *T)
> > +{
> > +  uint8_t carry;
> > +  uint8_t i;
> > +
> > +  for (i = 0, carry = 0; i < XTS_BLOCK_SIZE; i++)
> > +{
> > +  uint8_t msb = T[i] & 0x80;
> > +  T[i] = T[i] << 1;
> > +  T[i] |= carry;
> > +  carry = msb >> 7;
> > +}
> > +  if (carry)
> > +T[0] ^= 0x87;
> > +}
> 
> I think this is the same as block_mulx, in cmac.c. (Also same byte
> order, right?)

Looks the same indeed, should I share it? Just copy it from cmac?
Something else?

> Since the block size is fixed to 128 bits, I think it makes sense to use
> the nettle_block16 type for all blocks but the application's src and
> destination areas. Then we get proper alignment, and can easily use
> operations on larger units.

Ok.

> BTW, for side-channel silence, we should change 
> 
>   if (carry)
> T[0] ^= 0x87;
> 
> to something like
> 
>   T[0] ^= 0x87 & - carry;
> 
> (and similarly for the cmac version).

I can do it for xts.c, and provide a separate patch for cmac.c too, or
use a common function for both and handle it there.

> > +  fblen = length - (length % XTS_BLOCK_SIZE);
> > +  XTSENC(twk_ctx, T, tweak);
> > +
> > +  /* the zeroth power of alpha is the initial ciphertext value itself, so 
> > we
> > +   * skip shifting and do it at the end of each block operation instead */
> > +  for (i = 0; i < fblen; i += XTS_BLOCK_SIZE)
> > +{
> 
> In other places, loops like this are often written as
> 
>   for (; length >= BLOCK_SIZE; 
>length -= BLOCK_SIZE, src +

Re: [RFC] optimized poly1305 and ABI

2019-03-13 Thread Simo Sorce
On Thu, 2019-03-14 at 00:25 +0300, Yuriy M. Kaminskiy wrote:
> On 12.03.2019 15:02, Yuriy M. Kaminskiy wrote:
> > Then I will probably take a look at poly1305
> 
> ... and it looks problematic; porting poly1305/armv6 is possible, but there 
> won't
> be much improvement over generic C code:
> 
> $ poly1305-opt/bin/poly1305-util bench
> 8192 byte(s):
>   neon, 15114.40 ns per call,   1.8 ns/byte
>  armv6, 31944.33 ns per call,   3.9 ns/byte
> generic/32, 39088.50 ns per call,   4.8 ns/byte
> 
> neon (and other simd implementations) is much faster, but requires larger 
> state; current nettle's
> struct poly1305_ctx is 56 bytes, optimized versions requires up to 192 bytes.
> 
> And it is embedded in struct chacha_poly1305_ctx and poly1305_aes_ctx, which 
> looks like
> part of public (and used) low-level ABI.
> 
> (nettle-meta.h interface would be safe wrt struct size changes, but so far 
> everything I've looked
> at - including gnutls - was not using it :-()

FWIW, I wouldn't feel blocked by an ABI break in Nettle.
Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: sec_powm.c:293: GNU MP assertion failed: enb >= windowsize

2019-01-07 Thread Simo Sorce
They seem minor enough not to really need a new release IMO, people can
get them from master as needed. Or maybe you can commit patches on the
3.4 branch so people can find them more easily if they really care.

I have a question though, why C89 support ?
C99 is already almost 20y old already ...

Regards,
Simo.

On Sun, 2019-01-06 at 11:25 +0100, Niels Möller wrote:
> Jeffrey Walton  writes:
> 
> > I'm trying to build Nettle 3.4.1 on Fedora 29, x64 (fully patched).
> > Self tests are failing at:
> > 
> > PASS: rsa-sec-decrypt
> > sec_powm.c:293: GNU MP assertion failed: enb >= windowsize
> > ../run-tests: line 57: 22997 Aborted (core dumped) "$1" $testflags
> > FAIL: rsa-compute-root
> > PASS: dsa
> > PASS: dsa-keygen
> 
> (This was a test with -DNDEBUG, a configuration not covered by the
> gitlab ci testers). Turned out to be a bug in the test code,
> 
>   assert (mpz_invert(key->d, pub->e, phi));
> 
> Fixed (on the release-3.4-fixes branch) with commit
> https://git.lysator.liu.se/nettle/nettle/commit/73d3c6d5586cc0fd81eab081078144d621de07b4
> 
> There was another -DNDEBUG breakage in examples/nettle-openssl.c, which
> was fixed on master long ago, but which I overlooked when cherry-picking
> bugfixes for nettle-3.4.1. Also fixed now.
> 
> Question for the list: I'm now aware of two bugs in nettle-3.4.1:
> 
> 1. Accidental use of c99 loops, breaking builds with c89 compilers.
> 
> 2. Incorrect asserts, affecting tests and benchmark code when compield
>with -DNDEBUG, but not the libraries themselves.
> 
> Should I make a 3.4.2 release fixing these problems?
> 
> Regards,
> /Niels
> 

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc


___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Implement XTS block cipher mode

2018-10-08 Thread Simo Sorce
Hello,

The attached patch implements the XTS block cipher mode, as specified
in IEEE P1619. The interface is split into a generic pair of functions
for encryption and decryption and additional AES-128/AES-256 variants.

The function signatures follows the same pattern used by other block-
cipher modes like ctr, cfb, ccm, etc...

Basic tests using a small selection of NIST CAVS vectors are provided.

XTS is use in several disk encryption algorithms and programs because
it allows to use a block cipher even when the input length is not a
perfect multiple of the block cipher length by using ciphertext
stealing.

Thanks to Daiki Ueno for initial review.

Feedback is appreciated.

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc
From 50e8114c87875244a4a75ab4824931ca30efd844 Mon Sep 17 00:00:00 2001
From: Simo Sorce 
Date: Thu, 4 Oct 2018 14:38:50 -0400
Subject: [PATCH] Add support for XTS encryption mode

XEX encryption mode with tweak and ciphertext stealing (XTS) is
standardized in IEEE 1619 and generally used for storage devices.

Signed-off-by: Simo Sorce 
---
 Makefile.in|   5 +-
 nettle.texinfo | 146 -
 testsuite/.gitignore   |   1 +
 testsuite/.test-rules.make |   3 +
 testsuite/Makefile.in  |   2 +-
 testsuite/xts-test.c   | 169 
 xts-aes128.c   |  75 +
 xts-aes256.c   |  75 +
 xts.c  | 219 +
 xts.h  | 117 
 10 files changed, 807 insertions(+), 5 deletions(-)
 create mode 100644 testsuite/xts-test.c
 create mode 100644 xts-aes128.c
 create mode 100644 xts-aes256.c
 create mode 100644 xts.c
 create mode 100644 xts.h

diff --git a/Makefile.in b/Makefile.in
index d4fa628a..3d24503c 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -134,7 +134,8 @@ nettle_SOURCES = aes-decrypt-internal.c aes-decrypt.c \
 		 umac32.c umac64.c umac96.c umac128.c \
 		 version.c \
 		 write-be32.c write-le32.c write-le64.c \
-		 yarrow256.c yarrow_key_event.c
+		 yarrow256.c yarrow_key_event.c \
+		 xts.c xts-aes128.c xts-aes256.c
 
 hogweed_SOURCES = sexp.c sexp-format.c \
 		  sexp-transport.c sexp-transport-format.c \
@@ -202,7 +203,7 @@ HEADERS = aes.h arcfour.h arctwo.h asn1.h blowfish.h \
 	  pgp.h pkcs1.h pss.h pss-mgf1.h realloc.h ripemd160.h rsa.h \
 	  salsa20.h sexp.h \
 	  serpent.h sha.h sha1.h sha2.h sha3.h twofish.h \
-	  umac.h yarrow.h poly1305.h
+	  umac.h yarrow.h xts.h poly1305.h
 
 INSTALL_HEADERS = $(HEADERS) nettle-stdint.h version.h @IF_MINI_GMP@ mini-gmp.h
 
diff --git a/nettle.texinfo b/nettle.texinfo
index 9806bdc1..880a7539 100644
--- a/nettle.texinfo
+++ b/nettle.texinfo
@@ -2001,7 +2001,8 @@ Book mode, @acronym{ECB}), leaks information.
 
 Besides @acronym{ECB}, Nettle provides several other modes of operation:
 Cipher Block Chaining (@acronym{CBC}), Counter mode (@acronym{CTR}), Cipher
-Feedback (@acronym{CFB} and @acronym{CFB8}) and a couple of @acronym{AEAD}
+Feedback (@acronym{CFB} and @acronym{CFB8}), XEX-based tweaked-codebook mode
+with ciphertext stealing (@acronym{XTS}) and a couple of @acronym{AEAD}
 modes (@pxref{Authenticated encryption}).  @acronym{CBC} is widely used, but
 there are a few subtle issues of information leakage, see, e.g.,
 @uref{http://www.kb.cert.org/vuls/id/958563, @acronym{SSH} @acronym{CBC}
@@ -2016,6 +2017,7 @@ authenticate the message.
 * CBC:: 
 * CTR:: 
 * CFB and CFB8::
+* XTS::
 @end menu
 
 @node CBC, CTR, Cipher modes, Cipher modes
@@ -2187,7 +2189,7 @@ last three arguments define the source and destination area for the
 operation.
 @end deffn
 
-@node CFB and CFB8, , CTR, Cipher modes
+@node CFB and CFB8, XTS, CTR, Cipher modes
 @comment  node-name,  next,  previous,  up
 @subsection Cipher Feedback mode
 
@@ -2340,6 +2342,146 @@ conventions. The last three arguments define the source and destination
 area for the operation.
 @end deffn
 
+@node XTS, , CFB and CFB8, Cipher modes
+@comment  node-name,  next,  previous,  up
+@subsection XEX-based tweaked-codebook mode with ciphertext stealing
+
+@cindex XEX-based tweaked-codebook mode with ciphertext stealing
+@cindex XTS Mode
+
+
+XEX-based tweaked-codebook mode with ciphertext stealing (@acronym{XTS}) is
+a block mode like (@acronym{CBC}) but tweaked to be able to encrypt partial
+blocks via a technique called ciphertext stealing, where the last complete
+block of ciphertext is split and part returned as the last block and part
+used as plaintext for the second to last block.
+This mode is principally used to encrypt data at rest where it is not possible
+to store additional metadata or blocks larger than the plain text. The most
+common usage is for disk encryption. Due to the fact that ciphertext expansion
+is not possible, data is not authenticated. This mode should not be used where
+authentication