Loss of performance in RDRAND and RDSEED?

2021-01-02 Thread Jeffrey Walton
Hi Everyone,

I was performing some benchmarking today. On a Skylake Core-i5-6400
machine, and in the past (May 30, 2020), I would see these performance
numbers:

  RDRAND: 67 MB/s, ~38 cpb
  RDSEED: 24 MB/s, ~105 cpb

I ran the same benchmarks today (January 2 2020) and the benchmark
program reported:

  RDRAND: 7 MB/s, ~360 cpb
  RDSEED: 7 MB/s, ~360 cpb

I checked out the same code from the past (May 30, 2020) and the
numbers stayed the same:

  RDRAND: 7 MB/s, ~360 cpb
  RDSEED: 7 MB/s, ~360 cpb

SSE2, SSE4, AVX, AES-NI, SHA-NI, etc are OK.

The hardware is the same, but the OS was upgraded from Fedora 32 to
Fedora 33. The kernel and possibly intel-microcode have changed
between May 2020 and January 2021.

I'm aware of this problem with AMD's RDRAND and RDSEED, but it doesn't
affect Intel machines:
https://bugzilla.kernel.org/show_bug.cgi?id=85911 (so there should not
be any remediations in place).

My question is, is anyone aware of what may be responsible for the
performance loss?

Thanks in advance,

Jeff


Re: another testmgr question

2019-05-24 Thread Jeffrey Walton
On Fri, May 24, 2019 at 4:47 AM Christophe Leroy
 wrote:
> ...
> > As I already mentioned in another thread somewhere, this morning in the
> > shower I realised that this may be useful if you have no expectation of
> > the length itself. But it's still a pretty specific use case which was
> > never considered for our hardware. And our HW doesn't seem to be alone in
> > this.
> > Does shaXXXsum or md5sum use the kernel crypto API though?
>
> The ones from libkcapi do (http://www.chronox.de/libkcapi.html)

And they can be loaded into OpenSSL through the afalg interface.

Lots of potential uses cases. I would not cross my fingers no one is using them.

Jeff


Re: another testmgr question

2019-05-23 Thread Jeffrey Walton
On Thu, May 23, 2019 at 4:06 PM Eric Biggers  wrote:
>
> On Thu, May 23, 2019 at 01:07:25PM +, Pascal Van Leeuwen wrote:
> >
> > I'm running into some trouble with some random vectors that do *zero*
> > length operations. Now you can go all formal about how the API does
> > not explictly disallow this, but how much sense does it really make
> > to essentially encrypt, hash or authenticate absolutely *nothing*?
> >
> > It makes so little sense that we never bothered to support it in any
> > of our hardware developed over the past two decades ... and no
> > customer has ever complained about this, to the best of my knowledge.
> >
> > Can't you just remove those zero length tests?
>
> For hashes this is absolutely a valid case.  Try this:
>
> $ touch file
> $ sha256sum file
> e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  file
>
> That shows the SHA-256 digest of the empty message.
>
> For AEADs it's a valid case too.  You still get an authenticated ciphertext 
> even
> if the plaintext and/or AAD are empty, telling you that the (plaintext, AAD)
> pair is authentically from someone with the key.
>
> It's really only skciphers (length preserving encryption) where it's
> questionable, since for those an empty input can only map to an empty output.
>
> Regardless of what we do, I think it's really important that the behavior is
> *consistent*, so users see the same behavior no matter what implementation of
> the algorithm is used.
>
> Allowing empty messages works out naturally for most skcipher implementations,
> and it also conceptually simplifies the length restrictions of the API (e.g. 
> for
> most block cipher modes: just need nbytes % blocksize == 0, as opposed to that
> *and* nbytes != 0).  So that seems to be how we ended up with it.
>
> If we do change this, IMO we need to make it the behavior for all
> implementations, not make it implementation-defined.
>
> Note that it's not necessary that your *hardware* supports empty messages, 
> since
> you can simply do this in the driver instead:
>
> if (req->cryptlen == 0)
> return 0;

+1. It seems like a firmware update in the hardware or a software
update to the driver are the ways to proceed.

Why isn't the driver able to work around the hardware bugs?

I don't think it is wise to remove tests from the Test Manager.

Jeff


Re: [PATCH -next] hwrng: make symbol 'optee_rng_id_table' static

2019-02-20 Thread Jeffrey Walton
On Wed, Feb 20, 2019 at 4:23 AM Wei Yongjun  wrote:
>
> Fixes the following sparse warning:
>
> drivers/char/hw_random/optee-rng.c:265:35: warning:
>  symbol 'optee_rng_id_table' was not declared. Should it be static?

Static limits visibility to the current translation unit. Static is
like private visibility.

Maybe you are thinking if it should be declared extern so other
translation units can find the symbol. extern is like public
visibility.

Jeff


Re: [PATCH 3/3] crypto: siphash - drop _aligned variants

2018-10-08 Thread Jeffrey Walton
On Tue, Oct 9, 2018 at 2:00 AM Ard Biesheuvel  wrote:
>
> On 9 October 2018 at 06:11, Jason A. Donenfeld  wrote:
> > Hi Ard,
> > ...
> > As you might expect, when compiling in __siphash_unaligned and
> > __siphash_aligned on the x86 at the same time, __siphash_unaligned is
> > replaced with just "jmp __siphash_aligned", as gcc recognized that
> > indeed the same code is generated.
> >
> Yeah, I noticed something similar on arm64, although we do get a stack
> frame there.
>
> > However, on platforms where get_unaligned_* does do something
> > different, it looks to me like this patch now always calls the
> > unaligned code, even when the input data _is_ an aligned address
> > already, which is worse behaviour than before. While it would be
> > possible for the get_unaligned_* function headers to also detect this
> > and fallback to the faster version at compile time, by the time
> > get_unaligned_* is used in this patch, it's no longer in the header,
> > but rather in siphash.c, which means the compiler no longer knows that
> > the address is aligned, and so we hit the slow path. This especially
> > impacts architectures like MIPS, for example. This is why the original
> > code, prior to this patch, checks the alignment in the .h and then
> > selects which codepath afterwards. So while this patch might handle
> > the ARM use case, it seems like a regression on all other platforms.
> > See, for example, the struct passing in net/core/secure_seq.c, which
> > sends intentionally aligned and packed structs to siphash, which then
> > benefits from using the faster instructions on certain platforms.
> >
> > It seems like what you're grappling with on the ARM side of things is
> > that CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS only half means what it
> > says on some ISAs, complicating this logic. It seems like the ideal
> > thing to do, given that, would be to just not set
> > CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS on those, so that we can fall
> > back to the unaligned path always, like this patch suggests. Or if
> > that's _too_ drastic, perhaps introduce another variable like
> > CONFIG_MOSTLY_EFFICIENT_UNALIGNED_ACCESS.
> >
> Perhaps we should clarify better what
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS means.
>
> One could argue that it means there is no point in reorganizing your
> data to make it appear aligned, because the unaligned accessors are
> cheap. Instead, it is used as a license to cast unaligned pointers to
> any type (which C does not permit btw), even in the example.

I recommend avoiding this strategy. One of the libraries I help with
used a similar strategy and was constantly putting out 1-off fires
when GCC assumed, say, 4- or 8-byte alignments. Integer stuff was
fine. The problems did not surface until vectorization at -O3 when the
misaligned buffers started causing exceptions.

To be clear, there were very few problems. It might surface with GCC
4.9 on ARM in one function; and then surface again with GCC 5.1 on
x86_64 on another function; and then surface again under Cygwin for
another function with GCC 6.3.

The pattern was finally gutted in favor of the classic stuff - treat
the data unaligned an walk the buffer OR'ing in to a datatype. Or,
memcpy it into aligned datatypes. Modern compilers recognize the
pattern and it will be optimized they way you hope.

Older GCC's, like say, GCC 4.3, may not do as well. But it is the
price paid for portability and bug free code. And nowadays those old
GCC's and Clang's are getting more rare. There's no sense in doing
something quickly if you can't arrive at the correct result or you
crash at runtime.

> So in the case of siphash, that would mean always taking the unaligned
> path if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is set, or only for
> unaligned data if it is not.

Jeff


Re: [PATCH] crypto: remove speck

2018-08-06 Thread Jeffrey Walton
On Mon, Aug 6, 2018 at 7:04 PM, Jason A. Donenfeld  wrote:
> These are unused, undesired, and have never actually been used by
> anybody. The original authors of this code have changed their mind about
> its inclusion. Therefore, this patch removes it.

I think it may be unwise to completely discard Speck for several
reasons. The two biggest pain points for me are:

  - political concerns addressed by other ciphers
  - high quality lightweight block cipher implementation
  - some regulated industries will need it for their problem domains

It seems to me the political concerns were addressed by not using
Speck for Android. I don't believe HPolyC and Speck are orthogonal.
Instead they provide the user with a choice which is usually a good
thing.

I also think allowing politics a heavy hand endangers other ciphers
like SM3 and SM4. I would advise against removing them just because
they are Chinese ciphers. I suppose the same could be argued for North
Korea and Jipsam and Pilsung (if North Korea ever offers their
ciphers).

I think Eric, Ard and other contributions lead to a high quality
implementation of Speck. High quality implementations that "just
works" everywhere on multiple platforms are rather hard to come by.
The kernel's unified implementation ensures lots of folks don't go
making lots of mistakes when rolling their own.

There are verticals that will need a choice or alternative like Speck.
US Aerospace, US Automotive and US Hoteliers come to mind. US
Financial my use them too (they having some trading platforms with
absurd requirements that make Simon and Speck appear bloated and
overweight).  Some of the verticals are going to need an alternative
that meets technical and security goals and pass the audits.

Choice is a good thing. Users need choices for technical, regulatory
and legal reasons.

Jeff


Re: Does /dev/urandom now block until initialised ?

2018-07-23 Thread Jeffrey Walton
On Mon, Jul 23, 2018 at 11:16 AM, Theodore Y. Ts'o  wrote:
> On Mon, Jul 23, 2018 at 04:43:01AM +0100, Ken Moffat wrote:
>> ...
> One of the reasons why I didn't see the problem when I was developing
> the remediation patch for CVE-2018-1108 is because I run Debian
> testing, which doesn't have this particular Red Hat patch.

Off-topic, I'm kind of surprised it took that long to fix it (if I am
parsing things correctly).

I believe Stephan Mueller wrote up the weakness a couple of years ago.
He's the one who explained the interactions to me. Mueller was even
cited at https://github.com/systemd/systemd/issues/4167.

It is too bad he Mueller not receive credit for it in the CVE database.

Jeff


Re: [PATCH] random: add a config option to trust the CPU's hwrng

2018-07-17 Thread Jeffrey Walton
On Tue, Jul 17, 2018 at 9:43 PM, Theodore Ts'o  wrote:
> This gives the user building their own kernel (or a Linux
> distribution) the option of deciding whether or not to trust the CPU's
> hardware random number generator (e.g., RDRAND for x86 CPU's) as being
> correctly implemented and not having a back door introduced (perhaps
> courtesy of a Nation State's law enforcement or intelligence
> agencies).

+1.

Allowing the user to set local policy is a good idea. Thanks for that.


Re: PBKDF2 support in the linux kernel

2018-05-26 Thread Jeffrey Walton
On Thu, May 24, 2018 at 5:11 AM, Stephan Mueller  wrote:
> Am Donnerstag, 24. Mai 2018, 10:33:07 CEST schrieb Rafael J. Wysocki:
>
> Hi Rafael,
>
>> So the problem is that Yu would like to use this for hibernation encryption
>> done entirely in the kernel.
>
> But why do you need to perform PBKDF in kernel space?

I may be mis-parsing things, but using audited kernel code is a matter
of governance and good security engineering. I don't believe it is not
a matter of laziness.

If he/she were to add their own userland code then he would surely be
criticized for rolling his own implementation.

Jeff


Re: [PATCH v2 0/5] crypto: Speck support

2018-04-24 Thread Jeffrey Walton
On Tue, Apr 24, 2018 at 12:11 PM, Jason A. Donenfeld  wrote:
> Can we please not Speck?
>
> It was just rejected by the ISO/IEC.
>
> https://twitter.com/TomerAshur/status/988659711091228673

Yeah, but here was the reason given
(https://www.wikitribune.com/story/2018/04/20/internet/67004/67004/):

A source at an International Organization for Standardization (ISO)
meeting of expert delegations in Wuhan, China, told WikiTribune
that the U.S. delegation, including NSA officials, refused to provide
the standard level of technical information to proceed.

Jeff


Re: [PATCH v3 1/4] crypto: AF_ALG AIO - lock context IV

2018-02-15 Thread Jeffrey Walton
On Thu, Feb 15, 2018 at 8:04 AM, Stephan Mueller  wrote:
> Am Donnerstag, 15. Februar 2018, 13:45:53 CET schrieb Harsh Jain:
>
>> > Could you please elaborate what you mean with "partial tag" support?
>>
>> Here is the catch, Calculation of tag depends on total payload length
>> atleast for shaX, gcm,ccm mode on which I have worked.
>>
>> If we take an example of shaX. It appends 1 special block at the end of user
>> data which includes total input length in bit. Refer
>> "sha1_base_do_finalize" Suppose we have 32 byte and we break this in 2 IOCB
>> of 16 bytes each. Expected result : 32 encrypted bytes + sha auth tag
>> considering length 32 bytes. What we will  get : 16 bytes + sha auth tag
>> considering length 16 bytes + 16 encrypted bytes + another sha tag
>> considering 16 bytes.
>
> As AF_ALG for AEAD is implemented, there is no stream support where the hash
> is calculated at the end. This is even not supported in the current AEAD API
> of the kernel crypto API as far as I see. The only "stream-like" support is
> that you can invoke multiple separate sendmsg calls to provide the input data
> for the AEAD. But once you call recvmsg, the ciphertext and the tag is
> calculated and thus the recvmsg is akin to a hash_final operation.

If you follow Bernstein's protocol design philosophy, then messages
should be no larger than about 4K in size. From
https://nacl.cr.yp.to/valid.html:

This is one of several reasons [1] that callers should (1) split
all data into packets sent through the network; (2) put a
small global limit on packet length; and (3) separately
encrypt and authenticate each packet.

With the [1] link being
https://groups.google.com/forum/#!original/boring-crypto/BpUmNMXKMYQ/EEwAIeQdjacJ

Jeff


Re: [PATCH 0/5] crypto: Speck support

2018-02-12 Thread Jeffrey Walton
On Mon, Feb 12, 2018 at 2:19 PM, Eric Biggers  wrote:
> Hi all,
>
> On Fri, Feb 09, 2018 at 07:07:01PM -0500, Jeffrey Walton wrote:
>> > Hi Jeffrey,
>> >
>> > I see you wrote the SPECK implementation in Crypto++, and you are treating 
>> > the
>> > words as big endian.
>> >
>> > Do you have a reference for this being the "correct" order?  Unfortunately 
>> > the
>> > authors of the cipher failed to mention the byte order in their paper.  
>> > And they
>> > gave the test vectors as words, so the test vectors don't clarify it 
>> > either.
>> >
>> > I had assumed little endian words, but now I am having second thoughts...  
>> > And
>> > to confuse things further, it seems that some implementations (including 
>> > the
>> > authors own implementation for the SUPERCOP benchmark toolkit [1]) even 
>> > consider
>> > the words themselves in the order (y, x) rather than the more intuitive 
>> > (x, y).
>> >
>> > [1] 
>> > https://github.com/iadgov/simon-speck-supercop/blob/master/crypto_stream/speck128128ctr/ref/stream.c
>> >
>> > In fact, even the reference code from the paper treats pt[0] as y and 
>> > pt[1] as
>> > x, where 'pt' is a u64 array -- although that being said, it's not shown 
>> > how the
>> > actual bytes should be translated to/from those u64 arrays.
>> >
>> > I'd really like to avoid people having to add additional versions of SPECK 
>> > later
>> > for the different byte and word orders...
>>
>> Hi Eric,
>>
>> Yeah, this was a point of confusion for us as well. After the sidebar
>> conversations I am wondering about the correctness of Crypto++
>> implementation.
>>
>
> We've received another response from one of the Speck creators (Louis Wingers)
> that (to summarize) the intended byte order is little endian, and the intended
> word order is (y, x), i.e. 'y' is at a lower memory address than 'x'.  Or
> equivalently: the test vectors given in the original paper need to be read as
> byte arrays from *right-to-left*.
>
> (y, x) is not the intuitive order, but it's not a huge deal.  The more 
> important
> thing is that we don't end up with multiple implementations with different 
> byte
> and/or word orders.
>
> So, barring any additional confusion, I'll send a revised version of this
> patchset that flips the word order.  Jeff would need to flip both the byte and
> word orders in his implementation in Crypto++ as well.

Thanks Eric.

Yeah, the (y,x) explains a lot of the confusion, and explains the
modification I needed in my GitHub clone of the IAD Team's SUPERCOP to
arrive at test vector results. My clone is available at
https://github.com/noloader/simon-speck-supercop.

So let me ask you... Given the Speck-128(128) test vector from Appendix C:

Key: 0f0e0d0c0b0a0908 0706050403020100
Plaintext: 6c61766975716520 7469206564616d20
Ciphertext: a65d985179783265 7860fedf5c570d18

Will the Linux implementation arrive at the published result, or will
it arrive at a different result? I guess what I am asking, where is
the presentation detail going to be handled?

A related question is, will the kernel be parsing just the key as
(y,x), or will all parameters be handled as (y,x)? At this point I
believe it only needs to apply to the key but I did not investigate
the word swapping in detail because I was chasing the test vector.

Jeff


Re: [PATCH 0/5] crypto: Speck support

2018-02-09 Thread Jeffrey Walton
On Thu, Feb 8, 2018 at 4:01 PM, Eric Biggers  wrote:
> On Wed, Feb 07, 2018 at 08:47:05PM -0500, Jeffrey Walton wrote:
>> On Wed, Feb 7, 2018 at 7:09 PM, Eric Biggers  wrote:
>> > Hello,
>> >
>> > This series adds Speck support to the crypto API, including the Speck128
>> > and Speck64 variants.  Speck is a lightweight block cipher that can be
>> > much faster than AES on processors that don't have AES instructions.
>> >
>> > We are planning to offer Speck-XTS (probably Speck128/256-XTS) as an
>> > option for dm-crypt and fscrypt on Android, for low-end mobile devices
>> > with older CPUs such as ARMv7 which don't have the Cryptography
>> > Extensions.  Currently, such devices are unencrypted because AES is not
>> > fast enough, even when the NEON bit-sliced implementation of AES is
>> > used.  Other AES alternatives such as Blowfish, Twofish, Camellia,
>> > Cast6, and Serpent aren't fast enough either; it seems that only a
>> > modern ARX cipher can provide sufficient performance on these devices.
>> >
>> > This is a replacement for our original proposal
>> > (https://patchwork.kernel.org/patch/10101451/) which was to offer
>> > ChaCha20 for these devices.  However, the use of a stream cipher for
>> > disk/file encryption with no space to store nonces would have been much
>> > more insecure than we thought initially, given that it would be used on
>> > top of flash storage as well as potentially on top of F2FS, neither of
>> > which is guaranteed to overwrite data in-place.
>> >
>> > ...
>> > Thus, patch 1 adds a generic implementation of Speck, and the following
>> > patches add a 32-bit ARM NEON implementation of Speck-XTS.  The
>> > NEON-accelerated implementation is much faster than the generic
>> > implementation and therefore is the implementation that would primarily
>> > be used in practice on the devices we are targeting.
>> >
>> > There is no AArch64 implementation added, since such CPUs are likely to
>> > have the Cryptography Extensions, allowing the use of AES.
>>
>> +1 on SPECK.
>> ...
>
> Hi Jeffrey,
>
> I see you wrote the SPECK implementation in Crypto++, and you are treating the
> words as big endian.
>
> Do you have a reference for this being the "correct" order?  Unfortunately the
> authors of the cipher failed to mention the byte order in their paper.  And 
> they
> gave the test vectors as words, so the test vectors don't clarify it either.
>
> I had assumed little endian words, but now I am having second thoughts...  And
> to confuse things further, it seems that some implementations (including the
> authors own implementation for the SUPERCOP benchmark toolkit [1]) even 
> consider
> the words themselves in the order (y, x) rather than the more intuitive (x, 
> y).
>
> [1] 
> https://github.com/iadgov/simon-speck-supercop/blob/master/crypto_stream/speck128128ctr/ref/stream.c
>
> In fact, even the reference code from the paper treats pt[0] as y and pt[1] as
> x, where 'pt' is a u64 array -- although that being said, it's not shown how 
> the
> actual bytes should be translated to/from those u64 arrays.
>
> I'd really like to avoid people having to add additional versions of SPECK 
> later
> for the different byte and word orders...

Hi Eric,

Yeah, this was a point of confusion for us as well. After the sidebar
conversations I am wondering about the correctness of Crypto++
implementation.

As a first step here is the official test vector for Speck-128(128)
from Appendix C, p. 42 (https://eprint.iacr.org/2013/404.pdf):

Speck128/128
Key: 0f0e0d0c0b0a0908 0706050403020100
Plaintext: 6c61766975716520 7469206564616d20
Ciphertext: a65d985179783265 7860fedf5c570d18

We had some confusion over the presentation. Here is what the Simon
and Speck team sent when I asked about it, what gets plugged into the
algorithm, and how it gets plugged in:



On Mon, Nov 20, 2017 at 10:50 AM,  wrote:
> ...
> I'll explain the problem you have been having with our test vectors.
>
> The key is:  0x0f0e0d0c0b0a0908 0x0706050403020100
> The plaintext is:  6c61766975716520 7469206564616d20
> The ciphertext is:  a65d985179783265 7860fedf5c570d18
>
> The problem is essentially one of what goes where and we probably could
> have done a better job explaining things.
>
> For the key, with two words, K=(K[1],K[0]).  With three words 
> K=(K[2],K[1],K[0]),
> with four words K=(K[3],K[2],K[1],K[0]).
>
> So for the test vector you should have K[0]= 0x0706050403020100, K[1]= 
> 0x0f0e0d0c0b0a0908
> which

Re: [PATCH 0/5] crypto: Speck support

2018-02-07 Thread Jeffrey Walton
On Wed, Feb 7, 2018 at 7:09 PM, Eric Biggers  wrote:
> Hello,
>
> This series adds Speck support to the crypto API, including the Speck128
> and Speck64 variants.  Speck is a lightweight block cipher that can be
> much faster than AES on processors that don't have AES instructions.
>
> We are planning to offer Speck-XTS (probably Speck128/256-XTS) as an
> option for dm-crypt and fscrypt on Android, for low-end mobile devices
> with older CPUs such as ARMv7 which don't have the Cryptography
> Extensions.  Currently, such devices are unencrypted because AES is not
> fast enough, even when the NEON bit-sliced implementation of AES is
> used.  Other AES alternatives such as Blowfish, Twofish, Camellia,
> Cast6, and Serpent aren't fast enough either; it seems that only a
> modern ARX cipher can provide sufficient performance on these devices.
>
> This is a replacement for our original proposal
> (https://patchwork.kernel.org/patch/10101451/) which was to offer
> ChaCha20 for these devices.  However, the use of a stream cipher for
> disk/file encryption with no space to store nonces would have been much
> more insecure than we thought initially, given that it would be used on
> top of flash storage as well as potentially on top of F2FS, neither of
> which is guaranteed to overwrite data in-place.
>
> Speck has been somewhat controversial due to its origin.  Nevertheless,
> it has a straightforward design (it's an ARX cipher), and it appears to
> be the leading software-optimized lightweight block cipher currently,
> with the most cryptanalysis.  It's also easy to implement without side
> channels, unlike AES.  Moreover, we only intend Speck to be used when
> the status quo is no encryption, due to AES not being fast enough.
>
> We've also considered a novel length-preserving encryption mode based on
> ChaCha20 and Poly1305.  While theoretically attractive, such a mode
> would be a brand new crypto construction and would be more complicated
> and difficult to implement efficiently in comparison to Speck-XTS.
>
> Thus, patch 1 adds a generic implementation of Speck, and the following
> patches add a 32-bit ARM NEON implementation of Speck-XTS.  The
> NEON-accelerated implementation is much faster than the generic
> implementation and therefore is the implementation that would primarily
> be used in practice on the devices we are targeting.
>
> There is no AArch64 implementation added, since such CPUs are likely to
> have the Cryptography Extensions, allowing the use of AES.

+1 on SPECK.

Its a nice cipher that runs fast. It is nice because the security
engineering and parameter selection is well specified, and you can
push the margins as low as you like. It does not guess at security
parameters like some of the other ciphers used in dm-crypt.

On a modern Core-i5 6th gen I've seen numbers as low as ...
SPECK-64/128 runs around 2.1 cpb, and SPECK-128/256 runs around 2.4
cpb.

I've already done some work for a US contractor who wanted/needed
SPECK for a possible NASA contract. NASA is looking at SPECK for some
satellite comms.

Jeff


Re: [PATCH RFC 0/3] API for 128-bit IO access

2018-01-24 Thread Jeffrey Walton
On Wed, Jan 24, 2018 at 4:05 AM, Yury Norov  wrote:
>
> ...
> With all that, this example code:
>
> static int __init 128bit_test(void)
> {
> __uint128_t v;
> __uint128_t addr;
> __uint128_t val = (__uint128_t) 0x1234567890abc;
> ...

In case it matters, you can check for GCC support of the 128-bit types
in userland with:

#if (__SIZEOF_INT128__ >= 16)
   ...
#endif

Also see https://gcc.gnu.org/ml/gcc-help/2015-08/msg00185.html .

Jeff


Re: [PATCH] fscrypt: add support for ChaCha20 contents encryption

2017-12-08 Thread Jeffrey Walton
> Still, a stream cipher is sufficient to protect data confidentiality in
> the event of a single point-in-time permanent offline compromise of the
> disk, which currently is the primary threat model for fscrypt.  Thus,
> when the alternative is quite literally *no encryption*, we might as
> well use a stream cipher.

The "single point in time" requirement is kind of interesting. I
believe you are saying the scheme lacks semantic security.

Forgive my ignorance... Does that mean this cipher should not be used
when backups are in effect; or sync'ing to  happens?

Jeff

On Thu, Dec 7, 2017 at 8:38 PM, Eric Biggers  wrote:
> From: Eric Biggers 
>
> fscrypt currently only supports AES encryption.  However, many low-end
> mobile devices still use older CPUs such as ARMv7, which do not support
> the AES instructions (the ARMv8 Cryptography Extensions).  This results
> in very poor AES performance, even if the NEON bit-sliced implementation
> is used.  Roughly 20-40 MB/s is a typical number, in comparison to
> 300-800 MB/s on CPUs that support the AES instructions.  Switching from
> AES-256 to AES-128 only helps by about 30%.
>
> The result is that vendors don't enable encryption on these devices,
> leaving users unprotected.
>
> A performance difference of similar magnitude can also be observed on
> x86, between CPUs with and without the AES-NI instruction set.
>
> This patch provides an alternative to AES by updating fscrypt to support
> the ChaCha20 stream cipher (RFC7539) for contents encryption.  ChaCha20
> was designed to have a large security margin, to be efficient on
> general-purpose CPUs without dedicated instructions, and to be
> vectorizable.  It is already supported by the Linux crypto API,
> including a vectorized implementation for ARM using NEON instructions,
> and vectorized implementations for x86 using SSSE3 or AVX2 instructions.
>
> On 32-bit ARM processors with NEON support, ChaCha20 is about 3.2 times
> faster than AES-128-XTS (chacha20-neon vs. xts-aes-neonbs).  Without
> NEON support, ChaCha20 is about 1.5 times as fast (chacha20-generic vs.
> xts(aes-asm)).  The improvement over AES-256-XTS is even greater.
>
> Note that stream ciphers are not an ideal choice for disk encryption,
> since each data block has to be encrypted with the same IV each time it
> is overwritten.  Consequently, an adversary who observes the ciphertext
> both before and after a write can trivially recover the keystream if
> they can guess one of the plaintexts.  Moreover, an adversary who can
> write to the ciphertext can flip arbitrary bits in the plaintext, merely
> by flipping the corresponding bits in the ciphertext.  A block cipher
> operating in the XTS or CBC-ESSIV mode provides some protection against
> these types of attacks -- albeit not full protection, which would at
> minimum require the use an authenticated encryption mode with nonces.
>
> Unfortunately, we are unaware of any block cipher which performs as well
> as ChaCha20, has a similar or greater security margin, and has been
> subject to as much public security analysis.  We do not consider Speck
> to be a viable alternative at this time.
>
> Still, a stream cipher is sufficient to protect data confidentiality in
> the event of a single point-in-time permanent offline compromise of the
> disk, which currently is the primary threat model for fscrypt.  Thus,
> when the alternative is quite literally *no encryption*, we might as
> well use a stream cipher.
>
> We offer ChaCha20 rather than the reduced-round variants ChaCha8 or
> ChaCha12 because ChaCha20 has a much higher security margin, and we are
> primarily targeting CPUs where ChaCha20 is fast enough, in particular
> CPUs that have vector instructions such as NEON or SSSE3.  Also, the
> crypto API currently only supports ChaCha20.  Still, if ChaCha8 and/or
> ChaCha12 support were to be added to the crypto API, it would be
> straightforward to support them in fscrypt too.
>
> Currently, stream ciphers cannot be used for filenames encryption with
> fscrypt because all filenames in a directory have to be encrypted with
> the same IV.  Therefore, we offer ChaCha20 for contents encryption only.
> Filenames encryption still must use AES-256-CTS-CBC.  This is acceptable
> because filenames encryption is not as performance-critical as contents
> encryption.
>
> ...


Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel

2017-10-13 Thread Jeffrey Walton
On Fri, Oct 13, 2017 at 3:09 PM, Linus Torvalds
 wrote:
> On Fri, Oct 13, 2017 at 6:56 AM, Andrey Ryabinin
>  wrote:
>>
>> This could be fixed by s/vmovdqa/vmovdqu change like bellow, but maybe the 
>> right fix
>> would be to align the data properly?
>
> I suspect anything that has the SHA extensions should also do
> unaligned loads efficiently. The whole "aligned only" model is broken.
> It's just doing two loads from the state pointer, there's likely no
> point in trying to align it.

+1, good engineering.

AVX2 requires 32-byte buffer alignment in some places. It is trickier
than this use case because __BIGGEST_ALIGNMENT__ doubled, but a lot of
code still assumes 16-bytes.

Jeff


Re: Poor RNG performance on Ryzen

2017-07-21 Thread Jeffrey Walton
On Fri, Jul 21, 2017 at 3:12 AM, Oliver Mangold  wrote:
> Hi,
>
> I was wondering why reading from /dev/urandom is much slower on Ryzen than
> on Intel, and did some analysis. It turns out that the RDRAND instruction is
> at fault, which takes much longer on AMD.
>
> if I read this correctly:
>
> --- drivers/char/random.c ---
> 862 spin_lock_irqsave(&crng->lock, flags);
> 863 if (arch_get_random_long(&v))
> 864 crng->state[14] ^= v;
> 865 chacha20_block(&crng->state[0], out);
>
> one call to RDRAND (with 64-bit operand) is issued per computation of a
> chacha20 block. According to the measurements I did, it seems on Ryzen this
> dominates the time usage:

AMD's implementation of RDRAND and RDSEED are simply slow. It dates
back to Bulldozer. While Intel can produce random numbers at 10
cycle/sbyte, AMD regularly takes thousands of cycles for one byte.
Bulldozer was measured at 4100 cycles per byte.

It also appears AMD uses the same circuit for random numbers for both
RDRAND and RDSEED. Both are equally fast (or equally slow).

Here are some benchmarks if you are interested:
https://www.cryptopp.com/wiki/RDRAND#Performance .

Jeff


Re: [RFC PATCH v12 3/4] Linux Random Number Generator

2017-07-21 Thread Jeffrey Walton
Hi Ted,

Snipping one comment:

> Practically no one uses /dev/random.  It's essentially a deprecated
> interface; the primary interfaces that have been recommended for well
> over a decade is /dev/urandom, and now, getrandom(2).  We only need
> 384 bits of randomness every 5 minutes to reseed the CRNG, and that's
> plenty even given the very conservative entropy estimation currently
> being used.

The statement about /dev/random being deprecated is not well
documented. A quick search is not turning up the expected results.

The RANDOM(4) man page provides competing (conflicting?) information:

   When read, the /dev/random device will return random bytes only  within
   the estimated number of bits of noise in the entropy pool.  /dev/random
   should be suitable for uses that need very high quality randomness such
   as  one-time  pad  or  key generation...

We regularly test the /dev/random generator by reading 10K bytes in
non-blocking, discarding them, and then asking for 16 bytes in
blocking. We also compress as a poor man's fitness test. We are
interested in how robust the generator is, how well it performs under
stress, and how well it recovers.

After draining it often takes minutes for the generator to produce 16
bytes. On Debian based systems the experiment usually fails unless
rng-tools is installed. The failures occur even on systems with
hardware based generators like rdrand and rdseed. I've witnessed the
failure on i686, x86_64, ARM and MIPS.

We recently suggested the GCC compile farm install rng-tools because
we were witnessing the problem there on machines. Cf.,
https://lists.tetaneutral.net/pipermail/cfarm-users/2017-July/30.html
. I've even seen vendors recommend wiring /dev/random to /dev/urandom
because of the entropy depletion problems. That's a big No-No
according to https://lwn.net/Articles/525459/.

The failures have always left me with an uncomfortable feeling because
there are so many damn programs out there that do their own thing.
Distro don't perform a SecArch review before packaging, so problems
lie in wait.

If the generator is truly deprecated, then it may be prudent to remove
it completely or remove it from userland. Otherwise, improve its
robustness. At minimum, update the documentation.

Jeff

On Thu, Jul 20, 2017 at 11:08 PM, Theodore Ts'o  wrote:
> On Thu, Jul 20, 2017 at 09:00:02PM +0200, Stephan Müller wrote:
>> I concur with your rationale where de-facto the correlation is effect is
>> diminished and eliminated with the fast_pool and the minimal entropy
>> estimation of interrupts.
>>
>> But it does not address my concern. Maybe I was not clear, please allow me to
>> explain it again.
>>
>> We have lots of entropy in the system which is discarded by the 
>> aforementioned
>> approach (if a high-res timer is present -- without it all bets are off 
>> anyway
>> and this should be covered in a separate discussion). At boot time, this 
>> issue
>> is fixed by injecting 256 interrupts in the CRNG and consider it seeded.
>>
>> But at runtime, were we still need entropy to reseed the CRNG and to supply /
>> dev/random. The accounting of entropy at runtime is much too conservative...
>
> Practically no one uses /dev/random.  It's essentially a deprecated
> interface; the primary interfaces that have been recommended for well
> over a decade is /dev/urandom, and now, getrandom(2).  We only need
> 384 bits of randomness every 5 minutes to reseed the CRNG, and that's
> plenty even given the very conservative entropy estimation currently
> being used.
>
> This was deliberate.  I care a lot more that we get the initial
> boot-time CRNG initialization right on ARM32 and MIPS embedded
> devices, far, far, more than I care about making plenty of
> information-theoretic entropy available at /dev/random on an x86
> system.  Further, I haven't seen an argument for the use case where
> this would be valuable.
>
> If you don't think they count because ARM32 and MIPS don't have a
> high-res timer, then you have very different priorities than I do.  I
> will point out that numerically there are huge number of these devices
> --- and very, very few users of /dev/random.
>
>> You mentioned that you are super conservative for interrupts due to timer
>> interrupts. In all measurements on the different systems I conducted, I have
>> not seen that the timer triggers an interrupt picked up by
>> add_interrupt_randomness.
>
> Um, the timer is the largest number of interrupts on my system.  Compare:
>
> CPU0   CPU1   CPU2   CPU3
>  LOC:6396552603886565586466057102   Local timer interrupts
>
> with the number of disk related interrupts:
>
>  120:  21492 139284  405131705886   PCI-MSI 376832-edge  
> ahci[:00:17.0]
>
> ... and add_interrupt_randomness() gets called for **every**
> interrupt.  On an mostly idle machine (I was in meetings most of
> today) it's not surprising that time interrupts dominate.  That
> doesn

Re: [PATCH 4/6] fscrypt: verify that the correct master key was supplied

2017-07-14 Thread Jeffrey Walton
On Wed, Jul 12, 2017 at 5:00 PM, Eric Biggers  wrote:
> From: Eric Biggers 
>
>
> Solve the problem for v2 encryption policies by storing a "hash" of the
> master encryption key in the encryption xattr and verifying it before
> accepting the user-provided key.
> ...

Forgive my ignorance... Doesn't that setup an oracle so an attacker
can query keys?

It seems like the problem is deeper into the design. Namely, the
caching and sharing of keys.

Jeff


Re: [PATCH] random: silence compiler warnings and fix race

2017-06-21 Thread Jeffrey Walton
On Tue, Jun 20, 2017 at 7:38 PM, Theodore Ts'o  wrote:
> On Tue, Jun 20, 2017 at 11:49:07AM +0200, Jason A. Donenfeld wrote:
>> ...
>>> I more or less agree with you that we should just turn this on for all
>>> users and they'll just have to live with the spam and report odd
>>> entries, and overtime we'll fix all the violations.
>
> There seems to be a fundamental misapprehension that it will be easy
> to "fix all the violations".  For certain hardware types, this is
> not easy, and the "eh, let them get spammed until we get around to
> fixing it" attitude is precisely what I was pushing back against.

I can't speak for others, but for me: I think they will fall into
three categories:

 1. easy to fix
 2. difficult to fix
 3. unable to fix

(1) is low hanging fruit and they will probably (hopefully?) be
cleared easily.  Like systemd on x86_64 with rdrand and rdseed.
There's no reason for systemd to find itself starved of entropy on
that platform. (cf., http://github.com/systemd/systemd/issues/4167).

Organizations that find themselves in (3) can choose to use a board or
server and accept the risk, or they can choose to remediate it in
another way. The "other way" may include a capital expenditure and a
hardware refresh.

The central point is, they know about the risk and they can make the decision.

Jeff


Re: [PATCH] random: silence compiler warnings and fix race

2017-06-20 Thread Jeffrey Walton
On Tue, Jun 20, 2017 at 5:36 AM, Theodore Ts'o  wrote:
> On Tue, Jun 20, 2017 at 10:53:35AM +0200, Jason A. Donenfeld wrote:
>> > Suppressing all messages for all configurations cast a wider net than
>> > necessary. Configurations that could potentially be detected and fixed
>> > likely will go unnoticed. If the problem is not brought to light, then
>> > it won't be fixed.
>>
>> I more or less agree with you that we should just turn this on for all
>> users and they'll just have to live with the spam and report odd
>> entries, and overtime we'll fix all the violations.
>
> Fix all the problems *how*?  If you are on an old system which doesn't
> a hardware random number generator, and which doesn't have a high
> resolution cycle counter, and may not have a lot of entropy easily
> harvestable from the environment, there may not be a lot you can do.
> Sure, you can pretend that the cache (which by the way is usually
> determinstic) is ***so*** complicated that no one can figure it out,
> and essentially pretend that you have entropy when you probably don't;
> that just simply becomes a different way of handwaving and suppressing
> the warning messages.
>
>> But I think there's another camp that would mutiny in the face of this
>> kind of hubris.
>
> Blocking the boot for hours and hours until we have enough entropy to
> initialize the CRNG is ***not*** an acceptable way of making the
> warning messages go away.  Do that and the users **will** mutiny.
>
> It's this sort of attitude which is why Linus has in the past said
> that security people are sometimes insane

I don't believe it has anything to do with insanity. Its sound
security engineering.

Are there compelling reasons a single dmesg warning cannot be provided?

A single message avoids spamming the logs. It also informs the system
owner of the problem. An individual or organization can then take
action based on their risk posture. Finally, it avoids the kernel
making policy decisions for a user or organization.

Jeff


Re: [PATCH] random: silence compiler warnings and fix race

2017-06-20 Thread Jeffrey Walton
On Tue, Jun 20, 2017 at 4:14 AM, Jason A. Donenfeld  wrote:
>...
> Specifically, I added `depends on DEBUG_KERNEL`. This means that these
> useful warnings will only poke other kernel developers. This is probably
> exactly what we want. If the various associated developers see a warning
> coming from their particular subsystem, they'll be more motivated to
> fix it. Ordinary users on distribution kernels shouldn't see the
> warnings or the spam at all, since typically users aren't using
> DEBUG_KERNEL.

I think it is a bad idea to suppress all messages from a security
engineering point of view.

Many folks don't run debug kernels. Most of the users who want or need
to know of the issues won't realize its happening. Consider, the
reason we learned of systemd's problems was due to dmesg's.

Suppressing all messages for all configurations cast a wider net than
necessary. Configurations that could potentially be detected and fixed
likely will go unnoticed. If the problem is not brought to light, then
it won't be fixed.

I feel like the kernel is making policy decisions for some
organizations. For those who have hardware that is effectively
unfixable, then organization has to decide what to do based on their
risk adversity. They may decide to live with the risk, or they may
decide to refresh the hardware. However, without information on the
issue, they may not even realize they have an actionable item.

Jeff


Re: [kernel-hardening] Re: [PATCH v4 06/13] iscsi: ensure RNG is seeded before use

2017-06-17 Thread Jeffrey Walton
On Fri, Jun 16, 2017 at 11:45 PM, Lee Duncan  wrote:
> On 06/16/2017 05:41 PM, Jason A. Donenfeld wrote:
>> Hi Lee,
>>
>> On Fri, Jun 16, 2017 at 11:58 PM, Lee Duncan  wrote:
>>> It seems like what you are doing is basically "good", i.e. if there is
>>> not enough random data, don't use it. But what happens in that case? The
>>> authentication fails? How does the user know to wait and try again?
>>
>> The process just remains in interruptible (kill-able) sleep until
>> there is enough entropy, so the process doesn't need to do anything.
>> If the waiting is interrupted by a signal, it returns -ESYSRESTART,
>> which follows the usual semantics of restartable syscalls.
>>
> In your testing, how long might a process have to wait? Are we talking
> seconds? Longer? What about timeouts?
>
> Sorry, but your changing something that isn't exactly broken, so I just
> want to be sure we're not introducing some regression, like clients
> can't connect the first 5 minutes are a reboot.

CHAP (https://www.rfc-editor.org/rfc/rfc1994.txt) and iSCSI
(https://www.ietf.org/rfc/rfc3720.txt) require random values. If iSCSI
is operating without them, it seems like something is broken. From RFC
3720, Section 8.2.1, CHAP Considerations:

   When CHAP is performed over a non-encrypted channel, it is vulnerable
   to an off-line dictionary attack.  Implementations MUST support use
   of up to 128 bit random CHAP secrets, including the means to generate
   such secrets and to accept them from an external generation source.
   Implementations MUST NOT provide secret generation (or expansion)
   means other than random generation.

CHAP actually has a weaker requirement since it only requires _unique_
(and not _random_). From RFC 1994, Section 2.3, Design Requirements:

   Each challenge value SHOULD be unique, since repetition of a
   challenge value in conjunction with the same secret would permit an
   attacker to reply with a previously intercepted response.  Since it
   is expected that the same secret MAY be used to authenticate with
   servers in disparate geographic regions, the challenge SHOULD exhibit
   global and temporal uniqueness.

But its not clear to me how to ensure uniqueness when its based on
randomness from the generators.

Jeff


Re: [PATCH v4 13/13] random: warn when kernel uses unseeded randomness

2017-06-08 Thread Jeffrey Walton
On Tue, Jun 6, 2017 at 1:48 PM, Jason A. Donenfeld  wrote:
> This enables an important dmesg notification about when drivers have
> used the crng without it being seeded first. Prior, these errors would
> occur silently, and so there hasn't been a great way of diagnosing these
> types of bugs for obscure setups. By adding this as a config option, we
> can leave it on by default, so that we learn where these issues happen,
> in the field, will still allowing some people to turn it off, if they
> really know what they're doing and do not want the log entries.
>
> However, we don't leave it _completely_ by default. An earlier version
> of this patch simply had `default y`. I'd really love that, but it turns
> out, this problem with unseeded randomness being used is really quite
> present and is going to take a long time to fix. Thus, as a compromise
> between log-messages-for-all and nobody-knows, this is `default y`,
> except it is also `depends on DEBUG_KERNEL`. This will ensure that the
> curious see the messages while others don't have to.

Please forgive my ignorance... What do the last two sentence mean exactly?

If I am running a production Debian, Fedora or Ubuntu kernel, will a
messages be present if a violation occurs? A "violation" is a policy
violation, and I mean a generator is used before its operational.

Sunlight is the best disinfectant. At least one message should be
logged to ensure the issue is known. But its not clear to me if that's
what happens when trying to parse the last two sentences.

Jeff


Re: [PATCH v3 02/13] random: add get_random_{bytes,u32,u64,int,long,once}_wait family

2017-06-05 Thread Jeffrey Walton
On Mon, Jun 5, 2017 at 8:50 PM, Jason A. Donenfeld  wrote:
> These functions are simple convenience wrappers that call
> wait_for_random_bytes before calling the respective get_random_*
> function.

It may be advantageous to add a timeout, too.

There's been a number of times I did not want to wait an INFINITE
amount of time for a completion. (In another context).

Jeff

> Signed-off-by: Jason A. Donenfeld 
> ---
>  include/linux/net.h|  2 ++
>  include/linux/once.h   |  2 ++
>  include/linux/random.h | 25 +
>  3 files changed, 29 insertions(+)
>
> diff --git a/include/linux/net.h b/include/linux/net.h
> index abcfa46a2bd9..dda2cc939a53 100644
> --- a/include/linux/net.h
> +++ b/include/linux/net.h
> @@ -274,6 +274,8 @@ do {  
>   \
>
>  #define net_get_random_once(buf, nbytes)   \
> get_random_once((buf), (nbytes))
> +#define net_get_random_once_wait(buf, nbytes)  \
> +   get_random_once_wait((buf), (nbytes))
>
>  int kernel_sendmsg(struct socket *sock, struct msghdr *msg, struct kvec *vec,
>size_t num, size_t len);
> diff --git a/include/linux/once.h b/include/linux/once.h
> index 285f12cb40e6..9c98aaa87cbc 100644
> --- a/include/linux/once.h
> +++ b/include/linux/once.h
> @@ -53,5 +53,7 @@ void __do_once_done(bool *done, struct static_key *once_key,
>
>  #define get_random_once(buf, nbytes)\
> DO_ONCE(get_random_bytes, (buf), (nbytes))
> +#define get_random_once_wait(buf, nbytes)
> \
> +   DO_ONCE(get_random_bytes_wait, (buf), (nbytes))  \
>
>  #endif /* _LINUX_ONCE_H */
> diff --git a/include/linux/random.h b/include/linux/random.h
> index e29929347c95..4aecc339558d 100644
> --- a/include/linux/random.h
> +++ b/include/linux/random.h
> @@ -58,6 +58,31 @@ static inline unsigned long get_random_long(void)
>  #endif
>  }
>
> +/* Calls wait_for_random_bytes() and then calls get_random_bytes(buf, 
> nbytes).
> + * Returns the result of the call to wait_for_random_bytes. */
> +static inline int get_random_bytes_wait(void *buf, int nbytes)
> +{
> +   int ret = wait_for_random_bytes();
> +   if (unlikely(ret))
> +   return ret;
> +   get_random_bytes(buf, nbytes);
> +   return 0;
> +}
> +
> +#define declare_get_random_var_wait(var) \
> +   static inline int get_random_ ## var ## _wait(var *out) { \
> +   int ret = wait_for_random_bytes(); \
> +   if (unlikely(ret)) \
> +   return ret; \
> +   *out = get_random_ ## var(); \
> +   return 0; \
> +   }
> +declare_get_random_var_wait(u32)
> +declare_get_random_var_wait(u64)
> +declare_get_random_var_wait(int)
> +declare_get_random_var_wait(long)
> +#undef declare_get_random_var
> +
>  unsigned long randomize_page(unsigned long start, unsigned long range);
>
>  u32 prandom_u32(void);


Re: get_random_bytes returns bad randomness before seeding is complete

2017-06-03 Thread Jeffrey Walton
On Sun, Jun 4, 2017 at 1:48 AM, Stephan Müller  wrote:
> Am Freitag, 2. Juni 2017, 16:59:56 CEST schrieb Jason A. Donenfeld:
>
>> Alternatively, I'm open to other solutions people might come up with.
>
> How about stirring in some data from the Jitter RNG that we have in the kernel
> already and that is used for the DRBG in case get_random_bytes has
> insufficient entropy? Yes, two kernel developers said that this RNG is
> useless, where in fact a lot of hardware and even crypto folks say that this
> approach has merits.

Almost anything has to be better than (1) silent failures, and (2)
draining the little entropy available when the generators are starting
and trying to become operational.

The [negative] use case for (2) is systemd. See, for example,
https://github.com/systemd/systemd/issues/4167.

Jeff


Re: get_random_bytes returns bad randomness before seeding is complete

2017-06-03 Thread Jeffrey Walton
On Sat, Jun 3, 2017 at 5:45 PM, Sandy Harris  wrote:
> ...
> Of course this will fail on systems with no high-res timer. Are there
> still some of those? It might be done in about 1000 times as long on a
> system that lacks the realtime library's nanosecond timer but has the
> Posix standard microsecond timer, implying a delay time in the
> milliseconds. Would that be acceptable in those cases?

A significant portion of the use cases should include mobile devices.
Device sales outnumbered desktop and server sales several years ago.

Many devices are sensor rich. Even the low-end ones come with
accelorometers for gaming. A typical one has 3 or 4 sensors, and
higher-end ones have 7 or 8 sensors. An Evo 4G has 7 of them.

There's no wanting for entropy in many of the use cases. The thing
that is lacking seems to be taking advantage of it.

Jeff


Re: [PATCH] crypto: gf128mul - define gf128mul_x_ble in gf128mul.h

2017-03-30 Thread Jeffrey Walton
>> Also note that '(b & ((u64)1 << 63)) ? 0x87 : 0x00;' is actually getting
>> compiled as '((s64)b >> 63) & 0x87', which is branchless and therefore makes 
>> the
>> new version more efficient than one might expect:
>>
>> sar$0x3f,%rax
>> and$0x87,%eax
>>
>> It could even be written the branchless way explicitly, but it shouldn't 
>> matter.
>
> I think the definition using unsigned operations is more intuitive...
> Let's just leave the clever tricks up to the compiler :)

It may be a good idea to use the one that provides constant time-ness
to help avoid leaking information.

Jeff


Re: [ANNOUNCE] /dev/random - a new approach (code for 4.11-rc1)

2017-03-18 Thread Jeffrey Walton
>> > The design and implementation is driven by a set of goals described in [2]
>> > that the LRNG completely implements. Furthermore, [2] includes a
>> > comparison with RNG design suggestions such as SP800-90B, SP800-90C, and
>> > AIS20/31.
>>
>> A quick comment about SP800 and the hardware instructions... RDSEED is
>> 2 to 5 times slower than RDRAND on Intel hardware, depending on the
>> architecture and microarchitecture.
>
> I am not sure how this statement relates to the quote above. RDSEED is the
> CBC-MACed output of the flip-flop providing the raw noise.
>
> RDRAND is the output of the SP800-90A CTR DRBG that is seeded by the CBC-MAC
> that also feeds RDSEED. Thus, RDSEED is as fast as the noise source where
> RDRAND is a pure deterministic RNG that tries to be (re)seeded as often as
> possible.
>
> Both instructions are totally unrelated to the SP800-90A DRBG available to the
> Linux kernel.

SP800-90A requires an entropy source to bootstrap the Hash, HMAC and
CTR generators. That is, the Instantiate and Reseed functions need an
approved source of entropy. Both RDRAND and RDSEED are approved for
Intel chips. See SP800-90A, Section 8.6.5
(http://csrc.nist.gov/publications/nistpubs/800-90A/SP800-90A.pdf).

Jeff


Re: [ANNOUNCE] /dev/random - a new approach (code for 4.11-rc1)

2017-03-18 Thread Jeffrey Walton
> The design and implementation is driven by a set of goals described in [2]
> that the LRNG completely implements. Furthermore, [2] includes a
> comparison with RNG design suggestions such as SP800-90B, SP800-90C, and
> AIS20/31.

A quick comment about SP800 and the hardware instructions... RDSEED is
2 to 5 times slower than RDRAND on Intel hardware, depending on the
architecture and microarchitecture. AMD's implementation of RDRAND is
orders of magnitude slower than Intel's. Testing on an Athlon 845 X4
(Bulldozer v4) @ 3.5 GHz shows it runs between 4100 and 4500 cycles
per byte. It works out to be about 1 MiB/s.

While the LRNG may reach a cryptographically acceptable seed level
much earlier then the existing /dev/random, it may not be early
enough. Some components, like systemd, will ask for random numbers and
truck-on even if they are not available. Systemd does not block or
wait if get_random_bytes fails to produce. In the bigger picture,
don't expect that software layered above will do the expected thing in
all cases.

Jeff


Re: [PATCH 0/2] crypto: arm64/ARM: NEON accelerated ChaCha20

2016-12-27 Thread Jeffrey Walton
> ChaCha20 is a stream cipher described in RFC 7539, and is intended to be
> an efficient software implementable 'standby cipher', in case AES cannot
> be used.

That's not quite correct.

The IETF changed the algorithm a bit, and its not compatible with
Bernstein's ChaCha. They probably should have differentiated the name
to avoid this sort of confusion.

You can find Bernstein's specification for ChaCha at
https://cr.yp.to/chacha.html, and the test vectors for Bernstein's
specification at
http://tools.ietf.org/html/draft-strombergson-chacha-test-vectors.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread Jeffrey Walton
> As far as half-siphash is concerned, it occurs to me that the main
> problem will be those users who need to guarantee that output can't be
> guessed over a long period of time.  For example, if you have a
> long-running process, then the output needs to remain unguessable over
> potentially months or years, or else you might be weakening the ASLR
> protections.  If on the other hand, the hash table or the process will
> be going away in a matter of seconds or minutes, the requirements with
> respect to cryptographic strength go down significantly.

Perhaps SipHash-4-8 should be used instead of SipHash-2-4. I believe
SipHash-4-8 is recommended for the security conscious who want to be
more conservative in their security estimates.

SipHash-4-8 does not add much more processing. If you are clocking
SipHash-2-4 at 2.0 or 2.5 cpb, then SipHash-4-8 will run at 3.0 to
4.0. Both are well below MD5 times. (At least with the data sets I've
tested).

> Now, maybe this doesn't matter that much if we can guarantee (or make
> assumptions) that the attacker doesn't have unlimited access the
> output stream of get_random_{long,int}(), or if it's being used in an
> anti-DOS use case where it ultimately only needs to be harder than
> alternate ways of attacking the system.
>
> Rekeying every five minutes doesn't necessarily help the with respect
> to ASLR, but it might reduce the amount of the output stream that
> would be available to the attacker in order to be able to attack the
> get_random_{long,int}() generator, and it also reduces the value of
> doing that attack to only compromising the ASLR for those processes
> started within that five minute window.

Forgive my ignorance... I did not find reading on using the primitive
in a PRNG. Does anyone know what Aumasson or Bernstein have to say?
Aumasson's site does not seem to discuss the use case:
https://www.google.com/search?q=siphash+rng+site%3A131002.net. (And
their paper only mentions random-number once in a different context).

Making the leap from internal hash tables and short-lived network
packets to the rng case may leave something to be desired, especially
if the bits get used in unanticipated ways, like creating long term
private keys.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread Jeffrey Walton
> diff --git a/lib/test_siphash.c b/lib/test_siphash.c
> new file mode 100644
> index ..93549e4e22c5
> --- /dev/null
> +++ b/lib/test_siphash.c
> @@ -0,0 +1,83 @@
> +/* Test cases for siphash.c
> + *
> + * Copyright (C) 2016 Jason A. Donenfeld . All Rights 
> Reserved.
> + *
> + * This file is provided under a dual BSD/GPLv2 license.
> + *
> + * SipHash: a fast short-input PRF
> + * https://131002.net/siphash/
> + *
> + * This implementation is specifically for SipHash2-4.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* Test vectors taken from official reference source available at:
> + * https://131002.net/siphash/siphash24.c
> + */
> +static const u64 test_vectors[64] = {
> +   0x726fdb47dd0e0e31ULL, 0x74f839c593dc67fdULL, 0x0d6c8009d9a94f5aULL,
> +   0x85676696d7fb7e2dULL, 0xcf2794e0277187b7ULL, 0x18765564cd99a68dULL,
> +   0xcbc9466e58fee3ceULL, 0xab0200f58b01d137ULL, 0x93f5f5799a932462ULL,
> +   0x9e0082df0ba9e4b0ULL, 0x7a5dbbc594ddb9f3ULL, 0xf4b32f46226bada7ULL,
> +   0x751e8fbc860ee5fbULL, 0x14ea5627c0843d90ULL, 0xf723ca908e7af2eeULL,
> +   0xa129ca6149be45e5ULL, 0x3f2acc7f57c29bdbULL, 0x699ae9f52cbe4794ULL,
> +   0x4bc1b3f0968dd39cULL, 0xbb6dc91da77961bdULL, 0xbed65cf21aa2ee98ULL,
> +   0xd0f2cbb02e3b67c7ULL, 0x93536795e3a33e88ULL, 0xa80c038ccd5ccec8ULL,
> +   0xb8ad50c6f649af94ULL, 0xbce192de8a85b8eaULL, 0x17d835b85bbb15f3ULL,
> +   0x2f2e6163076bcfadULL, 0xde4daaaca71dc9a5ULL, 0xa6a2506687956571ULL,
> +   0xad87a3535c49ef28ULL, 0x32d892fad841c342ULL, 0x7127512f72f27cceULL,
> +   0xa7f32346f95978e3ULL, 0x12e0b01abb051238ULL, 0x15e034d40fa197aeULL,
> +   0x314dffbe0815a3b4ULL, 0x027990f029623981ULL, 0xcadcd4e59ef40c4dULL,
> +   0x9abfd8766a33735cULL, 0x0e3ea96b5304a7d0ULL, 0xad0c42d6fc585992ULL,
> +   0x187306c89bc215a9ULL, 0xd4a60abcf3792b95ULL, 0xf935451de4f21df2ULL,
> +   0xa9538f0419755787ULL, 0xdb9acddff56ca510ULL, 0xd06c98cd5c0975ebULL,
> +   0xe612a3cb9ecba951ULL, 0xc766e62cfcadaf96ULL, 0xee64435a9752fe72ULL,
> +   0xa192d576b245165aULL, 0x0a8787bf8ecb74b2ULL, 0x81b3e73d20b49b6fULL,
> +   0x7fa8220ba3b2eceaULL, 0x245731c13ca42499ULL, 0xb78dbfaf3a8d83bdULL,
> +   0xea1ad565322a1a0bULL, 0x60e61c23a3795013ULL, 0x6606d7e446282b93ULL,
> +   0x6ca4ecb15c5f91e1ULL, 0x9f626da15c9625f3ULL, 0xe51b38608ef25f57ULL,
> +   0x958a324ceb064572ULL
> +};
> +static const siphash_key_t test_key =
> +   { 0x0706050403020100ULL , 0x0f0e0d0c0b0a0908ULL };
> +
> +static int __init siphash_test_init(void)
> +{
> +   u8 in[64] __aligned(SIPHASH_ALIGNMENT);
> +   u8 in_unaligned[65];
> +   u8 i;
> +   int ret = 0;
> +
> +   for (i = 0; i < 64; ++i) {
> +   in[i] = i;
> +   in_unaligned[i + 1] = i;
> +   if (siphash(in, i, test_key) != test_vectors[i]) {
> +   pr_info("self-test aligned %u: FAIL\n", i + 1);
> +   ret = -EINVAL;
> +   }
> +   if (siphash_unaligned(in_unaligned + 1, i, test_key) != 
> test_vectors[i]) {
> +   pr_info("self-test unaligned %u: FAIL\n", i + 1);
> +   ret = -EINVAL;
> +   }
> +   }
> +   if (!ret)
> +   pr_info("self-tests: pass\n");
> +   return ret;
> +}
> +
> +static void __exit siphash_test_exit(void)
> +{
> +}
> +
> +module_init(siphash_test_init);
> +module_exit(siphash_test_exit);
> +
> +MODULE_AUTHOR("Jason A. Donenfeld ");
> +MODULE_LICENSE("Dual BSD/GPL");
> --
> 2.11.0
>

I believe the output of SipHash depends upon endianness. Folks who
request a digest through the af_alg interface will likely expect a
byte array.

I think that means on little endian machines, values like element 0
must be reversed byte reversed:

0x726fdb47dd0e0e31ULL => 31,0e,0e,dd,47,db,6f,72

If I am not mistaken, that value (and other tv's) are returned here:

return (v0 ^ v1) ^ (v2 ^ v3);

It may be prudent to include the endian reversal in the test to ensure
big endian machines produce expected results. Some closely related
testing on an old Apple PowerMac G5 revealed that result needed to be
reversed before returning it to a caller.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fast Code and HAVE_EFFICIENT_UNALIGNED_ACCESS (was: [PATCH] poly1305: generic C can be faster on chips with slow unaligned access)

2016-11-02 Thread Jeffrey Walton
On Wed, Nov 2, 2016 at 5:25 PM, Jason A. Donenfeld  wrote:
> These architectures select HAVE_EFFICIENT_UNALIGNED_ACCESS:
>
> s390 arm arm64 powerpc x86 x86_64
>
> So, these will use the original old code.
>
> The architectures that will thus use the new code are:
>
> alpha arc avr32 blackfin c6x cris frv h7300 hexagon ia64 m32r m68k
> metag microblaze mips mn10300 nios2 openrisc parisc score sh sparc
> tile um unicore32 xtensa

What I have found in practice from helping maintain a security library
and running benchmarks until my eyes bled

UNALIGNED_ACCESS is a kiss of death. It effectively prohibits -O3 and
above due to undefined behavior in C and problems with GCC
vectorization. In the bigger picture, it simply slows things down.

Once we moved away from UNALIGNED_ACCESS and started testing at -O3
and -O5, the benchmarks enjoyed non-trivial speedups on top of any
speedups we were trying to achieve with hand tuned assembly language
routines. Effectively, the best speedup was the sum of C-code and ASM;
they were not disjoint as they appear.

The one wrinkle for UNALIGNED_ACCESS is Bernstein's compressed tables
(https://cr.yp.to/antiforgery/cachetiming-20050414.pdf).
UNALIGNED_ACCESS meets some security goals. The techniques from
Bernstein's paper apply equally well to AES, Camellia and other
table-driven implementations. Painting with a broad brush (and as far
as I know), the kernel is not observing the recommendations. My
apologies if I parsed things incorrectly.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] libkcapi v0.12.0 released

2016-10-27 Thread Jeffrey Walton
>> > The Linux kernel exports a network interface of type AF_ALG to allow user
>> > space to utilize the kernel crypto API. libkcapi uses this network
>> > interface and exports an easy to use API so that a developer does not
>> > need to consider the low-level network interface handling.
...
>> any preprocessor macros to guard code paths in userland? What are the
>
> There are no special guards. If AF_ALG is available, all user space processes
> can use it.
>
>> preprocessor macros we can use to guard it?
>
> I am not entirely sure I understand the question.

See, for example,
https://github.com/openssl/openssl/blob/master/engines/afalg/e_afalg.c

The versions are kind of arbitrary because there's no easy way to tell
when the gear is available. If I recall from researching things, there
were two components needed for afalg, and they were supposedly
available back to later 2.x kernels.

Things should "just work" for 3.x and 4.x kernels. But if "too early"
a kernel is encountered, then users experience the spectrum from
compile problems to unexplained runtime errors.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] libkcapi v0.12.0 released

2016-10-26 Thread Jeffrey Walton
> The Linux kernel exports a network interface of type AF_ALG to allow user
> space to utilize the kernel crypto API. libkcapi uses this network interface
> and exports an easy to use API so that a developer does not need to consider
> the low-level network interface handling.
>
> The library does not implement any low level cipher algorithms. All consumer
> requests are sent to the kernel for processing. Results from the kernel crypto
> API are returned to the consumer via the library API.
>
> The kernel interface and therefore this library can be used by unprivileged
> processes.
>
> The library code archive also provides a drop-in replacement for the command
> line tools of sha*sum, fipscheck/fipshmac and sha512hmac.
>
> The source code and the documentation is available at [1].

That looks awesome Stephan.

How can user code reliably detect when the API is available? Are there
any preprocessor macros to guard code paths in userland? What are the
preprocessor macros we can use to guard it?

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: algif_aead: AIO broken with more than one iocb

2016-09-11 Thread Jeffrey Walton
> The AIO support for algif_aead is broken when submitting more than one iocb.
> The break happens in aead_recvmsg_async at the following code:
>

I think the kernel needs to take a half step back, and add the missing
self tests and test cases to be more proactive in detecting breaks
earlier. Speaking first hand, some of these breaks have existed for
months.

I don't take the position you can't break things. I believe you can't
make an omelet without breaking eggs; and if you're not breaking
something, then you're probably not getting anything done. The
engineering defect is not detecting the break.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Entropy sources (was: /dev/random - a new approach)

2016-08-20 Thread Jeffrey Walton
On Fri, Aug 19, 2016 at 1:20 PM, H. Peter Anvin  wrote:
> On 08/18/16 22:56, Herbert Xu wrote:
>> On Thu, Aug 18, 2016 at 10:49:47PM -0400, Theodore Ts'o wrote:
>>>
>>> That really depends on the system.  We can't assume that people are
>>> using systems with a 100Hz clock interrupt.  More often than not
>>> people are using tickless kernels these days.  That's actually the
>>> problem with changing /dev/urandom to block until things are
>>> initialized.
>>
>> Couldn't we disable tickless until urandom has been seeded? In fact
>> perhaps we should accelerate the timer interrupt rate until it has
>> been seeded?
>>
>
> The biggest problem there is that the timer interrupt adds *no* entropy
> unless there is a source of asynchronicity in the system.  On PCs,
> traditionally the timer has been run from a completely different crystal
> (14.31818 MHz) than the CPU, which is the ideal situation, but if they
> are run off the same crystal and run in lockstep, there is very little
> if anything there.  On some systems, the timer may even *be* the only
> source of time, and the entropy truly is zero.

It seems like a networked computer should have an abundance on entropy
available from the network stack. Every common case I can come up with
includes a networked computer. If a handheld is outside of coverage,
then it probably does not have the randomness demands because it can't
communicate (i.e., TCP sequence numbers, key agreement, etc).

In fact, there are at least two papers that use bits from the network stack:

* When Good Randomness Goes Bad: Virtual Machine Reset Vulnerabilities
and Hedging Deployed Cryptography,
http://pages.cs.wisc.edu/~rist/papers/sslhedge.pdf
* When Virtual is Harder than Real: Security Challenges in Virtual
Machine Based Computing Environments,
http://www.usenix.org/legacy/event/hotos05/final_papers/full_papers/garfinkel/garfinkel.pdf

As IoT gains traction the entropy available locally should increase
because these devices are chatty. I also expect gossip protocols to
play more of a role in the future. A network based attacker cannot
possibly monitor every conversation, especially when devices pair and
form adhoc networks. The network attacker probably won't see the
traffic on a local LAN segment either for headless servers.

When using network bits, it seems like the remaining problem is
extracting the entropy. I think Krawczyk (et al) have done a lot of
work in this area:

* Leftover Hash Lemma, Revisited, http://eprint.iacr.org/2011/088.pdf
* Cryptographic Extraction and Key Derivation: The HKDF Scheme,
http://eprint.iacr.org/2010/264.pdf

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add Ingenic JZ4780 hardware RNG driver

2016-08-19 Thread Jeffrey Walton
On Wed, Aug 17, 2016 at 11:35 AM, PrasannaKumar Muralidharan
 wrote:
> This patch adds support for hardware random number generator present in
> JZ4780 SoC.
>
> Signed-off-by: PrasannaKumar Muralidharan 
> ---
>  ...
> +static int jz4780_rng_read(struct hwrng *rng, void *buf, size_t max, bool 
> wait)
> +{
> +   struct jz4780_rng *jz4780_rng = container_of(rng, struct jz4780_rng,
> +   rng);
> +   u32 *data = buf;
> +   *data = jz4780_rng_readl(jz4780_rng, REG_RNG_DATA);
> +   return 4;
> +}

My bad, I should have spotted this earlier

i686, x86_64 and some ARM will sometimes define a macro indicating
unaligned data access is allowed. For example, see
__ARM_FEATURE_UNALIGNED (cf.,
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0774f/chr1383660321827.html)
. MIPSEL does not define such a macro.

# MIPS ci20 creator with GCC 4.6
$ gcc -march=native -dM -E -  +   u32 *data = buf;
> +   *data = jz4780_rng_readl(jz4780_rng, REG_RNG_DATA);

If GCC emits code that uses the MIPS unaligned load and store
instructions, then there's probably going to be a performance penalty.

Regardless of what the CPU tolerates, I believe unaligned data access
is undefined behavior in C/C++. I believe you should memcpy the value
into the buffer.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add Ingenic JZ4780 hardware RNG driver

2016-08-19 Thread Jeffrey Walton
On Wed, Aug 17, 2016 at 11:35 AM, PrasannaKumar Muralidharan
 wrote:
> This patch adds support for hardware random number generator present in
> JZ4780 SoC.
>
> Signed-off-by: PrasannaKumar Muralidharan 
> ---
>  .../devicetree/bindings/rng/ingenic,jz4780-rng.txt |  12 +++
>  MAINTAINERS|   5 +
>  arch/mips/boot/dts/ingenic/jz4780.dtsi |   7 +-
>  drivers/char/hw_random/Kconfig |  14 +++
>  drivers/char/hw_random/Makefile|   1 +
>  drivers/char/hw_random/jz4780-rng.c| 105 
> +
>  6 files changed, 143 insertions(+), 1 deletion(-)
>  create mode 100644 
> Documentation/devicetree/bindings/rng/ingenic,jz4780-rng.txt
>  create mode 100644 drivers/char/hw_random/jz4780-rng.c
>
> diff --git a/Documentation/devicetree/bindings/rng/ingenic,jz4780-rng.txt 
> b/Documentation/devicetree/bindings/rng/ingenic,jz4780-rng.txt
> new file mode 100644
> index 000..03abf56
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/rng/ingenic,jz4780-rng.txt
> @@ -0,0 +1,12 @@
> +Ingenic jz4780 RNG driver
> +
> +Required properties:
> +- compatible : Should be "ingenic,jz4780-rng"
> +- reg : Specifies base physical address and size of the registers.
> +
> +Example:
> +
> +rng: rng@10D8 {
> +   compatible = "ingenic,jz4780-rng";
> +   reg = <0x10D8 0x8>;
> +};
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 08e9efe..c0c66eb 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6002,6 +6002,11 @@ M:   Zubair Lutfullah Kakakhel 
> 
>  S: Maintained
>  F: drivers/dma/dma-jz4780.c
>
> +INGENIC JZ4780 HW RNG Driver
> +M: PrasannaKumar Muralidharan 
> +S: Maintained
> +F: drivers/char/hw_random/jz4780-rng.c
> +
>  INTEGRITY MEASUREMENT ARCHITECTURE (IMA)
>  M: Mimi Zohar 
>  M: Dmitry Kasatkin 
> diff --git a/arch/mips/boot/dts/ingenic/jz4780.dtsi 
> b/arch/mips/boot/dts/ingenic/jz4780.dtsi
> index b868b42..f11d139 100644
> --- a/arch/mips/boot/dts/ingenic/jz4780.dtsi
> +++ b/arch/mips/boot/dts/ingenic/jz4780.dtsi
> @@ -36,7 +36,7 @@
>
> cgu: jz4780-cgu@1000 {
> compatible = "ingenic,jz4780-cgu";
> -   reg = <0x1000 0x100>;
> +   reg = <0x1000 0xD8>;
>
> clocks = <&ext>, <&rtc>;
> clock-names = "ext", "rtc";
> @@ -44,6 +44,11 @@
> #clock-cells = <1>;
> };
>
> +   rng: jz4780-rng@10D8 {
> +   compatible = "ingenic,jz4780-rng";
> +   reg = <0x10D8 0x8>;
> +   };
> +
> uart0: serial@1003 {
> compatible = "ingenic,jz4780-uart";
> reg = <0x1003 0x100>;
> diff --git a/drivers/char/hw_random/Kconfig b/drivers/char/hw_random/Kconfig
> index 56ad5a59..c336fe8 100644
> --- a/drivers/char/hw_random/Kconfig
> +++ b/drivers/char/hw_random/Kconfig
> @@ -294,6 +294,20 @@ config HW_RANDOM_POWERNV
>
>   If unsure, say Y.
>
> +config HW_RANDOM_JZ4780
> +   tristate "JZ4780 HW random number generator support"
> +   depends on MACH_INGENIC
> +   depends on HAS_IOMEM
> +   default HW_RANDOM
> +   ---help---
> + This driver provides kernel-side support for the Random Number
> + Generator hardware found on JZ4780 SOCs.
> +
> + To compile this driver as a module, choose M here: the
> + module will be called jz4780-rng.
> +
> + If unsure, say Y.
> +
>  config HW_RANDOM_EXYNOS
> tristate "EXYNOS HW random number generator support"
> depends on ARCH_EXYNOS || COMPILE_TEST
> diff --git a/drivers/char/hw_random/Makefile b/drivers/char/hw_random/Makefile
> index 04bb0b0..a155066 100644
> --- a/drivers/char/hw_random/Makefile
> +++ b/drivers/char/hw_random/Makefile
> @@ -26,6 +26,7 @@ obj-$(CONFIG_HW_RANDOM_PSERIES) += pseries-rng.o
>  obj-$(CONFIG_HW_RANDOM_POWERNV) += powernv-rng.o
>  obj-$(CONFIG_HW_RANDOM_EXYNOS) += exynos-rng.o
>  obj-$(CONFIG_HW_RANDOM_HISI)   += hisi-rng.o
> +obj-$(CONFIG_HW_RANDOM_JZ4780) += jz4780-rng.o
>  obj-$(CONFIG_HW_RANDOM_TPM) += tpm-rng.o
>  obj-$(CONFIG_HW_RANDOM_BCM2835) += bcm2835-rng.o
>  obj-$(CONFIG_HW_RANDOM_IPROC_RNG200) += iproc-rng200.o
> diff --git a/drivers/char/hw_random/jz4780-rng.c 
> b/drivers/char/hw_random/jz4780-rng.c
> new file mode 100644
> index 000..c9d2cde
> --- /dev/null
> +++ b/drivers/char/hw_random/jz4780-rng.c
> @@ -0,0 +1,105 @@
> +/*
> + * jz4780-rng.c - Random Number Generator driver for J4780
> + *
> + * Copyright 2016 (C) PrasannaKumar Muralidharan 
> + *
> + * This file is licensed under  the terms of the GNU General Public
> + * License version 2. This program is licensed "as is" without any
> + * warranty of any kind, whether express or implied.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define REG_RNG_CTRL   0x0
> +#define REG_RNG_DATA   0x4
> +
> +struct jz4780_rn

JZ4780 RNG and entropy depletion

2016-08-14 Thread Jeffrey Walton
Hi Everyone,

I have a MIPSEL ci20 dev board for testing. The board has a hardware
based rng, but its suffering entropy depletion. I have Debian's
rng-tools package installed.

The board lacks /dev/hwrng. /dev/random blocks indefinitely after
draining the device. "Indefinitely" may not be accurate, but I killed
a program that waited over 4 hours for 16 bytes after draining
/dev/random.

The ci20's documentation is a bit scant, but it can be found at
http://mipscreator.imgtec.com/CI20/hardware/soc/JZ4780_PM.pdf. I'm not
sure what the output rate is, but it seems to be capable of one
machine word (4 bytes) every few milliseconds. Without a delay, I can
see values being shifted into the register mapped at 0x10DC.

I have a few questions:

  * Is there a driver for JZ4780 rng?
  * Is there a particular package for the driver that needs to be installed?
  * What causes/triggers /dev/hwrng to replenish /dev/random?

Thanks in advance.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Data type for aio_buf under X32?

2016-08-11 Thread Jeffrey Walton
Hi Everyone,

My apologies for this question and my confusion.

When interfacing with the kernel crypto through AF_ALG, what is the
type of 'aio_buf' under X32?

I know is X32 uses ILP32 data model, so integers/longs/pointers are
32-bits (cf., http://www.unix.org/version2/whatsnew/lp64_wp.html). I
believe Glibc uses a 'void*' for 'aio_buf' (cf.,
http://man7.org/linux/man-pages/man7/aio.7.html). But I believe the
kernel's 'aio_buf' is a u64 under X32.

I'm asking due a failure under X32 because GCC sign extends the
pointer value when upsizing to 64-bits (cf., GCC manual 4.7 Arrays and
Pointers, 
http://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-implementation.html).

Thanks in advance.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AF_ALG broken?

2016-08-08 Thread Jeffrey Walton
> When trying to use the openssl AF_ALG module with 4.8-rc1 with imx
> caam, I get this:
>
> $ OPENSSL_CONF=/shared/crypto/openssl-imx.cnf strace openssl dgst -md5 
>  ...
> socket(PF_ALG, SOCK_SEQPACKET, 0)   = 3
> close(3)= 0
> socket(PF_ALG, SOCK_SEQPACKET, 0)   = 3
> bind(3, {sa_family=AF_ALG, sa_data="hash\0\0\0\0\0\0\0\0\0\0"}, 88) = 0
> accept(3, 0, NULL)  = 4
> fstat64(0, {st_mode=S_IFREG|0755, st_size=666864, ...}) = 0
> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
> 0xb6fab000
> read(0, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\2\0(\0\1\0\0\0\21'\2\0004\0\0\0"..., 
> 8192) = 8192
> send(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\2\0(\0\1\0\0\0\21'\2\0004\0\0\0"..., 
> 8192, MSG_MORE) = -1 ENOKEY (Required key not available)

As far as I know from testing on x86, it has never worked as expected.
I believe you have to use 'sendto' and 'recvfrom' because 'send' and
'recv' use default structures, and they configure the object
incorrectly.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] DH support: add KDF handling support

2016-07-14 Thread Jeffrey Walton
> Note, as shared secrets potentially post-processed by a KDF usually are again
> used as key or data encryption keys, they need to be truncated/expanded to a
> specific length anyway. A KDF inherently provides the truncation support to
> any arbitrary length. Thus, I would think that the caller needs to provide
> that length but does not need to truncate the output itself.

As far as I know, there's no reduction in proof that a truncated hash
is as secure as the non-truncated one. One of the reasons to provide
the output length as a security parameter is to help avoid truncation
and related hash output attacks.

Also see Kelsey's work on the subject;
http://www.google.com/search?q=nist+kelsey+truncated+hash.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to decrypt message

2016-06-04 Thread Jeffrey Walton
> I am trying to encrypt decrypt data over the wire. On the receiver
> side I have a pre-routing hook where I get reference to my encrypted
> data and apply decryption using the skcipher api's, however I am
> unable to get the same data back.
>
> My algo is same on both ends "cbc(aes)" and using CRYPTO_ALG_ASYNC ,
> key is also same (content and size).

Depends on a number of things... But in general, the first thing to do
is find the self tests and run them. Finding the self tests may not be
that easy For example, afalg async tests are at
http://github.com/tstruk/afalg_async_test.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: (none)

2016-06-01 Thread Jeffrey Walton
On Wed, Jun 1, 2016 at 2:19 AM, Herbert Xu  wrote:
> On Wed, Jun 01, 2016 at 07:53:38AM +0200, Stephan Mueller wrote:
>>
>> I thought via-rng.c covers the VIA Padlock RNG?
>
> Indeed, you're quite right.  In that case Jeffrey was the via-rng
> driver loaded?

$ cat /proc/modules | egrep -i '(via|padlock|rng)'
padlock_sha 16384 0 - Live 0x
padlock_aes 16384 0 - Live 0x
via_cputemp 16384 0 - Live 0x
hwmon_vid 16384 1 via_cputemp, Live 0x
via_rng 16384 0 - Live 0x
i2c_viapro 16384 0 - Live 0x
pata_via 16384 0 - Live 0x
sata_via 16384 2 - Live 0x

And:

$ lsmod | egrep -i '(via|padlock|rng)'
padlock_sha16384  0
padlock_aes16384  0
via_cputemp16384  0
hwmon_vid  16384  1 via_cputemp
via_rng16384  0
i2c_viapro 16384  0
pata_via   16384  0
sata_via   16384  2

And:

$ dmesg | egrep -i '(via|padlock|rng)'
[0.124003] smpboot: CPU0: Centaur VIA C7-D Processor 1800MHz (fam:
06, model: 0d, stepping: 00)
[0.263914] pci :00:01.0: disabling DAC on VIA PCI bridge
[2.290795] agpgart: Detected VIA P4M900 chipset
[2.296875] agpgart-via :00:00.0: AGP aperture is 128M @ 0xf000
[2.934927] sata_via :00:0f.0: version 2.6
[2.935155] sata_via :00:0f.0: routed to hard irq line 6
[2.948457] scsi host0: sata_via
[2.967744] scsi host1: sata_via
[2.968167] pata_via :00:0f.1: version 0.3.4
[2.976090] scsi host2: pata_via
[2.982777] scsi host3: pata_via
[4.339291] systemd[1]: Set hostname to .
[   10.415938] VIA RNG detected
[   11.257974] hwmon_vid: Using 6-bit VID table for VIA C7-D CPU
[   12.100845] padlock_aes: Using VIA PadLock ACE for AES algorithm.
[   12.149586] padlock_sha: Using VIA PadLock ACE for SHA1/SHA256 algorithms.
[   12.633495] input: HDA VIA VT82xx Rear Mic as
/devices/pci:80/:80:01.0/sound/card0/input9
[   12.633720] input: HDA VIA VT82xx Line as
/devices/pci:80/:80:01.0/sound/card0/input10
[   12.633927] input: HDA VIA VT82xx Headphone Front as
/devices/pci:80/:80:01.0/sound/card0/input11
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2016-05-31 Thread Jeffrey Walton
Please forgive my ignorance here...

I have test system with a VIA C7-M processor and PM-400 chipset. This
is one of those Thin Client/Internet of Things processor and chipsets
I test security libraries on (like OpenSSL, Cryptlib and Crypto++).

The processor includes the Padlock extensions. Padlock is similar to
Intel's RDRAND, RDSEED and AES-NI, and it predates Intel's
instructions by about a decade.

The Padlock Security Engine can produce a stream of random numbers at
megabits per socond, so I've been kind of surprised it has been
suffering entropy depletion. Here's what the audit trail looks like:

Testing operating system provided blocking random number generator...
FAILED:  it took 74 seconds to generate 5 bytes
passed:  5 generated bytes compressed to 7 bytes by DEFLATE

Above, the blocking RNG is drained. Then, 16 bytes are requested. It
appears to take over one minute to gather five bytes when effectively
an endless stream is available.

My question is, is this system expected to suffer entropy depletion
out of the box? Or are users expected to do something special so the
system does not fail?

Thanks in advance.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AES-NI: slower than aes-generic?

2016-05-27 Thread Jeffrey Walton
> If we implement something which happens to result in a 2 minute stall
> in boot times, the danger is that a clueless engineer at Sony, or LGE,
> or Motorola, or BMW, or Toyota, etc, will "fix" the problem without
> telling anyone about what they did, and we might not notice right away
> that the fix was in fact catastrophically bad.

This is an non-trivial threat. +1 for recognizing it.

I know of one VM hypervisor used in US Financial that was effectively
doing "One thing you should not do is the following..." from
http://lwn.net/Articles/525459/.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AES-NI: slower than aes-generic?

2016-05-26 Thread Jeffrey Walton
> What I am wondering is that when encrypting 256 16 byte blocks, I get a speed
> of about 170 MB/s with the AES-NI driver. When using the aes-generic or aes-
> asm, I get up to 180 MB/s with all else being equal. Note, that figure
> includes a copy_to_user of the generated data.
>
> ...

Something sounds amiss.

AES-NI should be on the order of magnitude faster than a generic
implementation. Can you verify AES-NI is actually using AES-NI, and
aes-generic is a software implementation?

Here are some OpenSSL numbers. EVP uses AES-NI when available.
Omitting -evp means its software only (no hardware acceleration, like
AES-NI).

$ openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
...
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes
aes-128-cbc 626533.60k   669884.42k   680917.93k   682079.91k   684736.51k


$ openssl speed -elapsed aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
...
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes
aes-128 cbc 106520.59k   114380.16k   116741.46k   117489.32k   117563.39k

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


HWCAP_CRYPTO define for ARMv8?

2016-05-15 Thread Jeffrey Walton
Hi Everyone,

It appears defines like HWCAP_CRC32 fall under the purview of the
kernel. Confer, http://www.google.com/search?q="HWCAP_CRC32"; (my
apologies if this is not the case).

We use getauxval(AT_HWCAP) and HWCAP_CRC32 for runtime detection of
processor support for CRC. However, I can't find a similar symbol in
the context of the crypto instructions. Confer,
http://www.google.com/search?q="HWCAP_CRYPTO";.

My question is, what are the equivalent defines for Crypto features?

Thanks in advance.

*

Below is from a 64-bit LeMaker HiKey
(http://www.lemaker.org/product-hikey-index.html). It responds to
getauxval(AT_HWCAP) and HWCAP_CRC32.

$ cat /proc/cpuinfo
Processor: AArch64 Processor rev 3 (aarch64)
processor: 0
...
processor: 7
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer: 0x41
CPU architecture: AArch64
CPU variant: 0x0
CPU part: 0xd03
CPU revision: 3

Hardware: HiKey Development Board
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: UB in general ... and linux/bitops.h in particular

2016-05-05 Thread Jeffrey Walton
>-- Perhaps the compiler guys could be persuaded to support
> the needed features explicitly, perhaps via a command-line
> option: -std=vanilla
> This should be a no-cost option as things stand today, but
> it helps to prevent nasty surprises in the future.

It looks LLVM has the -rainbow option; see
http://blog.llvm.org/2016/04/undefined-behavior-is-magic.html :)

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: better patch for linux/bitops.h

2016-05-04 Thread Jeffrey Walton
On Wed, May 4, 2016 at 11:50 PM, Theodore Ts'o  wrote:
> ...
> But instead of arguing over what works and doesn't, let's just create
> the the test set and just try it on a wide range of compilers and
> architectures, hmmm?

What are the requirements? Here's a short list:

  * No undefined behavior
- important because the compiler writers use the C standard
  * Compiles to native "rotate IMMEDIATE" if the rotate amount is a
"constant expression" and the machine provides it
- translates to a native rotate instruction if available
- "rotate IMM" can be 3 times faster than "rotate REG"
- do any architectures *not* provide a rotate?
  * Compiles to native "rotate REGISTER" if the rotate is variable and
the machine provides it
- do any architectures *not* provide a rotate?
  * Constant time
- important to high-integrity code
- Non-security code paths probably don't care

Maybe the first thing to do is provide a different rotates for the
constant-time requirement when its in effect?

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: better patch for linux/bitops.h

2016-05-04 Thread Jeffrey Walton
>>> So you are actually saying outright that we should sacrifice *actual*
>>portability in favor of *theoretical* portability?  What kind of
>>twilight zone did we just step into?!
>>
>>I'm not sure what you mean. It will be well defined on all platforms.
>>Clang may not recognize the pattern, which means they could run
>>slower. GCC and ICC will be fine.
>>
>>Slower but correct code is what you have to live with until the Clang
>>dev's fix their compiler.
>>
>>Its kind of like what Dr. Jon Bentley said: "If it doesn't have to be
>>correct, I can make it as fast as you'd like it to be".
>
> The current code works on all compilers we care about.  The code you propose 
> does not; it doesn't work on anything but very recent versions of our 
> flagship target compiler, and pretty your own admission might even cause 
> security hazards in the kernel if compiled on clang.

I'm not sure how you're arriving at the conclusion the code does not work.

> That qualifies as insane in my book.

OK, thanks.

I see the kernel is providing IPSec, SSL/TLS, etc. You can make
SSL/TLS run faster by using aNULL and eNULL.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: better patch for linux/bitops.h

2016-05-04 Thread Jeffrey Walton
On Wed, May 4, 2016 at 10:41 PM, H. Peter Anvin  wrote:
> On May 4, 2016 6:35:44 PM PDT, Jeffrey Walton  wrote:
>>On Wed, May 4, 2016 at 5:52 PM, John Denker  wrote:
>>> On 05/04/2016 02:42 PM, I wrote:
>>>
>>>> I find it very odd that the other seven functions were not
>>>> upgraded. I suggest the attached fix-others.diff would make
>>>> things more consistent.
>>>
>>> Here's a replacement patch.
>>> ...
>>
>>+1, commit it.
>>
>>Its good for three additional reasons. First, this change means the
>>kernel is teaching the next generation the correct way to do things.
>>Many developers aspire to be kernel hackers, and they sometimes use
>>the code from bitops.h. I've actually run across the code in
>>production during an audit where the developers cited bitops.h.
>>
>>Second, it preserves a "silent and dark" cockpit during analysis. That
>>is, when analysis is run, no findings are generated. Auditors and
>>security folks like quiet tools, much like the way pilots like their
>>cockpits (flashing lights and sounding buzzers usually means something
>>is wrong).
>>
>>Third, the pattern is recognized by the major compilers, so the kernel
>>should not have any trouble when porting any of the compilers. I often
>>use multiple compiler to tease out implementation defined behavior in
>>a effort to achieve greater portability. Here are the citations to
>>ensure the kernel is safe with the pattern:
>>
>>  * GCC: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157
>>  * ICC: http://software.intel.com/en-us/forums/topic/580884
>>
>>However, Clang may cause trouble because they don't want the
>>responsibility of recognizing the pattern:
>>
>> * https://llvm.org/bugs/show_bug.cgi?id=24226#c8
>>
>>Instead, they provide a defective rotate. The "defect" here is its
>>non-constant time due to the branch, so it may not be suitable for
>>high-integrity or high-assurance code like linux-crypto:
>>
>>  * https://llvm.org/bugs/show_bug.cgi?id=24226#c5
>>
>>Jeff
>
> So you are actually saying outright that we should sacrifice *actual* 
> portability in favor of *theoretical* portability?  What kind of twilight 
> zone did we just step into?!

I'm not sure what you mean. It will be well defined on all platforms.
Clang may not recognize the pattern, which means they could run
slower. GCC and ICC will be fine.

Slower but correct code is what you have to live with until the Clang
dev's fix their compiler.

Its kind of like what Dr. Jon Bentley said: "If it doesn't have to be
correct, I can make it as fast as you'd like it to be".

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: better patch for linux/bitops.h

2016-05-04 Thread Jeffrey Walton
On Wed, May 4, 2016 at 5:52 PM, John Denker  wrote:
> On 05/04/2016 02:42 PM, I wrote:
>
>> I find it very odd that the other seven functions were not
>> upgraded. I suggest the attached fix-others.diff would make
>> things more consistent.
>
> Here's a replacement patch.
> ...

+1, commit it.

Its good for three additional reasons. First, this change means the
kernel is teaching the next generation the correct way to do things.
Many developers aspire to be kernel hackers, and they sometimes use
the code from bitops.h. I've actually run across the code in
production during an audit where the developers cited bitops.h.

Second, it preserves a "silent and dark" cockpit during analysis. That
is, when analysis is run, no findings are generated. Auditors and
security folks like quiet tools, much like the way pilots like their
cockpits (flashing lights and sounding buzzers usually means something
is wrong).

Third, the pattern is recognized by the major compilers, so the kernel
should not have any trouble when porting any of the compilers. I often
use multiple compiler to tease out implementation defined behavior in
a effort to achieve greater portability. Here are the citations to
ensure the kernel is safe with the pattern:

  * GCC: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157
  * ICC: http://software.intel.com/en-us/forums/topic/580884

However, Clang may cause trouble because they don't want the
responsibility of recognizing the pattern:

 * https://llvm.org/bugs/show_bug.cgi?id=24226#c8

Instead, they provide a defective rotate. The "defect" here is its
non-constant time due to the branch, so it may not be suitable for
high-integrity or high-assurance code like linux-crypto:

  * https://llvm.org/bugs/show_bug.cgi?id=24226#c5

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux/bitops.h

2016-05-04 Thread Jeffrey Walton
On Wed, May 4, 2016 at 7:06 PM, Andi Kleen  wrote:
> On Wed, May 04, 2016 at 03:06:04PM -0700, John Denker wrote:
>> On 05/04/2016 02:56 PM, H. Peter Anvin wrote:
>> >> Beware that shifting by an amount >= the number of bits in the
>> >> word remains Undefined Behavior.
>>
>> > This construct has been supported as a rotate since at least gcc2.
>>
>> How then should we understand the story told in commit d7e35dfa?
>> Is the story wrong?
>
> I don't think Linux runs on a system where it would make a difference
> (like a VAX), and also gcc always converts it before it could.
> Even UBSan should not complain because it runs after the conversion
> to ROTATE.
>
>From what I understand, its a limitation in the barrel shifter and the
way the shift bits are handled.

Linux runs on a great number of devices, so its conceivable (likely?)
a low-cost board would have hardware limitations that not found in
modern desktops and servers or VAX.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread Jeffrey Walton
On Wed, May 4, 2016 at 1:49 PM,   wrote:
> On Wed, May 04, 2016 at 10:40:20AM -0400, Jeffrey Walton wrote:
>> > +static inline u32 rotl32(u32 v, u8 n)
>> > +{
>> > +   return (v << n) | (v >> (sizeof(v) * 8 - n));
>> > +}
>>
>> That's undefined behavior when n=0.
>
> Sure, but it's never called with n = 0; I've double checked and the
> compiler seems to do the right thing with the above pattern as well.

> Hmm, it looks like there is a "standard" version rotate left and right
> defined in include/linux/bitops.h.  So I suspect it would make sense
> to use rol32 as defined in bitops.h --- and this is probably something

bitops.h could work in this case, but its not an ideal solution. GCC
does not optimize the code below as expected under all use cases
because GCC does not recognize it as a rotate (see
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157):

return (v << n) | (v >> (sizeof(v) * 8 - n));

And outside of special cases like Salsa, ChaCha and BLAKE2, the code
provided in bitops.h suffers UB on arbitrary data. So I think care
needs to be taken when selecting functions from bitops.h.

> that we should do for the rest of crypto/*.c, where people seem to be
> defininig their own version of something like rotl32 (I copied the
> contents of crypto/chacha20_generic.c to lib/chacha20, so this pattern
> of defining one's own version of rol32 isn't new).

Yeah, I kind of thought there was some duplication going on.

But I think bitops.h should be fixed. Many folks don't realize the
lurking UB, and many folks don't realize its not always optimized
well.

>> I think the portable way to do a rotate that avoids UB is the
>> following. GCC, Clang and ICC recognize the pattern, and emit a rotate
>> instruction.
>>
>> static const unsigned int MASK=31;
>> return (v<>(-n&MASK));
>>
>> You should also avoid the following because its not constant time due
>> to the branch:
>>
>> return n == 0 ? v : (v << n) | (v >> (sizeof(v) * 8 - n));
>>
>
> Where is this coming from?  I don't see this construct in the patch.

My bad... It was a general observation. I've seen folks try to correct
the UB by turning to something like that.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread Jeffrey Walton
>> + chacha20_block(&crng->state[0], out);
>> + if (crng->state[12] == 0)
>> + crng->state[13]++;
>
> state[12]++? Or why do you increment the nonce?

In Bernstein's Salsa and ChaCha, the counter is 64-bit. It appears
ChaCha-TLS uses a 32-bit counter, and the other 32-bits is given to
the nonce.

Maybe the first question to ask is, what ChaCha is the kernel
providing? If its ChaCha-TLS, then the carry does not make a lot of
sense.

If the generator is limiting the amount of material under a given set
of security parameters (key and nonce), then the generator will likely
re-key itself long before the 256-GB induced wrap. In this case, it
does not matter which ChaCha the kernel is providing and the carry is
superfluous.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread Jeffrey Walton
> +static inline u32 rotl32(u32 v, u8 n)
> +{
> +   return (v << n) | (v >> (sizeof(v) * 8 - n));
> +}

That's undefined behavior when n=0.

I think the portable way to do a rotate that avoids UB is the
following. GCC, Clang and ICC recognize the pattern, and emit a rotate
instruction.

static const unsigned int MASK=31;
return (v<>(-n&MASK));

You should also avoid the following because its not constant time due
to the branch:

return n == 0 ? v : (v << n) | (v >> (sizeof(v) * 8 - n));

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] random: add interrupt callback to VMBus IRQ handler

2016-05-02 Thread Jeffrey Walton
On Mon, May 2, 2016 at 2:26 AM, Theodore Ts'o  wrote:
> From: Stephan Mueller 
>
> The Hyper-V Linux Integration Services use the VMBus implementation for
> communication with the Hypervisor. VMBus registers its own interrupt
> handler that completely bypasses the common Linux interrupt handling.
> This implies that the interrupt entropy collector is not triggered.
> ...

Stephan correctly identified the problem of virtualized environments
in his paper, but there does not appear to be any real defenses in
place for VM rollback attacks.

Perhpas the following will make interesting reading:

* When Virtual is Harder than Real: Security Challenges in Virtual
Machine Based Computing Environments,
https://www.usenix.org/legacy/event/hotos05/final_papers/full_papers/garfinkel/garfinkel.pdf

* When Good Randomness Goes Bad: Virtual Machine Reset Vulnerabilities
and Hedging Deployed Cryptography,
http://pages.cs.wisc.edu/~rist/papers/sslhedge.pdf

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] crypto: rsa_helper - add raw integer parser actions

2016-04-08 Thread Jeffrey Walton
On Fri, Apr 8, 2016 at 12:55 PM, Stephan Mueller  wrote:
> Am Freitag, 8. April 2016, 12:54:10 schrieb Jeffrey Walton:
>
> Hi Jeffrey,
>
>> > +int rsa_check_key_length(unsigned int len)
>> > +{
>> > +   switch (len) {
>> > +   case 512:
>> > +   case 1024:
>> > +   case 1536:
>> > +   case 2048:
>> > +   case 3072:
>> > +   case 4096:
>> > +   return 0;
>> > +   }
>> > +
>> > +   return -EINVAL;
>> > +}
>>
>> That's an unusual restriction.
>>
>> > +   key->n_sz = vlen;
>> > +   /* In FIPS mode only allow key size 2K & 3K */
>> > +   if (fips_enabled && (key->n_sz != 256 && key->n_sz != 384)) {
>> > +   dev_err(ctx->dev, "RSA: key size not allowed in FIPS
>> > mode\n"); +   goto err;
>> > +   }
>>
>> That's an unusual restriction, too. As far as I know, FIPS does not
>> place that restriction.
>
> It does, see SP80-131A and the requirements on CAVS.

I believe the controlling document is SP800-56B. SP800-131 is just a
guide, and it digests the information from SP800-56B. For current FIPS
140 requirements (SP800-56B), RSA is a Finite Filed (FF) system, and
the requirement is |N| >= 2048.

Also, I did not see the restriction listed in SP800-131A Rev 1. Cf.,
http://csrc.nist.gov/publications/drafts/800-131A/sp800-131a_r1_draft.pdf.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] crypto: rsa_helper - add raw integer parser actions

2016-04-08 Thread Jeffrey Walton
> +int rsa_check_key_length(unsigned int len)
> +{
> +   switch (len) {
> +   case 512:
> +   case 1024:
> +   case 1536:
> +   case 2048:
> +   case 3072:
> +   case 4096:
> +   return 0;
> +   }
> +
> +   return -EINVAL;
> +}

That's an unusual restriction.

> +   key->n_sz = vlen;
> +   /* In FIPS mode only allow key size 2K & 3K */
> +   if (fips_enabled && (key->n_sz != 256 && key->n_sz != 384)) {
> +   dev_err(ctx->dev, "RSA: key size not allowed in FIPS mode\n");
> +   goto err;
> +   }

That's an unusual restriction, too. As far as I know, FIPS does not
place that restriction.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What are the requirements to create/open an AF_ALG socket type?

2016-04-03 Thread Jeffrey Walton
On Sun, Apr 3, 2016 at 4:42 PM, Jeffrey Walton  wrote:
> I'm testing userspace crypto code using AF_ALG domain socket. The call
> to 'socket(AF_ALG, SOCK_SEQPACKET, 0)' always fails with errno=2. The
> failure has been experienced on 3.8, 4.1, 4.2 and 4.4 kernels
> (provided by Debian, Fedora, Lubuntu and Ubuntu). I also experienced
> it on a Gentoo kernel, but I don't recall the kernel version. I've
> checked the kernel configs, and they all include
> "CONFIG_CRYPTO_USER_API={y|m}".
>
> When similar code is called from userland using the async crypto gear,
> then the call to socket usually succeeds. During async testing, I also
> see a dmesg about registering a socket family 38. The dmesg is not
> present during the non-async failures.
>
> I also checked the kernel crypto documentation at
> http://www.kernel.org/doc/Documentation/crypto/ and
> http://www.kernel.org/doc/htmldocs/crypto-API/User.html, but I don't
> see a requirement I might be missing. I also checked a couple of slide
> decks introducing the userspace crypto API, and I don't see what the
> presenters are doing differently. Finally, I checked the LVN example
> provided at http://lwn.net/Articles/410848/.
>
> If it matters, I usually disable IPv6 via a boot parameter since I
> don't use it in my environments. But I'm guessing it has nothing to do
> with the problem since the async gear works fine.
>
> What are the requirements to create/open an AF_ALG socket?
>
> Or maybe, what is missing so the call to socket succeeds?

Cancel...My apologies...

The call to bind() was failing after the socket was created.
Mis-identifying socket() was due to a copy/paste of the error logic.

The bind failure was due to .salg_type = "hmac" with .salg_name = "sha512".

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


What are the requirements to create/open an AF_ALG socket type?

2016-04-03 Thread Jeffrey Walton
I'm testing userspace crypto code using AF_ALG domain socket. The call
to 'socket(AF_ALG, SOCK_SEQPACKET, 0)' always fails with errno=2. The
failure has been experienced on 3.8, 4.1, 4.2 and 4.4 kernels
(provided by Debian, Fedora, Lubuntu and Ubuntu). I also experienced
it on a Gentoo kernel, but I don't recall the kernel version. I've
checked the kernel configs, and they all include
"CONFIG_CRYPTO_USER_API={y|m}".

When similar code is called from userland using the async crypto gear,
then the call to socket usually succeeds. During async testing, I also
see a dmesg about registering a socket family 38. The dmesg is not
present during the non-async failures.

I also checked the kernel crypto documentation at
http://www.kernel.org/doc/Documentation/crypto/ and
http://www.kernel.org/doc/htmldocs/crypto-API/User.html, but I don't
see a requirement I might be missing. I also checked a couple of slide
decks introducing the userspace crypto API, and I don't see what the
presenters are doing differently. Finally, I checked the LVN example
provided at http://lwn.net/Articles/410848/.

If it matters, I usually disable IPv6 via a boot parameter since I
don't use it in my environments. But I'm guessing it has nothing to do
with the problem since the async gear works fine.

What are the requirements to create/open an AF_ALG socket?

Or maybe, what is missing so the call to socket succeeds?

Thanks in advance.

**

#include 
#include 
#include 
#include 
#include 

int main(int argc, char* argv[])
{
s = socket(AF_ALG, SOCK_SEQPACKET, 0);
if (s == -1) {
fprintf(stderr, "Failed to open socket: %d\n", errno);
goto cleanup;
}
...
}
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to detect availability of asynchronous ciphers at runtime?

2016-03-26 Thread Jeffrey Walton
Hi Everyone,

Please forgive my ignorance here... I'm trying to detect the
availability of asynchronous ciphers support at runtime. The back
story is there's some feature tests going on based on hard coded
kernel version numbers (namely, 4.1). I feel like there's probably a
better way to go about it.

It seems like 'socket(AF_ALG, ...)' is not enough since that only
detects availability of userland crypto support.

How do I detect the availability of asynchronous ciphers at runtime?

Thank you in advance.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Userland crypto api test cases and test programs?

2016-03-25 Thread Jeffrey Walton
Hi Everyone,

I've been doing some testing of OpenSSL's upcoming 1.1.0. OpenSSL
includes an Engine wrapper for the userland crypto exposed through the
kernel's AF_ALG socket domain.

The upcoming code experiences somewhat unexplained failures on
occasion. I think its partly related to the asynchronous ciphers. For
example, trying to set the key for a skcipher results in EBUSY (device
or resource busy).

I'd like to examine the kernel's test cases and test programs to see
how things are intended to be operated. However, searching the
archives is not turning up much for past messages about it. For
example, 0 hits for "asynchronous socket tests"
(https://www.mail-archive.com/search?q=asynchronous+socket+tests&l=linux-crypto%40vger.kernel.org).

Where can I find the kernel's test cases and test programs used for
userland crypto api?

Thanks you in advance.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html