RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-19 Thread George Spelvin
David Laight wrote:
> From: George Spelvin
...
>> uint32_t
>> hsiphash24(char const *in, size_t len, uint32_t const key[2])
>> {
>>  uint32_t c = key[0];
>>  uint32_t d = key[1];
>>  uint32_t a = 0x6c796765 ^ 0x736f6d65;
>>  uint32_t b = d ^ 0x74656462 ^ 0x646f7261;

> I've not looked closely, but is that (in some sense) duplicating
> the key length?
> So you could set a = key[2] and b = key[3] and still have an
> working hash - albeit not exactly the one specified.

That's tempting, but not necessarily effective.  (A similar unsuccesful
idea can be found in discussions of "DES with independent round keys".
Or see the design discussion of Salsa20 and the constants in its input.)

You can increase the key size, but that might not increase the *security*
any.

The big issue is that there are a *lot* of square root attack in
cryptanalysis.  Because SipHash's state is twice the size of the key,
such an attack will have the same complexity as key exhaustion and need
not be considered.  To make a stronger security claim, you need to start
working through them all and show that they don't apply.

For SipHash in particular, an important property is asymmetry of the
internal state.  That's what duplicating the key with XORs guarantees.
If the two halves of the state end up identical, the mixing is much
weaker.

Now the probability of ending up in a "mirror state" is the square
root of the state size (1 in 2^64 for HalfSipHash's 128-bit state),
which is the same probability as guessing a key, so it's not a
problem that has to be considered when making a 64-bit security claim.

But if you want a higher security level, you have to think about
what can happen.

That said, I have been thinking very hard about

a = c ^ 0x48536970; /* 'HSip' */
d = key[2];

By guaranteeing that a and c are different, we get the desired
asymmetry, and the XOR of b and d is determined by the first word of
the message anyway, so this isn't weakening anything.

96 bits is far beyond the reach of any brute-force attack, and if a
more sophisticated 64-bit attack exists, it's at least out of the reach
of the script kiddies, and will almost certainly have a non-negligible
constant factor and more limits in when it can be applied.

> Is it worth using the 32bit hash for IP addresses on 64bit systems that
> can't do misaligned accessed?

Not a good idea.  To hash 64 bits of input:

* Full SipHash has to do two loads, a shift, an or, and two rounds of mixing.
* HalfSipHash has to do a load, two rounds, another load, and two more rounds.

In other words, in addition to being less secure, it's half the speed.  

Also, what systems are you thinking about?  x86, ARMv8, PowerPC, and
S390 (and ia64, if anyone cares) all handle unaligned loads.  MIPS has
efficient support.  Alpha and HPPA are for retrocomputing fans, not
people who care about performance.

So you're down to SPARC.  Which conveniently has the same maintainer as
the networking code, so I figure DaveM can take care of that himself. :-)


RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-19 Thread George Spelvin
David Laight wrote:
> From: George Spelvin
...
>> uint32_t
>> hsiphash24(char const *in, size_t len, uint32_t const key[2])
>> {
>>  uint32_t c = key[0];
>>  uint32_t d = key[1];
>>  uint32_t a = 0x6c796765 ^ 0x736f6d65;
>>  uint32_t b = d ^ 0x74656462 ^ 0x646f7261;

> I've not looked closely, but is that (in some sense) duplicating
> the key length?
> So you could set a = key[2] and b = key[3] and still have an
> working hash - albeit not exactly the one specified.

That's tempting, but not necessarily effective.  (A similar unsuccesful
idea can be found in discussions of "DES with independent round keys".
Or see the design discussion of Salsa20 and the constants in its input.)

You can increase the key size, but that might not increase the *security*
any.

The big issue is that there are a *lot* of square root attack in
cryptanalysis.  Because SipHash's state is twice the size of the key,
such an attack will have the same complexity as key exhaustion and need
not be considered.  To make a stronger security claim, you need to start
working through them all and show that they don't apply.

For SipHash in particular, an important property is asymmetry of the
internal state.  That's what duplicating the key with XORs guarantees.
If the two halves of the state end up identical, the mixing is much
weaker.

Now the probability of ending up in a "mirror state" is the square
root of the state size (1 in 2^64 for HalfSipHash's 128-bit state),
which is the same probability as guessing a key, so it's not a
problem that has to be considered when making a 64-bit security claim.

But if you want a higher security level, you have to think about
what can happen.

That said, I have been thinking very hard about

a = c ^ 0x48536970; /* 'HSip' */
d = key[2];

By guaranteeing that a and c are different, we get the desired
asymmetry, and the XOR of b and d is determined by the first word of
the message anyway, so this isn't weakening anything.

96 bits is far beyond the reach of any brute-force attack, and if a
more sophisticated 64-bit attack exists, it's at least out of the reach
of the script kiddies, and will almost certainly have a non-negligible
constant factor and more limits in when it can be applied.

> Is it worth using the 32bit hash for IP addresses on 64bit systems that
> can't do misaligned accessed?

Not a good idea.  To hash 64 bits of input:

* Full SipHash has to do two loads, a shift, an or, and two rounds of mixing.
* HalfSipHash has to do a load, two rounds, another load, and two more rounds.

In other words, in addition to being less secure, it's half the speed.  

Also, what systems are you thinking about?  x86, ARMv8, PowerPC, and
S390 (and ia64, if anyone cares) all handle unaligned loads.  MIPS has
efficient support.  Alpha and HPPA are for retrocomputing fans, not
people who care about performance.

So you're down to SPARC.  Which conveniently has the same maintainer as
the networking code, so I figure DaveM can take care of that himself. :-)


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-19 Thread Jason A. Donenfeld
Hi Ted,

On Sat, Dec 17, 2016 at 4:41 PM, Theodore Ts'o  wrote:
> On Fri, Dec 16, 2016 at 09:15:03PM -0500, George Spelvin wrote:
>> >> - Ted, Andy Lutorminski and I will try to figure out a construction of
>> >>   get_random_long() that we all like.
>
> We don't have to find the most optimal solution right away; we can
> approach this incrementally, after all.

Thanks to your call for moderation. This is the impression I have too.
And with all the back and forth of these threads, I fear nothing will
get done. I'm going to collect the best ideas amongst all of these,
and try to get it merged. Then after that we can incrementally improve
on it.

David Miller -- would you merge something into your 4.11 tree? Seems
like you might be the guy for this, since the changes primarily affect
net/*.

Latest patches are here:
https://git.zx2c4.com/linux-dev/log/?h=siphash


> So long as we replace get_random_{long,int}() with something which is
> (a) strictly better in terms of security given today's use of MD5, and
> (b) which is strictly *faster* than the current construction on 32-bit
> and 64-bit systems, we can do that, and can try to make it be faster
> while maintaining some minimum level of security which is sufficient
> for all current users of get_random_{long,int}() and which can be
> clearly artificulated for future users of get_random_{long,int}().
>
> The main worry at this point I have is benchmarking siphash on a
> 32-bit system.  It may be that simply batching the chacha20 output so
> that we're using the urandom construction more efficiently is the
> better way to go, since that *does* meet the criteron of strictly more
> secure and strictly faster than the current MD5 solution.  I'm open to
> using siphash, but I want to see the the 32-bit numbers first.

Sure, I'll run some benchmarks and report back.

> As far as half-siphash is concerned, it occurs to me that the main
> problem will be those users who need to guarantee that output can't be
> guessed over a long period of time.  For example, if you have a
> long-running process, then the output needs to remain unguessable over
> potentially months or years, or else you might be weakening the ASLR
> protections.  If on the other hand, the hash table or the process will
> be going away in a matter of seconds or minutes, the requirements with
> respect to cryptographic strength go down significantly.
>
> Now, maybe this doesn't matter that much if we can guarantee (or make
> assumptions) that the attacker doesn't have unlimited access the
> output stream of get_random_{long,int}(), or if it's being used in an
> anti-DOS use case where it ultimately only needs to be harder than
> alternate ways of attacking the system.

The only acceptable usage of HalfSipHash is for hash table lookups
where top security isn't actually a goal. I'm still a bit queasy about
going with it, but George S has very aggressively been pursuing a
"save every last cycle" agenda, which makes sense given how
performance sensitive certain hash tables are, and so I suspect this
could be an okay compromise between performance and security. But,
only for hash tables. Certainly not for the RNG.

>
> Rekeying every five minutes doesn't necessarily help the with respect
> to ASLR, but it might reduce the amount of the output stream that
> would be available to the attacker in order to be able to attack the
> get_random_{long,int}() generator, and it also reduces the value of
> doing that attack to only compromising the ASLR for those processes
> started within that five minute window.

The current implemention of get_random_int/long in my branch uses
128-bit key siphash, so the chances of compromising the key are pretty
much nil. The periodic rekeying is to protect against direct-ram
attacks or other kernel exploits -- a concern brought up by Andy.

With siphash, which takes a 128-bit key, if you got an RNG output once
every picosecond, I believe it would take approximately 10^19 years...

Jason


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-19 Thread Jason A. Donenfeld
Hi Ted,

On Sat, Dec 17, 2016 at 4:41 PM, Theodore Ts'o  wrote:
> On Fri, Dec 16, 2016 at 09:15:03PM -0500, George Spelvin wrote:
>> >> - Ted, Andy Lutorminski and I will try to figure out a construction of
>> >>   get_random_long() that we all like.
>
> We don't have to find the most optimal solution right away; we can
> approach this incrementally, after all.

Thanks to your call for moderation. This is the impression I have too.
And with all the back and forth of these threads, I fear nothing will
get done. I'm going to collect the best ideas amongst all of these,
and try to get it merged. Then after that we can incrementally improve
on it.

David Miller -- would you merge something into your 4.11 tree? Seems
like you might be the guy for this, since the changes primarily affect
net/*.

Latest patches are here:
https://git.zx2c4.com/linux-dev/log/?h=siphash


> So long as we replace get_random_{long,int}() with something which is
> (a) strictly better in terms of security given today's use of MD5, and
> (b) which is strictly *faster* than the current construction on 32-bit
> and 64-bit systems, we can do that, and can try to make it be faster
> while maintaining some minimum level of security which is sufficient
> for all current users of get_random_{long,int}() and which can be
> clearly artificulated for future users of get_random_{long,int}().
>
> The main worry at this point I have is benchmarking siphash on a
> 32-bit system.  It may be that simply batching the chacha20 output so
> that we're using the urandom construction more efficiently is the
> better way to go, since that *does* meet the criteron of strictly more
> secure and strictly faster than the current MD5 solution.  I'm open to
> using siphash, but I want to see the the 32-bit numbers first.

Sure, I'll run some benchmarks and report back.

> As far as half-siphash is concerned, it occurs to me that the main
> problem will be those users who need to guarantee that output can't be
> guessed over a long period of time.  For example, if you have a
> long-running process, then the output needs to remain unguessable over
> potentially months or years, or else you might be weakening the ASLR
> protections.  If on the other hand, the hash table or the process will
> be going away in a matter of seconds or minutes, the requirements with
> respect to cryptographic strength go down significantly.
>
> Now, maybe this doesn't matter that much if we can guarantee (or make
> assumptions) that the attacker doesn't have unlimited access the
> output stream of get_random_{long,int}(), or if it's being used in an
> anti-DOS use case where it ultimately only needs to be harder than
> alternate ways of attacking the system.

The only acceptable usage of HalfSipHash is for hash table lookups
where top security isn't actually a goal. I'm still a bit queasy about
going with it, but George S has very aggressively been pursuing a
"save every last cycle" agenda, which makes sense given how
performance sensitive certain hash tables are, and so I suspect this
could be an okay compromise between performance and security. But,
only for hash tables. Certainly not for the RNG.

>
> Rekeying every five minutes doesn't necessarily help the with respect
> to ASLR, but it might reduce the amount of the output stream that
> would be available to the attacker in order to be able to attack the
> get_random_{long,int}() generator, and it also reduces the value of
> doing that attack to only compromising the ASLR for those processes
> started within that five minute window.

The current implemention of get_random_int/long in my branch uses
128-bit key siphash, so the chances of compromising the key are pretty
much nil. The periodic rekeying is to protect against direct-ram
attacks or other kernel exploits -- a concern brought up by Andy.

With siphash, which takes a 128-bit key, if you got an RNG output once
every picosecond, I believe it would take approximately 10^19 years...

Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-19 Thread Jason A. Donenfeld
On Sat, Dec 17, 2016 at 3:55 PM, Jeffrey Walton  wrote:
> It may be prudent to include the endian reversal in the test to ensure
> big endian machines produce expected results. Some closely related
> testing on an old Apple PowerMac G5 revealed that result needed to be
> reversed before returning it to a caller.

The function [1] returns a u64. Originally I had it returning a
__le64, but that was considered unnecessary by many prior reviewers on
the list. It returns an integer. If you want uniform bytes out of it,
then use the endian conversion function, the same as you would do with
any other type of integer.

Additionally, this function is *not* meant for af_alg or any of the
crypto/* code. It's very unlikely to find a use there.


> Forgive my ignorance... I did not find reading on using the primitive
> in a PRNG. Does anyone know what Aumasson or Bernstein have to say?
> Aumasson's site does not seem to discuss the use case:

He's on this thread so I suppose he can speak up for himself. But in
my conversations with him, the primary take-away was, "seems okay to
me!". But please -- JP - correct me if I've misinterpreted.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-19 Thread Jason A. Donenfeld
On Sat, Dec 17, 2016 at 3:55 PM, Jeffrey Walton  wrote:
> It may be prudent to include the endian reversal in the test to ensure
> big endian machines produce expected results. Some closely related
> testing on an old Apple PowerMac G5 revealed that result needed to be
> reversed before returning it to a caller.

The function [1] returns a u64. Originally I had it returning a
__le64, but that was considered unnecessary by many prior reviewers on
the list. It returns an integer. If you want uniform bytes out of it,
then use the endian conversion function, the same as you would do with
any other type of integer.

Additionally, this function is *not* meant for af_alg or any of the
crypto/* code. It's very unlikely to find a use there.


> Forgive my ignorance... I did not find reading on using the primitive
> in a PRNG. Does anyone know what Aumasson or Bernstein have to say?
> Aumasson's site does not seem to discuss the use case:

He's on this thread so I suppose he can speak up for himself. But in
my conversations with him, the primary take-away was, "seems okay to
me!". But please -- JP - correct me if I've misinterpreted.


RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-19 Thread David Laight
From: George Spelvin
> Sent: 17 December 2016 15:21
...
> uint32_t
> hsiphash24(char const *in, size_t len, uint32_t const key[2])
> {
>   uint32_t c = key[0];
>   uint32_t d = key[1];
>   uint32_t a = 0x6c796765 ^ 0x736f6d65;
>   uint32_t b = d ^ 0x74656462 ^ 0x646f7261;

I've not looked closely, but is that (in some sense) duplicating
the key length?
So you could set a = key[2] and b = key[3] and still have an
working hash - albeit not exactly the one specified.

I'll add another comment here...
Is it worth using the 32bit hash for IP addresses on 64bit systems that
can't do misaligned accessed?

David



RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-19 Thread David Laight
From: George Spelvin
> Sent: 17 December 2016 15:21
...
> uint32_t
> hsiphash24(char const *in, size_t len, uint32_t const key[2])
> {
>   uint32_t c = key[0];
>   uint32_t d = key[1];
>   uint32_t a = 0x6c796765 ^ 0x736f6d65;
>   uint32_t b = d ^ 0x74656462 ^ 0x646f7261;

I've not looked closely, but is that (in some sense) duplicating
the key length?
So you could set a = key[2] and b = key[3] and still have an
working hash - albeit not exactly the one specified.

I'll add another comment here...
Is it worth using the 32bit hash for IP addresses on 64bit systems that
can't do misaligned accessed?

David



Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread Jeffrey Walton
> As far as half-siphash is concerned, it occurs to me that the main
> problem will be those users who need to guarantee that output can't be
> guessed over a long period of time.  For example, if you have a
> long-running process, then the output needs to remain unguessable over
> potentially months or years, or else you might be weakening the ASLR
> protections.  If on the other hand, the hash table or the process will
> be going away in a matter of seconds or minutes, the requirements with
> respect to cryptographic strength go down significantly.

Perhaps SipHash-4-8 should be used instead of SipHash-2-4. I believe
SipHash-4-8 is recommended for the security conscious who want to be
more conservative in their security estimates.

SipHash-4-8 does not add much more processing. If you are clocking
SipHash-2-4 at 2.0 or 2.5 cpb, then SipHash-4-8 will run at 3.0 to
4.0. Both are well below MD5 times. (At least with the data sets I've
tested).

> Now, maybe this doesn't matter that much if we can guarantee (or make
> assumptions) that the attacker doesn't have unlimited access the
> output stream of get_random_{long,int}(), or if it's being used in an
> anti-DOS use case where it ultimately only needs to be harder than
> alternate ways of attacking the system.
>
> Rekeying every five minutes doesn't necessarily help the with respect
> to ASLR, but it might reduce the amount of the output stream that
> would be available to the attacker in order to be able to attack the
> get_random_{long,int}() generator, and it also reduces the value of
> doing that attack to only compromising the ASLR for those processes
> started within that five minute window.

Forgive my ignorance... I did not find reading on using the primitive
in a PRNG. Does anyone know what Aumasson or Bernstein have to say?
Aumasson's site does not seem to discuss the use case:
https://www.google.com/search?q=siphash+rng+site%3A131002.net. (And
their paper only mentions random-number once in a different context).

Making the leap from internal hash tables and short-lived network
packets to the rng case may leave something to be desired, especially
if the bits get used in unanticipated ways, like creating long term
private keys.

Jeff


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread Jeffrey Walton
> As far as half-siphash is concerned, it occurs to me that the main
> problem will be those users who need to guarantee that output can't be
> guessed over a long period of time.  For example, if you have a
> long-running process, then the output needs to remain unguessable over
> potentially months or years, or else you might be weakening the ASLR
> protections.  If on the other hand, the hash table or the process will
> be going away in a matter of seconds or minutes, the requirements with
> respect to cryptographic strength go down significantly.

Perhaps SipHash-4-8 should be used instead of SipHash-2-4. I believe
SipHash-4-8 is recommended for the security conscious who want to be
more conservative in their security estimates.

SipHash-4-8 does not add much more processing. If you are clocking
SipHash-2-4 at 2.0 or 2.5 cpb, then SipHash-4-8 will run at 3.0 to
4.0. Both are well below MD5 times. (At least with the data sets I've
tested).

> Now, maybe this doesn't matter that much if we can guarantee (or make
> assumptions) that the attacker doesn't have unlimited access the
> output stream of get_random_{long,int}(), or if it's being used in an
> anti-DOS use case where it ultimately only needs to be harder than
> alternate ways of attacking the system.
>
> Rekeying every five minutes doesn't necessarily help the with respect
> to ASLR, but it might reduce the amount of the output stream that
> would be available to the attacker in order to be able to attack the
> get_random_{long,int}() generator, and it also reduces the value of
> doing that attack to only compromising the ASLR for those processes
> started within that five minute window.

Forgive my ignorance... I did not find reading on using the primitive
in a PRNG. Does anyone know what Aumasson or Bernstein have to say?
Aumasson's site does not seem to discuss the use case:
https://www.google.com/search?q=siphash+rng+site%3A131002.net. (And
their paper only mentions random-number once in a different context).

Making the leap from internal hash tables and short-lived network
packets to the rng case may leave something to be desired, especially
if the bits get used in unanticipated ways, like creating long term
private keys.

Jeff


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread Theodore Ts'o
On Fri, Dec 16, 2016 at 09:15:03PM -0500, George Spelvin wrote:
> >> - Ted, Andy Lutorminski and I will try to figure out a construction of
> >>   get_random_long() that we all like.

We don't have to find the most optimal solution right away; we can
approach this incrementally, after all.

So long as we replace get_random_{long,int}() with something which is
(a) strictly better in terms of security given today's use of MD5, and
(b) which is strictly *faster* than the current construction on 32-bit
and 64-bit systems, we can do that, and can try to make it be faster
while maintaining some minimum level of security which is sufficient
for all current users of get_random_{long,int}() and which can be
clearly artificulated for future users of get_random_{long,int}().

The main worry at this point I have is benchmarking siphash on a
32-bit system.  It may be that simply batching the chacha20 output so
that we're using the urandom construction more efficiently is the
better way to go, since that *does* meet the criteron of strictly more
secure and strictly faster than the current MD5 solution.  I'm open to
using siphash, but I want to see the the 32-bit numbers first.

As far as half-siphash is concerned, it occurs to me that the main
problem will be those users who need to guarantee that output can't be
guessed over a long period of time.  For example, if you have a
long-running process, then the output needs to remain unguessable over
potentially months or years, or else you might be weakening the ASLR
protections.  If on the other hand, the hash table or the process will
be going away in a matter of seconds or minutes, the requirements with
respect to cryptographic strength go down significantly.

Now, maybe this doesn't matter that much if we can guarantee (or make
assumptions) that the attacker doesn't have unlimited access the
output stream of get_random_{long,int}(), or if it's being used in an
anti-DOS use case where it ultimately only needs to be harder than
alternate ways of attacking the system.

Rekeying every five minutes doesn't necessarily help the with respect
to ASLR, but it might reduce the amount of the output stream that
would be available to the attacker in order to be able to attack the
get_random_{long,int}() generator, and it also reduces the value of
doing that attack to only compromising the ASLR for those processes
started within that five minute window.

Cheers,

- Ted

P.S.  I'm using ASLR as an example use case, above; of course we will
need to make similar eximainations of the other uses of
get_random_{long,int}().

P.P.S.  We might also want to think about potentially defining
get_random_{long,int}() to be unambiguously strong, and then creating
a get_weak_random_{long,int}() which on platforms where performance
might be a consideration, it uses a weaker algorithm perhaps with some
kind of rekeying interval.


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread Theodore Ts'o
On Fri, Dec 16, 2016 at 09:15:03PM -0500, George Spelvin wrote:
> >> - Ted, Andy Lutorminski and I will try to figure out a construction of
> >>   get_random_long() that we all like.

We don't have to find the most optimal solution right away; we can
approach this incrementally, after all.

So long as we replace get_random_{long,int}() with something which is
(a) strictly better in terms of security given today's use of MD5, and
(b) which is strictly *faster* than the current construction on 32-bit
and 64-bit systems, we can do that, and can try to make it be faster
while maintaining some minimum level of security which is sufficient
for all current users of get_random_{long,int}() and which can be
clearly artificulated for future users of get_random_{long,int}().

The main worry at this point I have is benchmarking siphash on a
32-bit system.  It may be that simply batching the chacha20 output so
that we're using the urandom construction more efficiently is the
better way to go, since that *does* meet the criteron of strictly more
secure and strictly faster than the current MD5 solution.  I'm open to
using siphash, but I want to see the the 32-bit numbers first.

As far as half-siphash is concerned, it occurs to me that the main
problem will be those users who need to guarantee that output can't be
guessed over a long period of time.  For example, if you have a
long-running process, then the output needs to remain unguessable over
potentially months or years, or else you might be weakening the ASLR
protections.  If on the other hand, the hash table or the process will
be going away in a matter of seconds or minutes, the requirements with
respect to cryptographic strength go down significantly.

Now, maybe this doesn't matter that much if we can guarantee (or make
assumptions) that the attacker doesn't have unlimited access the
output stream of get_random_{long,int}(), or if it's being used in an
anti-DOS use case where it ultimately only needs to be harder than
alternate ways of attacking the system.

Rekeying every five minutes doesn't necessarily help the with respect
to ASLR, but it might reduce the amount of the output stream that
would be available to the attacker in order to be able to attack the
get_random_{long,int}() generator, and it also reduces the value of
doing that attack to only compromising the ASLR for those processes
started within that five minute window.

Cheers,

- Ted

P.S.  I'm using ASLR as an example use case, above; of course we will
need to make similar eximainations of the other uses of
get_random_{long,int}().

P.P.S.  We might also want to think about potentially defining
get_random_{long,int}() to be unambiguously strong, and then creating
a get_weak_random_{long,int}() which on platforms where performance
might be a consideration, it uses a weaker algorithm perhaps with some
kind of rekeying interval.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread George Spelvin
To follow up on my comments that your benchmark results were peculiar,
here's my benchmark code.

It just computes the hash of all n*(n+1)/2 possible non-empty substrings
of a buffer of n (called "max" below) bytes.  "cpb" is "cycles per byte".

(The average length is (n+2)/3, c.f. https://oeis.org/A000292)

On x86-32, HSipHash is asymptotically twice the speed of SipHash,
rising to 2.5x for short strings:

SipHash/HSipHash benchmark, sizeof(long) = 4
 SipHash: max=   4 cycles= 10495 cpb=524.7500 (sum=47a4f5554869fa97)
HSipHash: max=   4 cycles=  3400 cpb=170. (sum=146a863e)
 SipHash: max=   8 cycles= 24468 cpb=203.9000 (sum=21c41a86355affcc)
HSipHash: max=   8 cycles=  9237 cpb= 76.9750 (sum=d3b5e0cd)
 SipHash: max=  16 cycles= 94622 cpb=115.9583 (sum=26d816b72721e48f)
HSipHash: max=  16 cycles= 34499 cpb= 42.2782 (sum=16bb7475)
 SipHash: max=  32 cycles=418767 cpb= 69.9811 (sum=dd5a97694b8a832d)
HSipHash: max=  32 cycles=156695 cpb= 26.1857 (sum=eed00fcb)
 SipHash: max=  64 cycles=   2119152 cpb= 46.3101 (sum=a2a725aecc09ed00)
HSipHash: max=  64 cycles=   1008678 cpb= 22.0428 (sum=99b9f4f)
 SipHash: max= 128 cycles=  12728659 cpb= 35.5788 (sum=420878cd20272817)
HSipHash: max= 128 cycles=   5452931 cpb= 15.2419 (sum=f1f4ad18)
 SipHash: max= 256 cycles=  38931946 cpb= 13.7615 (sum=e05dfb28b90dfd98)
HSipHash: max= 256 cycles=  13807312 cpb=  4.8805 (sum=ceeafcc1)
 SipHash: max= 512 cycles= 205537380 cpb=  9.1346 (sum=7d129d4de145fbea)
HSipHash: max= 512 cycles= 103420960 cpb=  4.5963 (sum=7f15a313)
 SipHash: max=1024 cycles=1540259472 cpb=  8.5817 (sum=cca7cbdc778ca8af)
HSipHash: max=1024 cycles= 796090824 cpb=  4.4355 (sum=d8f3374f)

On x86-64, SipHash is consistently faster, asymptotically approaching 2x
for long strings:

SipHash/HSipHash benchmark, sizeof(long) = 8
 SipHash: max=   4 cycles=  2642 cpb=132.1000 (sum=47a4f5554869fa97)
HSipHash: max=   4 cycles=  2498 cpb=124.9000 (sum=146a863e)
 SipHash: max=   8 cycles=  5270 cpb= 43.9167 (sum=21c41a86355affcc)
HSipHash: max=   8 cycles=  7140 cpb= 59.5000 (sum=d3b5e0cd)
 SipHash: max=  16 cycles= 19950 cpb= 24.4485 (sum=26d816b72721e48f)
HSipHash: max=  16 cycles= 23546 cpb= 28.8554 (sum=16bb7475)
 SipHash: max=  32 cycles= 80188 cpb= 13.4004 (sum=dd5a97694b8a832d)
HSipHash: max=  32 cycles=101218 cpb= 16.9148 (sum=eed00fcb)
 SipHash: max=  64 cycles=373286 cpb=  8.1575 (sum=a2a725aecc09ed00)
HSipHash: max=  64 cycles=535568 cpb= 11.7038 (sum=99b9f4f)
 SipHash: max= 128 cycles=   2075224 cpb=  5.8006 (sum=420878cd20272817)
HSipHash: max= 128 cycles=   3336820 cpb=  9.3270 (sum=f1f4ad18)
 SipHash: max= 256 cycles=  14276278 cpb=  5.0463 (sum=e05dfb28b90dfd98)
HSipHash: max= 256 cycles=  28847880 cpb= 10.1970 (sum=ceeafcc1)
 SipHash: max= 512 cycles=  50135180 cpb=  2.2281 (sum=7d129d4de145fbea)
HSipHash: max= 512 cycles=  86145916 cpb=  3.8286 (sum=7f15a313)
 SipHash: max=1024 cycles= 334111900 cpb=  1.8615 (sum=cca7cbdc778ca8af)
HSipHash: max=1024 cycles= 640432452 cpb=  3.5682 (sum=d8f3374f)


Here's the code; compile with -DSELFTEST.  (The main purpose of
printing the sum is to prevent dead code elimination.)


#if SELFTEST
#include 
#include 

static inline uint64_t rol64(uint64_t word, unsigned int shift)
{
return word << shift | word >> (64 - shift);
}

static inline uint32_t rol32(uint32_t word, unsigned int shift)
{
return word << shift | word >> (32 - shift);
}

static inline uint64_t get_unaligned_le64(void const *p)
{
return *(uint64_t const *)p;
}

static inline uint32_t get_unaligned_le32(void const *p)
{
return *(uint32_t const *)p;
}

static inline uint64_t le64_to_cpup(uint64_t const *p)
{
return *p;
}

static inline uint32_t le32_to_cpup(uint32_t const *p)
{
return *p;
}


#else
#include/* For rol64 */
#include 
#include 
#include 
#endif

/* The basic ARX mixing function, taken from Skein */
#define SIP_MIX(a, b, s) ((a) += (b), (b) = rol64(b, s), (b) ^= (a))

/*
 * The complete SipRound.  Note that, when unrolled twice like below,
 * the 32-bit rotates drop out on 32-bit machines.
 */
#define SIP_ROUND(a, b, c, d) \
(SIP_MIX(a, b, 13), SIP_MIX(c, d, 16), (a) = rol64(a, 32), \
 SIP_MIX(c, b, 17), SIP_MIX(a, d, 21), (c) = rol64(c, 32))

/*
 * This is rolled up more than most implementations, resulting in about
 * 55% the code size.  Speed is a few precent slower.  A crude benchmark
 * (for (i=1; i <= max; i++) for (j = 0; j < 4096-i; j++) hash(buf+j, i);)
 * produces the following timings (in usec):
 *
 *  i386i386i386x86_64  x86_64  x86_64  x86_64
 * Length   small   unroll  halfmd4 small   unroll  halfmd4 teahash
 * 1..4106910291608 195 160 399 690
 * 1..8248323813851 410 360 9881659
 * 1..12   430341526207 690 61816422690
 * 1..16   61225931

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread George Spelvin
To follow up on my comments that your benchmark results were peculiar,
here's my benchmark code.

It just computes the hash of all n*(n+1)/2 possible non-empty substrings
of a buffer of n (called "max" below) bytes.  "cpb" is "cycles per byte".

(The average length is (n+2)/3, c.f. https://oeis.org/A000292)

On x86-32, HSipHash is asymptotically twice the speed of SipHash,
rising to 2.5x for short strings:

SipHash/HSipHash benchmark, sizeof(long) = 4
 SipHash: max=   4 cycles= 10495 cpb=524.7500 (sum=47a4f5554869fa97)
HSipHash: max=   4 cycles=  3400 cpb=170. (sum=146a863e)
 SipHash: max=   8 cycles= 24468 cpb=203.9000 (sum=21c41a86355affcc)
HSipHash: max=   8 cycles=  9237 cpb= 76.9750 (sum=d3b5e0cd)
 SipHash: max=  16 cycles= 94622 cpb=115.9583 (sum=26d816b72721e48f)
HSipHash: max=  16 cycles= 34499 cpb= 42.2782 (sum=16bb7475)
 SipHash: max=  32 cycles=418767 cpb= 69.9811 (sum=dd5a97694b8a832d)
HSipHash: max=  32 cycles=156695 cpb= 26.1857 (sum=eed00fcb)
 SipHash: max=  64 cycles=   2119152 cpb= 46.3101 (sum=a2a725aecc09ed00)
HSipHash: max=  64 cycles=   1008678 cpb= 22.0428 (sum=99b9f4f)
 SipHash: max= 128 cycles=  12728659 cpb= 35.5788 (sum=420878cd20272817)
HSipHash: max= 128 cycles=   5452931 cpb= 15.2419 (sum=f1f4ad18)
 SipHash: max= 256 cycles=  38931946 cpb= 13.7615 (sum=e05dfb28b90dfd98)
HSipHash: max= 256 cycles=  13807312 cpb=  4.8805 (sum=ceeafcc1)
 SipHash: max= 512 cycles= 205537380 cpb=  9.1346 (sum=7d129d4de145fbea)
HSipHash: max= 512 cycles= 103420960 cpb=  4.5963 (sum=7f15a313)
 SipHash: max=1024 cycles=1540259472 cpb=  8.5817 (sum=cca7cbdc778ca8af)
HSipHash: max=1024 cycles= 796090824 cpb=  4.4355 (sum=d8f3374f)

On x86-64, SipHash is consistently faster, asymptotically approaching 2x
for long strings:

SipHash/HSipHash benchmark, sizeof(long) = 8
 SipHash: max=   4 cycles=  2642 cpb=132.1000 (sum=47a4f5554869fa97)
HSipHash: max=   4 cycles=  2498 cpb=124.9000 (sum=146a863e)
 SipHash: max=   8 cycles=  5270 cpb= 43.9167 (sum=21c41a86355affcc)
HSipHash: max=   8 cycles=  7140 cpb= 59.5000 (sum=d3b5e0cd)
 SipHash: max=  16 cycles= 19950 cpb= 24.4485 (sum=26d816b72721e48f)
HSipHash: max=  16 cycles= 23546 cpb= 28.8554 (sum=16bb7475)
 SipHash: max=  32 cycles= 80188 cpb= 13.4004 (sum=dd5a97694b8a832d)
HSipHash: max=  32 cycles=101218 cpb= 16.9148 (sum=eed00fcb)
 SipHash: max=  64 cycles=373286 cpb=  8.1575 (sum=a2a725aecc09ed00)
HSipHash: max=  64 cycles=535568 cpb= 11.7038 (sum=99b9f4f)
 SipHash: max= 128 cycles=   2075224 cpb=  5.8006 (sum=420878cd20272817)
HSipHash: max= 128 cycles=   3336820 cpb=  9.3270 (sum=f1f4ad18)
 SipHash: max= 256 cycles=  14276278 cpb=  5.0463 (sum=e05dfb28b90dfd98)
HSipHash: max= 256 cycles=  28847880 cpb= 10.1970 (sum=ceeafcc1)
 SipHash: max= 512 cycles=  50135180 cpb=  2.2281 (sum=7d129d4de145fbea)
HSipHash: max= 512 cycles=  86145916 cpb=  3.8286 (sum=7f15a313)
 SipHash: max=1024 cycles= 334111900 cpb=  1.8615 (sum=cca7cbdc778ca8af)
HSipHash: max=1024 cycles= 640432452 cpb=  3.5682 (sum=d8f3374f)


Here's the code; compile with -DSELFTEST.  (The main purpose of
printing the sum is to prevent dead code elimination.)


#if SELFTEST
#include 
#include 

static inline uint64_t rol64(uint64_t word, unsigned int shift)
{
return word << shift | word >> (64 - shift);
}

static inline uint32_t rol32(uint32_t word, unsigned int shift)
{
return word << shift | word >> (32 - shift);
}

static inline uint64_t get_unaligned_le64(void const *p)
{
return *(uint64_t const *)p;
}

static inline uint32_t get_unaligned_le32(void const *p)
{
return *(uint32_t const *)p;
}

static inline uint64_t le64_to_cpup(uint64_t const *p)
{
return *p;
}

static inline uint32_t le32_to_cpup(uint32_t const *p)
{
return *p;
}


#else
#include/* For rol64 */
#include 
#include 
#include 
#endif

/* The basic ARX mixing function, taken from Skein */
#define SIP_MIX(a, b, s) ((a) += (b), (b) = rol64(b, s), (b) ^= (a))

/*
 * The complete SipRound.  Note that, when unrolled twice like below,
 * the 32-bit rotates drop out on 32-bit machines.
 */
#define SIP_ROUND(a, b, c, d) \
(SIP_MIX(a, b, 13), SIP_MIX(c, d, 16), (a) = rol64(a, 32), \
 SIP_MIX(c, b, 17), SIP_MIX(a, d, 21), (c) = rol64(c, 32))

/*
 * This is rolled up more than most implementations, resulting in about
 * 55% the code size.  Speed is a few precent slower.  A crude benchmark
 * (for (i=1; i <= max; i++) for (j = 0; j < 4096-i; j++) hash(buf+j, i);)
 * produces the following timings (in usec):
 *
 *  i386i386i386x86_64  x86_64  x86_64  x86_64
 * Length   small   unroll  halfmd4 small   unroll  halfmd4 teahash
 * 1..4106910291608 195 160 399 690
 * 1..8248323813851 410 360 9881659
 * 1..12   430341526207 690 61816422690
 * 1..16   61225931

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread Jeffrey Walton
> diff --git a/lib/test_siphash.c b/lib/test_siphash.c
> new file mode 100644
> index ..93549e4e22c5
> --- /dev/null
> +++ b/lib/test_siphash.c
> @@ -0,0 +1,83 @@
> +/* Test cases for siphash.c
> + *
> + * Copyright (C) 2016 Jason A. Donenfeld . All Rights 
> Reserved.
> + *
> + * This file is provided under a dual BSD/GPLv2 license.
> + *
> + * SipHash: a fast short-input PRF
> + * https://131002.net/siphash/
> + *
> + * This implementation is specifically for SipHash2-4.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* Test vectors taken from official reference source available at:
> + * https://131002.net/siphash/siphash24.c
> + */
> +static const u64 test_vectors[64] = {
> +   0x726fdb47dd0e0e31ULL, 0x74f839c593dc67fdULL, 0x0d6c8009d9a94f5aULL,
> +   0x85676696d7fb7e2dULL, 0xcf2794e0277187b7ULL, 0x18765564cd99a68dULL,
> +   0xcbc9466e58fee3ceULL, 0xab0200f58b01d137ULL, 0x93f5f5799a932462ULL,
> +   0x9e0082df0ba9e4b0ULL, 0x7a5dbbc594ddb9f3ULL, 0xf4b32f46226bada7ULL,
> +   0x751e8fbc860ee5fbULL, 0x14ea5627c0843d90ULL, 0xf723ca908e7af2eeULL,
> +   0xa129ca6149be45e5ULL, 0x3f2acc7f57c29bdbULL, 0x699ae9f52cbe4794ULL,
> +   0x4bc1b3f0968dd39cULL, 0xbb6dc91da77961bdULL, 0xbed65cf21aa2ee98ULL,
> +   0xd0f2cbb02e3b67c7ULL, 0x93536795e3a33e88ULL, 0xa80c038ccd5ccec8ULL,
> +   0xb8ad50c6f649af94ULL, 0xbce192de8a85b8eaULL, 0x17d835b85bbb15f3ULL,
> +   0x2f2e6163076bcfadULL, 0xde4daaaca71dc9a5ULL, 0xa6a2506687956571ULL,
> +   0xad87a3535c49ef28ULL, 0x32d892fad841c342ULL, 0x7127512f72f27cceULL,
> +   0xa7f32346f95978e3ULL, 0x12e0b01abb051238ULL, 0x15e034d40fa197aeULL,
> +   0x314dffbe0815a3b4ULL, 0x027990f029623981ULL, 0xcadcd4e59ef40c4dULL,
> +   0x9abfd8766a33735cULL, 0x0e3ea96b5304a7d0ULL, 0xad0c42d6fc585992ULL,
> +   0x187306c89bc215a9ULL, 0xd4a60abcf3792b95ULL, 0xf935451de4f21df2ULL,
> +   0xa9538f0419755787ULL, 0xdb9acddff56ca510ULL, 0xd06c98cd5c0975ebULL,
> +   0xe612a3cb9ecba951ULL, 0xc766e62cfcadaf96ULL, 0xee64435a9752fe72ULL,
> +   0xa192d576b245165aULL, 0x0a8787bf8ecb74b2ULL, 0x81b3e73d20b49b6fULL,
> +   0x7fa8220ba3b2eceaULL, 0x245731c13ca42499ULL, 0xb78dbfaf3a8d83bdULL,
> +   0xea1ad565322a1a0bULL, 0x60e61c23a3795013ULL, 0x6606d7e446282b93ULL,
> +   0x6ca4ecb15c5f91e1ULL, 0x9f626da15c9625f3ULL, 0xe51b38608ef25f57ULL,
> +   0x958a324ceb064572ULL
> +};
> +static const siphash_key_t test_key =
> +   { 0x0706050403020100ULL , 0x0f0e0d0c0b0a0908ULL };
> +
> +static int __init siphash_test_init(void)
> +{
> +   u8 in[64] __aligned(SIPHASH_ALIGNMENT);
> +   u8 in_unaligned[65];
> +   u8 i;
> +   int ret = 0;
> +
> +   for (i = 0; i < 64; ++i) {
> +   in[i] = i;
> +   in_unaligned[i + 1] = i;
> +   if (siphash(in, i, test_key) != test_vectors[i]) {
> +   pr_info("self-test aligned %u: FAIL\n", i + 1);
> +   ret = -EINVAL;
> +   }
> +   if (siphash_unaligned(in_unaligned + 1, i, test_key) != 
> test_vectors[i]) {
> +   pr_info("self-test unaligned %u: FAIL\n", i + 1);
> +   ret = -EINVAL;
> +   }
> +   }
> +   if (!ret)
> +   pr_info("self-tests: pass\n");
> +   return ret;
> +}
> +
> +static void __exit siphash_test_exit(void)
> +{
> +}
> +
> +module_init(siphash_test_init);
> +module_exit(siphash_test_exit);
> +
> +MODULE_AUTHOR("Jason A. Donenfeld ");
> +MODULE_LICENSE("Dual BSD/GPL");
> --
> 2.11.0
>

I believe the output of SipHash depends upon endianness. Folks who
request a digest through the af_alg interface will likely expect a
byte array.

I think that means on little endian machines, values like element 0
must be reversed byte reversed:

0x726fdb47dd0e0e31ULL => 31,0e,0e,dd,47,db,6f,72

If I am not mistaken, that value (and other tv's) are returned here:

return (v0 ^ v1) ^ (v2 ^ v3);

It may be prudent to include the endian reversal in the test to ensure
big endian machines produce expected results. Some closely related
testing on an old Apple PowerMac G5 revealed that result needed to be
reversed before returning it to a caller.

Jeff


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread Jeffrey Walton
> diff --git a/lib/test_siphash.c b/lib/test_siphash.c
> new file mode 100644
> index ..93549e4e22c5
> --- /dev/null
> +++ b/lib/test_siphash.c
> @@ -0,0 +1,83 @@
> +/* Test cases for siphash.c
> + *
> + * Copyright (C) 2016 Jason A. Donenfeld . All Rights 
> Reserved.
> + *
> + * This file is provided under a dual BSD/GPLv2 license.
> + *
> + * SipHash: a fast short-input PRF
> + * https://131002.net/siphash/
> + *
> + * This implementation is specifically for SipHash2-4.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* Test vectors taken from official reference source available at:
> + * https://131002.net/siphash/siphash24.c
> + */
> +static const u64 test_vectors[64] = {
> +   0x726fdb47dd0e0e31ULL, 0x74f839c593dc67fdULL, 0x0d6c8009d9a94f5aULL,
> +   0x85676696d7fb7e2dULL, 0xcf2794e0277187b7ULL, 0x18765564cd99a68dULL,
> +   0xcbc9466e58fee3ceULL, 0xab0200f58b01d137ULL, 0x93f5f5799a932462ULL,
> +   0x9e0082df0ba9e4b0ULL, 0x7a5dbbc594ddb9f3ULL, 0xf4b32f46226bada7ULL,
> +   0x751e8fbc860ee5fbULL, 0x14ea5627c0843d90ULL, 0xf723ca908e7af2eeULL,
> +   0xa129ca6149be45e5ULL, 0x3f2acc7f57c29bdbULL, 0x699ae9f52cbe4794ULL,
> +   0x4bc1b3f0968dd39cULL, 0xbb6dc91da77961bdULL, 0xbed65cf21aa2ee98ULL,
> +   0xd0f2cbb02e3b67c7ULL, 0x93536795e3a33e88ULL, 0xa80c038ccd5ccec8ULL,
> +   0xb8ad50c6f649af94ULL, 0xbce192de8a85b8eaULL, 0x17d835b85bbb15f3ULL,
> +   0x2f2e6163076bcfadULL, 0xde4daaaca71dc9a5ULL, 0xa6a2506687956571ULL,
> +   0xad87a3535c49ef28ULL, 0x32d892fad841c342ULL, 0x7127512f72f27cceULL,
> +   0xa7f32346f95978e3ULL, 0x12e0b01abb051238ULL, 0x15e034d40fa197aeULL,
> +   0x314dffbe0815a3b4ULL, 0x027990f029623981ULL, 0xcadcd4e59ef40c4dULL,
> +   0x9abfd8766a33735cULL, 0x0e3ea96b5304a7d0ULL, 0xad0c42d6fc585992ULL,
> +   0x187306c89bc215a9ULL, 0xd4a60abcf3792b95ULL, 0xf935451de4f21df2ULL,
> +   0xa9538f0419755787ULL, 0xdb9acddff56ca510ULL, 0xd06c98cd5c0975ebULL,
> +   0xe612a3cb9ecba951ULL, 0xc766e62cfcadaf96ULL, 0xee64435a9752fe72ULL,
> +   0xa192d576b245165aULL, 0x0a8787bf8ecb74b2ULL, 0x81b3e73d20b49b6fULL,
> +   0x7fa8220ba3b2eceaULL, 0x245731c13ca42499ULL, 0xb78dbfaf3a8d83bdULL,
> +   0xea1ad565322a1a0bULL, 0x60e61c23a3795013ULL, 0x6606d7e446282b93ULL,
> +   0x6ca4ecb15c5f91e1ULL, 0x9f626da15c9625f3ULL, 0xe51b38608ef25f57ULL,
> +   0x958a324ceb064572ULL
> +};
> +static const siphash_key_t test_key =
> +   { 0x0706050403020100ULL , 0x0f0e0d0c0b0a0908ULL };
> +
> +static int __init siphash_test_init(void)
> +{
> +   u8 in[64] __aligned(SIPHASH_ALIGNMENT);
> +   u8 in_unaligned[65];
> +   u8 i;
> +   int ret = 0;
> +
> +   for (i = 0; i < 64; ++i) {
> +   in[i] = i;
> +   in_unaligned[i + 1] = i;
> +   if (siphash(in, i, test_key) != test_vectors[i]) {
> +   pr_info("self-test aligned %u: FAIL\n", i + 1);
> +   ret = -EINVAL;
> +   }
> +   if (siphash_unaligned(in_unaligned + 1, i, test_key) != 
> test_vectors[i]) {
> +   pr_info("self-test unaligned %u: FAIL\n", i + 1);
> +   ret = -EINVAL;
> +   }
> +   }
> +   if (!ret)
> +   pr_info("self-tests: pass\n");
> +   return ret;
> +}
> +
> +static void __exit siphash_test_exit(void)
> +{
> +}
> +
> +module_init(siphash_test_init);
> +module_exit(siphash_test_exit);
> +
> +MODULE_AUTHOR("Jason A. Donenfeld ");
> +MODULE_LICENSE("Dual BSD/GPL");
> --
> 2.11.0
>

I believe the output of SipHash depends upon endianness. Folks who
request a digest through the af_alg interface will likely expect a
byte array.

I think that means on little endian machines, values like element 0
must be reversed byte reversed:

0x726fdb47dd0e0e31ULL => 31,0e,0e,dd,47,db,6f,72

If I am not mistaken, that value (and other tv's) are returned here:

return (v0 ^ v1) ^ (v2 ^ v3);

It may be prudent to include the endian reversal in the test to ensure
big endian machines produce expected results. Some closely related
testing on an old Apple PowerMac G5 revealed that result needed to be
reversed before returning it to a caller.

Jeff


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread George Spelvin
BTW, here's some SipHash code I wrote for Linux a while ago.

My target application was ext4 directory hashing, resulting in different
implementation choices, although I still think that a rolled-up
implementation like this is reasonable.  Reducing I-cache impact speeds
up the calling code.

One thing I'd like to suggest you steal is the way it handles the
fetch of the final partial word.  It's a lot smaller and faster than
an 8-way case statement.


#include/* For rol64 */
#include 
#include 
#include 

/* The basic ARX mixing function, taken from Skein */
#define SIP_MIX(a, b, s) ((a) += (b), (b) = rol64(b, s), (b) ^= (a))

/*
 * The complete SipRound.  Note that, when unrolled twice like below,
 * the 32-bit rotates drop out on 32-bit machines.
 */
#define SIP_ROUND(a, b, c, d) \
(SIP_MIX(a, b, 13), SIP_MIX(c, d, 16), (a) = rol64(a, 32), \
 SIP_MIX(c, b, 17), SIP_MIX(a, d, 21), (c) = rol64(c, 32))

/*
 * This is rolled up more than most implementations, resulting in about
 * 55% the code size.  Speed is a few precent slower.  A crude benchmark
 * (for (i=1; i <= max; i++) for (j = 0; j < 4096-i; j++) hash(buf+j, i);)
 * produces the following timings (in usec):
 *
 *  i386i386i386x86_64  x86_64  x86_64  x86_64
 * Length   small   unroll  halfmd4 small   unroll  halfmd4 teahash
 * 1..4106910291608 195 160 399 690
 * 1..8248323813851 410 360 9881659
 * 1..12   430341526207 690 61816422690
 * 1..16   612259318668 968 87623633786
 * 1..20   83488137   112451323118531625567
 * 1..24  10580   10327   139351657150440667635
 * 1..28  13211   12956   168032069187150289759
 * 1..32  15843   15572   19725247022606084   11932
 * 1..36  18864   18609   24259293426787566   14794
 * 1..1024  5890194 6130242 10264816 881933  881244 3617392 7589036
 *
 * The performance penalty is quite minor, decreasing for long strings,
 * and it's significantly faster than half_md4, so I'm going for the
 * I-cache win.
 */
uint64_t
siphash24(char const *in, size_t len, uint32_t const seed[4])
{
uint64_t a = 0x736f6d6570736575;/* somepseu */
uint64_t b = 0x646f72616e646f6d;/* dorandom */
uint64_t c = 0x6c7967656e657261;/* lygenera */
uint64_t d = 0x7465646279746573;/* tedbytes */
uint64_t m = 0;
uint8_t padbyte = len;

/*
 * Mix in the 128-bit hash seed.  This is in a format convenient
 * to the ext3/ext4 code.  Please feel free to adapt the
 * */
if (seed) {
m = seed[2] | (uint64_t)seed[3] << 32;
b ^= m;
d ^= m;
m = seed[0] | (uint64_t)seed[1] << 32;
/* a ^= m; is done in loop below */
c ^= m;
}

/*
 * By using the same SipRound code for all iterations, we
 * save space, at the expense of some branch prediction.  But
 * branch prediction is hard because of variable length anyway.
 */
len = len/8 + 3;/* Now number of rounds to perform */
do {
a ^= m;

switch (--len) {
unsigned bytes;

default:/* Full words */
d ^= m = get_unaligned_le64(in);
in += 8;
break;
case 2: /* Final partial word */
/*
 * We'd like to do one 64-bit fetch rather than
 * mess around with bytes, but reading past the end
 * might hit a protection boundary.  Fortunately,
 * we know that protection boundaries are aligned,
 * so we can consider only three cases:
 * - The remainder occupies zero words
 * - The remainder fits into one word
 * - The remainder straddles two words
 */
bytes = padbyte & 7;

if (bytes == 0) {
m = 0;
} else {
unsigned offset = (unsigned)(uintptr_t)in & 7;

if (offset + bytes <= 8) {
m = le64_to_cpup((uint64_t const *)
(in - offset));
m >>= 8*offset;
} else {
m = get_unaligned_le64(in);
}
m &= ((uint64_t)1 << 

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread George Spelvin
BTW, here's some SipHash code I wrote for Linux a while ago.

My target application was ext4 directory hashing, resulting in different
implementation choices, although I still think that a rolled-up
implementation like this is reasonable.  Reducing I-cache impact speeds
up the calling code.

One thing I'd like to suggest you steal is the way it handles the
fetch of the final partial word.  It's a lot smaller and faster than
an 8-way case statement.


#include/* For rol64 */
#include 
#include 
#include 

/* The basic ARX mixing function, taken from Skein */
#define SIP_MIX(a, b, s) ((a) += (b), (b) = rol64(b, s), (b) ^= (a))

/*
 * The complete SipRound.  Note that, when unrolled twice like below,
 * the 32-bit rotates drop out on 32-bit machines.
 */
#define SIP_ROUND(a, b, c, d) \
(SIP_MIX(a, b, 13), SIP_MIX(c, d, 16), (a) = rol64(a, 32), \
 SIP_MIX(c, b, 17), SIP_MIX(a, d, 21), (c) = rol64(c, 32))

/*
 * This is rolled up more than most implementations, resulting in about
 * 55% the code size.  Speed is a few precent slower.  A crude benchmark
 * (for (i=1; i <= max; i++) for (j = 0; j < 4096-i; j++) hash(buf+j, i);)
 * produces the following timings (in usec):
 *
 *  i386i386i386x86_64  x86_64  x86_64  x86_64
 * Length   small   unroll  halfmd4 small   unroll  halfmd4 teahash
 * 1..4106910291608 195 160 399 690
 * 1..8248323813851 410 360 9881659
 * 1..12   430341526207 690 61816422690
 * 1..16   612259318668 968 87623633786
 * 1..20   83488137   112451323118531625567
 * 1..24  10580   10327   139351657150440667635
 * 1..28  13211   12956   168032069187150289759
 * 1..32  15843   15572   19725247022606084   11932
 * 1..36  18864   18609   24259293426787566   14794
 * 1..1024  5890194 6130242 10264816 881933  881244 3617392 7589036
 *
 * The performance penalty is quite minor, decreasing for long strings,
 * and it's significantly faster than half_md4, so I'm going for the
 * I-cache win.
 */
uint64_t
siphash24(char const *in, size_t len, uint32_t const seed[4])
{
uint64_t a = 0x736f6d6570736575;/* somepseu */
uint64_t b = 0x646f72616e646f6d;/* dorandom */
uint64_t c = 0x6c7967656e657261;/* lygenera */
uint64_t d = 0x7465646279746573;/* tedbytes */
uint64_t m = 0;
uint8_t padbyte = len;

/*
 * Mix in the 128-bit hash seed.  This is in a format convenient
 * to the ext3/ext4 code.  Please feel free to adapt the
 * */
if (seed) {
m = seed[2] | (uint64_t)seed[3] << 32;
b ^= m;
d ^= m;
m = seed[0] | (uint64_t)seed[1] << 32;
/* a ^= m; is done in loop below */
c ^= m;
}

/*
 * By using the same SipRound code for all iterations, we
 * save space, at the expense of some branch prediction.  But
 * branch prediction is hard because of variable length anyway.
 */
len = len/8 + 3;/* Now number of rounds to perform */
do {
a ^= m;

switch (--len) {
unsigned bytes;

default:/* Full words */
d ^= m = get_unaligned_le64(in);
in += 8;
break;
case 2: /* Final partial word */
/*
 * We'd like to do one 64-bit fetch rather than
 * mess around with bytes, but reading past the end
 * might hit a protection boundary.  Fortunately,
 * we know that protection boundaries are aligned,
 * so we can consider only three cases:
 * - The remainder occupies zero words
 * - The remainder fits into one word
 * - The remainder straddles two words
 */
bytes = padbyte & 7;

if (bytes == 0) {
m = 0;
} else {
unsigned offset = (unsigned)(uintptr_t)in & 7;

if (offset + bytes <= 8) {
m = le64_to_cpup((uint64_t const *)
(in - offset));
m >>= 8*offset;
} else {
m = get_unaligned_le64(in);
}
m &= ((uint64_t)1 << 

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> I already did this. Check my branch.

Do you think it should return "u32" (as you currently have it) or
"unsigned long"?  I thought the latter, since it doesn't cost any
more and makes more 

> I wonder if this could also lead to a similar aliasing
> with arch_get_random_int, since I'm pretty sure all rdrand-like
> instructions return native word size anyway.

Well, Intel's can return 16, 32 or 64 bits, and it makes a
small difference with reseed scheduling.

>> - Ted, Andy Lutorminski and I will try to figure out a construction of
>>   get_random_long() that we all like.

> And me, I hope... No need to make this exclusive.

Gaah, engage brain before fingers.  That was so obvious I didn't say
it, and the result came out sounding extremely rude.

A better (but longer) way to write it would be "I'm sorry that I, Ted,
and Andy are all arguing with you and each other about how to do this
and we can't finalize this part yet".


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> I already did this. Check my branch.

Do you think it should return "u32" (as you currently have it) or
"unsigned long"?  I thought the latter, since it doesn't cost any
more and makes more 

> I wonder if this could also lead to a similar aliasing
> with arch_get_random_int, since I'm pretty sure all rdrand-like
> instructions return native word size anyway.

Well, Intel's can return 16, 32 or 64 bits, and it makes a
small difference with reseed scheduling.

>> - Ted, Andy Lutorminski and I will try to figure out a construction of
>>   get_random_long() that we all like.

> And me, I hope... No need to make this exclusive.

Gaah, engage brain before fingers.  That was so obvious I didn't say
it, and the result came out sounding extremely rude.

A better (but longer) way to write it would be "I'm sorry that I, Ted,
and Andy are all arguing with you and each other about how to do this
and we can't finalize this part yet".


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
On Sat, Dec 17, 2016 at 12:44 AM, George Spelvin
 wrote:
> Ths advice I'd give now is:
> - Implement
> unsigned long hsiphash(const void *data, size_t len, const unsigned long 
> key[2])
>   .. as SipHash on 64-bit (maybe SipHash-1-3, still being discussed) and
>   HalfSipHash on 32-bit.

I already did this. Check my branch.

> - Document when it may or may not be used carefully.

Good idea. I'll write up some extensive documentation about all of
this, detailing use cases and our various conclusions.

> - #define get_random_int (unsigned)get_random_long

That's a good idea, since ultimately the other just casts in the
return value. I wonder if this could also lead to a similar aliasing
with arch_get_random_int, since I'm pretty sure all rdrand-like
instructions return native word size anyway.

> - Ted, Andy Lutorminski and I will try to figure out a construction of
>   get_random_long() that we all like.

And me, I hope... No need to make this exclusive.

Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
On Sat, Dec 17, 2016 at 12:44 AM, George Spelvin
 wrote:
> Ths advice I'd give now is:
> - Implement
> unsigned long hsiphash(const void *data, size_t len, const unsigned long 
> key[2])
>   .. as SipHash on 64-bit (maybe SipHash-1-3, still being discussed) and
>   HalfSipHash on 32-bit.

I already did this. Check my branch.

> - Document when it may or may not be used carefully.

Good idea. I'll write up some extensive documentation about all of
this, detailing use cases and our various conclusions.

> - #define get_random_int (unsigned)get_random_long

That's a good idea, since ultimately the other just casts in the
return value. I wonder if this could also lead to a similar aliasing
with arch_get_random_int, since I'm pretty sure all rdrand-like
instructions return native word size anyway.

> - Ted, Andy Lutorminski and I will try to figure out a construction of
>   get_random_long() that we all like.

And me, I hope... No need to make this exclusive.

Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> 64-bit security for an RNG is not reasonable even with rekeying. No no
> no. Considering we already have a massive speed-up here with the
> secure version, there's zero reason to start weakening the security
> because we're trigger happy with our benchmarks. No no no.

Just to clarify, I was discussing the idea with Ted (who's in charge of
the whole thing, not me), not trying to make any sort of final decision
on the subject.  I need to look at the various users (46 non-trivial ones
for get_random_int, 15 for get_random_long) and see what their security
requirements actually are.

I'm also trying to see if HalfSipHash can be used in a way that gives
slightly more than 64 bits of effective security.

The problem is that the old MD5-based transform had unclear, but
obviously ample, security.  There were 64 bytes of global secret and
16 chaining bytes per CPU.  Adapting SipHash (even the full version)
takes more thinking.

An actual HalfSipHash-based equivalent to the existing code would be:

#define RANDOM_INT_WORDS (64 / sizeof(long))/* 16 or 8 */

static u32 random_int_secret[RANDOM_INT_WORDS]
cacheline_aligned __read_mostly;
static DEFINE_PER_CPU(unsigned long[4], get_random_int_hash)
__aligned(sizeof(unsigned long));

unsigned long get_random_long(void)
{
unsigned long *hash = get_cpu_var(get_random_int_hash);
unsigned long v0 = hash[0], v1 = hash[1], v2 = hash[2], v3 = hash[3];
int i;

/* This could be improved, but it's equivalent */
v0 += current->pid + jiffies + random_get_entropy();

for (i = 0; i < RANDOM_INT_WORDS; i++) {
v3 ^= random_int_secret[i];
HSIPROUND;
HSIPROUND;
v0 ^= random_int_secret[i];
}
/* To be equivalent, we *don't* finalize the transform */

hash[0] = v0; hash[1] = v1; hash[2] = v2; hash[3] = v3;
put_cpu_var(get_random_int_hash);

return v0 ^ v1 ^ v2 ^ v3;
}

I don't think there's a 2^64 attack on that.

But 64 bytes of global secret is ridiculous if the hash function
doesn't require that minimum block size.  It'll take some thinking.


Ths advice I'd give now is:
- Implement
unsigned long hsiphash(const void *data, size_t len, const unsigned long key[2])
  .. as SipHash on 64-bit (maybe SipHash-1-3, still being discussed) and
  HalfSipHash on 32-bit.
- Document when it may or may not be used carefully.
- #define get_random_int (unsigned)get_random_long
- Ted, Andy Lutorminski and I will try to figure out a construction of
  get_random_long() that we all like.


('scuse me for a few hours, I have some unrelated things I really *should*
be working on...)


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> 64-bit security for an RNG is not reasonable even with rekeying. No no
> no. Considering we already have a massive speed-up here with the
> secure version, there's zero reason to start weakening the security
> because we're trigger happy with our benchmarks. No no no.

Just to clarify, I was discussing the idea with Ted (who's in charge of
the whole thing, not me), not trying to make any sort of final decision
on the subject.  I need to look at the various users (46 non-trivial ones
for get_random_int, 15 for get_random_long) and see what their security
requirements actually are.

I'm also trying to see if HalfSipHash can be used in a way that gives
slightly more than 64 bits of effective security.

The problem is that the old MD5-based transform had unclear, but
obviously ample, security.  There were 64 bytes of global secret and
16 chaining bytes per CPU.  Adapting SipHash (even the full version)
takes more thinking.

An actual HalfSipHash-based equivalent to the existing code would be:

#define RANDOM_INT_WORDS (64 / sizeof(long))/* 16 or 8 */

static u32 random_int_secret[RANDOM_INT_WORDS]
cacheline_aligned __read_mostly;
static DEFINE_PER_CPU(unsigned long[4], get_random_int_hash)
__aligned(sizeof(unsigned long));

unsigned long get_random_long(void)
{
unsigned long *hash = get_cpu_var(get_random_int_hash);
unsigned long v0 = hash[0], v1 = hash[1], v2 = hash[2], v3 = hash[3];
int i;

/* This could be improved, but it's equivalent */
v0 += current->pid + jiffies + random_get_entropy();

for (i = 0; i < RANDOM_INT_WORDS; i++) {
v3 ^= random_int_secret[i];
HSIPROUND;
HSIPROUND;
v0 ^= random_int_secret[i];
}
/* To be equivalent, we *don't* finalize the transform */

hash[0] = v0; hash[1] = v1; hash[2] = v2; hash[3] = v3;
put_cpu_var(get_random_int_hash);

return v0 ^ v1 ^ v2 ^ v3;
}

I don't think there's a 2^64 attack on that.

But 64 bytes of global secret is ridiculous if the hash function
doesn't require that minimum block size.  It'll take some thinking.


Ths advice I'd give now is:
- Implement
unsigned long hsiphash(const void *data, size_t len, const unsigned long key[2])
  .. as SipHash on 64-bit (maybe SipHash-1-3, still being discussed) and
  HalfSipHash on 32-bit.
- Document when it may or may not be used carefully.
- #define get_random_int (unsigned)get_random_long
- Ted, Andy Lutorminski and I will try to figure out a construction of
  get_random_long() that we all like.


('scuse me for a few hours, I have some unrelated things I really *should*
be working on...)


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
An idea I had which mght be useful:

You could perhaps save two rounds in siphash_*u64.

The final word with the length (called "b" in your implementation)
only needs to be there if the input is variable-sized.

If every use of a given key is of a fixed-size input, you don't need
a length suffix.  When the input is an even number of words, that can
save you two rounds.

This requires an audit of callers (e.g. you have to use different
keys for IPv4 and IPv6 ISNs), but can save time.

(This is crypto 101; search "MD-strengthening" or see the remark on
p. 101 on Damgaard's 1989 paper "A design principle for hash functions" at
http://saluc.engr.uconn.edu/refs/algorithms/hashalg/damgard89adesign.pdf
but I'm sure that Ted, Jean-Philippe, and/or DJB will confirm if you'd
like.)

Jason A. Donenfeld wrote:
> Oh, okay, that is exactly what I thought was going on. I just thought
> you were implying that jiffies could be moved inside the hash, which
> then confused my understanding of how things should be. In any case,
> thanks for the explanation.

No, the rekeying procedure is cleverer.

The thing is, all that matters is that the ISN increments fast enough,
but not wrap too soon.

It *is* permitted to change the random base, as long as it only
increases, and slower than the timestamp does.

So what you do is every few minutes, you increment the high 4 bits of the
random base and change the key used to generate the low 28 bits.

The base used for any particular host might change from 0x1000
to 0x2fff, or from 0x1fff to 0x2000, but either way, it's
increasing, and not too fast.

This has the downside that an attacker can see 4 bits of the base,
so only needs to send send 2^28 = 256 MB to flood the connection,
but the upside that the key used to generate the low bits changes
faster than it can be broken.


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
An idea I had which mght be useful:

You could perhaps save two rounds in siphash_*u64.

The final word with the length (called "b" in your implementation)
only needs to be there if the input is variable-sized.

If every use of a given key is of a fixed-size input, you don't need
a length suffix.  When the input is an even number of words, that can
save you two rounds.

This requires an audit of callers (e.g. you have to use different
keys for IPv4 and IPv6 ISNs), but can save time.

(This is crypto 101; search "MD-strengthening" or see the remark on
p. 101 on Damgaard's 1989 paper "A design principle for hash functions" at
http://saluc.engr.uconn.edu/refs/algorithms/hashalg/damgard89adesign.pdf
but I'm sure that Ted, Jean-Philippe, and/or DJB will confirm if you'd
like.)

Jason A. Donenfeld wrote:
> Oh, okay, that is exactly what I thought was going on. I just thought
> you were implying that jiffies could be moved inside the hash, which
> then confused my understanding of how things should be. In any case,
> thanks for the explanation.

No, the rekeying procedure is cleverer.

The thing is, all that matters is that the ISN increments fast enough,
but not wrap too soon.

It *is* permitted to change the random base, as long as it only
increases, and slower than the timestamp does.

So what you do is every few minutes, you increment the high 4 bits of the
random base and change the key used to generate the low 28 bits.

The base used for any particular host might change from 0x1000
to 0x2fff, or from 0x1fff to 0x2000, but either way, it's
increasing, and not too fast.

This has the downside that an attacker can see 4 bits of the base,
so only needs to send send 2^28 = 256 MB to flood the connection,
but the upside that the key used to generate the low bits changes
faster than it can be broken.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
On Fri, Dec 16, 2016 at 11:13 PM, George Spelvin
 wrote:
> Remembering that on "real" machines it's full SipHash, then I'd say that
> 64-bit security + rekeying seems reasonable.

64-bit security for an RNG is not reasonable even with rekeying. No no
no. Considering we already have a massive speed-up here with the
secure version, there's zero reason to start weakening the security
because we're trigger happy with our benchmarks. No no no.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
On Fri, Dec 16, 2016 at 11:13 PM, George Spelvin
 wrote:
> Remembering that on "real" machines it's full SipHash, then I'd say that
> 64-bit security + rekeying seems reasonable.

64-bit security for an RNG is not reasonable even with rekeying. No no
no. Considering we already have a massive speed-up here with the
secure version, there's zero reason to start weakening the security
because we're trigger happy with our benchmarks. No no no.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Andy Lutomirski
On Fri, Dec 16, 2016 at 2:13 PM, George Spelvin
 wrote:
>> What should we do with get_random_int() and get_random_long()?  In
>> some cases it's being used in performance sensitive areas, and where
>> anti-DoS protection might be enough.  In others, maybe not so much.
>
> This is tricky.  The entire get_random_int() structure is an abuse of
> the hash function and will need to be thoroughly rethought to convert
> it to SipHash.  Remember, SipHash's security goals are very different
> from MD5, so there's no obvious way to do the conversion.
>
> (It's *documented* as "not cryptographically secure", but we know
> where that goes.)
>
>> If we rekeyed the secret used by get_random_int() and
>> get_random_long() frequently (say, every minute or every 5 minutes),
>> would that be sufficient for current and future users of these
>> interfaces?
>
> Remembering that on "real" machines it's full SipHash, then I'd say that
> 64-bit security + rekeying seems reasonable.
>
> The question is, the idea has recently been floated to make hsiphash =
> SipHash-1-3 on 64-bit machines.  Is *that* okay?
>
>
> The annoying thing about the currently proposed patch is that the *only*
> chaining is the returned value.  What I'd *like* to do is the same
> pattern as we do with md5, and remember v[0..3] between invocations.
> But there's no partial SipHash primitive; we only get one word back.
>
> Even
> *chaining += ret = siphash_3u64(...)
>
> would be an improvement.

This is almost exactly what I suggested in my email on the other
thread from a few seconds ago :)

--Andy


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Andy Lutomirski
On Fri, Dec 16, 2016 at 2:13 PM, George Spelvin
 wrote:
>> What should we do with get_random_int() and get_random_long()?  In
>> some cases it's being used in performance sensitive areas, and where
>> anti-DoS protection might be enough.  In others, maybe not so much.
>
> This is tricky.  The entire get_random_int() structure is an abuse of
> the hash function and will need to be thoroughly rethought to convert
> it to SipHash.  Remember, SipHash's security goals are very different
> from MD5, so there's no obvious way to do the conversion.
>
> (It's *documented* as "not cryptographically secure", but we know
> where that goes.)
>
>> If we rekeyed the secret used by get_random_int() and
>> get_random_long() frequently (say, every minute or every 5 minutes),
>> would that be sufficient for current and future users of these
>> interfaces?
>
> Remembering that on "real" machines it's full SipHash, then I'd say that
> 64-bit security + rekeying seems reasonable.
>
> The question is, the idea has recently been floated to make hsiphash =
> SipHash-1-3 on 64-bit machines.  Is *that* okay?
>
>
> The annoying thing about the currently proposed patch is that the *only*
> chaining is the returned value.  What I'd *like* to do is the same
> pattern as we do with md5, and remember v[0..3] between invocations.
> But there's no partial SipHash primitive; we only get one word back.
>
> Even
> *chaining += ret = siphash_3u64(...)
>
> would be an improvement.

This is almost exactly what I suggested in my email on the other
thread from a few seconds ago :)

--Andy


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> What should we do with get_random_int() and get_random_long()?  In
> some cases it's being used in performance sensitive areas, and where
> anti-DoS protection might be enough.  In others, maybe not so much.

This is tricky.  The entire get_random_int() structure is an abuse of
the hash function and will need to be thoroughly rethought to convert
it to SipHash.  Remember, SipHash's security goals are very different
from MD5, so there's no obvious way to do the conversion.

(It's *documented* as "not cryptographically secure", but we know
where that goes.)

> If we rekeyed the secret used by get_random_int() and
> get_random_long() frequently (say, every minute or every 5 minutes),
> would that be sufficient for current and future users of these
> interfaces?

Remembering that on "real" machines it's full SipHash, then I'd say that
64-bit security + rekeying seems reasonable.

The question is, the idea has recently been floated to make hsiphash =
SipHash-1-3 on 64-bit machines.  Is *that* okay?


The annoying thing about the currently proposed patch is that the *only*
chaining is the returned value.  What I'd *like* to do is the same
pattern as we do with md5, and remember v[0..3] between invocations.
But there's no partial SipHash primitive; we only get one word back.

Even
*chaining += ret = siphash_3u64(...)

would be an improvement.

Although we could do something like

c0 = chaining[0];
chaining[0] = c1 = chaining[1];

ret = hsiphash(c0, c1, ...)

chaining[1] = c0 + ret;


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> What should we do with get_random_int() and get_random_long()?  In
> some cases it's being used in performance sensitive areas, and where
> anti-DoS protection might be enough.  In others, maybe not so much.

This is tricky.  The entire get_random_int() structure is an abuse of
the hash function and will need to be thoroughly rethought to convert
it to SipHash.  Remember, SipHash's security goals are very different
from MD5, so there's no obvious way to do the conversion.

(It's *documented* as "not cryptographically secure", but we know
where that goes.)

> If we rekeyed the secret used by get_random_int() and
> get_random_long() frequently (say, every minute or every 5 minutes),
> would that be sufficient for current and future users of these
> interfaces?

Remembering that on "real" machines it's full SipHash, then I'd say that
64-bit security + rekeying seems reasonable.

The question is, the idea has recently been floated to make hsiphash =
SipHash-1-3 on 64-bit machines.  Is *that* okay?


The annoying thing about the currently proposed patch is that the *only*
chaining is the returned value.  What I'd *like* to do is the same
pattern as we do with md5, and remember v[0..3] between invocations.
But there's no partial SipHash primitive; we only get one word back.

Even
*chaining += ret = siphash_3u64(...)

would be an improvement.

Although we could do something like

c0 = chaining[0];
chaining[0] = c1 = chaining[1];

ret = hsiphash(c0, c1, ...)

chaining[1] = c0 + ret;


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi George,

On Fri, Dec 16, 2016 at 10:25 PM, George Spelvin
 wrote:
> But yes, the sequence number is supposed to be (random base) + (timestamp).
> In the old days before Canter & Siegel when the internet was a nice place,
> people just used a counter that started at boot time.
>
> But then someone observed that I can start a connection to host X,
> see the sequence number it gives back to me, and thereby learn the
> seauence number it's using on its connections to host Y.
>
> And I can use that to inject forged data into an X-to-Y connection,
> without ever seeing a single byte of the traffic!  (If I *can* observe
> the traffic, of course, none of this makes the slightest difference.)
>
> So the random base was made a keyed hash of the endpoint identifiers.
> (Practically only the hosts matter, but generally the ports are thrown
> in for good measure.)  That way, the ISN that host X sends to me
> tells me nothing about the ISN it's using to talk to host Y.  Now the
> only way to inject forged data into the X-to-Y connection is to
> send 2^32 bytes, which is a little less practical.

Oh, okay, that is exactly what I thought was going on. I just thought
you were implying that jiffies could be moved inside the hash, which
then confused my understanding of how things should be. In any case,
thanks for the explanation.

Jason


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi George,

On Fri, Dec 16, 2016 at 10:25 PM, George Spelvin
 wrote:
> But yes, the sequence number is supposed to be (random base) + (timestamp).
> In the old days before Canter & Siegel when the internet was a nice place,
> people just used a counter that started at boot time.
>
> But then someone observed that I can start a connection to host X,
> see the sequence number it gives back to me, and thereby learn the
> seauence number it's using on its connections to host Y.
>
> And I can use that to inject forged data into an X-to-Y connection,
> without ever seeing a single byte of the traffic!  (If I *can* observe
> the traffic, of course, none of this makes the slightest difference.)
>
> So the random base was made a keyed hash of the endpoint identifiers.
> (Practically only the hosts matter, but generally the ports are thrown
> in for good measure.)  That way, the ISN that host X sends to me
> tells me nothing about the ISN it's using to talk to host Y.  Now the
> only way to inject forged data into the X-to-Y connection is to
> send 2^32 bytes, which is a little less practical.

Oh, okay, that is exactly what I thought was going on. I just thought
you were implying that jiffies could be moved inside the hash, which
then confused my understanding of how things should be. In any case,
thanks for the explanation.

Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
Jason A. Donenfeld wrote:
> I saw that jiffies addition in there and was wondering what it was all
> about. It's currently added _after_ the siphash input, not before, to
> keep with how the old algorithm worked. I'm not sure if this is
> correct or if there's something wrong with that, as I haven't studied
> how it works. If that jiffies should be part of the siphash input and
> not added to the result, please tell me. Otherwise I'll keep things
> how they are to avoid breaking something that seems to be working.

Oh, geez, I didn't realize you didn't understand this code.

Full details at
https://en.wikipedia.org/wiki/TCP_sequence_prediction_attack

But yes, the sequence number is supposed to be (random base) + (timestamp).
In the old days before Canter & Siegel when the internet was a nice place,
people just used a counter that started at boot time.

But then someone observed that I can start a connection to host X,
see the sequence number it gives back to me, and thereby learn the
seauence number it's using on its connections to host Y.

And I can use that to inject forged data into an X-to-Y connection,
without ever seeing a single byte of the traffic!  (If I *can* observe
the traffic, of course, none of this makes the slightest difference.)

So the random base was made a keyed hash of the endpoint identifiers.
(Practically only the hosts matter, but generally the ports are thrown
in for good measure.)  That way, the ISN that host X sends to me
tells me nothing about the ISN it's using to talk to host Y.  Now the
only way to inject forged data into the X-to-Y connection is to
send 2^32 bytes, which is a little less practical.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
Jason A. Donenfeld wrote:
> I saw that jiffies addition in there and was wondering what it was all
> about. It's currently added _after_ the siphash input, not before, to
> keep with how the old algorithm worked. I'm not sure if this is
> correct or if there's something wrong with that, as I haven't studied
> how it works. If that jiffies should be part of the siphash input and
> not added to the result, please tell me. Otherwise I'll keep things
> how they are to avoid breaking something that seems to be working.

Oh, geez, I didn't realize you didn't understand this code.

Full details at
https://en.wikipedia.org/wiki/TCP_sequence_prediction_attack

But yes, the sequence number is supposed to be (random base) + (timestamp).
In the old days before Canter & Siegel when the internet was a nice place,
people just used a counter that started at boot time.

But then someone observed that I can start a connection to host X,
see the sequence number it gives back to me, and thereby learn the
seauence number it's using on its connections to host Y.

And I can use that to inject forged data into an X-to-Y connection,
without ever seeing a single byte of the traffic!  (If I *can* observe
the traffic, of course, none of this makes the slightest difference.)

So the random base was made a keyed hash of the endpoint identifiers.
(Practically only the hosts matter, but generally the ports are thrown
in for good measure.)  That way, the ISN that host X sends to me
tells me nothing about the ISN it's using to talk to host Y.  Now the
only way to inject forged data into the X-to-Y connection is to
send 2^32 bytes, which is a little less practical.


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Hannes Frederic Sowa
On Fri, Dec 16, 2016, at 22:01, Jason A. Donenfeld wrote:
> Yes, on x86-64. But on i386 chacha20 incurs nearly the same kind of
> slowdown as siphash, so I expect the comparison to be more or less
> equal. There's another thing I really didn't like about your chacha20
> approach which is that it uses the /dev/urandom pool, which means
> various things need to kick in in the background to refill this.
> Additionally, having to refill the buffered chacha output every 32 or
> so longs isn't nice. These things together make for inconsistent and
> hard to understand general operating system performance, because
> get_random_long is called at every process startup for ASLR. So, in
> the end, I believe there's another reason for going with the siphash
> approach: deterministic performance.

*Hust*, so from where do you generate your key for siphash if called
early from ASLR?

Bye,
Hannes


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Hannes Frederic Sowa
On Fri, Dec 16, 2016, at 22:01, Jason A. Donenfeld wrote:
> Yes, on x86-64. But on i386 chacha20 incurs nearly the same kind of
> slowdown as siphash, so I expect the comparison to be more or less
> equal. There's another thing I really didn't like about your chacha20
> approach which is that it uses the /dev/urandom pool, which means
> various things need to kick in in the background to refill this.
> Additionally, having to refill the buffered chacha output every 32 or
> so longs isn't nice. These things together make for inconsistent and
> hard to understand general operating system performance, because
> get_random_long is called at every process startup for ASLR. So, in
> the end, I believe there's another reason for going with the siphash
> approach: deterministic performance.

*Hust*, so from where do you generate your key for siphash if called
early from ASLR?

Bye,
Hannes


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi Daniel,

On Fri, Dec 16, 2016 at 9:44 PM, Daniel Micay  wrote:
> On Fri, 2016-12-16 at 11:47 -0800, Tom Herbert wrote:
>>
>> That's about 3x of jhash speed (7 nsecs). So that might closer
>> to a more palatable replacement for jhash. Do we lose any security
>> advantages with halfsiphash?
>
> Have you tested a lower round SipHash? Probably best to stick with the
> usual construction for non-DoS mitigation, but why not try SipHash 1-3,
> 1-2, etc. for DoS mitigation?
>
> Rust and Swift both went with SipHash 1-3 for hash tables.

Maybe not a bad idea.

SipHash2-4 for MD5 replacement, as we've done so far. This is when we
actually want things to be secure (and fast).

And then HalfSipHash1-3 for certain jhash replacements. This is for
when we're talking only about DoS or sort of just joking about
security, and want things to be very competitive with jhash. (Of
course for 64-bit we'd use SipHash1-3 instead of HalfSipHash for the
speedup.)

I need to think on this a bit more, but preliminarily, I guess this
would be maybe okay...

George, JP - what do you think?

Jason


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi Daniel,

On Fri, Dec 16, 2016 at 9:44 PM, Daniel Micay  wrote:
> On Fri, 2016-12-16 at 11:47 -0800, Tom Herbert wrote:
>>
>> That's about 3x of jhash speed (7 nsecs). So that might closer
>> to a more palatable replacement for jhash. Do we lose any security
>> advantages with halfsiphash?
>
> Have you tested a lower round SipHash? Probably best to stick with the
> usual construction for non-DoS mitigation, but why not try SipHash 1-3,
> 1-2, etc. for DoS mitigation?
>
> Rust and Swift both went with SipHash 1-3 for hash tables.

Maybe not a bad idea.

SipHash2-4 for MD5 replacement, as we've done so far. This is when we
actually want things to be secure (and fast).

And then HalfSipHash1-3 for certain jhash replacements. This is for
when we're talking only about DoS or sort of just joking about
security, and want things to be very competitive with jhash. (Of
course for 64-bit we'd use SipHash1-3 instead of HalfSipHash for the
speedup.)

I need to think on this a bit more, but preliminarily, I guess this
would be maybe okay...

George, JP - what do you think?

Jason


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi Ted,

On Fri, Dec 16, 2016 at 9:43 PM, Theodore Ts'o  wrote:
> What should we do with get_random_int() and get_random_long()?  In
> some cases it's being used in performance sensitive areas, and where
> anti-DoS protection might be enough.  In others, maybe not so much.
>
> If we rekeyed the secret used by get_random_int() and
> get_random_long() frequently (say, every minute or every 5 minutes),
> would that be sufficient for current and future users of these
> interfaces?

get_random_int() and get_random_long() should quite clearly use
SipHash with its secure 128-bit key and not HalfSipHash with its
64-bit key. HalfSipHash is absolutely insufficient for this use case.
Remember, we're already an order of magnitude or more faster than
md5...

With regard to periodic rekeying... since the secret is 128-bits, I
believe this is likely sufficient for _not_ rekeying. There's also the
chaining variable, to tie together invocations of the function. If
you'd prefer, instead of the chaining variable, we could use some
siphash output to mutate the original key, but I don't think this
approach is actually better and might introduce vulnerabilities. In my
opinion chaining+128bitkey is sufficient. On the other hand, rekeying
every X minutes is 3 or 4 lines of code. If you want (just say so),
I'll add this to my next revision.

You asked about the security requirements of these functions. The
comment says they're not cryptographically secure. And right now with
MD5 they're not. So the expectations are pretty low. Moving to siphash
adds some cryptographic security, certainly. Moving to siphash plus
rekeying adds a bit more. Of course, on recent x86, RDRAND is used
instead, so the cryptographic strength then depends on the thickness
of your tinfoil hat. So probably we shouldn't change what we advertise
these functions provide, even though we're certainly improving them
performance-wise and security-wise.

> P.S.  I'll note that my performance figures when testing changes to
> get_random_int() were done on a 32-bit x86; Jason, I'm guessing your
> figures were using a 64-bit x86 system?.  I haven't tried 32-bit ARM
> or smaller CPU's (e.g., mips, et. al.) that might be more likely to be
> used on IoT devices, but I'm worried about those too, of course.

Yes, on x86-64. But on i386 chacha20 incurs nearly the same kind of
slowdown as siphash, so I expect the comparison to be more or less
equal. There's another thing I really didn't like about your chacha20
approach which is that it uses the /dev/urandom pool, which means
various things need to kick in in the background to refill this.
Additionally, having to refill the buffered chacha output every 32 or
so longs isn't nice. These things together make for inconsistent and
hard to understand general operating system performance, because
get_random_long is called at every process startup for ASLR. So, in
the end, I believe there's another reason for going with the siphash
approach: deterministic performance.

Jason


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi Ted,

On Fri, Dec 16, 2016 at 9:43 PM, Theodore Ts'o  wrote:
> What should we do with get_random_int() and get_random_long()?  In
> some cases it's being used in performance sensitive areas, and where
> anti-DoS protection might be enough.  In others, maybe not so much.
>
> If we rekeyed the secret used by get_random_int() and
> get_random_long() frequently (say, every minute or every 5 minutes),
> would that be sufficient for current and future users of these
> interfaces?

get_random_int() and get_random_long() should quite clearly use
SipHash with its secure 128-bit key and not HalfSipHash with its
64-bit key. HalfSipHash is absolutely insufficient for this use case.
Remember, we're already an order of magnitude or more faster than
md5...

With regard to periodic rekeying... since the secret is 128-bits, I
believe this is likely sufficient for _not_ rekeying. There's also the
chaining variable, to tie together invocations of the function. If
you'd prefer, instead of the chaining variable, we could use some
siphash output to mutate the original key, but I don't think this
approach is actually better and might introduce vulnerabilities. In my
opinion chaining+128bitkey is sufficient. On the other hand, rekeying
every X minutes is 3 or 4 lines of code. If you want (just say so),
I'll add this to my next revision.

You asked about the security requirements of these functions. The
comment says they're not cryptographically secure. And right now with
MD5 they're not. So the expectations are pretty low. Moving to siphash
adds some cryptographic security, certainly. Moving to siphash plus
rekeying adds a bit more. Of course, on recent x86, RDRAND is used
instead, so the cryptographic strength then depends on the thickness
of your tinfoil hat. So probably we shouldn't change what we advertise
these functions provide, even though we're certainly improving them
performance-wise and security-wise.

> P.S.  I'll note that my performance figures when testing changes to
> get_random_int() were done on a 32-bit x86; Jason, I'm guessing your
> figures were using a 64-bit x86 system?.  I haven't tried 32-bit ARM
> or smaller CPU's (e.g., mips, et. al.) that might be more likely to be
> used on IoT devices, but I'm worried about those too, of course.

Yes, on x86-64. But on i386 chacha20 incurs nearly the same kind of
slowdown as siphash, so I expect the comparison to be more or less
equal. There's another thing I really didn't like about your chacha20
approach which is that it uses the /dev/urandom pool, which means
various things need to kick in in the background to refill this.
Additionally, having to refill the buffered chacha output every 32 or
so longs isn't nice. These things together make for inconsistent and
hard to understand general operating system performance, because
get_random_long is called at every process startup for ASLR. So, in
the end, I believe there's another reason for going with the siphash
approach: deterministic performance.

Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Tom Herbert
On Fri, Dec 16, 2016 at 12:41 PM, George Spelvin
 wrote:
> Tom Herbert wrote:
>> Tested this. Distribution and avalanche effect are still good. Speed
>> wise I see about a 33% improvement over siphash (20 nsecs/op versus 32
>> nsecs). That's about 3x of jhash speed (7 nsecs). So that might closer
>> to a more palatable replacement for jhash. Do we lose any security
>> advantages with halfsiphash?
>
> What are you testing on?  And what input size?  And does "33% improvement"
> mean 4/3 the rate and 3/4 the time?  Or 2/3 the time and 3/2 the rate?
>
Sorry, that is over an IPv4 tuple. Intel(R) Xeon(R) CPU E5-2660 0 @
2.20GHz. Recoded the function I was using to look like more like 64
bit version and yes it is indeed slower.

> These are very odd results.  On a 64-bit machine, SipHash should be the
> same speed per round, and faster because it hashes more data per round.
> (Unless you're hitting some unexpected cache/decode effect due to REX
> prefixes.)
>
> On a 32-bit machine (other than ARM, where your results might make sense,
> or maybe if you're hashing large amounts of data), the difference should
> be larger.
>
> And yes, there is a *significant* security loss.  SipHash is 128 bits
> ("don't worry about it").  hsiphash is 64 bits, which is known breakable
> ("worry about it"), so we have to do a careful analysis of the cost of
> a successful attack.
>
> As mentioned in the e-mails that just flew by, hsiphash is intended
> *only* for 32-bit machines which bog down on full SipHash.  On all 64-bit
> machines, it will be implemented as an alias for SipHash and the security
> concerns will Just Go Away.
>
> The place where hsiphash is expected to make a big difference is 32-bit
> x86.  If you only see 33% difference with "gcc -m32", I'm going to be
> very confused.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Tom Herbert
On Fri, Dec 16, 2016 at 12:41 PM, George Spelvin
 wrote:
> Tom Herbert wrote:
>> Tested this. Distribution and avalanche effect are still good. Speed
>> wise I see about a 33% improvement over siphash (20 nsecs/op versus 32
>> nsecs). That's about 3x of jhash speed (7 nsecs). So that might closer
>> to a more palatable replacement for jhash. Do we lose any security
>> advantages with halfsiphash?
>
> What are you testing on?  And what input size?  And does "33% improvement"
> mean 4/3 the rate and 3/4 the time?  Or 2/3 the time and 3/2 the rate?
>
Sorry, that is over an IPv4 tuple. Intel(R) Xeon(R) CPU E5-2660 0 @
2.20GHz. Recoded the function I was using to look like more like 64
bit version and yes it is indeed slower.

> These are very odd results.  On a 64-bit machine, SipHash should be the
> same speed per round, and faster because it hashes more data per round.
> (Unless you're hitting some unexpected cache/decode effect due to REX
> prefixes.)
>
> On a 32-bit machine (other than ARM, where your results might make sense,
> or maybe if you're hashing large amounts of data), the difference should
> be larger.
>
> And yes, there is a *significant* security loss.  SipHash is 128 bits
> ("don't worry about it").  hsiphash is 64 bits, which is known breakable
> ("worry about it"), so we have to do a careful analysis of the cost of
> a successful attack.
>
> As mentioned in the e-mails that just flew by, hsiphash is intended
> *only* for 32-bit machines which bog down on full SipHash.  On all 64-bit
> machines, it will be implemented as an alias for SipHash and the security
> concerns will Just Go Away.
>
> The place where hsiphash is expected to make a big difference is 32-bit
> x86.  If you only see 33% difference with "gcc -m32", I'm going to be
> very confused.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
On Fri, Dec 16, 2016 at 9:17 PM, George Spelvin
 wrote:
> My (speaking enerally; I should walk through every hash table you've
> converted) opinion is that:
>
> - Hash tables, even network-facing ones, can all use hsiphash as long
>   as an attacker can only see collisions, i.e. ((H(x) ^ H(y)) & bits) ==
>   0, and the consequences of a successful attack is only more collisions
>   (timing).  While the attack is only 2x the cost (two hashes rather than
>   one to test a key), the knowledge of the collision is statistical,
>   especially for network attackers, which raises the cost of guessing
>   beyond an even more brute-force attack.
> - When the hash value directly visible (e.g. included in a network
>   packet), full SipHash should be the default.
> - Syncookies *could* use hsiphash, especially as there are
>   two keys in there.  Not sure if we need the performance.
> - For TCP ISNs, I'd prefer to use full SipHash.  I know this is
>   a very hot path, and if that's a performance bottleneck,
>   we can work harder on it.
>
> In particular, TCP ISNs *used* to rotate the key periodically,
> limiting the time available to an attacker to perform an
> attack before the secret goes stale and is useless.  commit
> 6e5714eaf77d79ae1c8b47e3e040ff5411b717ec upgraded to md5 and dropped
> the key rotation.

While I generally agree with this analysis for the most part, I do
think we should use SipHash and not HalfSipHash for syncookies.
Although the security risk is lower than with sequence numbers, it
previously used full MD5 for this, which means performance is not
generally a bottleneck and we'll get massive speedups no matter what,
whether using SipHash or HalfSipHash. In addition, using SipHash means
that the 128-bit key gives a larger margin and can be safe longterm.
So, I think we should err on the side of caution and stick with
SipHash in all cases in which we're upgrading from MD5.

In other words, only current jhash users should be potentially
eligible for hsiphash.


> Current code uses a 64 ns tick for the ISN, so it counts 2^24 per second.
> (32 bits wraps every 4.6 minutes.)  A 4-bit counter and 28-bit hash
> (or even 3+29) would work as long as the key is regenerated no more
> than once per minute.  (Just using the 4.6-minute ISN wrap time is the
> obvious simple implementation.)
>
> (Of course, I defer to DaveM's judgement on all network-related issues.)

I saw that jiffies addition in there and was wondering what it was all
about. It's currently added _after_ the siphash input, not before, to
keep with how the old algorithm worked. I'm not sure if this is
correct or if there's something wrong with that, as I haven't studied
how it works. If that jiffies should be part of the siphash input and
not added to the result, please tell me. Otherwise I'll keep things
how they are to avoid breaking something that seems to be working.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
On Fri, Dec 16, 2016 at 9:17 PM, George Spelvin
 wrote:
> My (speaking enerally; I should walk through every hash table you've
> converted) opinion is that:
>
> - Hash tables, even network-facing ones, can all use hsiphash as long
>   as an attacker can only see collisions, i.e. ((H(x) ^ H(y)) & bits) ==
>   0, and the consequences of a successful attack is only more collisions
>   (timing).  While the attack is only 2x the cost (two hashes rather than
>   one to test a key), the knowledge of the collision is statistical,
>   especially for network attackers, which raises the cost of guessing
>   beyond an even more brute-force attack.
> - When the hash value directly visible (e.g. included in a network
>   packet), full SipHash should be the default.
> - Syncookies *could* use hsiphash, especially as there are
>   two keys in there.  Not sure if we need the performance.
> - For TCP ISNs, I'd prefer to use full SipHash.  I know this is
>   a very hot path, and if that's a performance bottleneck,
>   we can work harder on it.
>
> In particular, TCP ISNs *used* to rotate the key periodically,
> limiting the time available to an attacker to perform an
> attack before the secret goes stale and is useless.  commit
> 6e5714eaf77d79ae1c8b47e3e040ff5411b717ec upgraded to md5 and dropped
> the key rotation.

While I generally agree with this analysis for the most part, I do
think we should use SipHash and not HalfSipHash for syncookies.
Although the security risk is lower than with sequence numbers, it
previously used full MD5 for this, which means performance is not
generally a bottleneck and we'll get massive speedups no matter what,
whether using SipHash or HalfSipHash. In addition, using SipHash means
that the 128-bit key gives a larger margin and can be safe longterm.
So, I think we should err on the side of caution and stick with
SipHash in all cases in which we're upgrading from MD5.

In other words, only current jhash users should be potentially
eligible for hsiphash.


> Current code uses a 64 ns tick for the ISN, so it counts 2^24 per second.
> (32 bits wraps every 4.6 minutes.)  A 4-bit counter and 28-bit hash
> (or even 3+29) would work as long as the key is regenerated no more
> than once per minute.  (Just using the 4.6-minute ISN wrap time is the
> obvious simple implementation.)
>
> (Of course, I defer to DaveM's judgement on all network-related issues.)

I saw that jiffies addition in there and was wondering what it was all
about. It's currently added _after_ the siphash input, not before, to
keep with how the old algorithm worked. I'm not sure if this is
correct or if there's something wrong with that, as I haven't studied
how it works. If that jiffies should be part of the siphash input and
not added to the result, please tell me. Otherwise I'll keep things
how they are to avoid breaking something that seems to be working.


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Daniel Micay
On Fri, 2016-12-16 at 11:47 -0800, Tom Herbert wrote:
> 
> That's about 3x of jhash speed (7 nsecs). So that might closer
> to a more palatable replacement for jhash. Do we lose any security
> advantages with halfsiphash?

Have you tested a lower round SipHash? Probably best to stick with the
usual construction for non-DoS mitigation, but why not try SipHash 1-3,
1-2, etc. for DoS mitigation?

Rust and Swift both went with SipHash 1-3 for hash tables.

signature.asc
Description: This is a digitally signed message part


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Theodore Ts'o
On Fri, Dec 16, 2016 at 03:17:39PM -0500, George Spelvin wrote:
> > That's a nice analysis. Might one conclude from that that hsiphash is
> > not useful for our purposes? Or does it still remain useful for
> > network facing code?
> 
> I think for attacks where the threat is a DoS, it's usable.  The point
> is you only have to raise the cost to equal that of a packet flood.
> (Just like in electronic warfare, the best you can possibly do is force
> the enemy to use broadband jamming.)
> 
> Hash collision attacks just aren't that powerful.  The original PoC
> was against an application that implemented a hard limit on hash chain
> length as a DoS defense, which the attack then exploited to turn it into
> a hard DoS.

What should we do with get_random_int() and get_random_long()?  In
some cases it's being used in performance sensitive areas, and where
anti-DoS protection might be enough.  In others, maybe not so much.

If we rekeyed the secret used by get_random_int() and
get_random_long() frequently (say, every minute or every 5 minutes),
would that be sufficient for current and future users of these
interfaces?

- Ted

P.S.  I'll note that my performance figures when testing changes to
get_random_int() were done on a 32-bit x86; Jason, I'm guessing your
figures were using a 64-bit x86 system?.  I haven't tried 32-bit ARM
or smaller CPU's (e.g., mips, et. al.) that might be more likely to be
used on IoT devices, but I'm worried about those too, of course.



Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Daniel Micay
On Fri, 2016-12-16 at 11:47 -0800, Tom Herbert wrote:
> 
> That's about 3x of jhash speed (7 nsecs). So that might closer
> to a more palatable replacement for jhash. Do we lose any security
> advantages with halfsiphash?

Have you tested a lower round SipHash? Probably best to stick with the
usual construction for non-DoS mitigation, but why not try SipHash 1-3,
1-2, etc. for DoS mitigation?

Rust and Swift both went with SipHash 1-3 for hash tables.

signature.asc
Description: This is a digitally signed message part


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Theodore Ts'o
On Fri, Dec 16, 2016 at 03:17:39PM -0500, George Spelvin wrote:
> > That's a nice analysis. Might one conclude from that that hsiphash is
> > not useful for our purposes? Or does it still remain useful for
> > network facing code?
> 
> I think for attacks where the threat is a DoS, it's usable.  The point
> is you only have to raise the cost to equal that of a packet flood.
> (Just like in electronic warfare, the best you can possibly do is force
> the enemy to use broadband jamming.)
> 
> Hash collision attacks just aren't that powerful.  The original PoC
> was against an application that implemented a hard limit on hash chain
> length as a DoS defense, which the attack then exploited to turn it into
> a hard DoS.

What should we do with get_random_int() and get_random_long()?  In
some cases it's being used in performance sensitive areas, and where
anti-DoS protection might be enough.  In others, maybe not so much.

If we rekeyed the secret used by get_random_int() and
get_random_long() frequently (say, every minute or every 5 minutes),
would that be sufficient for current and future users of these
interfaces?

- Ted

P.S.  I'll note that my performance figures when testing changes to
get_random_int() were done on a 32-bit x86; Jason, I'm guessing your
figures were using a 64-bit x86 system?.  I haven't tried 32-bit ARM
or smaller CPU's (e.g., mips, et. al.) that might be more likely to be
used on IoT devices, but I'm worried about those too, of course.



Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
On Fri, Dec 16, 2016 at 9:41 PM, George Spelvin
 wrote:
> What are you testing on?  And what input size?  And does "33% improvement"
> mean 4/3 the rate and 3/4 the time?  Or 2/3 the time and 3/2 the rate?

How that I've published my hsiphash implementation to my tree, it
should be possible to conduct the tests back to back with nearly
identical implementation strategies, to remove a potential source of
error.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
On Fri, Dec 16, 2016 at 9:41 PM, George Spelvin
 wrote:
> What are you testing on?  And what input size?  And does "33% improvement"
> mean 4/3 the rate and 3/4 the time?  Or 2/3 the time and 3/2 the rate?

How that I've published my hsiphash implementation to my tree, it
should be possible to conduct the tests back to back with nearly
identical implementation strategies, to remove a potential source of
error.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
Tom Herbert wrote:
> Tested this. Distribution and avalanche effect are still good. Speed
> wise I see about a 33% improvement over siphash (20 nsecs/op versus 32
> nsecs). That's about 3x of jhash speed (7 nsecs). So that might closer
> to a more palatable replacement for jhash. Do we lose any security
> advantages with halfsiphash?

What are you testing on?  And what input size?  And does "33% improvement"
mean 4/3 the rate and 3/4 the time?  Or 2/3 the time and 3/2 the rate?

These are very odd results.  On a 64-bit machine, SipHash should be the
same speed per round, and faster because it hashes more data per round.
(Unless you're hitting some unexpected cache/decode effect due to REX
prefixes.)

On a 32-bit machine (other than ARM, where your results might make sense,
or maybe if you're hashing large amounts of data), the difference should
be larger.

And yes, there is a *significant* security loss.  SipHash is 128 bits
("don't worry about it").  hsiphash is 64 bits, which is known breakable
("worry about it"), so we have to do a careful analysis of the cost of
a successful attack.

As mentioned in the e-mails that just flew by, hsiphash is intended
*only* for 32-bit machines which bog down on full SipHash.  On all 64-bit
machines, it will be implemented as an alias for SipHash and the security
concerns will Just Go Away.

The place where hsiphash is expected to make a big difference is 32-bit
x86.  If you only see 33% difference with "gcc -m32", I'm going to be
very confused.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
Tom Herbert wrote:
> Tested this. Distribution and avalanche effect are still good. Speed
> wise I see about a 33% improvement over siphash (20 nsecs/op versus 32
> nsecs). That's about 3x of jhash speed (7 nsecs). So that might closer
> to a more palatable replacement for jhash. Do we lose any security
> advantages with halfsiphash?

What are you testing on?  And what input size?  And does "33% improvement"
mean 4/3 the rate and 3/4 the time?  Or 2/3 the time and 3/2 the rate?

These are very odd results.  On a 64-bit machine, SipHash should be the
same speed per round, and faster because it hashes more data per round.
(Unless you're hitting some unexpected cache/decode effect due to REX
prefixes.)

On a 32-bit machine (other than ARM, where your results might make sense,
or maybe if you're hashing large amounts of data), the difference should
be larger.

And yes, there is a *significant* security loss.  SipHash is 128 bits
("don't worry about it").  hsiphash is 64 bits, which is known breakable
("worry about it"), so we have to do a careful analysis of the cost of
a successful attack.

As mentioned in the e-mails that just flew by, hsiphash is intended
*only* for 32-bit machines which bog down on full SipHash.  On all 64-bit
machines, it will be implemented as an alias for SipHash and the security
concerns will Just Go Away.

The place where hsiphash is expected to make a big difference is 32-bit
x86.  If you only see 33% difference with "gcc -m32", I'm going to be
very confused.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi JP,

On Fri, Dec 16, 2016 at 2:22 PM, Jean-Philippe Aumasson
 wrote:
> It needs some basic security review, which I'll try do next week (check for
> security margin, optimality of rotation counts, etc.). But after a lot of
> experience with this kind of construction (BLAKE, SipHash, NORX), I'm
> confident it will be safe as it is.

I've implemented it in my siphash kernel branch:

https://git.zx2c4.com/linux-dev/log/?h=siphash

It's the commit that has "HalfSipHash" in the log message. As the
structure is nearly identical to SipHash, there wasn't a lot to
change, and so the same implementation strategy exists for each.

When you've finished your security review and feel good about it, some
test vectors using the same formula (key={0x03020100, 07060504},
input={0x0, 0x1, 0x2, 0x3...}, output=test_vectors) would be nice for
verification.

Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi JP,

On Fri, Dec 16, 2016 at 2:22 PM, Jean-Philippe Aumasson
 wrote:
> It needs some basic security review, which I'll try do next week (check for
> security margin, optimality of rotation counts, etc.). But after a lot of
> experience with this kind of construction (BLAKE, SipHash, NORX), I'm
> confident it will be safe as it is.

I've implemented it in my siphash kernel branch:

https://git.zx2c4.com/linux-dev/log/?h=siphash

It's the commit that has "HalfSipHash" in the log message. As the
structure is nearly identical to SipHash, there wasn't a lot to
change, and so the same implementation strategy exists for each.

When you've finished your security review and feel good about it, some
test vectors using the same formula (key={0x03020100, 07060504},
input={0x0, 0x1, 0x2, 0x3...}, output=test_vectors) would be nice for
verification.

Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
>> On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and
>> should be used always.  Don't even compile the 32-bit code, to prevent
>> anyone accidentally using it, and make hsiphash an alias for siphash.

> Fascinating! Okay. So I'll alias hsiphash to siphash on 64-bit then. I
> like this arrangement.

This is a basic assumption I make in the security analysis below:
on most machines, it's 128-bit-key SipHash everywhere and we can
consider security solved.

Our analysis *only* has to consider 32-bit machines.  My big concern
is home routers, with IoT appliances coming second.  The routers have
severe hardware cost constraints (so limited CPU power), but see a lot
of traffic and need to process (NAT) it.

> That's a nice analysis. Might one conclude from that that hsiphash is
> not useful for our purposes? Or does it still remain useful for
> network facing code?

I think for attacks where the threat is a DoS, it's usable.  The point
is you only have to raise the cost to equal that of a packet flood.
(Just like in electronic warfare, the best you can possibly do is force
the enemy to use broadband jamming.)

Hash collision attacks just aren't that powerful.  The original PoC
was against an application that implemented a hard limit on hash chain
length as a DoS defense, which the attack then exploited to turn it into
a hard DoS.

>> Let me consider your second example above, "secure against local users".
>> I should dig through your patchset and find the details, but what exactly
>> are the consequences of such an attack?  Hasn't a local user already
>> got much better ways to DoS the system?

> For example, an unpriv'd user putting lots of entries in one hash
> bucket for a shared resource that's used by root, like filesystems or
> other lookup tables. If he can cause root to use more of root's cpu
> schedule budget than otherwise in a directed way, then that's a bad
> DoS.

This issue was recently discussed when we redesigned the dcache hash.
Even a successful attack doesn't slow things down all *that* much.

Before you overkill every hash table in the kernel, think about whether
it's a bigger problem than the dcache.  (Hint: it's probably not.)
There's no point armor-plating the side door when the front door was
just upgraded from screen to wood.

>> These days, 32-bit CPUs are for embedded applications: network appliances,
>> TVs, etc.  That means basically single-user.  Even phones are 64 bit.
>> Is this really a threat that needs to be defended against?

> I interpret this to indicate all the more reason to alias hsiphash to
> siphash on 64-bit, and then the problem space collapses in a clear
> way.

Yes, exactly.  

> Right. Hence the need for always using full siphash and not hsiphash
> for sequence numbers, per my earlier email to David.
>
>> I wish we could get away with 64-bit security, but given that the
>> modern internet involves attacks from NSA/Spetssvyaz/3PLA, I agree
>> it's just not enough.
>
> I take this comment to be relavent for the sequence number case.

Yes.

> For hashtables and hashtable flooding, is it still your opinion that
> we will benefit from hsiphash? Or is this final conclusion a rejection
> of hsiphash for that too? We're talking about two different use cases,
> and your email kind of interleaved both into your analysis, so I'm not
> certain so to precisely what your conclusion is for each use case. Can
> you clear up the ambiguity?

My (speaking enerally; I should walk through every hash table you've
converted) opinion is that:

- Hash tables, even network-facing ones, can all use hsiphash as long
  as an attacker can only see collisions, i.e. ((H(x) ^ H(y)) & bits) ==
  0, and the consequences of a successful attack is only more collisions
  (timing).  While the attack is only 2x the cost (two hashes rather than
  one to test a key), the knowledge of the collision is statistical,
  especially for network attackers, which raises the cost of guessing
  beyond an even more brute-force attack.
- When the hash value directly visible (e.g. included in a network
  packet), full SipHash should be the default.
- Syncookies *could* use hsiphash, especially as there are
  two keys in there.  Not sure if we need the performance.
- For TCP ISNs, I'd prefer to use full SipHash.  I know this is
  a very hot path, and if that's a performance bottleneck,
  we can work harder on it.

In particular, TCP ISNs *used* to rotate the key periodically,
limiting the time available to an attacker to perform an
attack before the secret goes stale and is useless.  commit
6e5714eaf77d79ae1c8b47e3e040ff5411b717ec upgraded to md5 and dropped
the key rotation.

If 2x hsiphash is faster than siphash, we could use a double-hashing
system like syncookies.  One 32-bit hash with a permanent key, summed
with a k-bit counter and a (32-k)-bit hash, where the key is rotated
(and the counter incremented) periodically.

The requirement is that the increment rate of the counter hash doesn't

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
>> On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and
>> should be used always.  Don't even compile the 32-bit code, to prevent
>> anyone accidentally using it, and make hsiphash an alias for siphash.

> Fascinating! Okay. So I'll alias hsiphash to siphash on 64-bit then. I
> like this arrangement.

This is a basic assumption I make in the security analysis below:
on most machines, it's 128-bit-key SipHash everywhere and we can
consider security solved.

Our analysis *only* has to consider 32-bit machines.  My big concern
is home routers, with IoT appliances coming second.  The routers have
severe hardware cost constraints (so limited CPU power), but see a lot
of traffic and need to process (NAT) it.

> That's a nice analysis. Might one conclude from that that hsiphash is
> not useful for our purposes? Or does it still remain useful for
> network facing code?

I think for attacks where the threat is a DoS, it's usable.  The point
is you only have to raise the cost to equal that of a packet flood.
(Just like in electronic warfare, the best you can possibly do is force
the enemy to use broadband jamming.)

Hash collision attacks just aren't that powerful.  The original PoC
was against an application that implemented a hard limit on hash chain
length as a DoS defense, which the attack then exploited to turn it into
a hard DoS.

>> Let me consider your second example above, "secure against local users".
>> I should dig through your patchset and find the details, but what exactly
>> are the consequences of such an attack?  Hasn't a local user already
>> got much better ways to DoS the system?

> For example, an unpriv'd user putting lots of entries in one hash
> bucket for a shared resource that's used by root, like filesystems or
> other lookup tables. If he can cause root to use more of root's cpu
> schedule budget than otherwise in a directed way, then that's a bad
> DoS.

This issue was recently discussed when we redesigned the dcache hash.
Even a successful attack doesn't slow things down all *that* much.

Before you overkill every hash table in the kernel, think about whether
it's a bigger problem than the dcache.  (Hint: it's probably not.)
There's no point armor-plating the side door when the front door was
just upgraded from screen to wood.

>> These days, 32-bit CPUs are for embedded applications: network appliances,
>> TVs, etc.  That means basically single-user.  Even phones are 64 bit.
>> Is this really a threat that needs to be defended against?

> I interpret this to indicate all the more reason to alias hsiphash to
> siphash on 64-bit, and then the problem space collapses in a clear
> way.

Yes, exactly.  

> Right. Hence the need for always using full siphash and not hsiphash
> for sequence numbers, per my earlier email to David.
>
>> I wish we could get away with 64-bit security, but given that the
>> modern internet involves attacks from NSA/Spetssvyaz/3PLA, I agree
>> it's just not enough.
>
> I take this comment to be relavent for the sequence number case.

Yes.

> For hashtables and hashtable flooding, is it still your opinion that
> we will benefit from hsiphash? Or is this final conclusion a rejection
> of hsiphash for that too? We're talking about two different use cases,
> and your email kind of interleaved both into your analysis, so I'm not
> certain so to precisely what your conclusion is for each use case. Can
> you clear up the ambiguity?

My (speaking enerally; I should walk through every hash table you've
converted) opinion is that:

- Hash tables, even network-facing ones, can all use hsiphash as long
  as an attacker can only see collisions, i.e. ((H(x) ^ H(y)) & bits) ==
  0, and the consequences of a successful attack is only more collisions
  (timing).  While the attack is only 2x the cost (two hashes rather than
  one to test a key), the knowledge of the collision is statistical,
  especially for network attackers, which raises the cost of guessing
  beyond an even more brute-force attack.
- When the hash value directly visible (e.g. included in a network
  packet), full SipHash should be the default.
- Syncookies *could* use hsiphash, especially as there are
  two keys in there.  Not sure if we need the performance.
- For TCP ISNs, I'd prefer to use full SipHash.  I know this is
  a very hot path, and if that's a performance bottleneck,
  we can work harder on it.

In particular, TCP ISNs *used* to rotate the key periodically,
limiting the time available to an attacker to perform an
attack before the secret goes stale and is useless.  commit
6e5714eaf77d79ae1c8b47e3e040ff5411b717ec upgraded to md5 and dropped
the key rotation.

If 2x hsiphash is faster than siphash, we could use a double-hashing
system like syncookies.  One 32-bit hash with a permanent key, summed
with a k-bit counter and a (32-k)-bit hash, where the key is rotated
(and the counter incremented) periodically.

The requirement is that the increment rate of the counter hash doesn't

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Tom Herbert
On Fri, Dec 16, 2016 at 4:39 AM, Jason A. Donenfeld  wrote:
> Hey JP,
>
> On Fri, Dec 16, 2016 at 9:08 AM, Jean-Philippe Aumasson
>  wrote:
>> Here's a tentative HalfSipHash:
>> https://github.com/veorq/SipHash/blob/halfsiphash/halfsiphash.c
>>
>> Haven't computed the cycle count nor measured its speed.
>
Tested this. Distribution and avalanche effect are still good. Speed
wise I see about a 33% improvement over siphash (20 nsecs/op versus 32
nsecs). That's about 3x of jhash speed (7 nsecs). So that might closer
to a more palatable replacement for jhash. Do we lose any security
advantages with halfsiphash?

Tom

> This is incredible. Really. Wow!
>
> I'll integrate this into my patchset and will write up some
> documentation about when one should be used over the other.
>
> Thanks again. Quite exciting.
>
> Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Tom Herbert
On Fri, Dec 16, 2016 at 4:39 AM, Jason A. Donenfeld  wrote:
> Hey JP,
>
> On Fri, Dec 16, 2016 at 9:08 AM, Jean-Philippe Aumasson
>  wrote:
>> Here's a tentative HalfSipHash:
>> https://github.com/veorq/SipHash/blob/halfsiphash/halfsiphash.c
>>
>> Haven't computed the cycle count nor measured its speed.
>
Tested this. Distribution and avalanche effect are still good. Speed
wise I see about a 33% improvement over siphash (20 nsecs/op versus 32
nsecs). That's about 3x of jhash speed (7 nsecs). So that might closer
to a more palatable replacement for jhash. Do we lose any security
advantages with halfsiphash?

Tom

> This is incredible. Really. Wow!
>
> I'll integrate this into my patchset and will write up some
> documentation about when one should be used over the other.
>
> Thanks again. Quite exciting.
>
> Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi George,

On Fri, Dec 16, 2016 at 6:36 PM, George Spelvin
 wrote:
> A 128-bit output option was added to SipHash after the initial publication;
> this is just the equivalent in 32-bit.
> Personally, I'd put in a comment saying that "there's a 64-bit output
> variant that's not implemented" and punt until someone find a need.

That's a good way to think about it. Okay, I'll do precisely that.

> On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and
> should be used always.  Don't even compile the 32-bit code, to prevent
> anyone accidentally using it, and make hsiphash an alias for siphash.

Fascinating! Okay. So I'll alias hsiphash to siphash on 64-bit then. I
like this arrangement.


> Fortunately, the cost of brute-forcing hash functions can be fairly
> exactly quantified, thanks to bitcoin miners.  It currently takes 2^70
> hashes to create one bitcoin block, worth 25 bitcoins ($19,500).  Thus,
> 2^63 hashes cost $152.
>
> Now, there are two factors that must be considered:
> - That's a very very "wholesale" rate.  That's assuming you're doing
>   large numbers of these and can put in the up-front effort designing
>   silicon ASICs to do the attack.
> - That's for a more difficult hash (double sha-256) than SipHash.
>   That's a constant fator, but a pretty significant one.  If the wholesale
>   assumption holds, that might bring the cost down another 6 or 7 bits,
>   to $1-2 per break.
>
> If you're not the NSA and limited to general-purpose silicon, let's
> assume a state of the art GPU (Radeon HD 7970; AMD GPUs seem do to better
> than nVidia).  The bitcoin mining rate for those is about 700M/second,
> 29.4 bits.  So 63 bits is 152502 GPU-days, divided by some factor
> to account for SipHash's high speed compared to two rounds of SHA-2.
> Call it 1000 GPU-days.
>
> It's very doable, but also very non-trivial.  The question is, wouldn't
> it be cheaper and easier just to do a brute-force flooding DDoS?
>
> (This is why I wish the key size could be tweaked up to 80 bits.
> That would take all these numbers out of the reasonable range.)

That's a nice analysis. Might one conclude from that that hsiphash is
not useful for our purposes? Or does it still remain useful for
network facing code?

> Let me consider your second example above, "secure against local users".
> I should dig through your patchset and find the details, but what exactly
> are the consequences of such an attack?  Hasn't a local user already
> got much better ways to DoS the system?

For example, an unpriv'd user putting lots of entries in one hash
bucket for a shared resource that's used by root, like filesystems or
other lookup tables. If he can cause root to use more of root's cpu
schedule budget than otherwise in a directed way, then that's a bad
DoS.

> The thing to remember is that we're worried only about the combination
> of a *new* Linux kernel (new build or under active maintenance) and a
> 32-bit host.  You'd be hard-pressed to find a *single* machine fitting
> that description which is hosting multiple users or VMs and is not 64-bit.
>
> These days, 32-bit CPUs are for embedded applications: network appliances,
> TVs, etc.  That means basically single-user.  Even phones are 64 bit.
> Is this really a threat that needs to be defended against?

I interpret this to indicate all the more reason to alias hsiphash to
siphash on 64-bit, and then the problem space collapses in a clear
way.

> For your first case, network applications, the additional security
> is definitely attractive.  Syncookies are only a DoS, but sequence
> numbers are a real security issue; they can let you inject data into a
> TCP connection.
> With sequence numbers, large amounts (32 bits) the hash output is
> directly observable.

Right. Hence the need for always using full siphash and not hsiphash
for sequence numbers, per my earlier email to David.

>
> I wish we could get away with 64-bit security, but given that the
> modern internet involves attacks from NSA/Spetssvyaz/3PLA, I agree
> it's just not enough.

I take this comment to be relavent for the sequence number case.

For hashtables and hashtable flooding, is it still your opinion that
we will benefit from hsiphash? Or is this final conclusion a rejection
of hsiphash for that too? We're talking about two different use cases,
and your email kind of interleaved both into your analysis, so I'm not
certain so to precisely what your conclusion is for each use case. Can
you clear up the ambiguity?

Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi George,

On Fri, Dec 16, 2016 at 6:36 PM, George Spelvin
 wrote:
> A 128-bit output option was added to SipHash after the initial publication;
> this is just the equivalent in 32-bit.
> Personally, I'd put in a comment saying that "there's a 64-bit output
> variant that's not implemented" and punt until someone find a need.

That's a good way to think about it. Okay, I'll do precisely that.

> On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and
> should be used always.  Don't even compile the 32-bit code, to prevent
> anyone accidentally using it, and make hsiphash an alias for siphash.

Fascinating! Okay. So I'll alias hsiphash to siphash on 64-bit then. I
like this arrangement.


> Fortunately, the cost of brute-forcing hash functions can be fairly
> exactly quantified, thanks to bitcoin miners.  It currently takes 2^70
> hashes to create one bitcoin block, worth 25 bitcoins ($19,500).  Thus,
> 2^63 hashes cost $152.
>
> Now, there are two factors that must be considered:
> - That's a very very "wholesale" rate.  That's assuming you're doing
>   large numbers of these and can put in the up-front effort designing
>   silicon ASICs to do the attack.
> - That's for a more difficult hash (double sha-256) than SipHash.
>   That's a constant fator, but a pretty significant one.  If the wholesale
>   assumption holds, that might bring the cost down another 6 or 7 bits,
>   to $1-2 per break.
>
> If you're not the NSA and limited to general-purpose silicon, let's
> assume a state of the art GPU (Radeon HD 7970; AMD GPUs seem do to better
> than nVidia).  The bitcoin mining rate for those is about 700M/second,
> 29.4 bits.  So 63 bits is 152502 GPU-days, divided by some factor
> to account for SipHash's high speed compared to two rounds of SHA-2.
> Call it 1000 GPU-days.
>
> It's very doable, but also very non-trivial.  The question is, wouldn't
> it be cheaper and easier just to do a brute-force flooding DDoS?
>
> (This is why I wish the key size could be tweaked up to 80 bits.
> That would take all these numbers out of the reasonable range.)

That's a nice analysis. Might one conclude from that that hsiphash is
not useful for our purposes? Or does it still remain useful for
network facing code?

> Let me consider your second example above, "secure against local users".
> I should dig through your patchset and find the details, but what exactly
> are the consequences of such an attack?  Hasn't a local user already
> got much better ways to DoS the system?

For example, an unpriv'd user putting lots of entries in one hash
bucket for a shared resource that's used by root, like filesystems or
other lookup tables. If he can cause root to use more of root's cpu
schedule budget than otherwise in a directed way, then that's a bad
DoS.

> The thing to remember is that we're worried only about the combination
> of a *new* Linux kernel (new build or under active maintenance) and a
> 32-bit host.  You'd be hard-pressed to find a *single* machine fitting
> that description which is hosting multiple users or VMs and is not 64-bit.
>
> These days, 32-bit CPUs are for embedded applications: network appliances,
> TVs, etc.  That means basically single-user.  Even phones are 64 bit.
> Is this really a threat that needs to be defended against?

I interpret this to indicate all the more reason to alias hsiphash to
siphash on 64-bit, and then the problem space collapses in a clear
way.

> For your first case, network applications, the additional security
> is definitely attractive.  Syncookies are only a DoS, but sequence
> numbers are a real security issue; they can let you inject data into a
> TCP connection.
> With sequence numbers, large amounts (32 bits) the hash output is
> directly observable.

Right. Hence the need for always using full siphash and not hsiphash
for sequence numbers, per my earlier email to David.

>
> I wish we could get away with 64-bit security, but given that the
> modern internet involves attacks from NSA/Spetssvyaz/3PLA, I agree
> it's just not enough.

I take this comment to be relavent for the sequence number case.

For hashtables and hashtable flooding, is it still your opinion that
we will benefit from hsiphash? Or is this final conclusion a rejection
of hsiphash for that too? We're talking about two different use cases,
and your email kind of interleaved both into your analysis, so I'm not
certain so to precisely what your conclusion is for each use case. Can
you clear up the ambiguity?

Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> It appears that hsiphash can produce either 32-bit output or 64-bit
> output, with the output length parameter as part of the hash algorithm
> in there. When I code this for my kernel patchset, I very likely will
> only implement one output length size. Right now I'm leaning toward
> 32-bit.

A 128-bit output option was added to SipHash after the initial publication;
this is just the equivalent in 32-bit.

> - Is this a reasonable choice?

Yes.

> - Are there reasons why hsiphash with 64-bit output would be
>   reasonable? Or will we be fine sticking with 32-bit output only?

Personally, I'd put in a comment saying that "there's a 64-bit output
variant that's not implemented" and punt until someone find a need.

> With both hsiphash and siphash, the division of usage will probably become:
> - Use 64-bit output 128-bit key siphash for keyed RNG-like things,
>   such as syncookies and sequence numbers
> - Use 64-bit output 128-bit key siphash for hashtables that must
>   absolutely be secure to an extremely high bandwidth attacker, such as
>   userspace directly DoSing a kernel hashtable
> - Use 32-bit output 64-bit key hsiphash for quick hashtable functions
>   that still must be secure but do not require as large of a security
>   margin.

On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and
should be used always.  Don't even compile the 32-bit code, to prevent
anyone accidentally using it, and make hsiphash an alias for siphash.

On a 32-bit machine, it's a much trickier case.  I'd be tempted to
use the 32-bit code always, but it needs examination.

Fortunately, the cost of brute-forcing hash functions can be fairly
exactly quantified, thanks to bitcoin miners.  It currently takes 2^70
hashes to create one bitcoin block, worth 25 bitcoins ($19,500).  Thus,
2^63 hashes cost $152.

Now, there are two factors that must be considered:
- That's a very very "wholesale" rate.  That's assuming you're doing
  large numbers of these and can put in the up-front effort designing
  silicon ASICs to do the attack.
- That's for a more difficult hash (double sha-256) than SipHash.
  That's a constant fator, but a pretty significant one.  If the wholesale
  assumption holds, that might bring the cost down another 6 or 7 bits,
  to $1-2 per break.

If you're not the NSA and limited to general-purpose silicon, let's
assume a state of the art GPU (Radeon HD 7970; AMD GPUs seem do to better
than nVidia).  The bitcoin mining rate for those is about 700M/second,
29.4 bits.  So 63 bits is 152502 GPU-days, divided by some factor
to account for SipHash's high speed compared to two rounds of SHA-2.
Call it 1000 GPU-days.

It's very doable, but also very non-trivial.  The question is, wouldn't
it be cheaper and easier just to do a brute-force flooding DDoS?

(This is why I wish the key size could be tweaked up to 80 bits.
That would take all these numbers out of the reasonable range.)


Let me consider your second example above, "secure against local users".
I should dig through your patchset and find the details, but what exactly
are the consequences of such an attack?  Hasn't a local user already
got much better ways to DoS the system?

The thing to remember is that we're worried only about the combination
of a *new* Linux kernel (new build or under active maintenance) and a
32-bit host.  You'd be hard-pressed to find a *single* machine fitting
that description which is hosting multiple users or VMs and is not 64-bit.

These days, 32-bit CPUs are for embedded applications: network appliances,
TVs, etc.  That means basically single-user.  Even phones are 64 bit.
Is this really a threat that needs to be defended against?


For your first case, network applications, the additional security
is definitely attractive.  Syncookies are only a DoS, but sequence
numbers are a real security issue; they can let you inject data into a
TCP connection.

Hash tables are much harder to attack.  The information you get back from
timing probes is statistical, and thus testing a key is more expensive.
With sequence numbers, large amounts (32 bits) the hash output is
directly observable.

I wish we could get away with 64-bit security, but given that the
modern internet involves attacks from NSA/Spetssvyaz/3PLA, I agree
it's just not enough.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> It appears that hsiphash can produce either 32-bit output or 64-bit
> output, with the output length parameter as part of the hash algorithm
> in there. When I code this for my kernel patchset, I very likely will
> only implement one output length size. Right now I'm leaning toward
> 32-bit.

A 128-bit output option was added to SipHash after the initial publication;
this is just the equivalent in 32-bit.

> - Is this a reasonable choice?

Yes.

> - Are there reasons why hsiphash with 64-bit output would be
>   reasonable? Or will we be fine sticking with 32-bit output only?

Personally, I'd put in a comment saying that "there's a 64-bit output
variant that's not implemented" and punt until someone find a need.

> With both hsiphash and siphash, the division of usage will probably become:
> - Use 64-bit output 128-bit key siphash for keyed RNG-like things,
>   such as syncookies and sequence numbers
> - Use 64-bit output 128-bit key siphash for hashtables that must
>   absolutely be secure to an extremely high bandwidth attacker, such as
>   userspace directly DoSing a kernel hashtable
> - Use 32-bit output 64-bit key hsiphash for quick hashtable functions
>   that still must be secure but do not require as large of a security
>   margin.

On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and
should be used always.  Don't even compile the 32-bit code, to prevent
anyone accidentally using it, and make hsiphash an alias for siphash.

On a 32-bit machine, it's a much trickier case.  I'd be tempted to
use the 32-bit code always, but it needs examination.

Fortunately, the cost of brute-forcing hash functions can be fairly
exactly quantified, thanks to bitcoin miners.  It currently takes 2^70
hashes to create one bitcoin block, worth 25 bitcoins ($19,500).  Thus,
2^63 hashes cost $152.

Now, there are two factors that must be considered:
- That's a very very "wholesale" rate.  That's assuming you're doing
  large numbers of these and can put in the up-front effort designing
  silicon ASICs to do the attack.
- That's for a more difficult hash (double sha-256) than SipHash.
  That's a constant fator, but a pretty significant one.  If the wholesale
  assumption holds, that might bring the cost down another 6 or 7 bits,
  to $1-2 per break.

If you're not the NSA and limited to general-purpose silicon, let's
assume a state of the art GPU (Radeon HD 7970; AMD GPUs seem do to better
than nVidia).  The bitcoin mining rate for those is about 700M/second,
29.4 bits.  So 63 bits is 152502 GPU-days, divided by some factor
to account for SipHash's high speed compared to two rounds of SHA-2.
Call it 1000 GPU-days.

It's very doable, but also very non-trivial.  The question is, wouldn't
it be cheaper and easier just to do a brute-force flooding DDoS?

(This is why I wish the key size could be tweaked up to 80 bits.
That would take all these numbers out of the reasonable range.)


Let me consider your second example above, "secure against local users".
I should dig through your patchset and find the details, but what exactly
are the consequences of such an attack?  Hasn't a local user already
got much better ways to DoS the system?

The thing to remember is that we're worried only about the combination
of a *new* Linux kernel (new build or under active maintenance) and a
32-bit host.  You'd be hard-pressed to find a *single* machine fitting
that description which is hosting multiple users or VMs and is not 64-bit.

These days, 32-bit CPUs are for embedded applications: network appliances,
TVs, etc.  That means basically single-user.  Even phones are 64 bit.
Is this really a threat that needs to be defended against?


For your first case, network applications, the additional security
is definitely attractive.  Syncookies are only a DoS, but sequence
numbers are a real security issue; they can let you inject data into a
TCP connection.

Hash tables are much harder to attack.  The information you get back from
timing probes is statistical, and thus testing a key is more expensive.
With sequence numbers, large amounts (32 bits) the hash output is
directly observable.

I wish we could get away with 64-bit security, but given that the
modern internet involves attacks from NSA/Spetssvyaz/3PLA, I agree
it's just not enough.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi David,

On Fri, Dec 16, 2016 at 6:06 PM, David Laight  wrote:
> A 32bit hash would also remove all the issues about the alignment
> of IP addresses (etc) on 64bit systems.

The current replacements of md5_transform with siphash in the v6 patch
series will continue to use the original siphash, since the 128-bit
key is rather important for these kinds of secrets. Additionally,
64-bit siphash is already faster than the md5_transform that it
replaces. So the alignment concerns (now, non-issues; problems have
been solved, I believe) still remain.

Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi David,

On Fri, Dec 16, 2016 at 6:06 PM, David Laight  wrote:
> A 32bit hash would also remove all the issues about the alignment
> of IP addresses (etc) on 64bit systems.

The current replacements of md5_transform with siphash in the v6 patch
series will continue to use the original siphash, since the 128-bit
key is rather important for these kinds of secrets. Additionally,
64-bit siphash is already faster than the md5_transform that it
replaces. So the alignment concerns (now, non-issues; problems have
been solved, I believe) still remain.

Jason


RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread David Laight
From: George Spelvin
> Sent: 15 December 2016 23:29
> > If a halved version of SipHash can bring significant performance boost
> > (with 32b words instead of 64b words) with an acceptable security level
> > (64-bit enough?) then we may design such a version.
> 
> I was thinking if the key could be pushed to 80 bits, that would be nice,
> but honestly 64 bits is fine.  This is DoS protection, and while it's
> possible to brute-force a 64 bit secret, there are more effective (DDoS)
> attacks possible for the same cost.

A 32bit hash would also remove all the issues about the alignment
of IP addresses (etc) on 64bit systems.

> (I'd suggest a name of "HalfSipHash" to convey the reduced security
> effectively.)
> 
> > Regarding output size, are 64 bits sufficient?
> 
> As a replacement for jhash, 32 bits are sufficient.  It's for
> indexing an in-memory hash table on a 32-bit machine.

It is also worth remembering that if the intent is to generate
a hash table index (not a unique fingerprint) you will always
get collisions on the final value.
Randomness could still give overlong hash chains - which might
still need rehashing with a different key.

David




RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread David Laight
From: George Spelvin
> Sent: 15 December 2016 23:29
> > If a halved version of SipHash can bring significant performance boost
> > (with 32b words instead of 64b words) with an acceptable security level
> > (64-bit enough?) then we may design such a version.
> 
> I was thinking if the key could be pushed to 80 bits, that would be nice,
> but honestly 64 bits is fine.  This is DoS protection, and while it's
> possible to brute-force a 64 bit secret, there are more effective (DDoS)
> attacks possible for the same cost.

A 32bit hash would also remove all the issues about the alignment
of IP addresses (etc) on 64bit systems.

> (I'd suggest a name of "HalfSipHash" to convey the reduced security
> effectively.)
> 
> > Regarding output size, are 64 bits sufficient?
> 
> As a replacement for jhash, 32 bits are sufficient.  It's for
> indexing an in-memory hash table on a 32-bit machine.

It is also worth remembering that if the intent is to generate
a hash table index (not a unique fingerprint) you will always
get collisions on the final value.
Randomness could still give overlong hash chains - which might
still need rehashing with a different key.

David




Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi JP & George,

My function names:
- SipHash -> siphash
- HalfSipHash -> hsiphash

It appears that hsiphash can produce either 32-bit output or 64-bit
output, with the output length parameter as part of the hash algorithm
in there. When I code this for my kernel patchset, I very likely will
only implement one output length size. Right now I'm leaning toward
32-bit. Questions:

- Is this a reasonable choice?
- When hsiphash is desired due to its faster speed, are there any
circumstances in which producing a 64-bit output would actually be
useful? Namely, are there any hashtables that could benefit from a
64-bit functions?
- Are there reasons why hsiphash with 64-bit output would be
reasonable? Or will we be fine sticking with 32-bit output only?

With both hsiphash and siphash, the division of usage will probably become:
- Use 64-bit output 128-bit key siphash for keyed RNG-like things,
such as syncookies and sequence numbers
- Use 64-bit output 128-bit key siphash for hashtables that must
absolutely be secure to an extremely high bandwidth attacker, such as
userspace directly DoSing a kernel hashtable
- Use 32-bit output 64-bit key hsiphash for quick hashtable functions
that still must be secure but do not require as large of a security
margin

Sound good?

Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi JP & George,

My function names:
- SipHash -> siphash
- HalfSipHash -> hsiphash

It appears that hsiphash can produce either 32-bit output or 64-bit
output, with the output length parameter as part of the hash algorithm
in there. When I code this for my kernel patchset, I very likely will
only implement one output length size. Right now I'm leaning toward
32-bit. Questions:

- Is this a reasonable choice?
- When hsiphash is desired due to its faster speed, are there any
circumstances in which producing a 64-bit output would actually be
useful? Namely, are there any hashtables that could benefit from a
64-bit functions?
- Are there reasons why hsiphash with 64-bit output would be
reasonable? Or will we be fine sticking with 32-bit output only?

With both hsiphash and siphash, the division of usage will probably become:
- Use 64-bit output 128-bit key siphash for keyed RNG-like things,
such as syncookies and sequence numbers
- Use 64-bit output 128-bit key siphash for hashtables that must
absolutely be secure to an extremely high bandwidth attacker, such as
userspace directly DoSing a kernel hashtable
- Use 32-bit output 64-bit key hsiphash for quick hashtable functions
that still must be secure but do not require as large of a security
margin

Sound good?

Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hey JP,

On Fri, Dec 16, 2016 at 9:08 AM, Jean-Philippe Aumasson
 wrote:
> Here's a tentative HalfSipHash:
> https://github.com/veorq/SipHash/blob/halfsiphash/halfsiphash.c
>
> Haven't computed the cycle count nor measured its speed.

This is incredible. Really. Wow!

I'll integrate this into my patchset and will write up some
documentation about when one should be used over the other.

Thanks again. Quite exciting.

Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hey JP,

On Fri, Dec 16, 2016 at 9:08 AM, Jean-Philippe Aumasson
 wrote:
> Here's a tentative HalfSipHash:
> https://github.com/veorq/SipHash/blob/halfsiphash/halfsiphash.c
>
> Haven't computed the cycle count nor measured its speed.

This is incredible. Really. Wow!

I'll integrate this into my patchset and will write up some
documentation about when one should be used over the other.

Thanks again. Quite exciting.

Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-15 Thread George Spelvin
Jean-Philippe Aumasson wrote:
> If a halved version of SipHash can bring significant performance boost
> (with 32b words instead of 64b words) with an acceptable security level
> (64-bit enough?) then we may design such a version.

It would be fairly significant, a 2x speed benefit on a lot of 32-bit
machines.

First is the fact that a 64-bit SipHash round on a generic 32-bit machine
requires not twice as many instructions, but more than three.

Consider the core SipHash quarter-round operation:
a += b;
b = rotate_left(b, k)
b ^= a

The add and xor are equivalent between 32- and 64-bit rounds; twice the
instructions do twice the work.  (There's a dependency via the carry
bit between the two halves of the add, but it ends up not being on the
critical path even in a superscalar implementation.)

The problem is the rotates.  Although some particularly nice code is
possible on 32-bit ARM due to its support for shift-and-xor operations,
on a generic 32-bit CPU the rotate grows to 6 instructions with a 2-cycle
dependency chain (more in practice because barrel shifters are large and
even quad-issue CPUs can't do 4 shifts per cycle):

temp_lo = b_lo >> (32-k)
temp_hi = b_hi >> (32-k)
b_lo <<= k
b_hi <<= k
b_lo ^= temp_hi
b_hi ^= temp_lo

The resultant instruction counts and (assuming wide issue)
latencies are:

64-bit SipHash  "Half" SipHash
Inst.   Latency Inst.   Latency
 10  33  2  Quarter round
 40  6   12  4  Full round
 80 12   24  8  Two rounds
 82 13   26  9  Mix in one word
 82 13   52 18  Mix in 64 bits
166 26   61 18  Four round finalization + final XOR
248 39  113 36  Hash 64 bits
330 52  165 54  Hash 128 bits
412 65  217 72  Hash 192 bits

While the ideal latencies are actually better for the 64-bit algorithm,
that requires an unrealistic 6+-wide superscalar implementation that's
more than twice as wide as the 64-bit code requires (which is already
optimized for quad-issue).  For a 1- or 2-wide processor, the instruction
counts dominate, and not only does the 64-bit algorithm take 60% more
time to mix in the same number of bytes, but the finalization rounds
bring the ratio to 2:1 for small inputs.

(And I haven't included the possible savings if the input size is an odd
number of 32-bit words, such as networking applications which include
the source/dest port numbers.)


Notes on particular processors:
- x86 can do a 64-bit rotate in 3 instructions and 2 cycles using
  the SHLD/SHRD instructions instead:
movl%b_hi, %temp
shldl   $k, %b_lo, %b_hi
shldl   $k, %temp, %b_lo
  ... but as I mentioned the problem is registers.  SipHash needs 8 32-bit
  words plus at least one temporary, and 32-bit x86 has only 7 available.
  (And compilers can rarely manage to keep more than 6 of them busy.)
- 64-bit SipHash is particularly efficient on 32-bit ARM due to its
  support for shift-and-op instructions.  The 64-bit shift and following
  xor can be done in 4 instructions.  So the only benefit is from the
  reduced finalization.
- Double-width adds cost a little more on CPUs like MIPS and RISC-V without
  condition codes.
- Certain particularly crappy uClinux processors with slow shifts
  (68000, anyone?) really suffer from extra shifts.

One *weakly* requested feature: It might simplify some programming
interfaces if we could use the same key for multiple hash tables with a
1-word "tweak" (e.g. pointer to the hash table, so it could be assumed
non-zero if that helped) to make distinct functions.  That would let us
more safely use a global key for multiple small hash tables without the
need to add code to generate and store key material for each place that
an unkeyed hash is replaced.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-15 Thread George Spelvin
Jean-Philippe Aumasson wrote:
> If a halved version of SipHash can bring significant performance boost
> (with 32b words instead of 64b words) with an acceptable security level
> (64-bit enough?) then we may design such a version.

It would be fairly significant, a 2x speed benefit on a lot of 32-bit
machines.

First is the fact that a 64-bit SipHash round on a generic 32-bit machine
requires not twice as many instructions, but more than three.

Consider the core SipHash quarter-round operation:
a += b;
b = rotate_left(b, k)
b ^= a

The add and xor are equivalent between 32- and 64-bit rounds; twice the
instructions do twice the work.  (There's a dependency via the carry
bit between the two halves of the add, but it ends up not being on the
critical path even in a superscalar implementation.)

The problem is the rotates.  Although some particularly nice code is
possible on 32-bit ARM due to its support for shift-and-xor operations,
on a generic 32-bit CPU the rotate grows to 6 instructions with a 2-cycle
dependency chain (more in practice because barrel shifters are large and
even quad-issue CPUs can't do 4 shifts per cycle):

temp_lo = b_lo >> (32-k)
temp_hi = b_hi >> (32-k)
b_lo <<= k
b_hi <<= k
b_lo ^= temp_hi
b_hi ^= temp_lo

The resultant instruction counts and (assuming wide issue)
latencies are:

64-bit SipHash  "Half" SipHash
Inst.   Latency Inst.   Latency
 10  33  2  Quarter round
 40  6   12  4  Full round
 80 12   24  8  Two rounds
 82 13   26  9  Mix in one word
 82 13   52 18  Mix in 64 bits
166 26   61 18  Four round finalization + final XOR
248 39  113 36  Hash 64 bits
330 52  165 54  Hash 128 bits
412 65  217 72  Hash 192 bits

While the ideal latencies are actually better for the 64-bit algorithm,
that requires an unrealistic 6+-wide superscalar implementation that's
more than twice as wide as the 64-bit code requires (which is already
optimized for quad-issue).  For a 1- or 2-wide processor, the instruction
counts dominate, and not only does the 64-bit algorithm take 60% more
time to mix in the same number of bytes, but the finalization rounds
bring the ratio to 2:1 for small inputs.

(And I haven't included the possible savings if the input size is an odd
number of 32-bit words, such as networking applications which include
the source/dest port numbers.)


Notes on particular processors:
- x86 can do a 64-bit rotate in 3 instructions and 2 cycles using
  the SHLD/SHRD instructions instead:
movl%b_hi, %temp
shldl   $k, %b_lo, %b_hi
shldl   $k, %temp, %b_lo
  ... but as I mentioned the problem is registers.  SipHash needs 8 32-bit
  words plus at least one temporary, and 32-bit x86 has only 7 available.
  (And compilers can rarely manage to keep more than 6 of them busy.)
- 64-bit SipHash is particularly efficient on 32-bit ARM due to its
  support for shift-and-op instructions.  The 64-bit shift and following
  xor can be done in 4 instructions.  So the only benefit is from the
  reduced finalization.
- Double-width adds cost a little more on CPUs like MIPS and RISC-V without
  condition codes.
- Certain particularly crappy uClinux processors with slow shifts
  (68000, anyone?) really suffer from extra shifts.

One *weakly* requested feature: It might simplify some programming
interfaces if we could use the same key for multiple hash tables with a
1-word "tweak" (e.g. pointer to the hash table, so it could be assumed
non-zero if that helped) to make distinct functions.  That would let us
more safely use a global key for multiple small hash tables without the
need to add code to generate and store key material for each place that
an unkeyed hash is replaced.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-15 Thread kbuild test robot
Hi Jason,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.9 next-20161215]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jason-A-Donenfeld/siphash-add-cryptographically-secure-PRF/20161216-092837
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   lib/siphash.c: In function 'siphash_unaligned':
>> lib/siphash.c:123:15: error: 'bytes' undeclared (first use in this function)
 case 1: b |= bytes[0];
  ^
   lib/siphash.c:123:15: note: each undeclared identifier is reported only once 
for each function it appears in

vim +/bytes +123 lib/siphash.c

   117  case 7: b |= ((u64)end[6]) << 48;
   118  case 6: b |= ((u64)end[5]) << 40;
   119  case 5: b |= ((u64)end[4]) << 32;
   120  case 4: b |= get_unaligned_le32(end); break;
   121  case 3: b |= ((u64)end[2]) << 16;
   122  case 2: b |= get_unaligned_le16(end); break;
 > 123  case 1: b |= bytes[0];
   124  }
   125  #endif
   126  v3 ^= b;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-15 Thread kbuild test robot
Hi Jason,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.9 next-20161215]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jason-A-Donenfeld/siphash-add-cryptographically-secure-PRF/20161216-092837
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   lib/siphash.c: In function 'siphash_unaligned':
>> lib/siphash.c:123:15: error: 'bytes' undeclared (first use in this function)
 case 1: b |= bytes[0];
  ^
   lib/siphash.c:123:15: note: each undeclared identifier is reported only once 
for each function it appears in

vim +/bytes +123 lib/siphash.c

   117  case 7: b |= ((u64)end[6]) << 48;
   118  case 6: b |= ((u64)end[5]) << 40;
   119  case 5: b |= ((u64)end[4]) << 32;
   120  case 4: b |= get_unaligned_le32(end); break;
   121  case 3: b |= ((u64)end[2]) << 16;
   122  case 2: b |= get_unaligned_le16(end); break;
 > 123  case 1: b |= bytes[0];
   124  }
   125  #endif
   126  v3 ^= b;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-15 Thread George Spelvin
> If a halved version of SipHash can bring significant performance boost
> (with 32b words instead of 64b words) with an acceptable security level
> (64-bit enough?) then we may design such a version.

I was thinking if the key could be pushed to 80 bits, that would be nice,
but honestly 64 bits is fine.  This is DoS protection, and while it's
possible to brute-force a 64 bit secret, there are more effective (DDoS)
attacks possible for the same cost.

(I'd suggest a name of "HalfSipHash" to convey the reduced security
effectively.)

> Regarding output size, are 64 bits sufficient?

As a replacement for jhash, 32 bits are sufficient.  It's for
indexing an in-memory hash table on a 32-bit machine.


(When you're done thinking about this, as a matter of personal interest
I'd love a hash expert's opinion on
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2a18da7a9c7886f1c7307f8d3f23f24318583f03
and
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8387ff2577eb9ed245df9a39947f66976c6bcd02
which is a non-cryptographic hash function of novel design that's
inspired by SipHash.)


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-15 Thread George Spelvin
> If a halved version of SipHash can bring significant performance boost
> (with 32b words instead of 64b words) with an acceptable security level
> (64-bit enough?) then we may design such a version.

I was thinking if the key could be pushed to 80 bits, that would be nice,
but honestly 64 bits is fine.  This is DoS protection, and while it's
possible to brute-force a 64 bit secret, there are more effective (DDoS)
attacks possible for the same cost.

(I'd suggest a name of "HalfSipHash" to convey the reduced security
effectively.)

> Regarding output size, are 64 bits sufficient?

As a replacement for jhash, 32 bits are sufficient.  It's for
indexing an in-memory hash table on a 32-bit machine.


(When you're done thinking about this, as a matter of personal interest
I'd love a hash expert's opinion on
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2a18da7a9c7886f1c7307f8d3f23f24318583f03
and
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8387ff2577eb9ed245df9a39947f66976c6bcd02
which is a non-cryptographic hash function of novel design that's
inspired by SipHash.)


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-15 Thread George Spelvin
> While SipHash is extremely fast for a cryptographically secure function,
> it is likely a tiny bit slower than the insecure jhash, and so replacements
> will be evaluated on a case-by-case basis based on whether or not the
> difference in speed is negligible and whether or not the current jhash usage
> poses a real security risk.

To quantify that, jhash is 27 instructions per 12 bytes of input, with a
dependency path length of 13 instructions.  (24/12 in __jash_mix, plus
3/1 for adding the input to the state.) The final add + __jhash_final
is 24 instructions with a path length of 15, which is close enough for
this handwaving.  Call it 18n instructions and 8n cycles for 8n bytes.

SipHash (on a 64-bit machine) is 14 instructions with a dependency path
length of 4 *per round*.  Two rounds per 8 bytes, plus plus two adds
and one cycle per input word, plus four rounds to finish makes 30n+46
instructions and 9n+16 cycles for 8n bytes.

So *if* you have a 64-bit 4-way superscalar machine, it's not that much
slower once it gets going, but the four-round finalization is quite
noticeable for short inputs.

For typical kernel input lengths "within a factor of 2" is
probably more accurate than "a tiny bit".

You lose a factor of 2 if you machine is 2-way or non-superscalar,
and a second factor of 2 if it's a 32-bit machine.

I mention this because there are a lot of home routers and other netwoek
appliances running Linux on 32-bit ARM and MIPS processors.  For those,
it's a factor of *eight*, which is a lot more than "a tiny bit".

The real killer is if you don't have enough registers; SipHash performs
horribly on i386 because it uses more state than i386 has registers.

(If i386 performance is desired, you might ask Jean-Philippe for some
rotate constants for a 32-bit variant with 64 bits of key.  Note that
SipHash's security proof requires that key length + input length is
strictly less than the state size, so for a 4x32-bit variant, while
you could stretch the key length a little, you'd have a hard limit at
95 bits.)


A second point, the final XOR in SipHash is either a (very minor) design
mistake, or an opportunity for optimization, depending on how you look
at it.  Look at the end of the function:

>+  SIPROUND;
>+  SIPROUND;
>+  return (v0 ^ v1) ^ (v2 ^ v3);

Expanding that out, you get:
+   v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32);
+   v2 += v3; v3 = rol64(v3, 16); v3 ^= v2;
+   v0 += v3; v3 = rol64(v3, 21); v3 ^= v0;
+   v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32);
+   return v0 ^ v1 ^ v2 ^ v3;

Since the final XOR includes both v0 and v3, it's undoing the "v3 ^= v0"
two lines earlier, so the value of v0 doesn't matter after its XOR into
v1 on line one.

The final SIPROUND and return can then be optimized to

+   v0 += v1; v1 = rol64(v1, 13); v1 ^= v0;
+   v2 += v3; v3 = rol64(v3, 16); v3 ^= v2;
+   v3 = rol64(v3, 21);
+   v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32);
+   return v1 ^ v2 ^ v3;

A 32-bit implementation could further tweak the 4 instructions of
v1 ^= v2; v2 = rol64(v2, 32); v1 ^= v2;

gcc 6.2.1 -O3 compiles it to basically:
v1.low ^= v2.low;
v1.high ^= v2.high;
v1.low ^= v2.high;
v1.high ^= v2.low;
but it could be written as:
v2.low ^= v2.high;
v1.low ^= v2.low;
v1.high ^= v2.low;

Alternatively, if it's for private use only (key not shared with other
systems), a slightly stronger variant would "return v1 ^ v3;".
(The final swap of v2 is dead code, but a compiler can spot that easily.)


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-15 Thread George Spelvin
> While SipHash is extremely fast for a cryptographically secure function,
> it is likely a tiny bit slower than the insecure jhash, and so replacements
> will be evaluated on a case-by-case basis based on whether or not the
> difference in speed is negligible and whether or not the current jhash usage
> poses a real security risk.

To quantify that, jhash is 27 instructions per 12 bytes of input, with a
dependency path length of 13 instructions.  (24/12 in __jash_mix, plus
3/1 for adding the input to the state.) The final add + __jhash_final
is 24 instructions with a path length of 15, which is close enough for
this handwaving.  Call it 18n instructions and 8n cycles for 8n bytes.

SipHash (on a 64-bit machine) is 14 instructions with a dependency path
length of 4 *per round*.  Two rounds per 8 bytes, plus plus two adds
and one cycle per input word, plus four rounds to finish makes 30n+46
instructions and 9n+16 cycles for 8n bytes.

So *if* you have a 64-bit 4-way superscalar machine, it's not that much
slower once it gets going, but the four-round finalization is quite
noticeable for short inputs.

For typical kernel input lengths "within a factor of 2" is
probably more accurate than "a tiny bit".

You lose a factor of 2 if you machine is 2-way or non-superscalar,
and a second factor of 2 if it's a 32-bit machine.

I mention this because there are a lot of home routers and other netwoek
appliances running Linux on 32-bit ARM and MIPS processors.  For those,
it's a factor of *eight*, which is a lot more than "a tiny bit".

The real killer is if you don't have enough registers; SipHash performs
horribly on i386 because it uses more state than i386 has registers.

(If i386 performance is desired, you might ask Jean-Philippe for some
rotate constants for a 32-bit variant with 64 bits of key.  Note that
SipHash's security proof requires that key length + input length is
strictly less than the state size, so for a 4x32-bit variant, while
you could stretch the key length a little, you'd have a hard limit at
95 bits.)


A second point, the final XOR in SipHash is either a (very minor) design
mistake, or an opportunity for optimization, depending on how you look
at it.  Look at the end of the function:

>+  SIPROUND;
>+  SIPROUND;
>+  return (v0 ^ v1) ^ (v2 ^ v3);

Expanding that out, you get:
+   v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32);
+   v2 += v3; v3 = rol64(v3, 16); v3 ^= v2;
+   v0 += v3; v3 = rol64(v3, 21); v3 ^= v0;
+   v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32);
+   return v0 ^ v1 ^ v2 ^ v3;

Since the final XOR includes both v0 and v3, it's undoing the "v3 ^= v0"
two lines earlier, so the value of v0 doesn't matter after its XOR into
v1 on line one.

The final SIPROUND and return can then be optimized to

+   v0 += v1; v1 = rol64(v1, 13); v1 ^= v0;
+   v2 += v3; v3 = rol64(v3, 16); v3 ^= v2;
+   v3 = rol64(v3, 21);
+   v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32);
+   return v1 ^ v2 ^ v3;

A 32-bit implementation could further tweak the 4 instructions of
v1 ^= v2; v2 = rol64(v2, 32); v1 ^= v2;

gcc 6.2.1 -O3 compiles it to basically:
v1.low ^= v2.low;
v1.high ^= v2.high;
v1.low ^= v2.high;
v1.high ^= v2.low;
but it could be written as:
v2.low ^= v2.high;
v1.low ^= v2.low;
v1.high ^= v2.low;

Alternatively, if it's for private use only (key not shared with other
systems), a slightly stronger variant would "return v1 ^ v3;".
(The final swap of v2 is dead code, but a compiler can spot that easily.)