[PATCH v3] ubsan: Avoid unnecessary 128-bit shifts

2019-04-04 Thread George Spelvin
adding inefficient code. Note that the shifts in (unsigned, and by a compile-time constant amount) are simpler and generated inline. Signed-off-by: George Spelvin Acked-By: Andrey Ryabinin Feedback-from: Rasmus Villemoes Cc: linux-s...@vger.kernel.org Cc: Heiko Carstens --- include/linux/bitops.h

[PATCH v2] ubsan: Avoid unnecessary 128-bit shifts

2019-04-02 Thread George Spelvin
eople from adding inefficient code. Note that the shifts in (unsigned, and by a compile-time constant amount) are simpler and generated inline. Signed-off-by: George Spelvin Acked-By: Andrey Ryabinin Cc: linux-s...@vger.kernel.org Cc: Heiko Carstens --- lib/ubsan.c | 4 ++-- 1 file

[PATCH] ubsan: Avoid unnecessary 128-bit shifts

2019-04-01 Thread George Spelvin
s handling. Option 1: Submit this for 5.2 and turn on INT128 for s390 in 5.3. Option 2: Let the arches cherry-pick this patch pre-5.2. My preference is for option 2, but that requires permission from ubsan's owner. Andrey? Signed-off-by: George Spelvin Cc: Andrey Ryabinin Cc: linux-s...@vger.ker

[PATCH 6/5] lib/list_sort: Fix GCC warning

2019-03-28 Thread George Spelvin
actually help GCC 8.3's code generation, so just omit it. Signed-off-by: George Spelvin Fixes: 820c81be5237 ("lib/list_sort: simplify and remove MAX_LIST_LENGTH_BITS") Cc: Andrew Morton Cc: Stephen Rothwell --- lib/list_sort.c | 14 +- 1 file changed, 9 insertions(+), 5

Re: [RESEND PATCH v2 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-28 Thread George Spelvin
Than you all for the build warning report. The warning produced by gcc versions 4.9, 7.3, 8.1, whatever version Stephen Rothwell is running, is: lib/list_sort.c:17:36: warning: __pure__ attribute ignored [-Wattributes] The relevant code is: 10: /* 11: * By declaring the compare function with

Re: [RFC PATCH v2] random: add get_random_max() function

2019-03-28 Thread George Spelvin
By the way, I just noticed that my fallback get_random_max64() algorithm (if there's no __int128 type) is completely broken and will need rewriting. It would work if I rejected and regenerated the high half if the low half were out of range, but that's not what it does. The worst case is a range

Re: [RFC PATCH] random: add get_random_max() function

2019-03-24 Thread George Spelvin
P.S. The cited paper calls your algorithm the "OpenBSD algorithm" and has a bunch of benchmarks comparing it to others in Fisher-Yates shuffles of sizes 1e3..1e9. Including all overhead (base PRNG, shuffle), it's 3x slower for 32-bit operations and 8x slower for 64-bit up to arrays of size 1e6,

Re: [RFC PATCH] random: add get_random_max() function

2019-03-24 Thread George Spelvin
On Sun, 24 Mar 2019 at 21:47:50 +0100, Jason A. Donenfeld wrote: > I generally use a slightly simpler algorithm in various different projects: > > //[0, bound) > static unsigned long random_bounded(unsigned long bound) > { >unsigned long ret; >const unsigned long max_mod_bound =

[RESEND PATCH v2 3/5] lib/sort: Avoid indirect calls to built-in swap

2019-03-19 Thread George Spelvin
, there are a few extra predicted branches on the code path.) This actually *shrinks* the x86-64 code, because it inlines the various swap functions inside do_swap, eliding function prologues & epilogues. x86-64 code size 767 -> 703 bytes (-64) Signed-off-by: George Spelvin Acked-by:

[RESEND PATCH v2 0/5] lib/sort & lib/list_sort: faster and smaller

2019-03-19 Thread George Spelvin
G_TEST_SORT and CONFIG_TEST_LIST_SORT, but I confess I haven't rebooted since the last round of minor edits to quell checkpatch. I figure there will be at least one round of comments and final testing. George Spelvin (5): lib/sort: Make swap functions more generic lib/sort: Use more efficient

[RESEND PATCH v2 1/5] lib/sort: Make swap functions more generic

2019-03-19 Thread George Spelvin
the great tradition of bikeshedding, the names were by far the most contentious issue during review of this patch series. x86-64 code size 872 -> 886 bytes (+14) Signed-off-by: George Spelvin Acked-by: Andrey Abramov Feedback-from: Andy Shevchenko Feedback-from: Rasmus Villemoes Feedbac

[RESEND PATCH v2 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-19 Thread George Spelvin
x86-64 code size 1086 -> 739 bytes (-347) (Yes, I see checkpatch complaining about no space after comma in "__attribute__((nonnull(2,3,4,5)))". Checkpatch is wrong.) Signed-off-by: George Spelvin Acked-by: Andrey Abramov Feedback-from: Rasmus Villemoes Feedback-from: Andy Shevchenk

[RESEND PATCH v2 5/5] lib/list_sort: Optimize number of calls to comparison function

2019-03-19 Thread George Spelvin
.psu.edu/viewdoc/summary?doi=10.1.1.4.5380 Queue-Mergesort Mordecai J. Golin, Robert Sedgewick Information Processing Letters, 48(5):253--259, 10 December 1993 https://doi.org/10.1016/0020-0190(93)90088-q https://sci-hub.tw/10.1016/0020-0190(93)90088-Q Signed-off-by: George Spelvin Acked-by

[RESEND PATCH v2 2/5] lib/sort: Use more efficient bottom-up heapsort variant

2019-03-19 Thread George Spelvin
tes (-118) (I see the checkpatch complaint about "else if (n -= size)". The alternative is significantly uglier.) Signed-off-by: George Spelvin Acked-by: Andrey Abramov --- lib/sort.c | 110 ++--- 1 file changed, 80 insertions(+), 30 delet

[PATCH v2 0/5] lib/sort & lib/list_sort: faster and smaller

2019-03-15 Thread George Spelvin
least one round of comments and final testing. George Spelvin (5): lib/sort: Make swap functions more generic lib/sort: Use more efficient bottom-up heapsort variant lib/sort: Avoid indirect calls to built-in swap lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS lib/list_sort: O

[PATCH v2 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-15 Thread George Spelvin
x86-64 code size 1086 -> 739 bytes (-347) (Yes, I see checkpatch complaining about no space after comma in "__attribute__((nonnull(2,3,4,5)))". Checkpatch is wrong.) Signed-off-by: George Spelvin Acked-by: Andrey Abramov Feedback-from: Rasmus Villemoes Feedback-from: Andy Shevchenk

[PATCH v2 3/5] lib/sort: Avoid indirect calls to built-in swap

2019-03-15 Thread George Spelvin
, there are a few extra predicted branches on the code path.) This actually *shrinks* the x86-64 code, because it inlines the various swap functions inside do_swap, eliding function prologues & epilogues. x86-64 code size 767 -> 703 bytes (-64) Signed-off-by: George Spelvin Acked-by:

[PATCH v2 1/5] lib/sort: Make swap functions more generic

2019-03-15 Thread George Spelvin
the great tradition of bikeshedding, the names were by far the most contentious issue during review of this patch series. x86-64 code size 872 -> 886 bytes (+14) Signed-off-by: George Spelvin Acked-by: Andrey Abramov Feedback-from: Andy Shevchenko Feedback-from: Rasmus Villemoes Feedbac

[PATCH v2 2/5] lib/sort: Use more efficient bottom-up heapsort variant

2019-03-15 Thread George Spelvin
tes (-118) (I see the checkpatch complaint about "else if (n -= size)". The alternative is significantly uglier.) Signed-off-by: George Spelvin Acked-by: Andrey Abramov --- lib/sort.c | 110 ++--- 1 file changed, 80 insertions(+), 30 delet

[PATCH v2 5/5] lib/list_sort: Optimize number of calls to comparison function

2019-03-15 Thread George Spelvin
.psu.edu/viewdoc/summary?doi=10.1.1.4.5380 Queue-Mergesort Mordecai J. Golin, Robert Sedgewick Information Processing Letters, 48(5):253--259, 10 December 1993 https://doi.org/10.1016/0020-0190(93)90088-q https://sci-hub.tw/10.1016/0020-0190(93)90088-Q Signed-off-by: George Spelvin Acked-by

Re: [PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-15 Thread George Spelvin
Indeed, thanks to everyone who commented. The extra conceptual complexity and reduced readbility is Just Not Worth It. v2 (and final, as far as I'm concerned) follows.

Re: [PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-15 Thread George Spelvin
On Fri, 15 Mar 2019 at 13:57:05 +0100, Geert Uytterhoeven wrote: > On Fri, Mar 15, 2019 at 11:23 AM George Spelvin wrote: >> On Fri, 15 Mar 2019 at 09:20:58 +0100, Geert Uytterhoeven wrote: >>> On Fri, Mar 15, 2019 at 5:33 AM George Spelvin wrote: >>>> One question

Re: [PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-15 Thread George Spelvin
On Fri, 15 Mar 2019 at 09:20:58 +0100, Geert Uytterhoeven wrote: > On Fri, Mar 15, 2019 at 5:33 AM George Spelvin wrote: >> On Thu, 14 Mar 2019 at 11:10:41 +0200, Andy Shevchenko wrote: >>> On Tue, Mar 05, 2019 at 03:06:44AM +0000, George Spelvin wrote: >>>> +

Re: [PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-14 Thread George Spelvin
On Thu, 14 Mar 2019 at 11:10:41 +0200, Andy Shevchenko wrote: > On Tue, Mar 05, 2019 at 03:06:44AM +0000, George Spelvin wrote: >> +for (bit = 1; count & bit; bit <<= 1) { >> +cur = merge(priv, (cmp_func)cmp, pending, cur); >> +

Re: [PATCH 1/5] lib/sort: Make swap functions more generic

2019-03-14 Thread George Spelvin
>> swap_bytes / swap_4byte_words / swap_8byte_words >> swap_bytes / swap_ints / swap_longs >> swap_1 / swap_4 / swap_8 >> Pistols at dawn? On Thu, 14 Mar 2019 at 22:59:55 +0300, Andrey Abramov wrote: > Yes, in my opinion, swap_bytes / swap_ints / swap_longs are the > most readable because we have

Re: [PATCH 1/5] lib/sort: Make swap functions more generic

2019-03-14 Thread George Spelvin
On Thu, 14 Mar 2019 at 11:41:26 +0100, Geert Uytterhoeven wrote: > On Thu, Mar 14, 2019 at 11:10 AM George Spelvin wrote: >> On Sat, 09 Mar 2019 at 23:19:49 +0300, Andrey Abramov wrote: >>>> How about one of: >>>> swap_bytes / swap_ints / swap_longs >>&g

Re: [PATCH 1/5] lib/sort: Make swap functions more generic

2019-03-14 Thread George Spelvin
On Sat, 09 Mar 2019 at 23:19:49 +0300, Andrey Abramov wrote: >> Although I'm thinking of: >> >> static bool __attribute_const__ >> is_aligned(const void *base, size_t size, unsigned char align) >> { >> unsigned char lsbits = (unsigned char)size; >> >> (void)base; >> #ifndef

Re: [PATCH 1/5] lib/sort: Make swap functions more generic

2019-03-14 Thread George Spelvin
On Sat, 09 Mar 2019 at 23:19:49 +0300, Andrey Abramov wrote: >> Although I'm thinking of: >> >> static bool __attribute_const__ >> is_aligned(const void *base, size_t size, unsigned char align) >> { >> unsigned char lsbits = (unsigned char)size; >> >> (void)base; >> #ifndef

Re: [PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-14 Thread George Spelvin
On Thu, 14 Mar 2019 at 11:10:41 +0200, Andy Shevchenko wrote: > On Tue, Mar 05, 2019 at 03:06:44AM +0000, George Spelvin wrote: >> +/* Do merges corresponding to set lsbits in count */ > >> +for (bit = 1; count & bit; bit <<= 1) { >> +

Re: [PATCH 5/5] lib/list_sort: Optimize number of calls to comparison function

2019-03-13 Thread George Spelvin
On Thu, 14 Mar 2019 at 00:28:16 +0100, Rasmus Villemoes wrote: > On 05/03/2019 06.58, George Spelvin wrote: >> This patch avoids badly unbalanced merges on unlucky input sizes. >> It slightly increases the code size, but saves an average of 0.2*n >> calls to cmp(). >>

Re: [PATCH 2/5] lib/sort: Use more efficient bottom-up heapsort variant

2019-03-13 Thread George Spelvin
On Wed, 13 Mar 2019 at 23:29:40 +0100, Rasmus Villemoes wrote: > On 21/02/2019 09.21, George Spelvin wrote: >> +/** >> + * parent - given the offset of the child, find the offset of the parent. >> + * @i: the offset of the heap element whose parent is sought. No

Re: [PATCH 1/5] lib/sort: Make swap functions more generic

2019-03-13 Thread George Spelvin
Thank you for your thoughtful comments! On Wed, 13 Mar 2019 at 23:23:44 +0100, Rasmus Villemoes wrote: > On 21/02/2019 07.30, George Spelvin wrote: > + * @align: required aignment (typically 4 or 8) > > typo aLignment Thanks; fixed! >> + * Returns true if elements can be copie

Re: [PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-10 Thread George Spelvin
Rasmus Villemoes wrote: > On 05/03/2019 04.06, George Spelvin wrote: >> + * (Actually, it is always called with @a being the element which was >> + * originally first, so it is not necessary to to distinguish the @a < @b >> + * and @a == @b cases; the return value

Re: [PATCH 1/5] lib/sort: Make swap functions more generic

2019-03-09 Thread George Spelvin
Andy Shevchenko wrote: > Shouldn't simple memcpy cover these case for both 32- and 64-bit > architectures? Speaking of replacing swap with copying via temporary buffers, one idea that did come to mind was avoiding swap for sufficiently small objects. Every sift-down is actually a circular

[PATCH 1/5] lib/sort: Make swap functions more generic

2019-03-08 Thread George Spelvin
, very few users of sort() sort pointers (or pointer-sized objects); most sort structures containing at least two words. (E.g. drivers/acpi/fan.c:acpi_fan_get_fps() sorts an array of 40-byte struct acpi_fan_fps.) x86-64 code size 872 -> 885 bytes (+8) Signed-off-by: George Spelvin --- lib

[PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-08 Thread George Spelvin
attribute__((nonnull(2,3,4,5)))". Checkpatch is wrong.) Signed-off-by: George Spelvin --- include/linux/list_sort.h | 1 + lib/list_sort.c | 152 -- 2 files changed, 96 insertions(+), 57 deletions(-) diff --git a/include/linux/list_sort.h

[PATCH 5/5] lib/list_sort: Optimize number of calls to comparison function

2019-03-08 Thread George Spelvin
eerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.5380 Queue-Mergesort Mordecai J. Golin, Robert Sedgewick Information Processing Letters, 48(5):253--259, 10 December 1993 https://doi.org/10.1016/0020-0190(93)90088-q https://sci-hub.tw/10.1016/0020-0190(93)90088-Q Signed-off-by: George S

[PATCH 2/5] lib/sort: Use more efficient bottom-up heapsort variant

2019-03-08 Thread George Spelvin
see the checkpatch complaint about "else if (n -= size)". The alternative is significantly uglier.) Signed-off-by: George Spelvin --- lib/sort.c | 102 +++-- 1 file changed, 75 insertions(+), 27 deletions(-) diff --git a/lib/sort.c b/lib

[PATCH 3/5] lib/sort: Avoid indirect calls to built-in swap

2019-03-08 Thread George Spelvin
, there are a few additional (highly predictable) conditional branches on the code path.) This actually *shrinks* the x86-64 code, because it inlines the various swap functions inside do_swap, eliding function prologues & epilogues. x86-64 code size 770 -> 709 bytes (-61) Signed-off-by:

[PATCH 0/5] lib/sort & lib/list_sort: faster and smaller

2019-03-08 Thread George Spelvin
be at least one round of comments and final testing. George Spelvin (5): lib/sort: Make swap functions more generic lib/sort: Use more efficient bottom-up heapsort variant lib/sort: Avoid indirect calls to built-in swap lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS lib/l

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-28 Thread George Spelvin
d. > On Fri, 2016-12-23 at 20:17 -0500, George Spelvin wrote: >> For example, two mix-backs of 64 bits gives you 65 bit security, not 128. >> (Because each mixback can be guessed and verified separately.) > Exactly, but the full reseed after running out of entropy is strong > en

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-28 Thread George Spelvin
d. > On Fri, 2016-12-23 at 20:17 -0500, George Spelvin wrote: >> For example, two mix-backs of 64 bits gives you 65 bit security, not 128. >> (Because each mixback can be guessed and verified separately.) > Exactly, but the full reseed after running out of entropy is strong > en

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread George Spelvin
Hannes Frederic Sowa wrote: > On 24.12.2016 00:39, George Spelvin wrote: >> We just finished discussing why 8 bytes isn't enough. If you only >> feed back 8 bytes, an attacker who can do 2^64 computation can find it >> (by guessing and computing forward to verify t

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread George Spelvin
Hannes Frederic Sowa wrote: > On 24.12.2016 00:39, George Spelvin wrote: >> We just finished discussing why 8 bytes isn't enough. If you only >> feed back 8 bytes, an attacker who can do 2^64 computation can find it >> (by guessing and computing forward to verify t

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread George Spelvin
Hannes Frederic Sowa wrote: > In general this looks good, but bitbuffer needs to be protected from > concurrent access, thus needing at least some atomic instruction and > disabling of interrupts for the locking if done outside of > get_random_long. Thus I liked your previous approach more where

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread George Spelvin
Hannes Frederic Sowa wrote: > In general this looks good, but bitbuffer needs to be protected from > concurrent access, thus needing at least some atomic instruction and > disabling of interrupts for the locking if done outside of > get_random_long. Thus I liked your previous approach more where

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread George Spelvin
(Cc: list trimmed slightly as the topic is wandering a bit.) Hannes Frederic Sowa wrote: > On Thu, 2016-12-22 at 19:07 -0500, George Spelvin wrote: >> Adding might_lock() annotations will improve coverage a lot. > > Might be hard to find the correct lock we take later down

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread George Spelvin
(Cc: list trimmed slightly as the topic is wandering a bit.) Hannes Frederic Sowa wrote: > On Thu, 2016-12-22 at 19:07 -0500, George Spelvin wrote: >> Adding might_lock() annotations will improve coverage a lot. > > Might be hard to find the correct lock we take later down

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin
Hannes Frederic Sowa wrote: > A lockdep test should still be done. ;) Adding might_lock() annotations will improve coverage a lot. > Yes, that does look nice indeed. Accounting for bits instead of bytes > shouldn't be a huge problem either. Maybe it gets a bit more verbose in > case you can't

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin
Hannes Frederic Sowa wrote: > A lockdep test should still be done. ;) Adding might_lock() annotations will improve coverage a lot. > Yes, that does look nice indeed. Accounting for bits instead of bytes > shouldn't be a huge problem either. Maybe it gets a bit more verbose in > case you can't

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin
> I do tend to like Ted's version in which we use batched > get_random_bytes() output. If it's fast enough, it's simpler and lets > us get the full strength of a CSPRNG. With the ChaCha20 generator, that's fine, although note that this abandons anti-backtracking entirely. It also takes locks,

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin
> I do tend to like Ted's version in which we use batched > get_random_bytes() output. If it's fast enough, it's simpler and lets > us get the full strength of a CSPRNG. With the ChaCha20 generator, that's fine, although note that this abandons anti-backtracking entirely. It also takes locks,

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin
> Having slept on this, I like it less. The problem is that a > backtracking attacker doesn't just learn H(random seed || entropy_0 || > secret || ...) -- they learn the internal state of the hash function > that generates that value. This probably breaks any attempt to apply > security

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin
> Having slept on this, I like it less. The problem is that a > backtracking attacker doesn't just learn H(random seed || entropy_0 || > secret || ...) -- they learn the internal state of the hash function > that generates that value. This probably breaks any attempt to apply > security

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin
> True, but it's called get_random_int(), and it seems like making it > stronger, especially if the performance cost is low to zero, is a good > thing. If it's cheap enough, I don't mind. But it's documented as being marginal-quality, used where speed is more important. In particular, it's

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin
> True, but it's called get_random_int(), and it seems like making it > stronger, especially if the performance cost is low to zero, is a good > thing. If it's cheap enough, I don't mind. But it's documented as being marginal-quality, used where speed is more important. In particular, it's

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-21 Thread George Spelvin
Andy Lutomirski wrote: > I don't even think it needs that. This is just adding a > non-destructive final operation, right? It is, but the problem is that SipHash is intended for *small* inputs, so the standard implementations aren't broken into init/update/final functions. There's just one big

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-21 Thread George Spelvin
Andy Lutomirski wrote: > I don't even think it needs that. This is just adding a > non-destructive final operation, right? It is, but the problem is that SipHash is intended for *small* inputs, so the standard implementations aren't broken into init/update/final functions. There's just one big

Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
> Plus the benchmark was bogus anyway, and when I built a more specific > harness -- actually comparing the TCP sequence number functions -- > SipHash was faster than MD5, even on register starved x86. So I think > we're fine and this chapter of the discussion can come to a close, in > order to

Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
> Plus the benchmark was bogus anyway, and when I built a more specific > harness -- actually comparing the TCP sequence number functions -- > SipHash was faster than MD5, even on register starved x86. So I think > we're fine and this chapter of the discussion can come to a close, in > order to

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
As a separate message, to disentangle the threads, I'd like to talk about get_random_long(). After some thinking, I still like the "state-preserving" construct that's equivalent to the current MD5 code. Yes, we could just do siphash(current_cpu || per_cpu_counter, global_key), but it's nice to

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
As a separate message, to disentangle the threads, I'd like to talk about get_random_long(). After some thinking, I still like the "state-preserving" construct that's equivalent to the current MD5 code. Yes, we could just do siphash(current_cpu || per_cpu_counter, global_key), but it's nice to

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
Theodore Ts'o wrote: > On Wed, Dec 21, 2016 at 01:37:51PM -0500, George Spelvin wrote: >> SipHash annihilates the competition on 64-bit superscalar hardware. >> SipHash dominates the field on 64-bit in-order hardware. >> SipHash wins easily on 32-bit hardware *

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
Theodore Ts'o wrote: > On Wed, Dec 21, 2016 at 01:37:51PM -0500, George Spelvin wrote: >> SipHash annihilates the competition on 64-bit superscalar hardware. >> SipHash dominates the field on 64-bit in-order hardware. >> SipHash wins easily on 32-bit hardware *

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
Eric Dumazet wrote: > Now I am quite confused. > > George said : >> Cycles per byte on 1024 bytes of data: >> Pentium Core 2 Ivy >> 4 Duo Bridge >> SipHash-2-4 38.9 8.3 5.8 >> HalfSipHash-2-4 12.7 4.5 3.2 >> MD5

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
Eric Dumazet wrote: > Now I am quite confused. > > George said : >> Cycles per byte on 1024 bytes of data: >> Pentium Core 2 Ivy >> 4 Duo Bridge >> SipHash-2-4 38.9 8.3 5.8 >> HalfSipHash-2-4 12.7 4.5 3.2 >> MD5

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
Linus wrote: >> How much does kernel_fpu_begin()/kernel_fpu_end() cost? > > It's now better than it used to be, but it's absolutely disastrous > still. We're talking easily many hundreds of cycles. Under some loads, > thousands. I think I've been thoroughly dissuaded, but just to clarify one

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
Linus wrote: >> How much does kernel_fpu_begin()/kernel_fpu_end() cost? > > It's now better than it used to be, but it's absolutely disastrous > still. We're talking easily many hundreds of cycles. Under some loads, > thousands. I think I've been thoroughly dissuaded, but just to clarify one

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
Actually, DJB just made a very relevant suggestion. As I've mentioned, the 32-bit performance problems are an x86-specific problem. ARM does very well, and other processors aren't bad at all. SipHash fits very nicely (and runs very fast) in the MMX registers. They're 64 bits, and there are 8

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
Actually, DJB just made a very relevant suggestion. As I've mentioned, the 32-bit performance problems are an x86-specific problem. ARM does very well, and other processors aren't bad at all. SipHash fits very nicely (and runs very fast) in the MMX registers. They're 64 bits, and there are 8

Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin
Eric Dumazet wrote: > On Tue, 2016-12-20 at 22:28 -0500, George Spelvin wrote: >> Cycles per byte on 1024 bytes of data: >> Pentium Core 2 Ivy >> 4 Duo Bridge >> SipHash-2-4 38.9 8.3 5.8 >>

Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin
Eric Dumazet wrote: > On Tue, 2016-12-20 at 22:28 -0500, George Spelvin wrote: >> Cycles per byte on 1024 bytes of data: >> Pentium Core 2 Ivy >> 4 Duo Bridge >> SipHash-2-4 38.9 8.3 5.8 >>

Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin
> I do not see why SipHash, if faster than MD5 and more secure, would be a > problem. Because on 32-bit x86, it's slower. Cycles per byte on 1024 bytes of data: Pentium Core 2 Ivy 4 Duo Bridge SipHash-2-4 38.9 8.3 5.8

Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin
> I do not see why SipHash, if faster than MD5 and more secure, would be a > problem. Because on 32-bit x86, it's slower. Cycles per byte on 1024 bytes of data: Pentium Core 2 Ivy 4 Duo Bridge SipHash-2-4 38.9 8.3 5.8

Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin
Theodore Ts'o wrote: > On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote: >> 1) Anything that requires actual long-term security will use >> SipHash2-4, with the 64-bit output and the 128-bit key. This includes >> things like TCP sequence numbers. This seems pretty uncontroversial

Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin
Theodore Ts'o wrote: > On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote: >> 1) Anything that requires actual long-term security will use >> SipHash2-4, with the 64-bit output and the 128-bit key. This includes >> things like TCP sequence numbers. This seems pretty uncontroversial

RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-19 Thread George Spelvin
David Laight wrote: > From: George Spelvin ... >> uint32_t >> hsiphash24(char const *in, size_t len, uint32_t const key[2]) >> { >> uint32_t c = key[0]; >> uint32_t d = key[1]; >> uint32_t a = 0x6c796765 ^ 0x736f6d65; >> uint

RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-19 Thread George Spelvin
David Laight wrote: > From: George Spelvin ... >> uint32_t >> hsiphash24(char const *in, size_t len, uint32_t const key[2]) >> { >> uint32_t c = key[0]; >> uint32_t d = key[1]; >> uint32_t a = 0x6c796765 ^ 0x736f6d65; >> uint

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread George Spelvin
To follow up on my comments that your benchmark results were peculiar, here's my benchmark code. It just computes the hash of all n*(n+1)/2 possible non-empty substrings of a buffer of n (called "max" below) bytes. "cpb" is "cycles per byte". (The average length is (n+2)/3, c.f.

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread George Spelvin
To follow up on my comments that your benchmark results were peculiar, here's my benchmark code. It just computes the hash of all n*(n+1)/2 possible non-empty substrings of a buffer of n (called "max" below) bytes. "cpb" is "cycles per byte". (The average length is (n+2)/3, c.f.

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread George Spelvin
BTW, here's some SipHash code I wrote for Linux a while ago. My target application was ext4 directory hashing, resulting in different implementation choices, although I still think that a rolled-up implementation like this is reasonable. Reducing I-cache impact speeds up the calling code. One

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread George Spelvin
BTW, here's some SipHash code I wrote for Linux a while ago. My target application was ext4 directory hashing, resulting in different implementation choices, although I still think that a rolled-up implementation like this is reasonable. Reducing I-cache impact speeds up the calling code. One

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> I already did this. Check my branch. Do you think it should return "u32" (as you currently have it) or "unsigned long"? I thought the latter, since it doesn't cost any more and makes more > I wonder if this could also lead to a similar aliasing > with arch_get_random_int, since I'm pretty

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> I already did this. Check my branch. Do you think it should return "u32" (as you currently have it) or "unsigned long"? I thought the latter, since it doesn't cost any more and makes more > I wonder if this could also lead to a similar aliasing > with arch_get_random_int, since I'm pretty

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> 64-bit security for an RNG is not reasonable even with rekeying. No no > no. Considering we already have a massive speed-up here with the > secure version, there's zero reason to start weakening the security > because we're trigger happy with our benchmarks. No no no. Just to clarify, I was

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> 64-bit security for an RNG is not reasonable even with rekeying. No no > no. Considering we already have a massive speed-up here with the > secure version, there's zero reason to start weakening the security > because we're trigger happy with our benchmarks. No no no. Just to clarify, I was

Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
An idea I had which mght be useful: You could perhaps save two rounds in siphash_*u64. The final word with the length (called "b" in your implementation) only needs to be there if the input is variable-sized. If every use of a given key is of a fixed-size input, you don't need a length suffix.

Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
An idea I had which mght be useful: You could perhaps save two rounds in siphash_*u64. The final word with the length (called "b" in your implementation) only needs to be there if the input is variable-sized. If every use of a given key is of a fixed-size input, you don't need a length suffix.

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> What should we do with get_random_int() and get_random_long()? In > some cases it's being used in performance sensitive areas, and where > anti-DoS protection might be enough. In others, maybe not so much. This is tricky. The entire get_random_int() structure is an abuse of the hash function

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> What should we do with get_random_int() and get_random_long()? In > some cases it's being used in performance sensitive areas, and where > anti-DoS protection might be enough. In others, maybe not so much. This is tricky. The entire get_random_int() structure is an abuse of the hash function

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
Jason A. Donenfeld wrote: > I saw that jiffies addition in there and was wondering what it was all > about. It's currently added _after_ the siphash input, not before, to > keep with how the old algorithm worked. I'm not sure if this is > correct or if there's something wrong with that, as I

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
Jason A. Donenfeld wrote: > I saw that jiffies addition in there and was wondering what it was all > about. It's currently added _after_ the siphash input, not before, to > keep with how the old algorithm worked. I'm not sure if this is > correct or if there's something wrong with that, as I

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
Tom Herbert wrote: > Tested this. Distribution and avalanche effect are still good. Speed > wise I see about a 33% improvement over siphash (20 nsecs/op versus 32 > nsecs). That's about 3x of jhash speed (7 nsecs). So that might closer > to a more palatable replacement for jhash. Do we lose any

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
Tom Herbert wrote: > Tested this. Distribution and avalanche effect are still good. Speed > wise I see about a 33% improvement over siphash (20 nsecs/op versus 32 > nsecs). That's about 3x of jhash speed (7 nsecs). So that might closer > to a more palatable replacement for jhash. Do we lose any

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
>> On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and >> should be used always. Don't even compile the 32-bit code, to prevent >> anyone accidentally using it, and make hsiphash an alias for siphash. > Fascinating! Okay. So I'll alias hsiphash to siphash on 64-bit then. I >

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
>> On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and >> should be used always. Don't even compile the 32-bit code, to prevent >> anyone accidentally using it, and make hsiphash an alias for siphash. > Fascinating! Okay. So I'll alias hsiphash to siphash on 64-bit then. I >

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> It appears that hsiphash can produce either 32-bit output or 64-bit > output, with the output length parameter as part of the hash algorithm > in there. When I code this for my kernel patchset, I very likely will > only implement one output length size. Right now I'm leaning toward > 32-bit. A

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> It appears that hsiphash can produce either 32-bit output or 64-bit > output, with the output length parameter as part of the hash algorithm > in there. When I code this for my kernel patchset, I very likely will > only implement one output length size. Right now I'm leaning toward > 32-bit. A

RE: [PATCH v5 2/4] siphash: add Nu{32,64} helpers

2016-12-16 Thread George Spelvin
Jason A. Donenfeld wrote: > Isn't that equivalent to: > v0 = key[0]; > v1 = key[1]; > v2 = key[0] ^ (0x736f6d6570736575ULL ^ 0x646f72616e646f6dULL); > v3 = key[1] ^ (0x646f72616e646f6dULL ^ 0x7465646279746573ULL); (Pre-XORing key[] with the first two constants which, if

RE: [PATCH v5 2/4] siphash: add Nu{32,64} helpers

2016-12-16 Thread George Spelvin
Jason A. Donenfeld wrote: > Isn't that equivalent to: > v0 = key[0]; > v1 = key[1]; > v2 = key[0] ^ (0x736f6d6570736575ULL ^ 0x646f72616e646f6dULL); > v3 = key[1] ^ (0x646f72616e646f6dULL ^ 0x7465646279746573ULL); (Pre-XORing key[] with the first two constants which, if

  1   2   3   4   5   6   7   8   9   10   >