adding inefficient code. Note that the
shifts in (unsigned, and by a compile-time constant amount)
are simpler and generated inline.
Signed-off-by: George Spelvin
Acked-By: Andrey Ryabinin
Feedback-from: Rasmus Villemoes
Cc: linux-s...@vger.kernel.org
Cc: Heiko Carstens
---
include/linux/bitops.h
eople from adding inefficient code. Note that the
shifts in (unsigned, and by a compile-time constant amount)
are simpler and generated inline.
Signed-off-by: George Spelvin
Acked-By: Andrey Ryabinin
Cc: linux-s...@vger.kernel.org
Cc: Heiko Carstens
---
lib/ubsan.c | 4 ++--
1 file
s handling.
Option 1: Submit this for 5.2 and turn on INT128 for s390 in 5.3.
Option 2: Let the arches cherry-pick this patch pre-5.2.
My preference is for option 2, but that requires permission from
ubsan's owner. Andrey?
Signed-off-by: George Spelvin
Cc: Andrey Ryabinin
Cc: linux-s...@vger.ker
actually help GCC 8.3's code generation, so just
omit it.
Signed-off-by: George Spelvin
Fixes: 820c81be5237 ("lib/list_sort: simplify and remove MAX_LIST_LENGTH_BITS")
Cc: Andrew Morton
Cc: Stephen Rothwell
---
lib/list_sort.c | 14 +-
1 file changed, 9 insertions(+), 5
Than you all for the build warning report.
The warning produced by gcc versions 4.9, 7.3, 8.1, whatever version
Stephen Rothwell is running, is:
lib/list_sort.c:17:36: warning: __pure__ attribute ignored [-Wattributes]
The relevant code is:
10: /*
11: * By declaring the compare function with
By the way, I just noticed that my fallback get_random_max64()
algorithm (if there's no __int128 type) is completely broken and
will need rewriting.
It would work if I rejected and regenerated the high half
if the low half were out of range, but that's not what it does.
The worst case is a range
P.S. The cited paper calls your algorithm the "OpenBSD algorithm"
and has a bunch of benchmarks comparing it to others in Fisher-Yates
shuffles of sizes 1e3..1e9.
Including all overhead (base PRNG, shuffle), it's 3x slower for
32-bit operations and 8x slower for 64-bit up to arrays of size
1e6,
On Sun, 24 Mar 2019 at 21:47:50 +0100, Jason A. Donenfeld wrote:
> I generally use a slightly simpler algorithm in various different projects:
>
> //[0, bound)
> static unsigned long random_bounded(unsigned long bound)
> {
>unsigned long ret;
>const unsigned long max_mod_bound =
, there are a few extra predicted branches on the code path.)
This actually *shrinks* the x86-64 code, because it inlines the various
swap functions inside do_swap, eliding function prologues & epilogues.
x86-64 code size 767 -> 703 bytes (-64)
Signed-off-by: George Spelvin
Acked-by:
G_TEST_SORT and
CONFIG_TEST_LIST_SORT, but I confess I haven't rebooted since
the last round of minor edits to quell checkpatch. I figure there
will be at least one round of comments and final testing.
George Spelvin (5):
lib/sort: Make swap functions more generic
lib/sort: Use more efficient
the great tradition of bikeshedding, the names were
by far the most contentious issue during review of this patch series.
x86-64 code size 872 -> 886 bytes (+14)
Signed-off-by: George Spelvin
Acked-by: Andrey Abramov
Feedback-from: Andy Shevchenko
Feedback-from: Rasmus Villemoes
Feedbac
x86-64 code size 1086 -> 739 bytes (-347)
(Yes, I see checkpatch complaining about no space after comma in
"__attribute__((nonnull(2,3,4,5)))". Checkpatch is wrong.)
Signed-off-by: George Spelvin
Acked-by: Andrey Abramov
Feedback-from: Rasmus Villemoes
Feedback-from: Andy Shevchenk
.psu.edu/viewdoc/summary?doi=10.1.1.4.5380
Queue-Mergesort
Mordecai J. Golin, Robert Sedgewick
Information Processing Letters, 48(5):253--259, 10 December 1993
https://doi.org/10.1016/0020-0190(93)90088-q
https://sci-hub.tw/10.1016/0020-0190(93)90088-Q
Signed-off-by: George Spelvin
Acked-by
tes (-118)
(I see the checkpatch complaint about "else if (n -= size)".
The alternative is significantly uglier.)
Signed-off-by: George Spelvin
Acked-by: Andrey Abramov
---
lib/sort.c | 110 ++---
1 file changed, 80 insertions(+), 30 delet
least one round of comments and final testing.
George Spelvin (5):
lib/sort: Make swap functions more generic
lib/sort: Use more efficient bottom-up heapsort variant
lib/sort: Avoid indirect calls to built-in swap
lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS
lib/list_sort: O
x86-64 code size 1086 -> 739 bytes (-347)
(Yes, I see checkpatch complaining about no space after comma in
"__attribute__((nonnull(2,3,4,5)))". Checkpatch is wrong.)
Signed-off-by: George Spelvin
Acked-by: Andrey Abramov
Feedback-from: Rasmus Villemoes
Feedback-from: Andy Shevchenk
, there are a few extra predicted branches on the code path.)
This actually *shrinks* the x86-64 code, because it inlines the various
swap functions inside do_swap, eliding function prologues & epilogues.
x86-64 code size 767 -> 703 bytes (-64)
Signed-off-by: George Spelvin
Acked-by:
the great tradition of bikeshedding, the names were
by far the most contentious issue during review of this patch series.
x86-64 code size 872 -> 886 bytes (+14)
Signed-off-by: George Spelvin
Acked-by: Andrey Abramov
Feedback-from: Andy Shevchenko
Feedback-from: Rasmus Villemoes
Feedbac
tes (-118)
(I see the checkpatch complaint about "else if (n -= size)".
The alternative is significantly uglier.)
Signed-off-by: George Spelvin
Acked-by: Andrey Abramov
---
lib/sort.c | 110 ++---
1 file changed, 80 insertions(+), 30 delet
.psu.edu/viewdoc/summary?doi=10.1.1.4.5380
Queue-Mergesort
Mordecai J. Golin, Robert Sedgewick
Information Processing Letters, 48(5):253--259, 10 December 1993
https://doi.org/10.1016/0020-0190(93)90088-q
https://sci-hub.tw/10.1016/0020-0190(93)90088-Q
Signed-off-by: George Spelvin
Acked-by
Indeed, thanks to everyone who commented. The extra conceptual
complexity and reduced readbility is Just Not Worth It.
v2 (and final, as far as I'm concerned) follows.
On Fri, 15 Mar 2019 at 13:57:05 +0100, Geert Uytterhoeven wrote:
> On Fri, Mar 15, 2019 at 11:23 AM George Spelvin wrote:
>> On Fri, 15 Mar 2019 at 09:20:58 +0100, Geert Uytterhoeven wrote:
>>> On Fri, Mar 15, 2019 at 5:33 AM George Spelvin wrote:
>>>> One question
On Fri, 15 Mar 2019 at 09:20:58 +0100, Geert Uytterhoeven wrote:
> On Fri, Mar 15, 2019 at 5:33 AM George Spelvin wrote:
>> On Thu, 14 Mar 2019 at 11:10:41 +0200, Andy Shevchenko wrote:
>>> On Tue, Mar 05, 2019 at 03:06:44AM +0000, George Spelvin wrote:
>>>> +
On Thu, 14 Mar 2019 at 11:10:41 +0200, Andy Shevchenko wrote:
> On Tue, Mar 05, 2019 at 03:06:44AM +0000, George Spelvin wrote:
>> +for (bit = 1; count & bit; bit <<= 1) {
>> +cur = merge(priv, (cmp_func)cmp, pending, cur);
>> +
>> swap_bytes / swap_4byte_words / swap_8byte_words
>> swap_bytes / swap_ints / swap_longs
>> swap_1 / swap_4 / swap_8
>> Pistols at dawn?
On Thu, 14 Mar 2019 at 22:59:55 +0300, Andrey Abramov wrote:
> Yes, in my opinion, swap_bytes / swap_ints / swap_longs are the
> most readable because we have
On Thu, 14 Mar 2019 at 11:41:26 +0100, Geert Uytterhoeven wrote:
> On Thu, Mar 14, 2019 at 11:10 AM George Spelvin wrote:
>> On Sat, 09 Mar 2019 at 23:19:49 +0300, Andrey Abramov wrote:
>>>> How about one of:
>>>> swap_bytes / swap_ints / swap_longs
>>&g
On Sat, 09 Mar 2019 at 23:19:49 +0300, Andrey Abramov wrote:
>> Although I'm thinking of:
>>
>> static bool __attribute_const__
>> is_aligned(const void *base, size_t size, unsigned char align)
>> {
>> unsigned char lsbits = (unsigned char)size;
>>
>> (void)base;
>> #ifndef
On Sat, 09 Mar 2019 at 23:19:49 +0300, Andrey Abramov wrote:
>> Although I'm thinking of:
>>
>> static bool __attribute_const__
>> is_aligned(const void *base, size_t size, unsigned char align)
>> {
>> unsigned char lsbits = (unsigned char)size;
>>
>> (void)base;
>> #ifndef
On Thu, 14 Mar 2019 at 11:10:41 +0200, Andy Shevchenko wrote:
> On Tue, Mar 05, 2019 at 03:06:44AM +0000, George Spelvin wrote:
>> +/* Do merges corresponding to set lsbits in count */
>
>> +for (bit = 1; count & bit; bit <<= 1) {
>> +
On Thu, 14 Mar 2019 at 00:28:16 +0100, Rasmus Villemoes wrote:
> On 05/03/2019 06.58, George Spelvin wrote:
>> This patch avoids badly unbalanced merges on unlucky input sizes.
>> It slightly increases the code size, but saves an average of 0.2*n
>> calls to cmp().
>>
On Wed, 13 Mar 2019 at 23:29:40 +0100, Rasmus Villemoes wrote:
> On 21/02/2019 09.21, George Spelvin wrote:
>> +/**
>> + * parent - given the offset of the child, find the offset of the parent.
>> + * @i: the offset of the heap element whose parent is sought. No
Thank you for your thoughtful comments!
On Wed, 13 Mar 2019 at 23:23:44 +0100, Rasmus Villemoes wrote:
> On 21/02/2019 07.30, George Spelvin wrote:
> + * @align: required aignment (typically 4 or 8)
>
> typo aLignment
Thanks; fixed!
>> + * Returns true if elements can be copie
Rasmus Villemoes wrote:
> On 05/03/2019 04.06, George Spelvin wrote:
>> + * (Actually, it is always called with @a being the element which was
>> + * originally first, so it is not necessary to to distinguish the @a < @b
>> + * and @a == @b cases; the return value
Andy Shevchenko wrote:
> Shouldn't simple memcpy cover these case for both 32- and 64-bit
> architectures?
Speaking of replacing swap with copying via temporary buffers, one
idea that did come to mind was avoiding swap for sufficiently small
objects.
Every sift-down is actually a circular
, very few users of sort() sort pointers (or pointer-sized
objects); most sort structures containing at least two words.
(E.g. drivers/acpi/fan.c:acpi_fan_get_fps() sorts an array of 40-byte
struct acpi_fan_fps.)
x86-64 code size 872 -> 885 bytes (+8)
Signed-off-by: George Spelvin
---
lib
attribute__((nonnull(2,3,4,5)))". Checkpatch is wrong.)
Signed-off-by: George Spelvin
---
include/linux/list_sort.h | 1 +
lib/list_sort.c | 152 --
2 files changed, 96 insertions(+), 57 deletions(-)
diff --git a/include/linux/list_sort.h
eerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.5380
Queue-Mergesort
Mordecai J. Golin, Robert Sedgewick
Information Processing Letters, 48(5):253--259, 10 December 1993
https://doi.org/10.1016/0020-0190(93)90088-q
https://sci-hub.tw/10.1016/0020-0190(93)90088-Q
Signed-off-by: George S
see the checkpatch complaint about "else if (n -= size)".
The alternative is significantly uglier.)
Signed-off-by: George Spelvin
---
lib/sort.c | 102 +++--
1 file changed, 75 insertions(+), 27 deletions(-)
diff --git a/lib/sort.c b/lib
, there are a few additional (highly predictable) conditional
branches on the code path.)
This actually *shrinks* the x86-64 code, because it inlines the various
swap functions inside do_swap, eliding function prologues & epilogues.
x86-64 code size 770 -> 709 bytes (-61)
Signed-off-by:
be at least one round of comments and final testing.
George Spelvin (5):
lib/sort: Make swap functions more generic
lib/sort: Use more efficient bottom-up heapsort variant
lib/sort: Avoid indirect calls to built-in swap
lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS
lib/l
d.
> On Fri, 2016-12-23 at 20:17 -0500, George Spelvin wrote:
>> For example, two mix-backs of 64 bits gives you 65 bit security, not 128.
>> (Because each mixback can be guessed and verified separately.)
> Exactly, but the full reseed after running out of entropy is strong
> en
d.
> On Fri, 2016-12-23 at 20:17 -0500, George Spelvin wrote:
>> For example, two mix-backs of 64 bits gives you 65 bit security, not 128.
>> (Because each mixback can be guessed and verified separately.)
> Exactly, but the full reseed after running out of entropy is strong
> en
Hannes Frederic Sowa wrote:
> On 24.12.2016 00:39, George Spelvin wrote:
>> We just finished discussing why 8 bytes isn't enough. If you only
>> feed back 8 bytes, an attacker who can do 2^64 computation can find it
>> (by guessing and computing forward to verify t
Hannes Frederic Sowa wrote:
> On 24.12.2016 00:39, George Spelvin wrote:
>> We just finished discussing why 8 bytes isn't enough. If you only
>> feed back 8 bytes, an attacker who can do 2^64 computation can find it
>> (by guessing and computing forward to verify t
Hannes Frederic Sowa wrote:
> In general this looks good, but bitbuffer needs to be protected from
> concurrent access, thus needing at least some atomic instruction and
> disabling of interrupts for the locking if done outside of
> get_random_long. Thus I liked your previous approach more where
Hannes Frederic Sowa wrote:
> In general this looks good, but bitbuffer needs to be protected from
> concurrent access, thus needing at least some atomic instruction and
> disabling of interrupts for the locking if done outside of
> get_random_long. Thus I liked your previous approach more where
(Cc: list trimmed slightly as the topic is wandering a bit.)
Hannes Frederic Sowa wrote:
> On Thu, 2016-12-22 at 19:07 -0500, George Spelvin wrote:
>> Adding might_lock() annotations will improve coverage a lot.
>
> Might be hard to find the correct lock we take later down
(Cc: list trimmed slightly as the topic is wandering a bit.)
Hannes Frederic Sowa wrote:
> On Thu, 2016-12-22 at 19:07 -0500, George Spelvin wrote:
>> Adding might_lock() annotations will improve coverage a lot.
>
> Might be hard to find the correct lock we take later down
Hannes Frederic Sowa wrote:
> A lockdep test should still be done. ;)
Adding might_lock() annotations will improve coverage a lot.
> Yes, that does look nice indeed. Accounting for bits instead of bytes
> shouldn't be a huge problem either. Maybe it gets a bit more verbose in
> case you can't
Hannes Frederic Sowa wrote:
> A lockdep test should still be done. ;)
Adding might_lock() annotations will improve coverage a lot.
> Yes, that does look nice indeed. Accounting for bits instead of bytes
> shouldn't be a huge problem either. Maybe it gets a bit more verbose in
> case you can't
> I do tend to like Ted's version in which we use batched
> get_random_bytes() output. If it's fast enough, it's simpler and lets
> us get the full strength of a CSPRNG.
With the ChaCha20 generator, that's fine, although note that this abandons
anti-backtracking entirely.
It also takes locks,
> I do tend to like Ted's version in which we use batched
> get_random_bytes() output. If it's fast enough, it's simpler and lets
> us get the full strength of a CSPRNG.
With the ChaCha20 generator, that's fine, although note that this abandons
anti-backtracking entirely.
It also takes locks,
> Having slept on this, I like it less. The problem is that a
> backtracking attacker doesn't just learn H(random seed || entropy_0 ||
> secret || ...) -- they learn the internal state of the hash function
> that generates that value. This probably breaks any attempt to apply
> security
> Having slept on this, I like it less. The problem is that a
> backtracking attacker doesn't just learn H(random seed || entropy_0 ||
> secret || ...) -- they learn the internal state of the hash function
> that generates that value. This probably breaks any attempt to apply
> security
> True, but it's called get_random_int(), and it seems like making it
> stronger, especially if the performance cost is low to zero, is a good
> thing.
If it's cheap enough, I don't mind. But it's documented as being
marginal-quality, used where speed is more important.
In particular, it's
> True, but it's called get_random_int(), and it seems like making it
> stronger, especially if the performance cost is low to zero, is a good
> thing.
If it's cheap enough, I don't mind. But it's documented as being
marginal-quality, used where speed is more important.
In particular, it's
Andy Lutomirski wrote:
> I don't even think it needs that. This is just adding a
> non-destructive final operation, right?
It is, but the problem is that SipHash is intended for *small* inputs,
so the standard implementations aren't broken into init/update/final
functions.
There's just one big
Andy Lutomirski wrote:
> I don't even think it needs that. This is just adding a
> non-destructive final operation, right?
It is, but the problem is that SipHash is intended for *small* inputs,
so the standard implementations aren't broken into init/update/final
functions.
There's just one big
> Plus the benchmark was bogus anyway, and when I built a more specific
> harness -- actually comparing the TCP sequence number functions --
> SipHash was faster than MD5, even on register starved x86. So I think
> we're fine and this chapter of the discussion can come to a close, in
> order to
> Plus the benchmark was bogus anyway, and when I built a more specific
> harness -- actually comparing the TCP sequence number functions --
> SipHash was faster than MD5, even on register starved x86. So I think
> we're fine and this chapter of the discussion can come to a close, in
> order to
As a separate message, to disentangle the threads, I'd like to
talk about get_random_long().
After some thinking, I still like the "state-preserving" construct
that's equivalent to the current MD5 code. Yes, we could just do
siphash(current_cpu || per_cpu_counter, global_key), but it's nice to
As a separate message, to disentangle the threads, I'd like to
talk about get_random_long().
After some thinking, I still like the "state-preserving" construct
that's equivalent to the current MD5 code. Yes, we could just do
siphash(current_cpu || per_cpu_counter, global_key), but it's nice to
Theodore Ts'o wrote:
> On Wed, Dec 21, 2016 at 01:37:51PM -0500, George Spelvin wrote:
>> SipHash annihilates the competition on 64-bit superscalar hardware.
>> SipHash dominates the field on 64-bit in-order hardware.
>> SipHash wins easily on 32-bit hardware *
Theodore Ts'o wrote:
> On Wed, Dec 21, 2016 at 01:37:51PM -0500, George Spelvin wrote:
>> SipHash annihilates the competition on 64-bit superscalar hardware.
>> SipHash dominates the field on 64-bit in-order hardware.
>> SipHash wins easily on 32-bit hardware *
Eric Dumazet wrote:
> Now I am quite confused.
>
> George said :
>> Cycles per byte on 1024 bytes of data:
>> Pentium Core 2 Ivy
>> 4 Duo Bridge
>> SipHash-2-4 38.9 8.3 5.8
>> HalfSipHash-2-4 12.7 4.5 3.2
>> MD5
Eric Dumazet wrote:
> Now I am quite confused.
>
> George said :
>> Cycles per byte on 1024 bytes of data:
>> Pentium Core 2 Ivy
>> 4 Duo Bridge
>> SipHash-2-4 38.9 8.3 5.8
>> HalfSipHash-2-4 12.7 4.5 3.2
>> MD5
Linus wrote:
>> How much does kernel_fpu_begin()/kernel_fpu_end() cost?
>
> It's now better than it used to be, but it's absolutely disastrous
> still. We're talking easily many hundreds of cycles. Under some loads,
> thousands.
I think I've been thoroughly dissuaded, but just to clarify one
Linus wrote:
>> How much does kernel_fpu_begin()/kernel_fpu_end() cost?
>
> It's now better than it used to be, but it's absolutely disastrous
> still. We're talking easily many hundreds of cycles. Under some loads,
> thousands.
I think I've been thoroughly dissuaded, but just to clarify one
Actually, DJB just made a very relevant suggestion.
As I've mentioned, the 32-bit performance problems are an x86-specific
problem. ARM does very well, and other processors aren't bad at all.
SipHash fits very nicely (and runs very fast) in the MMX registers.
They're 64 bits, and there are 8
Actually, DJB just made a very relevant suggestion.
As I've mentioned, the 32-bit performance problems are an x86-specific
problem. ARM does very well, and other processors aren't bad at all.
SipHash fits very nicely (and runs very fast) in the MMX registers.
They're 64 bits, and there are 8
Eric Dumazet wrote:
> On Tue, 2016-12-20 at 22:28 -0500, George Spelvin wrote:
>> Cycles per byte on 1024 bytes of data:
>> Pentium Core 2 Ivy
>> 4 Duo Bridge
>> SipHash-2-4 38.9 8.3 5.8
>>
Eric Dumazet wrote:
> On Tue, 2016-12-20 at 22:28 -0500, George Spelvin wrote:
>> Cycles per byte on 1024 bytes of data:
>> Pentium Core 2 Ivy
>> 4 Duo Bridge
>> SipHash-2-4 38.9 8.3 5.8
>>
> I do not see why SipHash, if faster than MD5 and more secure, would be a
> problem.
Because on 32-bit x86, it's slower.
Cycles per byte on 1024 bytes of data:
Pentium Core 2 Ivy
4 Duo Bridge
SipHash-2-4 38.9 8.3 5.8
> I do not see why SipHash, if faster than MD5 and more secure, would be a
> problem.
Because on 32-bit x86, it's slower.
Cycles per byte on 1024 bytes of data:
Pentium Core 2 Ivy
4 Duo Bridge
SipHash-2-4 38.9 8.3 5.8
Theodore Ts'o wrote:
> On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote:
>> 1) Anything that requires actual long-term security will use
>> SipHash2-4, with the 64-bit output and the 128-bit key. This includes
>> things like TCP sequence numbers. This seems pretty uncontroversial
Theodore Ts'o wrote:
> On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote:
>> 1) Anything that requires actual long-term security will use
>> SipHash2-4, with the 64-bit output and the 128-bit key. This includes
>> things like TCP sequence numbers. This seems pretty uncontroversial
David Laight wrote:
> From: George Spelvin
...
>> uint32_t
>> hsiphash24(char const *in, size_t len, uint32_t const key[2])
>> {
>> uint32_t c = key[0];
>> uint32_t d = key[1];
>> uint32_t a = 0x6c796765 ^ 0x736f6d65;
>> uint
David Laight wrote:
> From: George Spelvin
...
>> uint32_t
>> hsiphash24(char const *in, size_t len, uint32_t const key[2])
>> {
>> uint32_t c = key[0];
>> uint32_t d = key[1];
>> uint32_t a = 0x6c796765 ^ 0x736f6d65;
>> uint
To follow up on my comments that your benchmark results were peculiar,
here's my benchmark code.
It just computes the hash of all n*(n+1)/2 possible non-empty substrings
of a buffer of n (called "max" below) bytes. "cpb" is "cycles per byte".
(The average length is (n+2)/3, c.f.
To follow up on my comments that your benchmark results were peculiar,
here's my benchmark code.
It just computes the hash of all n*(n+1)/2 possible non-empty substrings
of a buffer of n (called "max" below) bytes. "cpb" is "cycles per byte".
(The average length is (n+2)/3, c.f.
BTW, here's some SipHash code I wrote for Linux a while ago.
My target application was ext4 directory hashing, resulting in different
implementation choices, although I still think that a rolled-up
implementation like this is reasonable. Reducing I-cache impact speeds
up the calling code.
One
BTW, here's some SipHash code I wrote for Linux a while ago.
My target application was ext4 directory hashing, resulting in different
implementation choices, although I still think that a rolled-up
implementation like this is reasonable. Reducing I-cache impact speeds
up the calling code.
One
> I already did this. Check my branch.
Do you think it should return "u32" (as you currently have it) or
"unsigned long"? I thought the latter, since it doesn't cost any
more and makes more
> I wonder if this could also lead to a similar aliasing
> with arch_get_random_int, since I'm pretty
> I already did this. Check my branch.
Do you think it should return "u32" (as you currently have it) or
"unsigned long"? I thought the latter, since it doesn't cost any
more and makes more
> I wonder if this could also lead to a similar aliasing
> with arch_get_random_int, since I'm pretty
> 64-bit security for an RNG is not reasonable even with rekeying. No no
> no. Considering we already have a massive speed-up here with the
> secure version, there's zero reason to start weakening the security
> because we're trigger happy with our benchmarks. No no no.
Just to clarify, I was
> 64-bit security for an RNG is not reasonable even with rekeying. No no
> no. Considering we already have a massive speed-up here with the
> secure version, there's zero reason to start weakening the security
> because we're trigger happy with our benchmarks. No no no.
Just to clarify, I was
An idea I had which mght be useful:
You could perhaps save two rounds in siphash_*u64.
The final word with the length (called "b" in your implementation)
only needs to be there if the input is variable-sized.
If every use of a given key is of a fixed-size input, you don't need
a length suffix.
An idea I had which mght be useful:
You could perhaps save two rounds in siphash_*u64.
The final word with the length (called "b" in your implementation)
only needs to be there if the input is variable-sized.
If every use of a given key is of a fixed-size input, you don't need
a length suffix.
> What should we do with get_random_int() and get_random_long()? In
> some cases it's being used in performance sensitive areas, and where
> anti-DoS protection might be enough. In others, maybe not so much.
This is tricky. The entire get_random_int() structure is an abuse of
the hash function
> What should we do with get_random_int() and get_random_long()? In
> some cases it's being used in performance sensitive areas, and where
> anti-DoS protection might be enough. In others, maybe not so much.
This is tricky. The entire get_random_int() structure is an abuse of
the hash function
Jason A. Donenfeld wrote:
> I saw that jiffies addition in there and was wondering what it was all
> about. It's currently added _after_ the siphash input, not before, to
> keep with how the old algorithm worked. I'm not sure if this is
> correct or if there's something wrong with that, as I
Jason A. Donenfeld wrote:
> I saw that jiffies addition in there and was wondering what it was all
> about. It's currently added _after_ the siphash input, not before, to
> keep with how the old algorithm worked. I'm not sure if this is
> correct or if there's something wrong with that, as I
Tom Herbert wrote:
> Tested this. Distribution and avalanche effect are still good. Speed
> wise I see about a 33% improvement over siphash (20 nsecs/op versus 32
> nsecs). That's about 3x of jhash speed (7 nsecs). So that might closer
> to a more palatable replacement for jhash. Do we lose any
Tom Herbert wrote:
> Tested this. Distribution and avalanche effect are still good. Speed
> wise I see about a 33% improvement over siphash (20 nsecs/op versus 32
> nsecs). That's about 3x of jhash speed (7 nsecs). So that might closer
> to a more palatable replacement for jhash. Do we lose any
>> On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and
>> should be used always. Don't even compile the 32-bit code, to prevent
>> anyone accidentally using it, and make hsiphash an alias for siphash.
> Fascinating! Okay. So I'll alias hsiphash to siphash on 64-bit then. I
>
>> On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and
>> should be used always. Don't even compile the 32-bit code, to prevent
>> anyone accidentally using it, and make hsiphash an alias for siphash.
> Fascinating! Okay. So I'll alias hsiphash to siphash on 64-bit then. I
>
> It appears that hsiphash can produce either 32-bit output or 64-bit
> output, with the output length parameter as part of the hash algorithm
> in there. When I code this for my kernel patchset, I very likely will
> only implement one output length size. Right now I'm leaning toward
> 32-bit.
A
> It appears that hsiphash can produce either 32-bit output or 64-bit
> output, with the output length parameter as part of the hash algorithm
> in there. When I code this for my kernel patchset, I very likely will
> only implement one output length size. Right now I'm leaning toward
> 32-bit.
A
Jason A. Donenfeld wrote:
> Isn't that equivalent to:
> v0 = key[0];
> v1 = key[1];
> v2 = key[0] ^ (0x736f6d6570736575ULL ^ 0x646f72616e646f6dULL);
> v3 = key[1] ^ (0x646f72616e646f6dULL ^ 0x7465646279746573ULL);
(Pre-XORing key[] with the first two constants which, if
Jason A. Donenfeld wrote:
> Isn't that equivalent to:
> v0 = key[0];
> v1 = key[1];
> v2 = key[0] ^ (0x736f6d6570736575ULL ^ 0x646f72616e646f6dULL);
> v3 = key[1] ^ (0x646f72616e646f6dULL ^ 0x7465646279746573ULL);
(Pre-XORing key[] with the first two constants which, if
1 - 100 of 1080 matches
Mail list logo