Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
Hannes Frederic Sowa wrote: > We call extract_crng when we run out of batched entropy and reseed. How > often we call down to extract_crng depends on how much entropy we > extracted by calls to get_random_int/long, so the number of calls into > those functions matter. > > In extract_crng we have a timer which reseeds every 300s the CPRNG and > either uses completely new entropy from the CRNG or calls down into the > CPRNG while also doing backtracing protection (which feeds chacha's > block size / 2 back into chacha, if I read the code correctly, thus > 1024 bits, which should be enough). In the current code, _extract_crng checks to see if more than 300 s have elapsed since last time it was reseeded, and if so, reseeds with fresh entropy. In addition, on every read (or get_random_bytes), if the request leaves enough ranfom bits in the last ChaCha block, it feeds back 256 bits (the ChaCha block size is 16*32 = 512 bits) for anti-backtracking. If the last read happened to not fit under that limit (size % 512 > 256), *and* there are no calls for RNG output for a long time, there is no upper limit to how long the old ChaCha key can hang around. > On Fri, 2016-12-23 at 20:17 -0500, George Spelvin wrote: >> For example, two mix-backs of 64 bits gives you 65 bit security, not 128. >> (Because each mixback can be guessed and verified separately.) > Exactly, but the full reseed after running out of entropy is strong > enough to not be defeated by your argumentation. Neither the reseed > from the CRNG. Yes, I was just reacting to your original statement: > couldn't we simply use 8 bytes of the 64 byte > return block to feed it directly back into the state chacha? It's not the idea that's bad, just the proposed quantity. >> If you want that, I have a pile of patches to prandom I really >> should push upstream. Shall I refresh them and send them to you? > I would like to have a look at them in the new year, certainly! I can > also take care about the core prandom patches, but don't know if I have > time to submit the others to the different subsystems. > > Maybe, if David would be okay with that, we can submit all patches > through his tree, as he is also the dedicated maintainer for prandom. Amazing, thank you very much! They're just minor cleanups, nothing too exciting. I'll put it in the queue to make sure they're up to date. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
Hello, On Fri, 2016-12-23 at 20:17 -0500, George Spelvin wrote: > Hannes Frederic Sowa wrote: > > On 24.12.2016 00:39, George Spelvin wrote: > > > We just finished discussing why 8 bytes isn't enough. If you only > > > feed back 8 bytes, an attacker who can do 2^64 computation can find it > > > (by guessing and computing forward to verify the guess) and recover the > > > previous state. You need to feed back at least as much output as your > > > security targete. For /dev/urandom's ChaCha20, that's 256 bits. > > I followed the discussion but it appeared to me that this has the > > additional constraint of time until the next reseeding event happenes, > > which is 300s (under the assumption that calls to get_random_int happen > > regularly, which I expect right now). After that the existing reseeding > > mechansim will ensure enough backtracking protection. The number of > > bytes can easily be increased here, given that reseeding was shown to be > > quite fast already and we produce enough output. But I am not sure if > > this is a bit overengineered in the end? > > I'm not following your description of how the time-based and call-based > mechanisms interact, but for any mix-back, you should either do enough > or none at all. (Also called "catastrophic reseeding".) We call extract_crng when we run out of batched entropy and reseed. How often we call down to extract_crng depends on how much entropy we extracted by calls to get_random_int/long, so the number of calls into those functions matter. In extract_crng we have a timer which reseeds every 300s the CPRNG and either uses completely new entropy from the CRNG or calls down into the CPRNG while also doing backtracing protection (which feeds chacha's block size / 2 back into chacha, if I read the code correctly, thus 1024 bits, which should be enough). > For example, two mix-backs of 64 bits gives you 65 bit security, not 128. > (Because each mixback can be guessed and verified separately.) Exactly, but the full reseed after running out of entropy is strong enough to not be defeated by your argumentation. Neither the reseed from the CRNG. > > Also agreed. Given your solution below to prandom_u32, I do think it > > might also work without the seqlock now. > > It's not technically a seqlock; in particular the reader doesn't > spin. But the write side, and general logic is so similar it's > a good mental model. > > Basically, assume a 64-byte buffer. The reader has gone through > 32 bytes of it, and has 32 left, and when he reads another 8 bytes, > has to distinguish three cases: > > 1) No update; we read the old bytes and there are now 32 - 24 bytes left. > 2) Update completed while we weren't looking. There are now new >bytes in the buffer, and we took 8 leaving 64 - 8 = 56. > 3) Update in progress at the time of the read. We don't know if we >are seeing old bytes or new bytes, so we have to assume the worst >and not proceeed unless 32 >= 8, but assume at the end there are >64 - 8 = 56 new bytes left. > > > I wouldn't have added a disable irqs, but given that I really like your > > proposal, I would take it in my todo branch and submit it when net-next > > window opens. > > If you want that, I have a pile of patches to prandom I really > should push upstream. Shall I refresh them and send them to you? I would like to have a look at them in the new year, certainly! I can also take care about the core prandom patches, but don't know if I have time to submit the others to the different subsystems. Maybe, if David would be okay with that, we can submit all patches through his tree, as he is also the dedicated maintainer for prandom. > [... patch descriptions ...] Thanks, Hannes -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
Hannes Frederic Sowa wrote: > On 24.12.2016 00:39, George Spelvin wrote: >> We just finished discussing why 8 bytes isn't enough. If you only >> feed back 8 bytes, an attacker who can do 2^64 computation can find it >> (by guessing and computing forward to verify the guess) and recover the >> previous state. You need to feed back at least as much output as your >> security targete. For /dev/urandom's ChaCha20, that's 256 bits. > I followed the discussion but it appeared to me that this has the > additional constraint of time until the next reseeding event happenes, > which is 300s (under the assumption that calls to get_random_int happen > regularly, which I expect right now). After that the existing reseeding > mechansim will ensure enough backtracking protection. The number of > bytes can easily be increased here, given that reseeding was shown to be > quite fast already and we produce enough output. But I am not sure if > this is a bit overengineered in the end? I'm not following your description of how the time-based and call-based mechanisms interact, but for any mix-back, you should either do enough or none at all. (Also called "catastrophic reseeding".) For example, two mix-backs of 64 bits gives you 65 bit security, not 128. (Because each mixback can be guessed and verified separately.) > Also agreed. Given your solution below to prandom_u32, I do think it > might also work without the seqlock now. It's not technically a seqlock; in particular the reader doesn't spin. But the write side, and general logic is so similar it's a good mental model. Basically, assume a 64-byte buffer. The reader has gone through 32 bytes of it, and has 32 left, and when he reads another 8 bytes, has to distinguish three cases: 1) No update; we read the old bytes and there are now 32 - 24 bytes left. 2) Update completed while we weren't looking. There are now new bytes in the buffer, and we took 8 leaving 64 - 8 = 56. 3) Update in progress at the time of the read. We don't know if we are seeing old bytes or new bytes, so we have to assume the worst and not proceeed unless 32 >= 8, but assume at the end there are 64 - 8 = 56 new bytes left. > I wouldn't have added a disable irqs, but given that I really like your > proposal, I would take it in my todo branch and submit it when net-next > window opens. If you want that, I have a pile of patches to prandom I really should push upstream. Shall I refresh them and send them to you? commit 4cf1b3d9f4fbccc29ffc2fe4ca4ff52ea77253f1 Author: George Spelvin Date: Mon Aug 31 00:05:00 2015 -0400 net: Use prandom_u32_max() It's slightly more efficient than "prandom_u32() % range" The net/802 code was already efficient, but prandom_u32_max() is simpler. In net/batman-adv/bat_iv_ogm.c, batadv_iv_ogm_fwd_send_time() got changed from picking a random number of milliseconds and converting to jiffies to picking a random number of jiffies, since the number of milliseconds (and thus the conversion to jiffies) is a compile-time constant. The equivalent code in batadv_iv_ogm_emit_send_time was not changed, because the number of milliseconds is variable. In net/ipv6/ip6_flowlabel.c, ip6_flowlabel had htonl(prandom_u32()), which is silly. Just cast to __be32 without permuting the bits. net/sched/sch_netem.c got adjusted to only need one call to prandom_u32 instead of 2. (Assuming skb_headlen can't exceed 512 MiB, which is hopefully safe for some time yet.) Signed-off-by: George Spelvin commit 9c8fb80e1fd2be42c35cab1af27187d600fd85e3 Author: George Spelvin Date: Sat May 24 15:20:47 2014 -0400 mm/swapfile.c: Use prandom_u32_max() It's slightly more efficient than "prandom_u32() % range" Signed-off-by: George Spelvin commit 2743eb01e5c5958fd88ae78d19c5fea772d4b117 Author: George Spelvin Date: Sat May 24 15:19:53 2014 -0400 lib: Use prandom_u32_max() It's slightly more efficient than "prandom_u32() % range" Signed-off-by: George Spelvin commit 6a5e91bf395060a3351bfe5efc40ac20ffba2c1b Author: George Spelvin Date: Sat May 24 15:18:50 2014 -0400 fs/xfs: Use prandom_u32_max() It's slightly more efficient than "prandom_u32() % range". Also changed the last argument of xfs_error_test() from "unsigned long" to "unsigned", since the code never did support values > 2^32, and the largest value ever passed is 100. The code could be improved even further by passing in 2^32/rf rather than rf, but I'll leave that to some XFS developers. Signed-off-by: George Spelvin commit 6f6d485d9179ca6ec4e30caa06ade0e0c6931810 Author: George Spelvin Date: Sat May 24 15:00:17 2014 -0400 fs/ubifs: Use prandom_u32_max() It's slightly more efficient than "prandom_u32() % range". In fs/ubifs/debug.c, the chance() function got rewritten entirely to take advantage o
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
Hi, On 24.12.2016 00:39, George Spelvin wrote: > Hannes Frederic Sowa wrote: >> In general this looks good, but bitbuffer needs to be protected from >> concurrent access, thus needing at least some atomic instruction and >> disabling of interrupts for the locking if done outside of >> get_random_long. Thus I liked your previous approach more where you >> simply embed this in the already necessary get_random_long and aliased >> get_random_long as get_random_bits(BITS_PER_LONG) accordingly, IMHO. > > It's meant to be part of the same approach, and I didn't include locking > because that's a requirement for *any* solution, and isn't specific > to the part I was trying to illustrate. > > (As for re-using the name "get_random_long", that was just so > I didn't have to explain it. Call it "get_random_long_internal" > if you like.) > > Possible locking implementations include: > 1) Use the same locking as applies to get_random_long_internal(), or > 2) Make bitbuffer a per-CPU variable (note that we currently have 128 >bits of per-CPU state in get_random_int_hash[]), and this is all a >fast-path to bypass heavier locking in get_random_long_internal(). I understood that this is definitely a solvable problem. >>> But, I just realized I've been overlooking something glaringly obvious... >>> there's no reason you can't compute multple blocks in advance. >> >> In the current code on the cost of per cpu allocations thus memory. > > Yes, but on 4-core machines it's still not much, and 4096-core > behemoths have RAM to burn. > >> In the extract_crng case, couldn't we simply use 8 bytes of the 64 byte >> return block to feed it directly back into the state chacha? So we pass >> on 56 bytes into the pcpu buffer, and consume 8 bytes for the next >> state. This would make the window max shorter than the anti >> backtracking protection right now from 300s to 14 get_random_int calls. >> Not sure if this is worth it. > > We just finished discussing why 8 bytes isn't enough. If you only > feed back 8 bytes, an attacker who can do 2^64 computation can find it > (by guessing and computing forward to verify the guess) and recover the > previous state. You need to feed back at least as much output as your > security targete. For /dev/urandom's ChaCha20, that's 256 bits. I followed the discussion but it appeared to me that this has the additional constraint of time until the next reseeding event happenes, which is 300s (under the assumption that calls to get_random_int happen regularly, which I expect right now). After that the existing reseeding mechansim will ensure enough backtracking protection. The number of bytes can easily be increased here, given that reseeding was shown to be quite fast already and we produce enough output. But I am not sure if this is a bit overengineered in the end? >>> For example, suppose we gave each CPU a small pool to minimize locking. >>> When one runs out and calls the global refill, it could refill *all* >>> of the CPU pools. (You don't even need locking; there may be a race to >>> determine *which* random numbers the reader sees, but they're equally >>> good either way.) > >> Yes, but still disabled interrupts, otherwise the same state could be >> used twice on the same CPU. Uff, I think we have this problem in >> prandom_u32. > > There are some locking gotchas, but it is doable lock-free. > > Basically, it's a seqlock. The writer increments it once (to an odd > number) before starting to overwrite the buffer, and a second time (to > an even number) after. "Before" and "after" mean smp_wmb(). > > The reader can use this to figure out how much of the data in the buffer > is safely fresh. The full sequence of checks is a bit intricate, > but straightforward. > > I didn't discuss the locking because I'm confident it's solvable, > not because I wasn't aware it has to be solved. Also agreed. Given your solution below to prandom_u32, I do think it might also work without the seqlock now. > As for prandom_u32(), what's the problem? Are you worried that > get_cpu_var disables preemption but not interrupts, and so an > ISR might return the same value as process-level code? Yes, exactly those were my thoughts. > First of all, that's not a problem because prandom_u32() doesn't > have security guarantees. Occasionally returning a dupicate number > is okay. > > Second, if you do care, that could be trivially fixed by throwing > a barrier() in the middle of the code. (Patch appended; S-o-B > if anyone wants it.) I wouldn't have added a disable irqs, but given that I really like your proposal, I would take it in my todo branch and submit it when net-next window opens. > diff --git a/lib/random32.c b/lib/random32.c > index c750216d..6bee4a36 100644 > --- a/lib/random32.c > +++ b/lib/random32.c > [...] Thanks, Hannes -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
Hannes Frederic Sowa wrote: > In general this looks good, but bitbuffer needs to be protected from > concurrent access, thus needing at least some atomic instruction and > disabling of interrupts for the locking if done outside of > get_random_long. Thus I liked your previous approach more where you > simply embed this in the already necessary get_random_long and aliased > get_random_long as get_random_bits(BITS_PER_LONG) accordingly, IMHO. It's meant to be part of the same approach, and I didn't include locking because that's a requirement for *any* solution, and isn't specific to the part I was trying to illustrate. (As for re-using the name "get_random_long", that was just so I didn't have to explain it. Call it "get_random_long_internal" if you like.) Possible locking implementations include: 1) Use the same locking as applies to get_random_long_internal(), or 2) Make bitbuffer a per-CPU variable (note that we currently have 128 bits of per-CPU state in get_random_int_hash[]), and this is all a fast-path to bypass heavier locking in get_random_long_internal(). >> But, I just realized I've been overlooking something glaringly obvious... >> there's no reason you can't compute multple blocks in advance. > > In the current code on the cost of per cpu allocations thus memory. Yes, but on 4-core machines it's still not much, and 4096-core behemoths have RAM to burn. > In the extract_crng case, couldn't we simply use 8 bytes of the 64 byte > return block to feed it directly back into the state chacha? So we pass > on 56 bytes into the pcpu buffer, and consume 8 bytes for the next > state. This would make the window max shorter than the anti > backtracking protection right now from 300s to 14 get_random_int calls. > Not sure if this is worth it. We just finished discussing why 8 bytes isn't enough. If you only feed back 8 bytes, an attacker who can do 2^64 computation can find it (by guessing and computing forward to verify the guess) and recover the previous state. You need to feed back at least as much output as your security targete. For /dev/urandom's ChaCha20, that's 256 bits. >> For example, suppose we gave each CPU a small pool to minimize locking. >> When one runs out and calls the global refill, it could refill *all* >> of the CPU pools. (You don't even need locking; there may be a race to >> determine *which* random numbers the reader sees, but they're equally >> good either way.) > Yes, but still disabled interrupts, otherwise the same state could be > used twice on the same CPU. Uff, I think we have this problem in > prandom_u32. There are some locking gotchas, but it is doable lock-free. Basically, it's a seqlock. The writer increments it once (to an odd number) before starting to overwrite the buffer, and a second time (to an even number) after. "Before" and "after" mean smp_wmb(). The reader can use this to figure out how much of the data in the buffer is safely fresh. The full sequence of checks is a bit intricate, but straightforward. I didn't discuss the locking because I'm confident it's solvable, not because I wasn't aware it has to be solved. As for prandom_u32(), what's the problem? Are you worried that get_cpu_var disables preemption but not interrupts, and so an ISR might return the same value as process-level code? First of all, that's not a problem because prandom_u32() doesn't have security guarantees. Occasionally returning a dupicate number is okay. Second, if you do care, that could be trivially fixed by throwing a barrier() in the middle of the code. (Patch appended; S-o-B if anyone wants it.) diff --git a/lib/random32.c b/lib/random32.c index c750216d..6bee4a36 100644 --- a/lib/random32.c +++ b/lib/random32.c @@ -55,16 +55,29 @@ static DEFINE_PER_CPU(struct rnd_state, net_rand_state); * * This is used for pseudo-randomness with no outside seeding. * For more random results, use prandom_u32(). + * + * The barrier() is to allow prandom_u32() to be called from interupt + * context without locking. An interrupt will run to completion, + * updating all four state variables. The barrier() ensures that + * the interrupted code will compute a different result. Either it + * will have written s1 and s2 (so the interrupt will start with + * the updated values), or it will use the values of s3 and s4 + * updated by the interrupt. + * + * (The same logic applies recursively to nested interrupts, trap + * handlers, and NMIs.) */ u32 prandom_u32_state(struct rnd_state *state) { + register u32 x; #define TAUSWORTHE(s, a, b, c, d) ((s & c) << d) ^ (((s << a) ^ s) >> b) - state->s1 = TAUSWORTHE(state->s1, 6, 13, 4294967294U, 18U); - state->s2 = TAUSWORTHE(state->s2, 2, 27, 4294967288U, 2U); - state->s3 = TAUSWORTHE(state->s3, 13, 21, 4294967280U, 7U); - state->s4 = TAUSWORTHE(state->s4, 3, 12, 4294967168U, 13U); + x = state->s1 = TAUSWORTHE(state->s1, 6, 13, ~
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
Hi, On Fri, 2016-12-23 at 13:26 -0500, George Spelvin wrote: > (Cc: list trimmed slightly as the topic is wandering a bit.) > > Hannes Frederic Sowa wrote: > > On Thu, 2016-12-22 at 19:07 -0500, George Spelvin wrote: > > > Adding might_lock() annotations will improve coverage a lot. > > > > Might be hard to find the correct lock we take later down the code > > path, but if that is possible, certainly. > > The point of might_lock() is that you don't have to. You find the > worst case (most global) lock that the code *might* take if all the > buffer-empty conditions are true, and tell lockdep "assume this lock is > taken all the time". Yes, of course. Probably indicating input_pool's and primary_crng's locks are good to indicate here, but on NUMA other locks might be taken. > > > Hannes Frederic Sowa wrote: > > > > Yes, that does look nice indeed. Accounting for bits instead of bytes > > > > shouldn't be a huge problem either. Maybe it gets a bit more verbose in > > > > case you can't satisfy a request with one batched entropy block and have > > > > to consume randomness from two. > > For example, here's a simple bit-buffer implementation that wraps around > a get_random_long. The bitbuffer is of the form "1", where the > x bits are valid, and the position of the msbit indicates how many bits > are valid. > > extern unsigned long get_random_long(); > static unsigned long bitbuffer = 1; /* Holds 0..BITS_PER_LONG-1 bits */ > unsigned long get_random_bits(unsigned char bits) > { > /* We handle bits == BITS_PER_LONG,and not bits == 0 */ > unsigned long mask = -1ul >> (BITS_PER_LONG - bits); > unsigned long val; > > if (bitbuffer > mask) { > /* Request can be satisfied out of the bit buffer */ > val = bitbuffer; > bitbuffer >>= bits; > } else { > /* >* Not enough bits, but enough room in bitbuffer for the >* leftovers. avail < bits, so avail + 64 <= bits + 63. >*/ > val = get_random_long(); > bitbuffer = bitbuffer << (BITS_PER_LONG - bits) > | val >> 1 >> (bits-1); > } > return val & mask; > } In general this looks good, but bitbuffer needs to be protected from concurrent access, thus needing at least some atomic instruction and disabling of interrupts for the locking if done outside of get_random_long. Thus I liked your previous approach more where you simply embed this in the already necessary get_random_long and aliased get_random_long as get_random_bits(BITS_PER_LONG) accordingly, IMHO. > > When we hit the chacha20 without doing a reseed we only mutate the > > state of chacha, but being an invertible function in its own, a > > proposal would be to mix parts of the chacha20 output back into the > > state, which, as a result, would cause slowdown because we couldn't > > propagate the complete output of the cipher back to the caller (looking > > at the function _extract_crng). > > Basically, yes. Half of the output goes to rekeying itself. Okay, thanks and understood. :) > But, I just realized I've been overlooking something glaringly obvious... > there's no reason you can't compute multple blocks in advance. In the current code on the cost of per cpu allocations thus memory. > The standard assumption in antibacktracking is that you'll *notice* the > state capture and stop trusting the random numbers afterward; you just > want the output *before* to be secure. In other words, cops busting > down the door can't find the session key used in the message you just sent. Yep, analogous to forward secrecy and future secrecy (self healing) is provided by catastrophic reseeding from /dev/random. In the extract_crng case, couldn't we simply use 8 bytes of the 64 byte return block to feed it directly back into the state chacha? So we pass on 56 bytes into the pcpu buffer, and consume 8 bytes for the next state. This would make the window max shorter than the anti backtracking protection right now from 300s to 14 get_random_int calls. Not sure if this is worth it. > So you can compute and store random numbers ahead of need. > > This can amortize the antibacktracking as much as you'd like. > > For example, suppose we gave each CPU a small pool to minimize locking. > When one runs out and calls the global refill, it could refill *all* > of the CPU pools. (You don't even need locking; there may be a race to > determine *which* random numbers the reader sees, but they're equally > good either way.) Yes, but still disabled interrupts, otherwise the same state could be used twice on the same CPU. Uff, I think we have this problem in prandom_u32. > > Or are you referring that the anti-backtrack protection should happen > > in every call from get_random_int? > > If you ask for anti-backtracking without qualification, that's the > goal, since you don't know how long will elapse until the next call.
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
(Cc: list trimmed slightly as the topic is wandering a bit.) Hannes Frederic Sowa wrote: > On Thu, 2016-12-22 at 19:07 -0500, George Spelvin wrote: >> Adding might_lock() annotations will improve coverage a lot. > > Might be hard to find the correct lock we take later down the code > path, but if that is possible, certainly. The point of might_lock() is that you don't have to. You find the worst case (most global) lock that the code *might* take if all the buffer-empty conditions are true, and tell lockdep "assume this lock is taken all the time". >> Hannes Frederic Sowa wrote: >>> Yes, that does look nice indeed. Accounting for bits instead of bytes >>> shouldn't be a huge problem either. Maybe it gets a bit more verbose in >>> case you can't satisfy a request with one batched entropy block and have >>> to consume randomness from two. For example, here's a simple bit-buffer implementation that wraps around a get_random_long. The bitbuffer is of the form "1", where the x bits are valid, and the position of the msbit indicates how many bits are valid. extern unsigned long get_random_long(); static unsigned long bitbuffer = 1; /* Holds 0..BITS_PER_LONG-1 bits */ unsigned long get_random_bits(unsigned char bits) { /* We handle bits == BITS_PER_LONG,and not bits == 0 */ unsigned long mask = -1ul >> (BITS_PER_LONG - bits); unsigned long val; if (bitbuffer > mask) { /* Request can be satisfied out of the bit buffer */ val = bitbuffer; bitbuffer >>= bits; } else { /* * Not enough bits, but enough room in bitbuffer for the * leftovers. avail < bits, so avail + 64 <= bits + 63. */ val = get_random_long(); bitbuffer = bitbuffer << (BITS_PER_LONG - bits) | val >> 1 >> (bits-1); } return val & mask; } > When we hit the chacha20 without doing a reseed we only mutate the > state of chacha, but being an invertible function in its own, a > proposal would be to mix parts of the chacha20 output back into the > state, which, as a result, would cause slowdown because we couldn't > propagate the complete output of the cipher back to the caller (looking > at the function _extract_crng). Basically, yes. Half of the output goes to rekeying itself. But, I just realized I've been overlooking something glaringly obvious... there's no reason you can't compute multple blocks in advance. The standard assumption in antibacktracking is that you'll *notice* the state capture and stop trusting the random numbers afterward; you just want the output *before* to be secure. In other words, cops busting down the door can't find the session key used in the message you just sent. So you can compute and store random numbers ahead of need. This can amortize the antibacktracking as much as you'd like. For example, suppose we gave each CPU a small pool to minimize locking. When one runs out and calls the global refill, it could refill *all* of the CPU pools. (You don't even need locking; there may be a race to determine *which* random numbers the reader sees, but they're equally good either way.) > Or are you referring that the anti-backtrack protection should happen > in every call from get_random_int? If you ask for anti-backtracking without qualification, that's the goal, since you don't know how long will elapse until the next call. It's like fsync(). There are lots of more limited forms of "keep my data safe in case of a crash", but the most basic one is "if we lost power the very instant the call returned, the data would be safe." -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
On Thu, 2016-12-22 at 19:07 -0500, George Spelvin wrote: > Hannes Frederic Sowa wrote: > > A lockdep test should still be done. ;) > > Adding might_lock() annotations will improve coverage a lot. Might be hard to find the correct lock we take later down the code path, but if that is possible, certainly. > > Yes, that does look nice indeed. Accounting for bits instead of bytes > > shouldn't be a huge problem either. Maybe it gets a bit more verbose in > > case you can't satisfy a request with one batched entropy block and have > > to consume randomness from two. > > The bit granularity is also for the callers' convenience, so they don't > have to mask again. Whether get_random_bits rounds up to byte boundaries > internally or not is something else. > > When the current batch runs low, I was actually thinking of throwing > away the remaining bits and computing a new batch of 512. But it's > whatever works best at implementation time. > > > > > It could only mix the output back in every two calls, in which case > > > > you can backtrack up to one call but you need to do 2^128 work to > > > > backtrack farther. But yes, this is getting excessively complicated. > > > No, if you're willing to accept limited backtrack, this is a perfectly > > > acceptable solution, and not too complicated. You could do it phase-less > > > if you like; store the previous output, then after generating the new > > > one, mix in both. Then overwrite the previous output. (But doing two > > > rounds of a crypto primtive to avoid one conditional jump is stupid, > > > so forget that.) > > Can you quickly explain why we lose the backtracking capability? > > Sure. An RNG is (state[i], output[i]) = f(state[i-1]). The goal of > backtracking is to compute output[i], or better yet state[i-1], given > state[i]. > > For example, consider an OFB or CTR mode generator. The state is a key > and and IV, and you encrypt the IV with the key to produce output, then > either replace the IV with the output, or increment it. Either way, > since you still have the key, you can invert the transformation and > recover the previous IV. > > The standard way around this is to use the Davies-Meyer construction: > > IV[i] = IV[i-1] + E(IV[i-1], key) > > This is the standard way to make a non-invertible random function > out of an invertible random permutation. > > From the sum, there's no easy way to find the ciphertext *or* the > plaintext that was encrypted. Assuming the encryption is secure, > the only way to reverse it is brute force: guess IV[i-1] and run the > operation forward to see if the resultant IV[i] matches. > > There are a variety of ways to organize this computation, since the > guess gives toy both IV[i-1] and E(IV[i-1], key) = IV[i] - IV[i-1], including > running E forward, backward, or starting from both ends to see if you > meet in the middle. > > The way you add the encryption output to the IV is not very important. > It can be addition, xor, or some more complex invertible transformation. > In the case of SipHash, the "encryption" output is smaller than the > input, so we have to get a bit more creative, but it's still basically > the same thing. > > The problem is that the output which is combined with the IV is too small. > With only 64 bits, trying all possible values is practical. (The world's > Bitcoin miners are collectively computing SHA-256(SHA-256(input)) 1.7 * 2^64 > times per second.) > > By basically doing two iterations at once and mixing in 128 bits of > output, the guessing attack is rendered impractical. The only downside > is that you need to remember and store one result between when it's > computed and last used. This is part of the state, so an attack can > find output[i-1], but not anything farther back. Thanks a lot for the explanation! > > ChaCha as a block cipher gives a "perfect" permutation from the output > > of either the CRNG or the CPRNG, which actually itself has backtracking > > protection. > > I'm not quite understanding. The /dev/random implementation uses some > of the ChaCha output as a new ChaCha key (that's another way to mix output > back into the state) to prevent backtracking. But this slows it down, and > again if you want to be efficient, you're generating and storing large batches > of entropy and storing it in the RNG state. I was actually referring to the anti-backtrack protection in /dev/random and also /dev/urandom, from where we reseed every 300 seconds and if our batched entropy runs low with Ted's/Jason's current patch for get_random_int. As far as I can understand it, backtracking is not a problem in case of a reseed event inside extract_crng. When we hit the chacha20 without doing a reseed we only mutate the state of chacha, but being an invertible function in its own, a proposal would be to mix parts of the chacha20 output back into the state, which, as a result, would cause slowdown because we couldn't propagate the complete output of the cipher back
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
Hannes Frederic Sowa wrote: > A lockdep test should still be done. ;) Adding might_lock() annotations will improve coverage a lot. > Yes, that does look nice indeed. Accounting for bits instead of bytes > shouldn't be a huge problem either. Maybe it gets a bit more verbose in > case you can't satisfy a request with one batched entropy block and have > to consume randomness from two. The bit granularity is also for the callers' convenience, so they don't have to mask again. Whether get_random_bits rounds up to byte boundaries internally or not is something else. When the current batch runs low, I was actually thinking of throwing away the remaining bits and computing a new batch of 512. But it's whatever works best at implementation time. >>> It could only mix the output back in every two calls, in which case >>> you can backtrack up to one call but you need to do 2^128 work to >>> backtrack farther. But yes, this is getting excessively complicated. > >> No, if you're willing to accept limited backtrack, this is a perfectly >> acceptable solution, and not too complicated. You could do it phase-less >> if you like; store the previous output, then after generating the new >> one, mix in both. Then overwrite the previous output. (But doing two >> rounds of a crypto primtive to avoid one conditional jump is stupid, >> so forget that.) > Can you quickly explain why we lose the backtracking capability? Sure. An RNG is (state[i], output[i]) = f(state[i-1]). The goal of backtracking is to compute output[i], or better yet state[i-1], given state[i]. For example, consider an OFB or CTR mode generator. The state is a key and and IV, and you encrypt the IV with the key to produce output, then either replace the IV with the output, or increment it. Either way, since you still have the key, you can invert the transformation and recover the previous IV. The standard way around this is to use the Davies-Meyer construction: IV[i] = IV[i-1] + E(IV[i-1], key) This is the standard way to make a non-invertible random function out of an invertible random permutation. >From the sum, there's no easy way to find the ciphertext *or* the plaintext that was encrypted. Assuming the encryption is secure, the only way to reverse it is brute force: guess IV[i-1] and run the operation forward to see if the resultant IV[i] matches. There are a variety of ways to organize this computation, since the guess gives toy both IV[i-1] and E(IV[i-1], key) = IV[i] - IV[i-1], including running E forward, backward, or starting from both ends to see if you meet in the middle. The way you add the encryption output to the IV is not very important. It can be addition, xor, or some more complex invertible transformation. In the case of SipHash, the "encryption" output is smaller than the input, so we have to get a bit more creative, but it's still basically the same thing. The problem is that the output which is combined with the IV is too small. With only 64 bits, trying all possible values is practical. (The world's Bitcoin miners are collectively computing SHA-256(SHA-256(input)) 1.7 * 2^64 times per second.) By basically doing two iterations at once and mixing in 128 bits of output, the guessing attack is rendered impractical. The only downside is that you need to remember and store one result between when it's computed and last used. This is part of the state, so an attack can find output[i-1], but not anything farther back. > ChaCha as a block cipher gives a "perfect" permutation from the output > of either the CRNG or the CPRNG, which actually itself has backtracking > protection. I'm not quite understanding. The /dev/random implementation uses some of the ChaCha output as a new ChaCha key (that's another way to mix output back into the state) to prevent backtracking. But this slows it down, and again if you want to be efficient, you're generating and storing large batches of entropy and storing it in the RNG state. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
On 22.12.2016 22:11, George Spelvin wrote: >> I do tend to like Ted's version in which we use batched >> get_random_bytes() output. If it's fast enough, it's simpler and lets >> us get the full strength of a CSPRNG. > > With the ChaCha20 generator, that's fine, although note that this abandons > anti-backtracking entirely. > > It also takes locks, something the previous get_random_int code > path avoided. Do we need to audit the call sites to ensure that's safe? We have spin_lock_irq* locks on the way. Of course they can hurt in when contended. The situation should be the same as with get_random_bytes, callable from every possible situation in the kernel without fear, so this should also work for get_random_int. A lockdep test should still be done. ;) > And there is the issue that the existing callers assume that there's a > fixed cost per word. A good half of get_random_long calls are followed by > "& ~PAGE_MASK" to extract the low 12 bits. Or "& ((1ul << mmap_rnd_bits) > - 1)" to extract the low 28. If we have a buffer we're going to have to > pay to refill, it would be nice to use less than 8 bytes to satisfy those. > > But that can be a followup patch. I'm thinking > > unsigned long get_random_bits(unsigned bits) > E.g. get_random_bits(PAGE_SHIFT), >get_random_bits(mmap_rnd_bits), > u32 imm_rnd = get_random_bits(32) > > unsigned get_random_mod(unsigned modulus) > E.g. get_random_mod(hole) & ~(alignment - 1); >get_random_mod(port_scan_backoff) > (Althogh probably drivers/s390/scsi/zfcp_fc.c should be changed > to prandom.) > > with, until the audit is completed: > #define get_random_int() get_random_bits(32) > #define get_random_long() get_random_bits(BITS_PER_LONG) Yes, that does look nice indeed. Accounting for bits instead of bytes shouldn't be a huge problem either. Maybe it gets a bit more verbose in case you can't satisfy a request with one batched entropy block and have to consume randomness from two. >> It could only mix the output back in every two calls, in which case >> you can backtrack up to one call but you need to do 2^128 work to >> backtrack farther. But yes, this is getting excessively complicated. > > No, if you're willing to accept limited backtrack, this is a perfectly > acceptable solution, and not too complicated. You could do it phase-less > if you like; store the previous output, then after generating the new > one, mix in both. Then overwrite the previous output. (But doing two > rounds of a crypto primtive to avoid one conditional jump is stupid, > so forget that.) Can you quickly explain why we lose the backtracking capability? ChaCha as a block cipher gives a "perfect" permutation from the output of either the CRNG or the CPRNG, which actually itself has backtracking protection. Thanks for explaining, Hannes -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
> I do tend to like Ted's version in which we use batched > get_random_bytes() output. If it's fast enough, it's simpler and lets > us get the full strength of a CSPRNG. With the ChaCha20 generator, that's fine, although note that this abandons anti-backtracking entirely. It also takes locks, something the previous get_random_int code path avoided. Do we need to audit the call sites to ensure that's safe? And there is the issue that the existing callers assume that there's a fixed cost per word. A good half of get_random_long calls are followed by "& ~PAGE_MASK" to extract the low 12 bits. Or "& ((1ul << mmap_rnd_bits) - 1)" to extract the low 28. If we have a buffer we're going to have to pay to refill, it would be nice to use less than 8 bytes to satisfy those. But that can be a followup patch. I'm thinking unsigned long get_random_bits(unsigned bits) E.g. get_random_bits(PAGE_SHIFT), get_random_bits(mmap_rnd_bits), u32 imm_rnd = get_random_bits(32) unsigned get_random_mod(unsigned modulus) E.g. get_random_mod(hole) & ~(alignment - 1); get_random_mod(port_scan_backoff) (Althogh probably drivers/s390/scsi/zfcp_fc.c should be changed to prandom.) with, until the audit is completed: #define get_random_int() get_random_bits(32) #define get_random_long() get_random_bits(BITS_PER_LONG) > It could only mix the output back in every two calls, in which case > you can backtrack up to one call but you need to do 2^128 work to > backtrack farther. But yes, this is getting excessively complicated. No, if you're willing to accept limited backtrack, this is a perfectly acceptable solution, and not too complicated. You could do it phase-less if you like; store the previous output, then after generating the new one, mix in both. Then overwrite the previous output. (But doing two rounds of a crypto primtive to avoid one conditional jump is stupid, so forget that.) >> Hmm, interesting. Although, for ASLR, we could use get_random_bytes() >> directly and be done with it. It won't be a bottleneck. Isn't that what you already suggested? I don't mind fewer primtives; I got a bit fixated on "Replace MD5 with SipHash". It's just the locking that I want to check isn't a problem. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
On Thu, Dec 22, 2016 at 11:24 AM, George Spelvin wrote: >> Having slept on this, I like it less. The problem is that a >> backtracking attacker doesn't just learn H(random seed || entropy_0 || >> secret || ...) -- they learn the internal state of the hash function >> that generates that value. This probably breaks any attempt to apply >> security properties of the hash function. For example, the internal >> state could easily contain a whole bunch of prior outputs it in >> verbatim. > > The problem is, anti-backtracing is in severe tension with your desire > to use unmodified SipHash. > > First of all, I'd like to repeat that it isn't a design goal of the current > generator and isn't necessary. Agreed. > Now, the main point: it's not likely to be solvable. > > The problem with unmodified SipHash is that is has only 64 bits of > output. No mix-back mechanism can get around the fundamental problem > that that's too small to prevent a brute-force guessing attack. You need > wider mix-back. And getting more output from unmodified SipHash requires > more finalization rounds, which is expensive. It could only mix the output back in every two calls, in which case you can backtrack up to one call but you need to do 2^128 work to backtrack farther. But yes, this is getting excessively complicated. > Finally, your discomfort about an attacker learning the internal state... > if an attacker knows the key and the input, they can construct the > internal state. Yes, we could discard the internal state and construct > a fresh one on the next call to get_random_int, but what are you going > to key it with? What are you going to feed it? What keeps *that* > internal state any more secret from an attacker than one that's explicitly > stored? I do tend to like Ted's version in which we use batched get_random_bytes() output. If it's fast enough, it's simpler and lets us get the full strength of a CSPRNG. (Aside: some day I want to move all that code from drivers/ to lib/ and teach it to be buildable in userspace, too, so it's easy to play with it, feed it test vectors, confirm that it's equivalent to a reference implementation, write up the actual design and try to get real cryptographers to analyze it, etc.) > For example, clearly stating the concern over starting new processes > with predictable layout, and the limits on the fresh entropy supply, > has made me realize that there *is* a possible source: each exec() > is passed 128 bits from get_random_bytes in the AT_RANDOM element of > its auxv. Since get_random_int() accesses "current" anyway, we could > store some seed material there rather than using "pid". While this is > not fresh for each call to get_random_int, it *is* fresh for each new > address space whose layout is being randomized. Hmm, interesting. Although, for ASLR, we could use get_random_bytes() directly and be done with it. It won't be a bottleneck. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
> Having slept on this, I like it less. The problem is that a > backtracking attacker doesn't just learn H(random seed || entropy_0 || > secret || ...) -- they learn the internal state of the hash function > that generates that value. This probably breaks any attempt to apply > security properties of the hash function. For example, the internal > state could easily contain a whole bunch of prior outputs it in > verbatim. The problem is, anti-backtracing is in severe tension with your desire to use unmodified SipHash. First of all, I'd like to repeat that it isn't a design goal of the current generator and isn't necessary. The current generator just returns hash[0] from the MD5 state, then leaves the state stored. The fact that it conceals earlier outputs is an accident of the Davies-Meyer structure of md5_transform. It isn't necessary, because every secret generated is stored unencrypted for as long as it's of value. A few values are used for retransmit backoffs and random MAC addresses. Those are revealed to the world as soon as they're used. Most values are used for ASLR. These address are of interest to an attacker trying to mount a buffer-overflow attack, but that only lasts as long as the process is running to receive buffers. After the process exits, knowledge of its layout is worthless. And this is stored as long as it's running in kernel-accessible data, so storing a *second* copy in less conveniently kernel-accessible data (the RNG state) doesn't make the situation any worse. In addition to the above, if you're assuming a state capture, then since we have (for necessary efficieny reasons) a negligible about of fresh entropy, an attacker has the secret key and can predict *future* outputs very easily. Given that information, an attacker doesn't need to learn the layout of vulnerable server X. If they have a buffer overflow, they can crash the current instance and wait for a fresh image to be started (with a known address space) to launch their attack at. Kernel state capture attacks are a very unlikely attack, mostly because it's a narrow target a hair's breadth away from the much more interesting outright kernel compromise (attacker gains write access as well as read) which renders all this fancy cryptanaysis moot. Now, the main point: it's not likely to be solvable. The problem with unmodified SipHash is that is has only 64 bits of output. No mix-back mechanism can get around the fundamental problem that that's too small to prevent a brute-force guessing attack. You need wider mix-back. And getting more output from unmodified SipHash requires more finalization rounds, which is expensive. (Aside: 64 bits does have the advantage that it can't be brute-forced on the attacked machine. It must be exfiltrated to the attacker, and the solution returned to the attack code. But doing this is getting easier all the time.) Adding antibacktracking to SipHash is trivial: just add a Davies-Meyer structure around its internal state. Remember the internal state before hashing in the entropy and secret, generate the output, then add the previous and final states together for storage. This is a standard textbook construction, very cheap, and doesn't touch the compression function which is the target of analysis and attacks, but it requires poking around in SipHash's internal state. (And people who read the textbooks without understanding them will get upset because the textbooks all talk about using this construction with block ciphers, and SipHash's compression function is not advertised as a block cipher.) Alternative designs exist; you could extract additional output from earlier rounds of SipHash, using the duplex sponge construction you mentioned earlier. That output would be used for mixback purposes *only*, so wouldn't affect the security proof of the "primary" output. But this is also getting creative with SipHash's internals. Now, you could use a completely *different* cryptographic primitive to enforce one-way-ness, and apply SipHash as a strong output transform, but that doesn't feel like good design, and is probably more expensive. Finally, your discomfort about an attacker learning the internal state... if an attacker knows the key and the input, they can construct the internal state. Yes, we could discard the internal state and construct a fresh one on the next call to get_random_int, but what are you going to key it with? What are you going to feed it? What keeps *that* internal state any more secret from an attacker than one that's explicitly stored? Keeping the internal state around is a cacheing optimization, that's all. *If* you're assuming a state capture, the only thing secret from the attacker is any fresh entropy collected between the time of capture and the time of generation. Due to mandatory efficiency requirements, this is very small. I really think you're wishing for the impossible here. A final note: although I'm disagreeing with you,
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
On Wed, Dec 21, 2016 at 6:07 PM, Andy Lutomirski wrote: > On Wed, Dec 21, 2016 at 5:13 PM, George Spelvin > wrote: >> As a separate message, to disentangle the threads, I'd like to >> talk about get_random_long(). >> >> After some thinking, I still like the "state-preserving" construct >> that's equivalent to the current MD5 code. Yes, we could just do >> siphash(current_cpu || per_cpu_counter, global_key), but it's nice to >> preserve a bit more. >> >> It requires library support from the SipHash code to return the full >> SipHash state, but I hope that's a fair thing to ask for. > > I don't even think it needs that. This is just adding a > non-destructive final operation, right? > >> >> Here's my current straw man design for comment. It's very similar to >> the current MD5-based design, but feeds all the seed material in the >> "correct" way, as opposed to Xring directly into the MD5 state. >> >> * Each CPU has a (Half)SipHash state vector, >> "unsigned long get_random_int_hash[4]". Unlike the current >> MD5 code, we take care to initialize it to an asymmetric state. >> >> * There's a global 256-bit random_int_secret (which we could >> reseed periodically). >> >> To generate a random number: >> * If get_random_int_hash is all-zero, seed it with fresh a half-sized >> SipHash key and the appropriate XOR constants. >> * Generate three words of random_get_entropy(), jiffies, and current->pid. >> (This is arbitary seed material, copied from the current code.) >> * Crank through that with (Half)SipHash-1-0. >> * Crank through the random_int_secret with (Half)SipHash-1-0. >> * Return v1 ^ v3. > > Just to clarify, if we replace SipHash with a black box, I think this > effectively means, where "entropy" is random_get_entropy() || jiffies > || current->pid: > > The first call returns H(random seed || entropy_0 || secret). The > second call returns H(random seed || entropy_0 || secret || entropy_1 > || secret). Etc. Having slept on this, I like it less. The problem is that a backtracking attacker doesn't just learn H(random seed || entropy_0 || secret || ...) -- they learn the internal state of the hash function that generates that value. This probably breaks any attempt to apply security properties of the hash function. For example, the internal state could easily contain a whole bunch of prior outputs it in verbatim. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
> True, but it's called get_random_int(), and it seems like making it > stronger, especially if the performance cost is low to zero, is a good > thing. If it's cheap enough, I don't mind. But it's documented as being marginal-quality, used where speed is more important. In particular, it's *not* used for key material; only values that matter only as long as they are in use. Whule they're in use, they can't be concealed from an attacker with kernel access, and when they're dne being used, they're worthless. >> If you want anti-backtracking, though, it's easy to add. What we >> hash is: >> >> entropy_0 || secret || output_0 || entropy_1 || secret || output_1 || ... >> >> You mix the output word right back in to the (unfinalized) state after >> generating it. This is still equivalent to unmodified back-box SipHash, >> you're just using a (conceptually independent) SipHash invocation to >> produce some of its input. > Ah, cute. This could probably be sped up by doing something like: > > entropy_0 || secret || output_0 ^ entropy_1 || secret || ... I'm disinclined to do that because that requires deferring the mixing until the *next* time you generate something. Storing the value you don't want revealed by a state capture defeats the purpose. I'd rather mix it in immediately, so you have anti-backtracking from the moment of creation. Also, I don't think it's particularly "cute" or clever; mixing output back in is the standard way all antibacktracking is accomplished. It's how the Davies-Meyer hash construction makes a reversible function one-way. (There is a second way to do it by throwing away state, but that's expensive in seed entropy.) > It's a little weak because the output is only 64 bits, so you could > plausibly backtrack it on a GPU or FPGA cluster or on an ASIC if the > old entropy is guessable. I suspect there are sneaky ways around it > like using output_n-1 ^ output_n-2 or similar. I'll sleep on it. Ah, yes, I see. Given the final state, you guess the output word, go back one round, then forward the finalization rounds. Is the output equal to the guessed output? You'll find the true value, plus Poisson(1 - 2^-64) additional. (Since you have 2^64-1 chances at something with probability 1 in 2^64.) And this can be iterated as often as you like to get earlier output words, as long as you can guess the entropy. *That's* the part that hurts; you'd like something that peters out. You could use the double-length-output SipHash variant (which requires a second set of finalization rounds) and mix more output back, but that's expensive. The challenge is coming up with more unpredictable data to mix in than one invocation of SipHash returns. And without storing previous output anywhere, because that is exactly wrong. A running sum or xor or whatever of the outputs doesn't help, because once you've guessed the last output, you can backtrack that for no additional effort. State capture is incredibly difficult, our application doesn't require resistance anyway... unless you can think of something cheap, I think we can just live with this. >> I'd *like* to persuade you that skipping the padding byte wouldn't >> invalidate any security proofs, because it's true and would simplify >> the code. But if you want 100% stock, I'm willing to cater to that. > I lean toward stock in the absence of a particularly good reason. At > the very least I'd want to read that paper carefully. Er... adding the length is standard Merkle-Damgaard strengthening. Why you do this is described in the original papers by Merkle and Damgaard. The lay summary is at https://en.wikipedia.org/wiki/Merkle-Damgard_construction The original sources are: http://www.merkle.com/papers/Thesis1979.pdf http://saluc.engr.uconn.edu/refs/algorithms/hashalg/damgard89adesign.pdf Merkle describes the construction; Damgaard proves it secure. Basically, appending the length is required to handle variable-length input if the input is not itself self-delimiting. The proof of security is theorem 3.1 in the latter. (The first, more detailed explanation involves the use of an extra bit, which the second then explains how todo without.) In particular, see the top of page 420, which notes that the security proof only requires encoding *how much padding is added* in the final block, not the overall length of the message, and the second remark on p. 421 which notes that no such suffix is required if it's not necessary to distinguish messages with different numbers of trailing null bytes. The rules are alluded to in the "Choice of padding rule" part of the "Rationale" section of the SipHash paper (p. 7), but the description is very brief because it assumes the reader has the background. That's why they say "We could have chosen a slightly simpler padding rule, such as appending a 80 byte followed by zeroes." The thing is, if the amount of the last block that is used is fixed (within the domain of a particular key), you don't nee
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
On Wed, Dec 21, 2016 at 9:01 PM, George Spelvin wrote: > Andy Lutomirski wrote: >> I don't even think it needs that. This is just adding a >> non-destructive final operation, right? > > It is, but the problem is that SipHash is intended for *small* inputs, > so the standard implementations aren't broken into init/update/final > functions. > > There's just one big function that keeps the state variables in > registers and never stores them anywhere. > > If we *had* init/update/final functions, then it would be trivial. > >> Just to clarify, if we replace SipHash with a black box, I think this >> effectively means, where "entropy" is random_get_entropy() || jiffies >> || current->pid: > >> The first call returns H(random seed || entropy_0 || secret). The >> second call returns H(random seed || entropy_0 || secret || entropy_1 >> || secret). Etc. > > Basically, yes. I was skipping the padding byte and keying the > finalization rounds on the grounds of "can't hurt and might help", > but we could do it a more standard way. > >> If not, then I have a fairly strong preference to keep whatever >> construction we come up with consistent with something that could >> actually happen with invocations of unmodified SipHash -- then all the >> security analysis on SipHash goes through. > > Okay. I don't think it makes a difference, but it's not a *big* waste > of time. If we have finalization rounds, we can reduce the secret > to 128 bits. > > If we include the padding byte, we can do one of two things: > 1) Make the secret 184 bits, to fill up the final partial word as >much as possible, or > 2) Make the entropy 1 byte smaller and conceptually misalign the >secret. What we'd actually do is remove the last byte of >the secret and include it in the entropy words, but that's >just a rotation of the secret between storage and hashing. > > Also, I assume you'd like SipHash-2-4, since you want to rely > on a security analysis. I haven't looked, but I assume that the analysis at least thought about reduced rounds, so maybe other variants are okay. >> The one thing I don't like is >> that I don't see how to prove that you can't run it backwards if you >> manage to acquire a memory dump. In fact, I that that there exist, at >> least in theory, hash functions that are secure in the random oracle >> model but that *can* be run backwards given the full state. From >> memory, SHA-3 has exactly that property, and it would be a bit sad for >> a CSPRNG to be reversible. > > Er... get_random_int() is specifically *not* designed to be resistant > to state capture, and I didn't try. Remember, what it's used for > is ASLR, what we're worried about is somene learning the layouts > of still-running processes, and and if you get a memory dump, you have > the memory layout! True, but it's called get_random_int(), and it seems like making it stronger, especially if the performance cost is low to zero, is a good thing. > > If you want anti-backtracking, though, it's easy to add. What we > hash is: > > entropy_0 || secret || output_0 || entropy_1 || secret || output_1 || ... > > You mix the output word right back in to the (unfinalized) state after > generating it. This is still equivalent to unmodified back-box SipHash, > you're just using a (conceptually independent) SipHash invocation to > produce some of its input. Ah, cute. This could probably be sped up by doing something like: entropy_0 || secret || output_0 ^ entropy_1 || secret || ... It's a little weak because the output is only 64 bits, so you could plausibly backtrack it on a GPU or FPGA cluster or on an ASIC if the old entropy is guessable. I suspect there are sneaky ways around it like using output_n-1 ^ output_n-2 or similar. I'll sleep on it. > > The only remaining issues are: > 1) How many rounds, and > 2) May we use HalfSipHash? I haven't looked closely enough to have a real opinion here. I don't know what the security margin is believed to be. > > I'd *like* to persuade you that skipping the padding byte wouldn't > invalidate any security proofs, because it's true and would simplify > the code. But if you want 100% stock, I'm willing to cater to that. I lean toward stock in the absence of a particularly good reason. At the very least I'd want to read that paper carefully. > > Ted, what do you think? -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
Andy Lutomirski wrote: > I don't even think it needs that. This is just adding a > non-destructive final operation, right? It is, but the problem is that SipHash is intended for *small* inputs, so the standard implementations aren't broken into init/update/final functions. There's just one big function that keeps the state variables in registers and never stores them anywhere. If we *had* init/update/final functions, then it would be trivial. > Just to clarify, if we replace SipHash with a black box, I think this > effectively means, where "entropy" is random_get_entropy() || jiffies > || current->pid: > The first call returns H(random seed || entropy_0 || secret). The > second call returns H(random seed || entropy_0 || secret || entropy_1 > || secret). Etc. Basically, yes. I was skipping the padding byte and keying the finalization rounds on the grounds of "can't hurt and might help", but we could do it a more standard way. > If not, then I have a fairly strong preference to keep whatever > construction we come up with consistent with something that could > actually happen with invocations of unmodified SipHash -- then all the > security analysis on SipHash goes through. Okay. I don't think it makes a difference, but it's not a *big* waste of time. If we have finalization rounds, we can reduce the secret to 128 bits. If we include the padding byte, we can do one of two things: 1) Make the secret 184 bits, to fill up the final partial word as much as possible, or 2) Make the entropy 1 byte smaller and conceptually misalign the secret. What we'd actually do is remove the last byte of the secret and include it in the entropy words, but that's just a rotation of the secret between storage and hashing. Also, I assume you'd like SipHash-2-4, since you want to rely on a security analysis. (Regarding the padding byte, getting it right might be annoying to do exactly. All of the security analysis depends *only* on its low 3 bits indicating how much of the final block is used. As it says in the SipHash paper, they included 8 bits just because it was easy. But if you want it exact, it's just one more byte of state.) > The one thing I don't like is > that I don't see how to prove that you can't run it backwards if you > manage to acquire a memory dump. In fact, I that that there exist, at > least in theory, hash functions that are secure in the random oracle > model but that *can* be run backwards given the full state. From > memory, SHA-3 has exactly that property, and it would be a bit sad for > a CSPRNG to be reversible. Er... get_random_int() is specifically *not* designed to be resistant to state capture, and I didn't try. Remember, what it's used for is ASLR, what we're worried about is somene learning the layouts of still-running processes, and and if you get a memory dump, you have the memory layout! If you want anti-backtracking, though, it's easy to add. What we hash is: entropy_0 || secret || output_0 || entropy_1 || secret || output_1 || ... You mix the output word right back in to the (unfinalized) state after generating it. This is still equivalent to unmodified back-box SipHash, you're just using a (conceptually independent) SipHash invocation to produce some of its input. Each output is produced by copying the state, padding & finalizing after the secret. In fact, to make our lives easier, let's define the secret to end with a counter byte that happens to be equal to the padding byte. The input stream will be: Previous output: 8 (or 4 for HalfSipHash) bytes Entropy: 15 bytes (8 bytes timer, 4 bytes jiffies, 3 bytes pid) Secret: 16 bytes Counter: 1 byte ...repeat... > We could also periodically mix in a big (128-bit?) chunk of fresh > urandom output to keep the bad guys guessing. Simpler and faster to just update the global master secret. The state is per-CPU, so mixing in has to be repeated per CPU. With these changes, I'm satisifed that it's secure, cheap, has a sufficiently wide state size, *and* all standard SipHash analysis applies. The only remaining issues are: 1) How many rounds, and 2) May we use HalfSipHash? I'd *like* to persuade you that skipping the padding byte wouldn't invalidate any security proofs, because it's true and would simplify the code. But if you want 100% stock, I'm willing to cater to that. Ted, what do you think? -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
> On Wed, Dec 21, 2016 at 5:13 PM, George Spelvin >> After some thinking, I still like the "state-preserving" construct >> that's equivalent to the current MD5 code. Yes, we could just do >> siphash(current_cpu || per_cpu_counter, global_key), but it's nice to >> preserve a bit more. >> >> It requires library support from the SipHash code to return the full >> SipHash state, but I hope that's a fair thing to ask for. This is not a good idea. If I understand correctly, the idea here is to just keep around SipHash's internal state variables, and chain them over to the next call, sort of like how md5_transform with the current code works on the same scratch space. There has been no security analysis in the literature on this use of the primitive, and I have no confidence that this is a secure use of the function. Unless somebody can point me toward a paper I missed or a comment from a real cryptographer about the specifics of SipHash, I think I'm right to admonish against this dangerous road. Let's talk about constructions. And let's only decide on a construction that we're actually equipped to analyze. Let's definitely not talk about making our own primitives, or retrofitting nice primitive's internals into our own Frankenstein. Alternatively, if I'm wrong, please send me an eprint/arXiv link to a paper that discusses this use of SipHash. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html