Re: Patch: cprng_fast performance - please review.

2014-04-23 Thread David Laight
On Fri, Apr 18, 2014 at 02:41:07PM -0400, Thor Lancelot Simon wrote: > > Of the few systems which do have instructions that accellerate AES, on > the most common implementation -- x86 -- we cannot use the instructions > in question in the kernel because they use CPU state we do not save/ > restore

Re: Patch: cprng_fast performance - please review.

2014-04-23 Thread Mindaugas Rasiukevicius
Thor Lancelot Simon wrote: > On Wed, Apr 16, 2014 at 09:52:22PM -0400, Thor Lancelot Simon wrote: > > > > Attached is a patch which makes cprng_fast per-CPU and lockless. *IT > > IS NOT WELL TESTED YET (I haven't even run test vectors) AND IS ONLY > > FOR REVIEW.* > > New diff, with some missin

cprng_fast and interrupts [was Re: Patch: cprng_fast performance - please review.]

2014-04-22 Thread Taylor R Campbell
Date: Fri, 18 Apr 2014 21:50:46 -0400 From: Thor Lancelot Simon > Are there actually any callers of cprng_fast at IPL_HIGH? Are there > actually any legitimate random decisions to be made at IPL_HIGH? I'm > sceptical. What do you get if you cross an elephant and a rhinocerous

Re: Patch: cprng_fast performance - please review.

2014-04-19 Thread darkstar
> I would still suggest Salsa20 or ChaCha. My measurements with naive C > code suggest that, if you buffer the output for short outputs, these > take on average 40-50 Ivy Bridge cycles per request. (If you don't > buffer the output, it's 300 cycles.) Long requests get ~4 cpb. In > contrast, lib

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
Miscellaneous notes -- I'm doing the benchmarking we discussed and pondering whether it is right to switch from hc-128 to salsa20, separately. On Thu, Apr 17, 2014 at 09:33:28PM +, Taylor R Campbell wrote: > > > +void > > +hc128_init(hc128_state_t *state, const uint8_t *key, const uint8_t *iv

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Joerg Sonnenberger
On Fri, Apr 18, 2014 at 05:05:37PM -0400, Thor Lancelot Simon wrote: > On Fri, Apr 18, 2014 at 05:00:50PM -0400, Thor Lancelot Simon wrote: > > > > Unfortunately, the virtual machines on this laptop that I use for most > > NetBSD development don't expose the AES-NI instructions to guests, even > >

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Izumi Tsutsui
tls@ wrote: > > Note the caller of this hc128_init() is: > > > > I'm afraid "9KB stack on rekeying" is fatal on most ports. > > Well, the cipher should hardly ever get rekeyed. The rekeying > intervals could be considerably larger; I did not want to > increase them too much compared to the old

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Matt Thomas
On Apr 18, 2014, at 11:23 AM, Markku-Juhani Olavi Saarinen wrote: > It has been there on all new systems purchased in some last 3 years, > so I would *guess* that it would be > 50% of systems fielded out > there. Not everything is x86 based.

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 11:04:04PM +0200, Markku-Juhani Olavi Saarinen wrote: > Hi, > > Just one last thought: it will be on all *future* systems ? No, it won't, unless you have some funny definition of "system" that excludes anything that's not a high-end x86 implementation from one particular m

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 05:00:50PM -0400, Thor Lancelot Simon wrote: > > Unfortunately, the virtual machines on this laptop that I use for most > NetBSD development don't expose the AES-NI instructions to guests, even > when doing hardware assisted virtualization. Not RDRAND neither, for So, sin

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Markku-Juhani Olavi Saarinen
Hi, Just one last thought: it will be on all *future* systems ? Cheers, - markku Dr. Markku-Juhani O. Saarinen US +1 (424) 666 2713 On Fri, Apr 18, 2014 at 11:00 PM, Thor Lancelot Simon wrote: > On Fri, Apr 18, 2014 at 09:54:09PM +0100, Roland C. Dowdeswell wrote: >> On Fri, Apr 18, 2014 at

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 09:54:09PM +0100, Roland C. Dowdeswell wrote: > On Fri, Apr 18, 2014 at 08:23:11PM +0200, Markku-Juhani Olavi Saarinen wrote: > > > > > Agreed. AES is worse if you don't have AES-NI. > > > > It has been there on all new systems purchased in some last 3 years, > > so I woul

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Roland C. Dowdeswell
On Fri, Apr 18, 2014 at 08:23:11PM +0200, Markku-Juhani Olavi Saarinen wrote: > > Agreed. AES is worse if you don't have AES-NI. > > It has been there on all new systems purchased in some last 3 years, > so I would *guess* that it would be > 50% of systems fielded out > there. It hasn't been the

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 08:33:20PM +0200, Thomas Klausner wrote: > On Fri, Apr 18, 2014 at 01:39:18PM -0400, Thor Lancelot Simon wrote: > > How do you count to 9K? I see: > > > > 2K for p > > 2K for q > > 1280 bytes for w > > Are you talking about this w? > + uint32_t w[1280],

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 06:11:39PM +, Taylor R Campbell wrote: > > The majority of systems certainly don't have AES-NI. Only some recent > Intel CPUs do, and we can't use it in the kernel anyway. Right: plenty of systems accellerate AES, but in the wide world of systems that are not all x86

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thomas Klausner
On Fri, Apr 18, 2014 at 01:39:18PM -0400, Thor Lancelot Simon wrote: > How do you count to 9K? I see: > > 2K for p > 2K for q > 1280 bytes for w Are you talking about this w? + uint32_t w[1280], *p = state->p, *q = state->q; This looks like 1280x4 bytes to me. Thomas

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Markku-Juhani Olavi Saarinen
On Fri, Apr 18, 2014 at 8:11 PM, Taylor R Campbell wrote: >Date: Fri, 18 Apr 2014 19:58:06 +0200 >From: Markku-Juhani Olavi Saarinen > >If you want to get rid of RC4, use AES in CTR mode. It is standard, >compact, clean, and really fast solution. May sound boring, but gives >m

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Taylor R Campbell
Date: Fri, 18 Apr 2014 19:58:06 +0200 From: Markku-Juhani Olavi Saarinen If you want to get rid of RC4, use AES in CTR mode. It is standard, compact, clean, and really fast solution. May sound boring, but gives me a feel of solid security engineering. We use that for /dev/u?random

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Markku-Juhani Olavi Saarinen
Hi, If you want to get rid of RC4, use AES in CTR mode. It is standard, compact, clean, and really fast solution. May sound boring, but gives me a feel of solid security engineering. Note that majority of systems now have the AES-NI instructions which speed up AES implementations by an order of m

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Taylor R Campbell
Date: Fri, 18 Apr 2014 12:38:38 -0400 From: Thor Lancelot Simon 3) If the algorithm's use of state-dependent array indices presents a real weakness in practice, why aren't there any published results on this and why was it chosen as, and

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 10:27:45PM +0900, Izumi Tsutsui wrote: > > Note the caller of this hc128_init() is: > > > > +static void > > > +cprng_fast_randrekey(cprng_fast_ctx_t *ctx) > > > +{ > > > + uint8_t key[16], iv[16]; > > > + hc128_state_t tempstate; > > > + int s; > > > + > > > + int have_in

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 04:26:19PM +, Taylor R Campbell wrote: > > Closer inspection of HC-128 reveals that it uses secret-dependent > array indices[*], so for that reason alone I don't think we should > adopt it. It also has a very large state, which is going to hurt the > cache on big system

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 10:20:24PM +0900, Izumi Tsutsui wrote: > > LITTLE_ENDIAN != x86 > > This should simply be le32dec(9) otherwise > it will cause unaligned trap on arm and mips etc. I believe the input is -- though declared as uint8_t -- required to always be alinged (see the comment on the

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Taylor R Campbell
Closer inspection of HC-128 reveals that it uses secret-dependent array indices[*], so for that reason alone I don't think we should adopt it. It also has a very large state, which is going to hurt the cache on big systems and hurt the stack on little systems. So, could you please split your chan

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 10:27:45PM +0900, Izumi Tsutsui wrote: > > Note the caller of this hc128_init() is: > > I'm afraid "9KB stack on rekeying" is fatal on most ports. Well, the cipher should hardly ever get rekeyed. The rekeying intervals could be considerably larger; I did not want to incr

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Izumi Tsutsui
campbell+netbsd-tech-kern@ wrote: > > +void > > +hc128_init(hc128_state_t *state, const uint8_t *key, const uint8_t *iv) > > +{ > > + unsigned int i; > > + uint32_t w[1280], *p = state->p, *q = state->q; > > 5 KB on the stack is a lot! Granted, this is a leaf routine which in > our case will

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Izumi Tsutsui
tls@ wrote: > @@ -160,6 +160,7 @@ include "crypto/cast128/files.cast128" > --- /dev/null 1 Jan 1970 00:00:00 - > +++ crypto/hc128/hc128.c 17 Apr 2014 03:17:18 - : > +static inline uint32_t > +pack_littleendian(const uint8_t *v) > +{ > +#ifdef LITTLE_ENDIAN > + return *((const uin

Re: Patch: cprng_fast performance - please review.

2014-04-17 Thread Thor Lancelot Simon
Thank you for looking at this! On Thu, Apr 17, 2014 at 09:33:28PM +, Taylor R Campbell wrote: > > And the only performance constraint is that its single-threaded > performance should not be worse than the existing arc4random-based > cprng_fast. This, I don't agree with, unless we're going to

Re: Patch: cprng_fast performance - please review.

2014-04-17 Thread Taylor R Campbell
Repeating what I said privately for the public record: The changes to cprng.h expose a lot of guts of the new cprng_fast implementation. I don't see justification for this. As I understand it, the goals of rewriting cprng_fast are: 1. Replace the unsafe arc4random by a safer algorithm. 2. Impro

Re: Patch: cprng_fast performance - please review.

2014-04-17 Thread Thor Lancelot Simon
On Wed, Apr 16, 2014 at 09:52:22PM -0400, Thor Lancelot Simon wrote: > > Attached is a patch which makes cprng_fast per-CPU and lockless. *IT IS NOT > WELL TESTED YET (I haven't even run test vectors) AND IS ONLY FOR REVIEW.* New diff, with some missing files and incorporating some more comments

Re: Patch: cprng_fast performance - please review.

2014-04-16 Thread Thor Lancelot Simon
On Wed, Apr 16, 2014 at 09:52:22PM -0400, Thor Lancelot Simon wrote: > > Attached is a patch which makes cprng_fast per-CPU and lockless. *IT IS NOT > WELL TESTED YET (I haven't even run test vectors) AND IS ONLY FOR REVIEW.* > The diff is against the head of the tls-earlyentropy branch. Taylor

Patch: cprng_fast performance - please review.

2014-04-16 Thread Thor Lancelot Simon
On Wed, Apr 09, 2014 at 05:16:55PM +0100, Mindaugas Rasiukevicius wrote: > > Perhaps I missed it, but were the benchmark results published? Also, how > about cprng_fast32/64()? If the current mechanism is indeed faster, then > I am glad that you have made an improvement. To quote http://mail-in