Re: v4.6-rc1 regression bisected, Problem loading in-kernel X.509 certificate (-2)
On Wed, May 04, 2016 at 06:38:46AM -0700, Tadeusz Struk wrote: > Hi David > On 05/04/2016 02:01 AM, David Howells wrote: > > Do you want to push this via Herbert's tree? > > > > Yes, I think Herbert has some more patches queued for rc-7. > Let me also send a proper one with the signed-off tag. > > ---8<--- > Subject: crypto: rsa - select crypto mgr dependency > > The pkcs1pad template needs CRYPTO_MANAGER so it needs > to be explicitly selected by CRYPTO_RSA. > > Reported-by: Jamie Heilman > Signed-off-by: Tadeusz Struk Applied. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] crypto: Add a flag allowing the self-tests to be disabled at runtime.
On Tue, May 03, 2016 at 10:00:17AM +0100, Richard W.M. Jones wrote: > Running self-tests for a short-lived KVM VM takes 28ms on my laptop. > This commit adds a flag 'cryptomgr.notests' which allows them to be > disabled. > > However if fips=1 as well, we ignore this flag as FIPS mode mandates > that the self-tests are run. > > Signed-off-by: Richard W.M. Jones Applied. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: better patch for linux/bitops.h
On 05/04/16 21:03, Jeffrey Walton wrote: On Wed, May 4, 2016 at 11:50 PM, Theodore Ts'o wrote: ... But instead of arguing over what works and doesn't, let's just create the the test set and just try it on a wide range of compilers and architectures, hmmm? What are the requirements? Here's a short list: * No undefined behavior - important because the compiler writers use the C standard * Compiles to native "rotate IMMEDIATE" if the rotate amount is a "constant expression" and the machine provides it - translates to a native rotate instruction if available - "rotate IMM" can be 3 times faster than "rotate REG" - do any architectures *not* provide a rotate? * Compiles to native "rotate REGISTER" if the rotate is variable and the machine provides it - do any architectures *not* provide a rotate? * Constant time - important to high-integrity code - Non-security code paths probably don't care Maybe the first thing to do is provide a different rotates for the constant-time requirement when its in effect? The disagreement here is the priority between these points. In my very strong opinion, "no undefined behavior" per the C standard is way less important than the others; what matters is what gcc and the other compilers we care about do. The kernel relies on various versions of C-standard-undefined behavior *all over the place*; for one thing sizeof(void *) == sizeof(size_t) == sizeof(unsigned long)!! but they are well-defined in the subcontext we care about. (And no, not all architectures provide a rotate instruction.) -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3 v3] Key-agreement Protocol Primitives (KPP) API
On Tue, May 03, 2016 at 12:44:00PM +0100, Salvatore Benedetto wrote: > Hi Herb, > > the following patchset introduces a new API for abstracting key-agreement > protocols such as DH and ECDH. It provides the primitives required for > implementing > the protocol, thus the name KPP (Key-agreement Protocol Primitives). > > Regards, > Salvatore > > Changes from v1: > * Change check in dh_check_params_length based on Stephan review > > Changed from v2: > * Add support for ECDH (curve P192 and P256). I reused the ecc module >already present in net/bluetooth and extended it in order to select >different curves at runtime. Code for P192 was taken from tinycrypt. Are you going to post the patches to use this as well? Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: better patch for linux/bitops.h
On Wed, May 4, 2016 at 11:50 PM, Theodore Ts'o wrote: > ... > But instead of arguing over what works and doesn't, let's just create > the the test set and just try it on a wide range of compilers and > architectures, hmmm? What are the requirements? Here's a short list: * No undefined behavior - important because the compiler writers use the C standard * Compiles to native "rotate IMMEDIATE" if the rotate amount is a "constant expression" and the machine provides it - translates to a native rotate instruction if available - "rotate IMM" can be 3 times faster than "rotate REG" - do any architectures *not* provide a rotate? * Compiles to native "rotate REGISTER" if the rotate is variable and the machine provides it - do any architectures *not* provide a rotate? * Constant time - important to high-integrity code - Non-security code paths probably don't care Maybe the first thing to do is provide a different rotates for the constant-time requirement when its in effect? Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: better patch for linux/bitops.h
Instead of arguing over who's "sane" or "insane", can we come up with a agreed upon set of tests, and a set of compiler and compiler versions for which these tests must achieve at least *working* code? Bonus points if they achieve optimal code, but what's important is that for a wide range of GCC versions (from ancient RHEL distributions to bleeding edge gcc 5.x compilers) this *must* work. >From my perspective, clang and ICC producing corret code is a very nice to have, but most shops I know of don't yet assume that clang produces code that is trustworthy for production systems, although it *is* great for as far as generating compiler warnings to find potential bugs. But instead of arguing over what works and doesn't, let's just create the the test set and just try it on a wide range of compilers and architectures, hmmm? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: better patch for linux/bitops.h
>>> So you are actually saying outright that we should sacrifice *actual* >>portability in favor of *theoretical* portability? What kind of >>twilight zone did we just step into?! >> >>I'm not sure what you mean. It will be well defined on all platforms. >>Clang may not recognize the pattern, which means they could run >>slower. GCC and ICC will be fine. >> >>Slower but correct code is what you have to live with until the Clang >>dev's fix their compiler. >> >>Its kind of like what Dr. Jon Bentley said: "If it doesn't have to be >>correct, I can make it as fast as you'd like it to be". > > The current code works on all compilers we care about. The code you propose > does not; it doesn't work on anything but very recent versions of our > flagship target compiler, and pretty your own admission might even cause > security hazards in the kernel if compiled on clang. I'm not sure how you're arriving at the conclusion the code does not work. > That qualifies as insane in my book. OK, thanks. I see the kernel is providing IPSec, SSL/TLS, etc. You can make SSL/TLS run faster by using aNULL and eNULL. Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: better patch for linux/bitops.h
On May 4, 2016 7:54:12 PM PDT, Jeffrey Walton wrote: >On Wed, May 4, 2016 at 10:41 PM, H. Peter Anvin wrote: >> On May 4, 2016 6:35:44 PM PDT, Jeffrey Walton >wrote: >>>On Wed, May 4, 2016 at 5:52 PM, John Denker wrote: On 05/04/2016 02:42 PM, I wrote: > I find it very odd that the other seven functions were not > upgraded. I suggest the attached fix-others.diff would make > things more consistent. Here's a replacement patch. ... >>> >>>+1, commit it. >>> >>>Its good for three additional reasons. First, this change means the >>>kernel is teaching the next generation the correct way to do things. >>>Many developers aspire to be kernel hackers, and they sometimes use >>>the code from bitops.h. I've actually run across the code in >>>production during an audit where the developers cited bitops.h. >>> >>>Second, it preserves a "silent and dark" cockpit during analysis. >That >>>is, when analysis is run, no findings are generated. Auditors and >>>security folks like quiet tools, much like the way pilots like their >>>cockpits (flashing lights and sounding buzzers usually means >something >>>is wrong). >>> >>>Third, the pattern is recognized by the major compilers, so the >kernel >>>should not have any trouble when porting any of the compilers. I >often >>>use multiple compiler to tease out implementation defined behavior in >>>a effort to achieve greater portability. Here are the citations to >>>ensure the kernel is safe with the pattern: >>> >>> * GCC: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157 >>> * ICC: http://software.intel.com/en-us/forums/topic/580884 >>> >>>However, Clang may cause trouble because they don't want the >>>responsibility of recognizing the pattern: >>> >>> * https://llvm.org/bugs/show_bug.cgi?id=24226#c8 >>> >>>Instead, they provide a defective rotate. The "defect" here is its >>>non-constant time due to the branch, so it may not be suitable for >>>high-integrity or high-assurance code like linux-crypto: >>> >>> * https://llvm.org/bugs/show_bug.cgi?id=24226#c5 >>> >>>Jeff >> >> So you are actually saying outright that we should sacrifice *actual* >portability in favor of *theoretical* portability? What kind of >twilight zone did we just step into?! > >I'm not sure what you mean. It will be well defined on all platforms. >Clang may not recognize the pattern, which means they could run >slower. GCC and ICC will be fine. > >Slower but correct code is what you have to live with until the Clang >dev's fix their compiler. > >Its kind of like what Dr. Jon Bentley said: "If it doesn't have to be >correct, I can make it as fast as you'd like it to be". > >Jeff The current code works on all compilers we care about. The code you propose does not; it doesn't work on anything but very recent versions of our flagship target compiler, and pretty your own admission might even cause security hazards in the kernel if compiled on clang. That qualifies as insane in my book. The church code is de facto portable with the intended outcome, the one you propose is not. -- Sent from my Android device with K-9 Mail. Please excuse brevity and formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: better patch for linux/bitops.h
On Wed, May 4, 2016 at 10:41 PM, H. Peter Anvin wrote: > On May 4, 2016 6:35:44 PM PDT, Jeffrey Walton wrote: >>On Wed, May 4, 2016 at 5:52 PM, John Denker wrote: >>> On 05/04/2016 02:42 PM, I wrote: >>> I find it very odd that the other seven functions were not upgraded. I suggest the attached fix-others.diff would make things more consistent. >>> >>> Here's a replacement patch. >>> ... >> >>+1, commit it. >> >>Its good for three additional reasons. First, this change means the >>kernel is teaching the next generation the correct way to do things. >>Many developers aspire to be kernel hackers, and they sometimes use >>the code from bitops.h. I've actually run across the code in >>production during an audit where the developers cited bitops.h. >> >>Second, it preserves a "silent and dark" cockpit during analysis. That >>is, when analysis is run, no findings are generated. Auditors and >>security folks like quiet tools, much like the way pilots like their >>cockpits (flashing lights and sounding buzzers usually means something >>is wrong). >> >>Third, the pattern is recognized by the major compilers, so the kernel >>should not have any trouble when porting any of the compilers. I often >>use multiple compiler to tease out implementation defined behavior in >>a effort to achieve greater portability. Here are the citations to >>ensure the kernel is safe with the pattern: >> >> * GCC: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157 >> * ICC: http://software.intel.com/en-us/forums/topic/580884 >> >>However, Clang may cause trouble because they don't want the >>responsibility of recognizing the pattern: >> >> * https://llvm.org/bugs/show_bug.cgi?id=24226#c8 >> >>Instead, they provide a defective rotate. The "defect" here is its >>non-constant time due to the branch, so it may not be suitable for >>high-integrity or high-assurance code like linux-crypto: >> >> * https://llvm.org/bugs/show_bug.cgi?id=24226#c5 >> >>Jeff > > So you are actually saying outright that we should sacrifice *actual* > portability in favor of *theoretical* portability? What kind of twilight > zone did we just step into?! I'm not sure what you mean. It will be well defined on all platforms. Clang may not recognize the pattern, which means they could run slower. GCC and ICC will be fine. Slower but correct code is what you have to live with until the Clang dev's fix their compiler. Its kind of like what Dr. Jon Bentley said: "If it doesn't have to be correct, I can make it as fast as you'd like it to be". Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: better patch for linux/bitops.h
On May 4, 2016 6:35:44 PM PDT, Jeffrey Walton wrote: >On Wed, May 4, 2016 at 5:52 PM, John Denker wrote: >> On 05/04/2016 02:42 PM, I wrote: >> >>> I find it very odd that the other seven functions were not >>> upgraded. I suggest the attached fix-others.diff would make >>> things more consistent. >> >> Here's a replacement patch. >> ... > >+1, commit it. > >Its good for three additional reasons. First, this change means the >kernel is teaching the next generation the correct way to do things. >Many developers aspire to be kernel hackers, and they sometimes use >the code from bitops.h. I've actually run across the code in >production during an audit where the developers cited bitops.h. > >Second, it preserves a "silent and dark" cockpit during analysis. That >is, when analysis is run, no findings are generated. Auditors and >security folks like quiet tools, much like the way pilots like their >cockpits (flashing lights and sounding buzzers usually means something >is wrong). > >Third, the pattern is recognized by the major compilers, so the kernel >should not have any trouble when porting any of the compilers. I often >use multiple compiler to tease out implementation defined behavior in >a effort to achieve greater portability. Here are the citations to >ensure the kernel is safe with the pattern: > > * GCC: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157 > * ICC: http://software.intel.com/en-us/forums/topic/580884 > >However, Clang may cause trouble because they don't want the >responsibility of recognizing the pattern: > > * https://llvm.org/bugs/show_bug.cgi?id=24226#c8 > >Instead, they provide a defective rotate. The "defect" here is its >non-constant time due to the branch, so it may not be suitable for >high-integrity or high-assurance code like linux-crypto: > > * https://llvm.org/bugs/show_bug.cgi?id=24226#c5 > >Jeff So you are actually saying outright that we should sacrifice *actual* portability in favor of *theoretical* portability? What kind of twilight zone did we just step into?! -- Sent from my Android device with K-9 Mail. Please excuse brevity and formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: better patch for linux/bitops.h
On Wed, May 4, 2016 at 5:52 PM, John Denker wrote: > On 05/04/2016 02:42 PM, I wrote: > >> I find it very odd that the other seven functions were not >> upgraded. I suggest the attached fix-others.diff would make >> things more consistent. > > Here's a replacement patch. > ... +1, commit it. Its good for three additional reasons. First, this change means the kernel is teaching the next generation the correct way to do things. Many developers aspire to be kernel hackers, and they sometimes use the code from bitops.h. I've actually run across the code in production during an audit where the developers cited bitops.h. Second, it preserves a "silent and dark" cockpit during analysis. That is, when analysis is run, no findings are generated. Auditors and security folks like quiet tools, much like the way pilots like their cockpits (flashing lights and sounding buzzers usually means something is wrong). Third, the pattern is recognized by the major compilers, so the kernel should not have any trouble when porting any of the compilers. I often use multiple compiler to tease out implementation defined behavior in a effort to achieve greater portability. Here are the citations to ensure the kernel is safe with the pattern: * GCC: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157 * ICC: http://software.intel.com/en-us/forums/topic/580884 However, Clang may cause trouble because they don't want the responsibility of recognizing the pattern: * https://llvm.org/bugs/show_bug.cgi?id=24226#c8 Instead, they provide a defective rotate. The "defect" here is its non-constant time due to the branch, so it may not be suitable for high-integrity or high-assurance code like linux-crypto: * https://llvm.org/bugs/show_bug.cgi?id=24226#c5 Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux/bitops.h
On May 4, 2016 6:20:32 PM PDT, Jeffrey Walton wrote: >On Wed, May 4, 2016 at 7:06 PM, Andi Kleen wrote: >> On Wed, May 04, 2016 at 03:06:04PM -0700, John Denker wrote: >>> On 05/04/2016 02:56 PM, H. Peter Anvin wrote: >>> >> Beware that shifting by an amount >= the number of bits in the >>> >> word remains Undefined Behavior. >>> >>> > This construct has been supported as a rotate since at least gcc2. >>> >>> How then should we understand the story told in commit d7e35dfa? >>> Is the story wrong? >> >> I don't think Linux runs on a system where it would make a difference >> (like a VAX), and also gcc always converts it before it could. >> Even UBSan should not complain because it runs after the conversion >> to ROTATE. >> >From what I understand, its a limitation in the barrel shifter and the >way the shift bits are handled. > >Linux runs on a great number of devices, so its conceivable (likely?) >a low-cost board would have hardware limitations that not found in >modern desktops and servers or VAX. > >Jeff This is a compiler feature, though. And if icc or clang don't support the idiom they would never have compiled a working kernel. -- Sent from my Android device with K-9 Mail. Please excuse brevity and formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux/bitops.h
On Wed, May 4, 2016 at 7:06 PM, Andi Kleen wrote: > On Wed, May 04, 2016 at 03:06:04PM -0700, John Denker wrote: >> On 05/04/2016 02:56 PM, H. Peter Anvin wrote: >> >> Beware that shifting by an amount >= the number of bits in the >> >> word remains Undefined Behavior. >> >> > This construct has been supported as a rotate since at least gcc2. >> >> How then should we understand the story told in commit d7e35dfa? >> Is the story wrong? > > I don't think Linux runs on a system where it would make a difference > (like a VAX), and also gcc always converts it before it could. > Even UBSan should not complain because it runs after the conversion > to ROTATE. > >From what I understand, its a limitation in the barrel shifter and the way the shift bits are handled. Linux runs on a great number of devices, so its conceivable (likely?) a low-cost board would have hardware limitations that not found in modern desktops and servers or VAX. Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] random: replace non-blocking pool with a Chacha20-based CRNG
On Wed, May 04, 2016 at 10:28:24PM +0200, Stephan Mueller wrote: > > +out: > > + spin_unlock_irqrestore(&primary_crng.lock, flags); > > + return ret; > > Where did you add the memzero_explict of tmp? Oops, sorry, somehow that change got lost in the patch updates. Fixed now. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux/bitops.h
On Wed, May 4, 2016 at 5:30 PM, H. Peter Anvin wrote: > > Yes. d7e35dfa is baloney IMNSHO. All it does is produce worse code, and the > description even says so. > > As I said, gcc has treated the former code as idiomatic since gcc 2, so that > support is beyond ancient. Well, we're *trying* to get clang supported, so the argument that gcc has always supported it and compiles correct code for it is not necessarily the whole story yet. The problem with "32 - shift" is that it really could generate garbage in situations where shift ends up being a constant zero.. That said, the fact that the other cases weren't changed (rol64/ror64/ror32) does make that argument less interesting. Unless there was some particular code that actively ended up using "rol32(..0)" but not the other cases. (And yes, rol32(x,0) is obviously nonsense, but it could easily happen with inline functions or macros that end up generating that). Note that there may be 8 "rol/ror" functions, but the 16-bit and 8-bit ones don't have the undefined semantics. So there are only four that had _that_ problem, although I agree that changing just one out of four is wrong. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux/bitops.h
On 05/04/16 15:06, John Denker wrote: On 05/04/2016 02:56 PM, H. Peter Anvin wrote: Beware that shifting by an amount >= the number of bits in the word remains Undefined Behavior. This construct has been supported as a rotate since at least gcc2. How then should we understand the story told in commit d7e35dfa? Is the story wrong? At the very least, something inconsistent is going on. There are 8 functions. Why did d7e35dfa change one of them but not the other 7? Yes. d7e35dfa is baloney IMNSHO. All it does is produce worse code, and the description even says so. As I said, gcc has treated the former code as idiomatic since gcc 2, so that support is beyond ancient. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux/bitops.h
On 05/04/2016 04:06 PM, Andi Kleen wrote: > gcc always converts it before it could [make a difference]. At the moment, current versions of gcc treat the idiomatic ror/rol code as something they support ... but older versions do not, and future version may not. The gcc guys have made it very clear that they reserve the right to do absolutely anything they want in a UB situation. -- What is true as of today might not be "always" true. -- What is true at one level of optimization might not be true at another. -- The consequences can be highly nonlocal and counterintuitive. For example, in the case of: rslt = word << (32 - N); ... ... if (!N) { ... } the compiler could assume that N is necessarily nonzero, and many lines later it could optimize out the whole if-block. So, even if the "<<" operator gives the right result, there could be ghastly failures elsewhere. It might work for some people but not others. > So it's unlikely to be a pressing issue. Sometimes issues that are not urgently "pressing" ought to be dealt with in a systematic way. There are serious people who think that avoiding UB is a necessity, if you want the code to be reliable and maintainable. I renew the question: Why did commit d7e35dfa upgrade one of the 8 functions but not the other 7? -- I could understand 0 of 8, or 8 of 8. -- In contrast, I'm having a hard time understanding why 7 of the 8 use the idiomatic expression while the 8th does not. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux/bitops.h
On Wed, May 04, 2016 at 03:06:04PM -0700, John Denker wrote: > On 05/04/2016 02:56 PM, H. Peter Anvin wrote: > >> Beware that shifting by an amount >= the number of bits in the > >> word remains Undefined Behavior. > > > This construct has been supported as a rotate since at least gcc2. > > How then should we understand the story told in commit d7e35dfa? > Is the story wrong? I don't think Linux runs on a system where it would make a difference (like a VAX), and also gcc always converts it before it could. Even UBSan should not complain because it runs after the conversion to ROTATE. So it's unlikely to be a pressing issue. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] asm-generic/io.h: add io{read,write}64 accessors
On Wednesday 04 May 2016 20:16:19 Horia Geantă wrote: > @@ -625,6 +645,16 @@ static inline u32 ioread32be(const volatile void __iomem > *addr) > } > #endif > > +#ifdef CONFIG_64BIT > +#ifndef ioread64be > +#define ioread64be ioread64be > +static inline u64 ioread64be(const volatile void __iomem *addr) > +{ > + return __be64_to_cpu(__raw_readq(addr)); > +} > +#endif > +#endif /* CONFIG_64BIT */ > + > #ifndef iowrite16be > #define iowrite16be iowrite16be > static inline void iowrite16be(u16 value, void volatile __iomem *addr) > @@ -641,6 +671,16 @@ static inline void iowrite32be(u32 value, volatile void > __iomem *addr) > } > #endif > > +#ifdef CONFIG_64BIT > +#ifndef iowrite64be > +#define iowrite64be iowrite64be > +static inline void iowrite64be(u64 value, volatile void __iomem *addr) > +{ > + __raw_writeq(__cpu_to_be64(value), addr); > +} > +#endif > +#endif /* CONFIG_64BIT */ > + > I just noticed that these two are both a bit wrong, but they copy the mistake that already exists in the 16 and 32 bit versions: If an architecture overrides readq/writeq to have barriers but does not override ioread64be/iowrite64be, this will lack the barriers and behave differently from the little-endian version. I think the only affected architecture is ARC, since ARM and ARM64 both override the big-endian accessors to have the correct barriers, and all others don't use barriers at all. Maybe you can add a patch before this one to replace the 16/32-bit accessors with ones that do a static inline void iowrite32be(u32 value, volatile void __iomem *addr) { writel(swab32(value), addr); } This will lead to a double-swap on architectures that don't override it, but it will work correctly on all architectures without them having to override the big-endian accessors. Aside from that, the patch looks fine. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux/bitops.h
On 05/04/2016 02:56 PM, H. Peter Anvin wrote: >> Beware that shifting by an amount >= the number of bits in the >> word remains Undefined Behavior. > This construct has been supported as a rotate since at least gcc2. How then should we understand the story told in commit d7e35dfa? Is the story wrong? At the very least, something inconsistent is going on. There are 8 functions. Why did d7e35dfa change one of them but not the other 7? -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG
On May 4, 2016 2:42:53 PM PDT, John Denker wrote: >On 05/04/2016 12:07 PM, ty...@thunk.org wrote: > >> it doesn't hit the >> UB case which Jeffrey was concerned about. > >That should be good enough for present purposes > >However, in the interests of long-term maintainability, I >would suggest sticking in a comment or assertion saying >that ror32(,shift) is never called with shift=0. This >can be removed if/when bitops.h is upgraded. > >There is a track record of compilers doing Bad Things in >response to UB code, including some very counterintuitive >Bad Things. > >On Wed, May 04, 2016 at 11:29:57AM -0700, H. Peter Anvin wrote: >>> >>> If bitops.h doesn't do the right thing, we need to >>> fix bitops.h. > >Most of the ror and rol functions in linux/bitops.h >should be considered unsafe, as currently implemented. >http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/bitops.h?id=04974df8049fc4240d22759a91e035082ccd18b4#n103 > >I don't see anything in the file that suggests any limits >on the range of the second argument. So at best it is an >undocumented trap for the unwary. This has demonstrably >been a problem in the past. The explanation in the attached >fix-rol32.diff makes amusing reading. > >Of the eight functions > ror64, rol64, ror32, rol32, ror16, rol16, ror8, and rol8, >only one of them can handle shifting by zero, namely rol32. >It was upgraded on Thu Dec 3 22:04:01 2015; see the attached >fix-rol32.diff. > >I find it very odd that the other seven functions were not >upgraded. I suggest the attached fix-others.diff would make >things more consistent. > >Beware that shifting by an amount >= the number of bits in the >word remains Undefined Behavior. This should be either documented >or fixed. It could be fixed easily enough. This construct has been supported as a rotate since at least gcc2. -- Sent from my Android device with K-9 Mail. Please excuse brevity and formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: better patch for linux/bitops.h
On 05/04/2016 02:42 PM, I wrote: > I find it very odd that the other seven functions were not > upgraded. I suggest the attached fix-others.diff would make > things more consistent. Here's a replacement patch. Same idea, less brain damage. Sorry for the confusion. commit ba83b16d8430ee6104aa1feeed4ff7a82b02747a Author: John Denker Date: Wed May 4 13:55:51 2016 -0700 Make ror64, rol64, ror32, ror16, rol16, ror8, and rol8 consistent with rol32 in their handling of shifting by a zero amount. Same overall rationale as in d7e35dfa, just more consistently applied. Beware that shifting by an amount >= the number of bits in the word remains Undefined Behavior. This should be either documented or fixed. It could be fixed easily enough. diff --git a/include/linux/bitops.h b/include/linux/bitops.h index defeaac..90f389b 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -87,7 +87,7 @@ static __always_inline unsigned long hweight_long(unsigned long w) */ static inline __u64 rol64(__u64 word, unsigned int shift) { - return (word << shift) | (word >> (64 - shift)); + return (word << shift) | (word >> ((-shift) & 63)); } /** @@ -97,7 +97,7 @@ static inline __u64 rol64(__u64 word, unsigned int shift) */ static inline __u64 ror64(__u64 word, unsigned int shift) { - return (word >> shift) | (word << (64 - shift)); + return (word >> shift) | (word << ((-shift) & 63)); } /** @@ -117,7 +117,7 @@ static inline __u32 rol32(__u32 word, unsigned int shift) */ static inline __u32 ror32(__u32 word, unsigned int shift) { - return (word >> shift) | (word << (32 - shift)); + return (word >> shift) | (word << ((-shift) & 31)); } /** @@ -127,7 +127,7 @@ static inline __u32 ror32(__u32 word, unsigned int shift) */ static inline __u16 rol16(__u16 word, unsigned int shift) { - return (word << shift) | (word >> (16 - shift)); + return (word << shift) | (word >> ((-shift) & 15)); } /** @@ -137,7 +137,7 @@ static inline __u16 rol16(__u16 word, unsigned int shift) */ static inline __u16 ror16(__u16 word, unsigned int shift) { - return (word >> shift) | (word << (16 - shift)); + return (word >> shift) | (word << ((-shift) & 15)); } /** @@ -147,7 +147,7 @@ static inline __u16 ror16(__u16 word, unsigned int shift) */ static inline __u8 rol8(__u8 word, unsigned int shift) { - return (word << shift) | (word >> (8 - shift)); + return (word << shift) | (word >> ((-shift) & 7)); } /** @@ -157,7 +157,7 @@ static inline __u8 rol8(__u8 word, unsigned int shift) */ static inline __u8 ror8(__u8 word, unsigned int shift) { - return (word >> shift) | (word << (8 - shift)); + return (word >> shift) | (word << ((-shift) & 7)); } /**
Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG
On 05/04/2016 12:07 PM, ty...@thunk.org wrote: > it doesn't hit the > UB case which Jeffrey was concerned about. That should be good enough for present purposes However, in the interests of long-term maintainability, I would suggest sticking in a comment or assertion saying that ror32(,shift) is never called with shift=0. This can be removed if/when bitops.h is upgraded. There is a track record of compilers doing Bad Things in response to UB code, including some very counterintuitive Bad Things. On Wed, May 04, 2016 at 11:29:57AM -0700, H. Peter Anvin wrote: >> >> If bitops.h doesn't do the right thing, we need to >> fix bitops.h. Most of the ror and rol functions in linux/bitops.h should be considered unsafe, as currently implemented. http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/bitops.h?id=04974df8049fc4240d22759a91e035082ccd18b4#n103 I don't see anything in the file that suggests any limits on the range of the second argument. So at best it is an undocumented trap for the unwary. This has demonstrably been a problem in the past. The explanation in the attached fix-rol32.diff makes amusing reading. Of the eight functions ror64, rol64, ror32, rol32, ror16, rol16, ror8, and rol8, only one of them can handle shifting by zero, namely rol32. It was upgraded on Thu Dec 3 22:04:01 2015; see the attached fix-rol32.diff. I find it very odd that the other seven functions were not upgraded. I suggest the attached fix-others.diff would make things more consistent. Beware that shifting by an amount >= the number of bits in the word remains Undefined Behavior. This should be either documented or fixed. It could be fixed easily enough. commit d7e35dfa2531b53618b9e6edcd8752ce988ac555 Author: Sasha Levin Date: Thu Dec 3 22:04:01 2015 -0500 bitops.h: correctly handle rol32 with 0 byte shift ROL on a 32 bit integer with a shift of 32 or more is undefined and the result is arch-dependent. Avoid this by handling the trivial case of roling by 0 correctly. The trivial solution of checking if shift is 0 breaks gcc's detection of this code as a ROL instruction, which is unacceptable. This bug was reported and fixed in GCC (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157): The standard rotate idiom, (x << n) | (x >> (32 - n)) is recognized by gcc (for concreteness, I discuss only the case that x is an uint32_t here). However, this is portable C only for n in the range 0 < n < 32. For n == 0, we get x >> 32 which gives undefined behaviour according to the C standard (6.5.7, Bitwise shift operators). To portably support n == 0, one has to write the rotate as something like (x << n) | (x >> ((-n) & 31)) And this is apparently not recognized by gcc. Note that this is broken on older GCCs and will result in slower ROL. Acked-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Linus Torvalds diff --git a/include/linux/bitops.h b/include/linux/bitops.h index 2b8ed12..defeaac 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -107,7 +107,7 @@ static inline __u64 ror64(__u64 word, unsigned int shift) */ static inline __u32 rol32(__u32 word, unsigned int shift) { - return (word << shift) | (word >> (32 - shift)); + return (word << shift) | (word >> ((-shift) & 31)); } /** commit 03b97eeb5401ede1e1d7b6fbf6a9575db8d0efa6 Author: John Denker Date: Wed May 4 13:55:51 2016 -0700 Make ror64, rol64, ror32, ror16, rol16, ror8, and rol8 consistent with rol32 in their handling of shifting by a zero amount. Same overall rationale as in d7e35dfa, just more consistently applied. Beware that shifting by an amount >= the number of bits in the word remains Undefined Behavior. This should be either documented or fixed. It could be fixed easily enough. diff --git a/include/linux/bitops.h b/include/linux/bitops.h index defeaac..97096b4 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -87,7 +87,7 @@ static __always_inline unsigned long hweight_long(unsigned long w) */ static inline __u64 rol64(__u64 word, unsigned int shift) { - return (word << shift) | (word >> (64 - shift)); + return (word << shift) | (word >> ((-shift) & 64)); } /** @@ -97,7 +97,7 @@ static inline __u64 rol64(__u64 word, unsigned int shift) */ static inline __u64 ror64(__u64 word, unsigned int shift) { - return (word >> shift) | (word << (64 - shift)); + return (word >> shift) | (word << ((-shift) & 64)); } /** @@ -117,7 +117,7 @@ static inline __u32 rol32(__u32 word, unsigned int shift) */ static inline __u32 ror32(__u32 word, unsigned int shift) { - return (word >> shift) | (word << (32 - shift)); + return (word >> shift) | (word << ((-shift) & 31)); } /** @@ -127,7 +127,7 @@ static inline __u32 ror32(__u32 word, unsigned int shift) */ sta
Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG
On 05/04/2016 12:07 PM, ty...@thunk.org wrote: > > If we are all agreed that what is in bitops.h is considered valid, > then we can start converting people over to using the version defined > in bitops.h, and if there is some compiler issue we need to work > around, at least we only need to put the workaround in a single header > file. > Yes, please. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [crypto / sparc64] cryptomgr_test OOPS
From: Anatoly Pugachev Date: Wed, 4 May 2016 22:10:47 +0300 > Here's a quick diff related to crypto for debian kernel configs: > > $ diff -u /boot/config-4.5.0-2-sparc64-smp /boot/config-4.6.0-rc5-sparc64-smp > @@ -5299,10 +5380,9 @@ > CONFIG_CRYPTO_RNG=m > CONFIG_CRYPTO_RNG2=y > CONFIG_CRYPTO_RNG_DEFAULT=m > -CONFIG_CRYPTO_PCOMP=m > -CONFIG_CRYPTO_PCOMP2=y > CONFIG_CRYPTO_AKCIPHER2=y > -# CONFIG_CRYPTO_RSA is not set > +CONFIG_CRYPTO_AKCIPHER=y > +CONFIG_CRYPTO_RSA=y > CONFIG_CRYPTO_MANAGER=y > CONFIG_CRYPTO_MANAGER2=y > # CONFIG_CRYPTO_USER is not set > > so, I need to compile 4.5.x kernel (or even older kernels) with > CRYPTO_RSA=y and test how it would act. Well yes, that would be an absolutely critical datapoint. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] random: replace non-blocking pool with a Chacha20-based CRNG
Am Mittwoch, 4. Mai 2016, 15:25:48 schrieb Theodore Ts'o: Hi Theodore, > The CRNG is faster, and we don't pretend to track entropy usage in the > CRNG any more. > > Signed-off-by: Theodore Ts'o > --- > crypto/chacha20_generic.c | 61 -- > drivers/char/random.c | 283 > +++--- include/crypto/chacha20.h | > 1 + > lib/Makefile | 2 +- > lib/chacha20.c| 79 + > 5 files changed, 295 insertions(+), 131 deletions(-) > create mode 100644 lib/chacha20.c > > diff --git a/crypto/chacha20_generic.c b/crypto/chacha20_generic.c > index da9c899..1cab831 100644 > --- a/crypto/chacha20_generic.c > +++ b/crypto/chacha20_generic.c > @@ -15,72 +15,11 @@ > #include > #include > > -static inline u32 rotl32(u32 v, u8 n) > -{ > - return (v << n) | (v >> (sizeof(v) * 8 - n)); > -} > - > static inline u32 le32_to_cpuvp(const void *p) > { > return le32_to_cpup(p); > } > > -static void chacha20_block(u32 *state, void *stream) > -{ > - u32 x[16], *out = stream; > - int i; > - > - for (i = 0; i < ARRAY_SIZE(x); i++) > - x[i] = state[i]; > - > - for (i = 0; i < 20; i += 2) { > - x[0] += x[4];x[12] = rotl32(x[12] ^ x[0], 16); > - x[1] += x[5];x[13] = rotl32(x[13] ^ x[1], 16); > - x[2] += x[6];x[14] = rotl32(x[14] ^ x[2], 16); > - x[3] += x[7];x[15] = rotl32(x[15] ^ x[3], 16); > - > - x[8] += x[12]; x[4] = rotl32(x[4] ^ x[8], 12); > - x[9] += x[13]; x[5] = rotl32(x[5] ^ x[9], 12); > - x[10] += x[14]; x[6] = rotl32(x[6] ^ x[10], 12); > - x[11] += x[15]; x[7] = rotl32(x[7] ^ x[11], 12); > - > - x[0] += x[4];x[12] = rotl32(x[12] ^ x[0], 8); > - x[1] += x[5];x[13] = rotl32(x[13] ^ x[1], 8); > - x[2] += x[6];x[14] = rotl32(x[14] ^ x[2], 8); > - x[3] += x[7];x[15] = rotl32(x[15] ^ x[3], 8); > - > - x[8] += x[12]; x[4] = rotl32(x[4] ^ x[8], 7); > - x[9] += x[13]; x[5] = rotl32(x[5] ^ x[9], 7); > - x[10] += x[14]; x[6] = rotl32(x[6] ^ x[10], 7); > - x[11] += x[15]; x[7] = rotl32(x[7] ^ x[11], 7); > - > - x[0] += x[5];x[15] = rotl32(x[15] ^ x[0], 16); > - x[1] += x[6];x[12] = rotl32(x[12] ^ x[1], 16); > - x[2] += x[7];x[13] = rotl32(x[13] ^ x[2], 16); > - x[3] += x[4];x[14] = rotl32(x[14] ^ x[3], 16); > - > - x[10] += x[15]; x[5] = rotl32(x[5] ^ x[10], 12); > - x[11] += x[12]; x[6] = rotl32(x[6] ^ x[11], 12); > - x[8] += x[13]; x[7] = rotl32(x[7] ^ x[8], 12); > - x[9] += x[14]; x[4] = rotl32(x[4] ^ x[9], 12); > - > - x[0] += x[5];x[15] = rotl32(x[15] ^ x[0], 8); > - x[1] += x[6];x[12] = rotl32(x[12] ^ x[1], 8); > - x[2] += x[7];x[13] = rotl32(x[13] ^ x[2], 8); > - x[3] += x[4];x[14] = rotl32(x[14] ^ x[3], 8); > - > - x[10] += x[15]; x[5] = rotl32(x[5] ^ x[10], 7); > - x[11] += x[12]; x[6] = rotl32(x[6] ^ x[11], 7); > - x[8] += x[13]; x[7] = rotl32(x[7] ^ x[8], 7); > - x[9] += x[14]; x[4] = rotl32(x[4] ^ x[9], 7); > - } > - > - for (i = 0; i < ARRAY_SIZE(x); i++) > - out[i] = cpu_to_le32(x[i] + state[i]); > - > - state[12]++; > -} > - > static void chacha20_docrypt(u32 *state, u8 *dst, const u8 *src, >unsigned int bytes) > { > diff --git a/drivers/char/random.c b/drivers/char/random.c > index b583e53..91d5c2a 100644 > --- a/drivers/char/random.c > +++ b/drivers/char/random.c > @@ -260,6 +260,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -412,6 +413,18 @@ static struct fasync_struct *fasync; > static DEFINE_SPINLOCK(random_ready_list_lock); > static LIST_HEAD(random_ready_list); > > +/* > + * crng_init = 0 --> Uninitialized > + * 2 --> Initialized > + * 3 --> Initialized from input_pool > + * > + * crng_init is protected by primary_crng->lock, and only increases > + * its value (from 0->1->2->3). > + */ > +static int crng_init = 0; > +#define crng_ready() (likely(crng_init >= 2)) > +static void process_random_ready_list(void); > + > /** > * > * OS independent entropy store. Here are the functions which handle > @@ -441,10 +454,13 @@ struct entropy_store { > __u8 last_data[EXTRACT_SIZE]; > }; > > +static ssize_t extract_entropy(struct entropy_store *r, void *buf, > +size_t nbytes, int min, int rsvd); > + > +static int crng_reseed(struct entropy_store *r); > static void push_to_pool(struct work_struct *work);
Re: [crypto / sparc64] cryptomgr_test OOPS
Hi Anatoly, On 05/04/2016 12:10 PM, Anatoly Pugachev wrote: > we're using 4.5.2 debian kernel here without this problem. I'm not > sure I would be able to bisect, never did it before, but I could > try... On 4.5.2 could you try "modprobe rsa" -- TS -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] random: add interrupt callback to VMBus IRQ handler
From: Stephan Mueller The Hyper-V Linux Integration Services use the VMBus implementation for communication with the Hypervisor. VMBus registers its own interrupt handler that completely bypasses the common Linux interrupt handling. This implies that the interrupt entropy collector is not triggered. This patch adds the interrupt entropy collection callback into the VMBus interrupt handler function. Signed-off-by: Stephan Mueller Signed-off-by: Stephan Mueller Signed-off-by: Theodore Ts'o --- drivers/char/random.c | 1 + drivers/hv/vmbus_drv.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/drivers/char/random.c b/drivers/char/random.c index b970db6..897c75e 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1134,6 +1134,7 @@ void add_interrupt_randomness(int irq, int irq_flags) /* award one bit for the contents of the fast pool */ credit_entropy_bits(r, credit + 1); } +EXPORT_SYMBOL_GPL(add_interrupt_randomness); #ifdef CONFIG_BLOCK void add_disk_randomness(struct gendisk *disk) diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 64713ff..9af61bb 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -41,6 +41,7 @@ #include #include #include +#include #include "hyperv_vmbus.h" static struct acpi_device *hv_acpi_dev; @@ -801,6 +802,8 @@ static void vmbus_isr(void) else tasklet_schedule(hv_context.msg_dpc[cpu]); } + + add_interrupt_randomness(HYPERVISOR_CALLBACK_VECTOR, 0); } -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] random: make /dev/urandom scalable for silly userspace programs
On a system with a 4 socket (NUMA) system where a large number of application threads were all trying to read from /dev/urandom, this can result in the system spending 80% of its time contending on the global urandom spinlock. The application should have used its own PRNG, but let's try to help it from running, lemming-like, straight over the locking cliff. Reported-by: Andi Kleen Signed-off-by: Theodore Ts'o --- drivers/char/random.c | 67 +++ 1 file changed, 62 insertions(+), 5 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 91d5c2a..b970db6 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -749,6 +749,17 @@ struct crng_state primary_crng = { }; static DECLARE_WAIT_QUEUE_HEAD(crng_init_wait); +#ifdef CONFIG_NUMA +/* + * Hack to deal with crazy userspace progams when they are all trying + * to access /dev/urandom in parallel. The programs are almost + * certainly doing something terribly wrong, but we'll work around + * their brain damage. + */ +static struct crng_state **crng_node_pool __read_mostly; +#endif + + static void _initialize_crng(struct crng_state *crng) { int i; @@ -764,11 +775,13 @@ static void _initialize_crng(struct crng_state *crng) crng->init_time = jiffies - CRNG_RESEED_INTERVAL; } +#ifdef CONFIG_NUMA static void initialize_crng(struct crng_state *crng) { _initialize_crng(crng); spin_lock_init(&crng->lock); } +#endif static int crng_fast_load(__u32 pool[4]) { @@ -823,19 +836,23 @@ out: return ret; } +static inline void maybe_reseed_primary_crng(void) +{ + if (crng_init > 2 && + time_after(jiffies, primary_crng.init_time + CRNG_RESEED_INTERVAL)) + crng_reseed(&input_pool); +} + static inline void crng_wait_ready(void) { wait_event_interruptible(crng_init_wait, crng_ready()); } -static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE]) +static void _extract_crng(struct crng_state *crng, + __u8 out[CHACHA20_BLOCK_SIZE]) { unsigned long v, flags; - struct crng_state *crng = &primary_crng; - if (crng_init > 2 && - time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL)) - crng_reseed(&input_pool); spin_lock_irqsave(&crng->lock, flags); if (arch_get_random_long(&v)) crng->state[14] ^= v; @@ -845,6 +862,30 @@ static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE]) spin_unlock_irqrestore(&crng->lock, flags); } +static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE]) +{ +#ifndef CONFIG_NUMA + maybe_reseed_primary_crng(); + _extract_crng(&primary_crng, out); +#else + int node_id = numa_node_id(); + struct crng_state *crng = crng_node_pool[node_id]; + + if (time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL)) { + unsigned long flags; + + maybe_reseed_primary_crng(); + _extract_crng(&primary_crng, out); + spin_lock_irqsave(&crng->lock, flags); + memcpy(&crng->state[4], out, CHACHA20_KEY_SIZE); + crng->state[15] = node_id; + crng->init_time = jiffies; + spin_unlock_irqrestore(&crng->lock, flags); + } + _extract_crng(crng, out); +#endif +} + static ssize_t extract_crng_user(void __user *buf, size_t nbytes) { ssize_t ret = 0, i; @@ -1549,6 +1590,22 @@ static void init_std_data(struct entropy_store *r) */ static int rand_initialize(void) { +#ifdef CONFIG_NUMA + int i; + int num_nodes = num_possible_nodes(); + struct crng_state *crng; + + crng_node_pool = kmalloc(num_nodes * sizeof(void *), +GFP_KERNEL|__GFP_NOFAIL); + + for (i=0; i < num_nodes; i++) { + crng = kmalloc(sizeof(struct crng_state), + GFP_KERNEL | __GFP_NOFAIL); + initialize_crng(crng); + crng_node_pool[i] = crng; + + } +#endif init_std_data(&input_pool); init_std_data(&blocking_pool); _initialize_crng(&primary_crng); -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] random: add backtracking protection to the CRNG
Signed-off-by: Theodore Ts'o --- drivers/char/random.c | 35 +-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 897c75e..028d085 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -886,6 +886,34 @@ static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE]) #endif } +/* + * Use the leftover bytes from the CRNG block output (if there is + * enough) to mutate the CRNG key to provide backtracking protection. + */ +static void crng_backtrack_protect(__u8 tmp[CHACHA20_BLOCK_SIZE], int unused) +{ +#ifdef CONFIG_NUMA + struct crng_state *crng = crng_node_pool[numa_node_id()]; +#else + struct crng_state *crng = &primary_crng; +#endif + unsigned long flags; + __u32 *s, *d; + int i; + + unused = round_up(unused, sizeof(__u32)); + if (unused + CHACHA20_KEY_SIZE > CHACHA20_BLOCK_SIZE) { + extract_crng(tmp); + unused = 0; + } + spin_lock_irqsave(&crng->lock, flags); + s = (__u32 *) &tmp[unused]; + d = &crng->state[4]; + for (i=0; i < 8; i++) + *d++ ^= *s++; + spin_unlock_irqrestore(&crng->lock, flags); +} + static ssize_t extract_crng_user(void __user *buf, size_t nbytes) { ssize_t ret = 0, i; @@ -913,6 +941,7 @@ static ssize_t extract_crng_user(void __user *buf, size_t nbytes) buf += i; ret += i; } + crng_backtrack_protect(tmp, i); /* Wipe data just written to memory */ memzero_explicit(tmp, sizeof(tmp)); @@ -1457,8 +1486,10 @@ void get_random_bytes(void *buf, int nbytes) if (nbytes > 0) { extract_crng(tmp); memcpy(buf, tmp, nbytes); - memzero_explicit(tmp, nbytes); - } + crng_backtrack_protect(tmp, nbytes); + } else + crng_backtrack_protect(tmp, 0); + memzero_explicit(tmp, sizeof(tmp)); } EXPORT_SYMBOL(get_random_bytes); -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] random: replace non-blocking pool with a Chacha20-based CRNG
The CRNG is faster, and we don't pretend to track entropy usage in the CRNG any more. Signed-off-by: Theodore Ts'o --- crypto/chacha20_generic.c | 61 -- drivers/char/random.c | 283 +++--- include/crypto/chacha20.h | 1 + lib/Makefile | 2 +- lib/chacha20.c| 79 + 5 files changed, 295 insertions(+), 131 deletions(-) create mode 100644 lib/chacha20.c diff --git a/crypto/chacha20_generic.c b/crypto/chacha20_generic.c index da9c899..1cab831 100644 --- a/crypto/chacha20_generic.c +++ b/crypto/chacha20_generic.c @@ -15,72 +15,11 @@ #include #include -static inline u32 rotl32(u32 v, u8 n) -{ - return (v << n) | (v >> (sizeof(v) * 8 - n)); -} - static inline u32 le32_to_cpuvp(const void *p) { return le32_to_cpup(p); } -static void chacha20_block(u32 *state, void *stream) -{ - u32 x[16], *out = stream; - int i; - - for (i = 0; i < ARRAY_SIZE(x); i++) - x[i] = state[i]; - - for (i = 0; i < 20; i += 2) { - x[0] += x[4];x[12] = rotl32(x[12] ^ x[0], 16); - x[1] += x[5];x[13] = rotl32(x[13] ^ x[1], 16); - x[2] += x[6];x[14] = rotl32(x[14] ^ x[2], 16); - x[3] += x[7];x[15] = rotl32(x[15] ^ x[3], 16); - - x[8] += x[12]; x[4] = rotl32(x[4] ^ x[8], 12); - x[9] += x[13]; x[5] = rotl32(x[5] ^ x[9], 12); - x[10] += x[14]; x[6] = rotl32(x[6] ^ x[10], 12); - x[11] += x[15]; x[7] = rotl32(x[7] ^ x[11], 12); - - x[0] += x[4];x[12] = rotl32(x[12] ^ x[0], 8); - x[1] += x[5];x[13] = rotl32(x[13] ^ x[1], 8); - x[2] += x[6];x[14] = rotl32(x[14] ^ x[2], 8); - x[3] += x[7];x[15] = rotl32(x[15] ^ x[3], 8); - - x[8] += x[12]; x[4] = rotl32(x[4] ^ x[8], 7); - x[9] += x[13]; x[5] = rotl32(x[5] ^ x[9], 7); - x[10] += x[14]; x[6] = rotl32(x[6] ^ x[10], 7); - x[11] += x[15]; x[7] = rotl32(x[7] ^ x[11], 7); - - x[0] += x[5];x[15] = rotl32(x[15] ^ x[0], 16); - x[1] += x[6];x[12] = rotl32(x[12] ^ x[1], 16); - x[2] += x[7];x[13] = rotl32(x[13] ^ x[2], 16); - x[3] += x[4];x[14] = rotl32(x[14] ^ x[3], 16); - - x[10] += x[15]; x[5] = rotl32(x[5] ^ x[10], 12); - x[11] += x[12]; x[6] = rotl32(x[6] ^ x[11], 12); - x[8] += x[13]; x[7] = rotl32(x[7] ^ x[8], 12); - x[9] += x[14]; x[4] = rotl32(x[4] ^ x[9], 12); - - x[0] += x[5];x[15] = rotl32(x[15] ^ x[0], 8); - x[1] += x[6];x[12] = rotl32(x[12] ^ x[1], 8); - x[2] += x[7];x[13] = rotl32(x[13] ^ x[2], 8); - x[3] += x[4];x[14] = rotl32(x[14] ^ x[3], 8); - - x[10] += x[15]; x[5] = rotl32(x[5] ^ x[10], 7); - x[11] += x[12]; x[6] = rotl32(x[6] ^ x[11], 7); - x[8] += x[13]; x[7] = rotl32(x[7] ^ x[8], 7); - x[9] += x[14]; x[4] = rotl32(x[4] ^ x[9], 7); - } - - for (i = 0; i < ARRAY_SIZE(x); i++) - out[i] = cpu_to_le32(x[i] + state[i]); - - state[12]++; -} - static void chacha20_docrypt(u32 *state, u8 *dst, const u8 *src, unsigned int bytes) { diff --git a/drivers/char/random.c b/drivers/char/random.c index b583e53..91d5c2a 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -260,6 +260,7 @@ #include #include #include +#include #include #include @@ -412,6 +413,18 @@ static struct fasync_struct *fasync; static DEFINE_SPINLOCK(random_ready_list_lock); static LIST_HEAD(random_ready_list); +/* + * crng_init = 0 --> Uninitialized + * 2 --> Initialized + * 3 --> Initialized from input_pool + * + * crng_init is protected by primary_crng->lock, and only increases + * its value (from 0->1->2->3). + */ +static int crng_init = 0; +#define crng_ready() (likely(crng_init >= 2)) +static void process_random_ready_list(void); + /** * * OS independent entropy store. Here are the functions which handle @@ -441,10 +454,13 @@ struct entropy_store { __u8 last_data[EXTRACT_SIZE]; }; +static ssize_t extract_entropy(struct entropy_store *r, void *buf, + size_t nbytes, int min, int rsvd); + +static int crng_reseed(struct entropy_store *r); static void push_to_pool(struct work_struct *work); static __u32 input_pool_data[INPUT_POOL_WORDS]; static __u32 blocking_pool_data[OUTPUT_POOL_WORDS]; -static __u32 nonblocking_pool_data[OUTPUT_POOL_WORDS]; static struct entropy_store input_pool = { .poolinfo = &poolinfo
[PATCH -v2 0/4] random: replace urandom pool with a CRNG
By using a CRNG to replace the urandom pool, we address a number of complaints which Stephan Mueller has been concerned about. We now use a much more aggressive interrupt sampling system to quickly initialize a CRNG which gets used in place of the original non-blocking pool. This tends to get initialized *very* quickly (before the devices are finished being proved.) Like Stephan's proposal, this assumes that we can get a bit of entropy per interrupt, which may be problematic on some architectures. So after we do this quick-and-dirty initialization, we then fall back to the slower, more conservative interrupt sampling system to fill the input pool, and we will do a catastrophic reseeding once we get 128 bits using the slower but more conservative system, and every five minutes afterwards, if possible. In addition, on NUMA systems we make the CRNG state per-NUMA socket, to address the NUMA locking contention problem which Andi Kleen has been complaining about. I'm not entirely sure this will work on the crazy big SGI systems, but they are rare. Whether they are rarer than abusive userspace programs that are continuously pounding /dev/urandom is unclear. If necessary we can make a config option to turn off the per-NUMA socket hack if it proves to be problematic. Stephan Mueller (1): random: add interrupt callback to VMBus IRQ handler Theodore Ts'o (3): random: replace non-blocking pool with a Chacha20-based CRNG random: make /dev/urandom scalable for silly userspace programs random: add backtracking protection to the CRNG crypto/chacha20_generic.c | 61 drivers/char/random.c | 372 +- drivers/hv/vmbus_drv.c| 3 + include/crypto/chacha20.h | 1 + lib/Makefile | 2 +- lib/chacha20.c| 79 ++ 6 files changed, 387 insertions(+), 131 deletions(-) create mode 100644 lib/chacha20.c -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [crypto / sparc64] cryptomgr_test OOPS
On Tue, May 3, 2016 at 7:33 PM, David Miller wrote: > From: Anatoly Pugachev > Date: Tue, 3 May 2016 16:54:18 +0300 > >> I have git kernel OOPS (4.6.0-rc6) on sparc64. This OOPS does not >> happen, if I set the following kernel option: >> >> CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y >> >> Can someone please look at >> https://bugzilla.kernel.org/show_bug.cgi?id=117551 ? > > The lib/mpi/ code hasn't been touched in a long time, so I can only see that > it might > be the crypto/rsa.c changes that happened this release. > > Can you possibly bisect this or at least tell us that 4.5 doesn't have this > problem? David, we're using 4.5.2 debian kernel here without this problem. I'm not sure I would be able to bisect, never did it before, but I could try... Here's a quick diff related to crypto for debian kernel configs: $ diff -u /boot/config-4.5.0-2-sparc64-smp /boot/config-4.6.0-rc5-sparc64-smp @@ -5299,10 +5380,9 @@ CONFIG_CRYPTO_RNG=m CONFIG_CRYPTO_RNG2=y CONFIG_CRYPTO_RNG_DEFAULT=m -CONFIG_CRYPTO_PCOMP=m -CONFIG_CRYPTO_PCOMP2=y CONFIG_CRYPTO_AKCIPHER2=y -# CONFIG_CRYPTO_RSA is not set +CONFIG_CRYPTO_AKCIPHER=y +CONFIG_CRYPTO_RSA=y CONFIG_CRYPTO_MANAGER=y CONFIG_CRYPTO_MANAGER2=y # CONFIG_CRYPTO_USER is not set so, I need to compile 4.5.x kernel (or even older kernels) with CRYPTO_RSA=y and test how it would act. And if it would not crash, I could try to bisect. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG
On Wed, May 04, 2016 at 11:29:57AM -0700, H. Peter Anvin wrote: > > We don't care about UB, we care about gcc, and to a lesser extent > LLVM and ICC. If bitops.h doesn't do the right thing, we need to > fix bitops.h. I'm going to suggest that we treat the ro[rl]{32,64}() question as separable from the /dev/random case. I've sanity checked gcc 5.3.1 and it does the right thing given the small number of constant arguments given to rotl32() in lib/chacha20.c, and it doesn't hit the UB case which Jeffrey was concerned about. This is also code that was previously in crypto/chacha20_generic.c, and so if there is a bug with some obscure compiler not properly compiling it down to a rotate instruction, (a) no one is paying me to make sure the kernel code compiles well on ICC, and (b) it's not scalable to have each kernel developer try to deal with the vagrancies of compilers. So from my perspective, the only interesting results for me is (a) using what had been used before with crypto/chacha20_generic.c, or (b) reusing what is in bitops.h and letting it be someone else's problem if some obscure compiler isn't happy with what is in bitops.h If we are all agreed that what is in bitops.h is considered valid, then we can start converting people over to using the version defined in bitops.h, and if there is some compiler issue we need to work around, at least we only need to put the workaround in a single header file. Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG
On May 4, 2016 11:22:25 AM PDT, Jeffrey Walton wrote: >On Wed, May 4, 2016 at 1:49 PM, wrote: >> On Wed, May 04, 2016 at 10:40:20AM -0400, Jeffrey Walton wrote: >>> > +static inline u32 rotl32(u32 v, u8 n) >>> > +{ >>> > + return (v << n) | (v >> (sizeof(v) * 8 - n)); >>> > +} >>> >>> That's undefined behavior when n=0. >> >> Sure, but it's never called with n = 0; I've double checked and the >> compiler seems to do the right thing with the above pattern as well. > >> Hmm, it looks like there is a "standard" version rotate left and >right >> defined in include/linux/bitops.h. So I suspect it would make sense >> to use rol32 as defined in bitops.h --- and this is probably >something > >bitops.h could work in this case, but its not an ideal solution. GCC >does not optimize the code below as expected under all use cases >because GCC does not recognize it as a rotate (see >http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157): > >return (v << n) | (v >> (sizeof(v) * 8 - n)); > >And outside of special cases like Salsa, ChaCha and BLAKE2, the code >provided in bitops.h suffers UB on arbitrary data. So I think care >needs to be taken when selecting functions from bitops.h. > >> that we should do for the rest of crypto/*.c, where people seem to be >> defininig their own version of something like rotl32 (I copied the >> contents of crypto/chacha20_generic.c to lib/chacha20, so this >pattern >> of defining one's own version of rol32 isn't new). > >Yeah, I kind of thought there was some duplication going on. > >But I think bitops.h should be fixed. Many folks don't realize the >lurking UB, and many folks don't realize its not always optimized >well. > >>> I think the portable way to do a rotate that avoids UB is the >>> following. GCC, Clang and ICC recognize the pattern, and emit a >rotate >>> instruction. >>> >>> static const unsigned int MASK=31; >>> return (v<>(-n&MASK)); >>> >>> You should also avoid the following because its not constant time >due >>> to the branch: >>> >>> return n == 0 ? v : (v << n) | (v >> (sizeof(v) * 8 - n)); >>> >> >> Where is this coming from? I don't see this construct in the patch. > >My bad... It was a general observation. I've seen folks try to correct >the UB by turning to something like that. > >Jeff We don't care about UB, we care about gcc, and to a lesser extent LLVM and ICC. If bitops.h doesn't do the right thing, we need to fix bitops.h. -- Sent from my Android device with K-9 Mail. Please excuse brevity and formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG
On Wed, May 4, 2016 at 1:49 PM, wrote: > On Wed, May 04, 2016 at 10:40:20AM -0400, Jeffrey Walton wrote: >> > +static inline u32 rotl32(u32 v, u8 n) >> > +{ >> > + return (v << n) | (v >> (sizeof(v) * 8 - n)); >> > +} >> >> That's undefined behavior when n=0. > > Sure, but it's never called with n = 0; I've double checked and the > compiler seems to do the right thing with the above pattern as well. > Hmm, it looks like there is a "standard" version rotate left and right > defined in include/linux/bitops.h. So I suspect it would make sense > to use rol32 as defined in bitops.h --- and this is probably something bitops.h could work in this case, but its not an ideal solution. GCC does not optimize the code below as expected under all use cases because GCC does not recognize it as a rotate (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157): return (v << n) | (v >> (sizeof(v) * 8 - n)); And outside of special cases like Salsa, ChaCha and BLAKE2, the code provided in bitops.h suffers UB on arbitrary data. So I think care needs to be taken when selecting functions from bitops.h. > that we should do for the rest of crypto/*.c, where people seem to be > defininig their own version of something like rotl32 (I copied the > contents of crypto/chacha20_generic.c to lib/chacha20, so this pattern > of defining one's own version of rol32 isn't new). Yeah, I kind of thought there was some duplication going on. But I think bitops.h should be fixed. Many folks don't realize the lurking UB, and many folks don't realize its not always optimized well. >> I think the portable way to do a rotate that avoids UB is the >> following. GCC, Clang and ICC recognize the pattern, and emit a rotate >> instruction. >> >> static const unsigned int MASK=31; >> return (v<>(-n&MASK)); >> >> You should also avoid the following because its not constant time due >> to the branch: >> >> return n == 0 ? v : (v << n) | (v >> (sizeof(v) * 8 - n)); >> > > Where is this coming from? I don't see this construct in the patch. My bad... It was a general observation. I've seen folks try to correct the UB by turning to something like that. Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG
On May 4, 2016 10:30:41 AM PDT, ty...@mit.edu wrote: >On Tue, May 03, 2016 at 10:50:25AM +0200, Stephan Mueller wrote: >> > +/* >> > + * crng_init = 0 --> Uninitialized >> > + *2 --> Initialized >> > + *3 --> Initialized from input_pool >> > + */ >> > +static int crng_init = 0; >> >> shouldn't that be an atomic_t ? > >The crng_init variable only gets incremented (it progresses from >0->1->2->3) and it's protected by the primary_crng->lock spinlock. >There are a few places where we read it without first taking the lock, >but that's where we depend on it getting incremented and where if we >race with another CPU which has just bumped the crng_init status, it's >not critical. (Or we can take the lock and then recheck the crng_init >if we really need to be sure.) > >> > +static void _initialize_crng(struct crng_state *crng) >> > +{ >> > + int i; >> > + unsigned long rv; >> >> Why do you use unsigned long here? I thought the state[i] is unsigned >int. > >Because it gets filled in via arch_get_random_seed_long(&rv) and >arch_get_random_log(&rv). If that means we get 64 bits and we only >use 32 bits, that's OK > >> Would it make sense to add the ChaCha20 self test vectors from >RFC7539 here to >> test that the ChaCha20 works? > >We're not doing that for any of the other ciphers and hashes. We >didn't do that for SHA1, and the chacha20 code where I took this from >didn't check for this as well. What threat are you most worried >about. We don't care about chacha20 being exactly chacha20, so long >as it's cryptographically strong. In fact I think I removed a >potential host order byteswap in the set key operation specifically >because we don't care and interop. > >If this is required for FIPS mode, we can add that later. I was >primarily trying to keep the patch small to be easier for people to >look at it, so I've deliberately left off bells and whistles that >aren't strictly needed to show that the new approach is sound. > >> > + if (crng_init++ >= 2) >> > + wake_up_interruptible(&crng_init_wait); >> >> Don't we have a race here with the crng_init < 3 check in crng_reseed > >> considering multi-core systems? > >No, because we are holding the primary_crng->lock spinlock. I'll add >a comment explaining the locking protections which is intended for >crng_init where we declare it. > > >> > + if (num < 16 || num > 32) { >> > + WARN_ON(1); >> > + pr_err("crng_reseed: num is %d?!?\n", num); >> > + } >> > + num_words = (num + 3) / 4; >> > + for (i = 0; i < num_words; i++) >> > + primary_crng.state[i+4] ^= tmp[i]; >> > + primary_crng.init_time = jiffies; >> > + if (crng_init < 3) { >> >> Shouldn't that one be if (crng_init < 3 && num >= 16) ? > >I'll just change the above WRN_ON test to be: > > BUG_ON(num < 16 || num > 32); > >This really can't happen, and I had it as a WARN_ON with a printk for >debugging purpose in case I was wrong about how the code works. > >> > + crng_init = 3; >> > + process_random_ready_list(); >> > + wake_up_interruptible(&crng_init_wait); >> > + pr_notice("random: crng_init 3\n"); >> >> Would it make sense to be more descriptive here to allow readers of >dmesg to >> understand the output? > >Yes, what we're currently printing during the initialization: > >random: crng_init 1 >random: crng_init 2 >random: crng_init 3 > >was again mostly for debugging purposes. What I'm thinking about >doing is changing crng_init 2 and 3 messages to be: > >random: crng fast init done >random: crng conservative init done > >> > + } >> > + ret = 1; >> > +out: >> > + spin_unlock_irqrestore(&primary_crng.lock, flags); >> >> memzero_explicit of tmp? > >Good point, I've added a call to memzero_explicit(). > >> > +static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE]) >> > +{ >> > + unsigned long v, flags; >> > + struct crng_state *crng = &primary_crng; >> > + >> > + if (crng_init > 2 && >> > + time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL)) >> > + crng_reseed(&input_pool); >> > + spin_lock_irqsave(&crng->lock, flags); >> > + if (arch_get_random_long(&v)) >> > + crng->state[14] ^= v; >> > + chacha20_block(&crng->state[0], out); >> > + if (crng->state[12] == 0) >> > + crng->state[13]++; > >> What is the purpose to only cover the 2nd 32 bit value of the nonce >with >> arch_get_random? >> >> state[12]++? Or why do you increment the nonce? > >In Chacha20, its output is a funcrtion of the state array, where >state[0..3] is a constant specified by the Chacha20 definition, >state[4..11] is the Key, and state[12..15] is the IV. The >chacha20_block() function increments state[12] each time it is called, >so state[12] is being used as the block counter. The increment of >state[13] is used to make state[13] to be the upper 32-bits of the >block counter. We *should* be reseeding more often than 2**32 calls >to chacha20_block
Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG
On Wed, May 04, 2016 at 10:40:20AM -0400, Jeffrey Walton wrote: > > +static inline u32 rotl32(u32 v, u8 n) > > +{ > > + return (v << n) | (v >> (sizeof(v) * 8 - n)); > > +} > > That's undefined behavior when n=0. Sure, but it's never called with n = 0; I've double checked and the compiler seems to do the right thing with the above pattern as well. Hmm, it looks like there is a "standard" version rotate left and right defined in include/linux/bitops.h. So I suspect it would make sense to use rol32 as defined in bitops.h --- and this is probably something that we should do for the rest of crypto/*.c, where people seem to be defininig their own version of something like rotl32 (I copied the contents of crypto/chacha20_generic.c to lib/chacha20, so this pattern of defining one's own version of rol32 isn't new). > I think the portable way to do a rotate that avoids UB is the > following. GCC, Clang and ICC recognize the pattern, and emit a rotate > instruction. > > static const unsigned int MASK=31; > return (v<>(-n&MASK)); > > You should also avoid the following because its not constant time due > to the branch: > > return n == 0 ? v : (v << n) | (v >> (sizeof(v) * 8 - n)); > Where is this coming from? I don't see this construct in the patch. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] asm-generic/io.h: add io{read,write}64 accessors
This will allow device drivers to consistently use io{read,write}XX also for 64-bit accesses. Signed-off-by: Horia Geantă --- include/asm-generic/io.h| 63 + include/asm-generic/iomap.h | 8 ++ 2 files changed, 71 insertions(+) diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h index eed3bbe88c8a..0e2c73bb6d56 100644 --- a/include/asm-generic/io.h +++ b/include/asm-generic/io.h @@ -585,6 +585,16 @@ static inline u32 ioread32(const volatile void __iomem *addr) } #endif +#ifdef CONFIG_64BIT +#ifndef ioread64 +#define ioread64 ioread64 +static inline u64 ioread64(const volatile void __iomem *addr) +{ + return readq(addr); +} +#endif +#endif /* CONFIG_64BIT */ + #ifndef iowrite8 #define iowrite8 iowrite8 static inline void iowrite8(u8 value, volatile void __iomem *addr) @@ -609,6 +619,16 @@ static inline void iowrite32(u32 value, volatile void __iomem *addr) } #endif +#ifdef CONFIG_64BIT +#ifndef iowrite64 +#define iowrite64 iowrite64 +static inline void iowrite64(u64 value, volatile void __iomem *addr) +{ + writeq(value, addr); +} +#endif +#endif /* CONFIG_64BIT */ + #ifndef ioread16be #define ioread16be ioread16be static inline u16 ioread16be(const volatile void __iomem *addr) @@ -625,6 +645,16 @@ static inline u32 ioread32be(const volatile void __iomem *addr) } #endif +#ifdef CONFIG_64BIT +#ifndef ioread64be +#define ioread64be ioread64be +static inline u64 ioread64be(const volatile void __iomem *addr) +{ + return __be64_to_cpu(__raw_readq(addr)); +} +#endif +#endif /* CONFIG_64BIT */ + #ifndef iowrite16be #define iowrite16be iowrite16be static inline void iowrite16be(u16 value, void volatile __iomem *addr) @@ -641,6 +671,16 @@ static inline void iowrite32be(u32 value, volatile void __iomem *addr) } #endif +#ifdef CONFIG_64BIT +#ifndef iowrite64be +#define iowrite64be iowrite64be +static inline void iowrite64be(u64 value, volatile void __iomem *addr) +{ + __raw_writeq(__cpu_to_be64(value), addr); +} +#endif +#endif /* CONFIG_64BIT */ + #ifndef ioread8_rep #define ioread8_rep ioread8_rep static inline void ioread8_rep(const volatile void __iomem *addr, void *buffer, @@ -668,6 +708,17 @@ static inline void ioread32_rep(const volatile void __iomem *addr, } #endif +#ifdef CONFIG_64BIT +#ifndef ioread64_rep +#define ioread64_rep ioread64_rep +static inline void ioread64_rep(const volatile void __iomem *addr, + void *buffer, unsigned int count) +{ + readsq(addr, buffer, count); +} +#endif +#endif /* CONFIG_64BIT */ + #ifndef iowrite8_rep #define iowrite8_rep iowrite8_rep static inline void iowrite8_rep(volatile void __iomem *addr, @@ -697,6 +748,18 @@ static inline void iowrite32_rep(volatile void __iomem *addr, writesl(addr, buffer, count); } #endif + +#ifdef CONFIG_64BIT +#ifndef iowrite64_rep +#define iowrite64_rep iowrite64_rep +static inline void iowrite64_rep(volatile void __iomem *addr, +const void *buffer, +unsigned int count) +{ + writesq(addr, buffer, count); +} +#endif +#endif /* CONFIG_64BIT */ #endif /* CONFIG_GENERIC_IOMAP */ #ifdef __KERNEL__ diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h index d8f8622fa044..650fede33c25 100644 --- a/include/asm-generic/iomap.h +++ b/include/asm-generic/iomap.h @@ -30,12 +30,20 @@ extern unsigned int ioread16(void __iomem *); extern unsigned int ioread16be(void __iomem *); extern unsigned int ioread32(void __iomem *); extern unsigned int ioread32be(void __iomem *); +#ifdef CONFIG_64BIT +extern u64 ioread64(void __iomem *); +extern u64 ioread64be(void __iomem *); +#endif extern void iowrite8(u8, void __iomem *); extern void iowrite16(u16, void __iomem *); extern void iowrite16be(u16, void __iomem *); extern void iowrite32(u32, void __iomem *); extern void iowrite32be(u32, void __iomem *); +#ifdef CONFIG_64BIT +extern void iowrite64(u64, void __iomem *); +extern void iowrite64be(u64, void __iomem *); +#endif /* * "string" versions of the above. Note that they -- 2.4.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/7] arm64: dts: ls1043a: add crypto node
LS1043A has a SEC v5.4 security engine. For now don't add rtic or sec_mon subnodes, since these features haven't been tested yet. Signed-off-by: Horia Geantă --- arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts | 4 +++ arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi| 43 +++ 2 files changed, 47 insertions(+) diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts b/arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts index ce235577e90f..9b5b75a4f02a 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts +++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts @@ -49,6 +49,10 @@ / { model = "LS1043A RDB Board"; + + aliases { + crypto = &crypto; + }; }; &i2c0 { diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi index be72bf5b58b5..529c198494d5 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi @@ -159,6 +159,49 @@ big-endian; }; + crypto: crypto@170 { + compatible = "fsl,sec-v5.4", "fsl,sec-v5.0", +"fsl,sec-v4.0"; + fsl,sec-era = <3>; + #address-cells = <1>; + #size-cells = <1>; + ranges = <0x0 0x00 0x170 0x10>; + reg = <0x00 0x170 0x0 0x10>; + interrupts = <0 75 0x4>; + + sec_jr0: jr@1 { + compatible = "fsl,sec-v5.4-job-ring", +"fsl,sec-v5.0-job-ring", +"fsl,sec-v4.0-job-ring"; + reg= <0x1 0x1>; + interrupts = <0 71 0x4>; + }; + + sec_jr1: jr@2 { + compatible = "fsl,sec-v5.4-job-ring", +"fsl,sec-v5.0-job-ring", +"fsl,sec-v4.0-job-ring"; + reg= <0x2 0x1>; + interrupts = <0 72 0x4>; + }; + + sec_jr2: jr@3 { + compatible = "fsl,sec-v5.4-job-ring", +"fsl,sec-v5.0-job-ring", +"fsl,sec-v4.0-job-ring"; + reg= <0x3 0x1>; + interrupts = <0 73 0x4>; + }; + + sec_jr3: jr@4 { + compatible = "fsl,sec-v5.4-job-ring", +"fsl,sec-v5.0-job-ring", +"fsl,sec-v4.0-job-ring"; + reg= <0x4 0x1>; + interrupts = <0 74 0x4>; + }; + }; + dcfg: dcfg@1ee { compatible = "fsl,ls1043a-dcfg", "syscon"; reg = <0x0 0x1ee 0x0 0x1>; -- 2.4.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG
On Tue, May 03, 2016 at 10:50:25AM +0200, Stephan Mueller wrote: > > +/* > > + * crng_init = 0 --> Uninitialized > > + * 2 --> Initialized > > + * 3 --> Initialized from input_pool > > + */ > > +static int crng_init = 0; > > shouldn't that be an atomic_t ? The crng_init variable only gets incremented (it progresses from 0->1->2->3) and it's protected by the primary_crng->lock spinlock. There are a few places where we read it without first taking the lock, but that's where we depend on it getting incremented and where if we race with another CPU which has just bumped the crng_init status, it's not critical. (Or we can take the lock and then recheck the crng_init if we really need to be sure.) > > +static void _initialize_crng(struct crng_state *crng) > > +{ > > + int i; > > + unsigned long rv; > > Why do you use unsigned long here? I thought the state[i] is unsigned int. Because it gets filled in via arch_get_random_seed_long(&rv) and arch_get_random_log(&rv). If that means we get 64 bits and we only use 32 bits, that's OK > Would it make sense to add the ChaCha20 self test vectors from RFC7539 here > to > test that the ChaCha20 works? We're not doing that for any of the other ciphers and hashes. We didn't do that for SHA1, and the chacha20 code where I took this from didn't check for this as well. What threat are you most worried about. We don't care about chacha20 being exactly chacha20, so long as it's cryptographically strong. In fact I think I removed a potential host order byteswap in the set key operation specifically because we don't care and interop. If this is required for FIPS mode, we can add that later. I was primarily trying to keep the patch small to be easier for people to look at it, so I've deliberately left off bells and whistles that aren't strictly needed to show that the new approach is sound. > > + if (crng_init++ >= 2) > > + wake_up_interruptible(&crng_init_wait); > > Don't we have a race here with the crng_init < 3 check in crng_reseed > considering multi-core systems? No, because we are holding the primary_crng->lock spinlock. I'll add a comment explaining the locking protections which is intended for crng_init where we declare it. > > + if (num < 16 || num > 32) { > > + WARN_ON(1); > > + pr_err("crng_reseed: num is %d?!?\n", num); > > + } > > + num_words = (num + 3) / 4; > > + for (i = 0; i < num_words; i++) > > + primary_crng.state[i+4] ^= tmp[i]; > > + primary_crng.init_time = jiffies; > > + if (crng_init < 3) { > > Shouldn't that one be if (crng_init < 3 && num >= 16) ? I'll just change the above WRN_ON test to be: BUG_ON(num < 16 || num > 32); This really can't happen, and I had it as a WARN_ON with a printk for debugging purpose in case I was wrong about how the code works. > > + crng_init = 3; > > + process_random_ready_list(); > > + wake_up_interruptible(&crng_init_wait); > > + pr_notice("random: crng_init 3\n"); > > Would it make sense to be more descriptive here to allow readers of dmesg to > understand the output? Yes, what we're currently printing during the initialization: random: crng_init 1 random: crng_init 2 random: crng_init 3 was again mostly for debugging purposes. What I'm thinking about doing is changing crng_init 2 and 3 messages to be: random: crng fast init done random: crng conservative init done > > + } > > + ret = 1; > > +out: > > + spin_unlock_irqrestore(&primary_crng.lock, flags); > > memzero_explicit of tmp? Good point, I've added a call to memzero_explicit(). > > +static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE]) > > +{ > > + unsigned long v, flags; > > + struct crng_state *crng = &primary_crng; > > + > > + if (crng_init > 2 && > > + time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL)) > > + crng_reseed(&input_pool); > > + spin_lock_irqsave(&crng->lock, flags); > > + if (arch_get_random_long(&v)) > > + crng->state[14] ^= v; > > + chacha20_block(&crng->state[0], out); > > + if (crng->state[12] == 0) > > + crng->state[13]++; > What is the purpose to only cover the 2nd 32 bit value of the nonce with > arch_get_random? > > state[12]++? Or why do you increment the nonce? In Chacha20, its output is a funcrtion of the state array, where state[0..3] is a constant specified by the Chacha20 definition, state[4..11] is the Key, and state[12..15] is the IV. The chacha20_block() function increments state[12] each time it is called, so state[12] is being used as the block counter. The increment of state[13] is used to make state[13] to be the upper 32-bits of the block counter. We *should* be reseeding more often than 2**32 calls to chacha20_block(), and the increment is just to be safe in case something goes wronng and we're not reseeding. We're using crng[14] to be contain the output of RDRAND, so this is where
[PATCH 0/7] crypto: caam - add support for LS1043A SoC
Hi, [Patches 1-3 add io{read,write}64[be] accessors (generic, arm64, ppc64), such that CAAM's accessors in regs.h are simplified a bit. Patch 7 adds crypto node for LS1043A platform. Let me know if it's ok to go with these through the cryptodev-2.6 tree.] This is a follow-up on the following RFC patch set: crypto: caam - Revamp I/O accessors https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg15878.html There are platforms such as LS1043A (or LS1012A) where core endianness does not match CAAM/SEC endianness (LE vs. BE). Add support in caam driver for these cases. Current patch set detects device endianness at runtime (as opposed to compile-time endianness), in order to support multiplatform kernels. Detection of device endianness is not device-tree based. Instead, SSTA ("SEC STAtus") register has a property such that reading it in any endianness and masking it properly, it's possible to deduce device endianness. The performance drop due to the runtime detection is < 1.0%. (An alternative implementation using function pointers has been tried, but lead to a bigger performance drop.) Thanks, Horia Cristian Stoica (1): crypto: caam - fix offset field in hw sg entries Horia Geantă (6): asm-generic/io.h: add io{read,write}64 accessors arm64: add io{read,write}64be accessors powerpc: add io{read,write}64 accessors crypto: caam - handle core endianness != caam endianness crypto: caam - add ARCH_LAYERSCAPE to supported architectures arm64: dts: ls1043a: add crypto node arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts | 4 + arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi| 43 ++ arch/arm64/include/asm/io.h | 4 +- arch/powerpc/kernel/iomap.c | 24 drivers/crypto/caam/Kconfig | 6 +- drivers/crypto/caam/caamhash.c| 5 +- drivers/crypto/caam/ctrl.c| 125 +++--- drivers/crypto/caam/desc.h| 9 +- drivers/crypto/caam/desc_constr.h | 44 --- drivers/crypto/caam/jr.c | 22 ++-- drivers/crypto/caam/pdb.h | 137 +++- drivers/crypto/caam/regs.h| 151 +++--- drivers/crypto/caam/sg_sw_sec4.h | 17 +-- include/asm-generic/io.h | 63 + include/asm-generic/iomap.h | 8 ++ 15 files changed, 490 insertions(+), 172 deletions(-) -- 2.4.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] crypto: caam - add ARCH_LAYERSCAPE to supported architectures
This basically adds support for ls1043a platform. Signed-off-by: Horia Geantă --- drivers/crypto/caam/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/crypto/caam/Kconfig b/drivers/crypto/caam/Kconfig index d2c2909a4020..ff54c42e6e51 100644 --- a/drivers/crypto/caam/Kconfig +++ b/drivers/crypto/caam/Kconfig @@ -1,6 +1,6 @@ config CRYPTO_DEV_FSL_CAAM tristate "Freescale CAAM-Multicore driver backend" - depends on FSL_SOC || ARCH_MXC + depends on FSL_SOC || ARCH_MXC || ARCH_LAYERSCAPE help Enables the driver module for Freescale's Cryptographic Accelerator and Assurance Module (CAAM), also known as the SEC version 4 (SEC4). -- 2.4.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/7] crypto: caam - handle core endianness != caam endianness
There are SoCs like LS1043A where CAAM endianness (BE) does not match the default endianness of the core (LE). Moreover, there are requirements for the driver to handle cases like CPU_BIG_ENDIAN=y on ARM-based SoCs. This requires for a complete rewrite of the I/O accessors. PPC-specific accessors - {in,out}_{le,be}XX - are replaced with generic ones - io{read,write}[be]XX. Endianness is detected dynamically (at runtime) to allow for multiplatform kernels, for e.g. running the same kernel image on LS1043A (BE CAAM) and LS2080A (LE CAAM) armv8-based SoCs. While here: debugfs entries need to take into consideration the endianness of the core when displaying data. Add the necessary glue code so the entries remain the same, but they are properly read, regardless of the core and/or SEC endianness. Note: pdb.h fixes only what is currently being used (IPsec). Signed-off-by: Horia Geantă Signed-off-by: Alex Porosanu --- drivers/crypto/caam/Kconfig | 4 - drivers/crypto/caam/caamhash.c| 5 +- drivers/crypto/caam/ctrl.c| 125 +++ drivers/crypto/caam/desc.h| 7 +- drivers/crypto/caam/desc_constr.h | 44 +++ drivers/crypto/caam/jr.c | 22 +++--- drivers/crypto/caam/pdb.h | 137 ++ drivers/crypto/caam/regs.h| 151 +- drivers/crypto/caam/sg_sw_sec4.h | 11 +-- 9 files changed, 340 insertions(+), 166 deletions(-) diff --git a/drivers/crypto/caam/Kconfig b/drivers/crypto/caam/Kconfig index 5652a53415dc..d2c2909a4020 100644 --- a/drivers/crypto/caam/Kconfig +++ b/drivers/crypto/caam/Kconfig @@ -116,10 +116,6 @@ config CRYPTO_DEV_FSL_CAAM_IMX def_bool SOC_IMX6 || SOC_IMX7D depends on CRYPTO_DEV_FSL_CAAM -config CRYPTO_DEV_FSL_CAAM_LE - def_bool CRYPTO_DEV_FSL_CAAM_IMX || SOC_LS1021A - depends on CRYPTO_DEV_FSL_CAAM - config CRYPTO_DEV_FSL_CAAM_DEBUG bool "Enable debug output in CAAM driver" depends on CRYPTO_DEV_FSL_CAAM diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c index 5845d4a08797..f1ecc8df8d41 100644 --- a/drivers/crypto/caam/caamhash.c +++ b/drivers/crypto/caam/caamhash.c @@ -847,7 +847,7 @@ static int ahash_update_ctx(struct ahash_request *req) *next_buflen, 0); } else { (edesc->sec4_sg + sec4_sg_src_index - 1)->len |= - SEC4_SG_LEN_FIN; + cpu_to_caam32(SEC4_SG_LEN_FIN); } state->current_buf = !state->current_buf; @@ -949,7 +949,8 @@ static int ahash_final_ctx(struct ahash_request *req) state->buf_dma = try_buf_map_to_sec4_sg(jrdev, edesc->sec4_sg + 1, buf, state->buf_dma, buflen, last_buflen); - (edesc->sec4_sg + sec4_sg_src_index - 1)->len |= SEC4_SG_LEN_FIN; + (edesc->sec4_sg + sec4_sg_src_index - 1)->len |= + cpu_to_caam32(SEC4_SG_LEN_FIN); edesc->sec4_sg_dma = dma_map_single(jrdev, edesc->sec4_sg, sec4_sg_bytes, DMA_TO_DEVICE); diff --git a/drivers/crypto/caam/ctrl.c b/drivers/crypto/caam/ctrl.c index 44d30b45f3cc..1c8e764872ae 100644 --- a/drivers/crypto/caam/ctrl.c +++ b/drivers/crypto/caam/ctrl.c @@ -15,6 +15,9 @@ #include "desc_constr.h" #include "error.h" +bool caam_little_end; +EXPORT_SYMBOL(caam_little_end); + /* * i.MX targets tend to have clock control subsystems that can * enable/disable clocking to our device. @@ -106,7 +109,7 @@ static inline int run_descriptor_deco0(struct device *ctrldev, u32 *desc, if (ctrlpriv->virt_en == 1) { - setbits32(&ctrl->deco_rsr, DECORSR_JR0); + clrsetbits_32(&ctrl->deco_rsr, 0, DECORSR_JR0); while (!(rd_reg32(&ctrl->deco_rsr) & DECORSR_VALID) && --timeout) @@ -115,7 +118,7 @@ static inline int run_descriptor_deco0(struct device *ctrldev, u32 *desc, timeout = 10; } - setbits32(&ctrl->deco_rq, DECORR_RQD0ENABLE); + clrsetbits_32(&ctrl->deco_rq, 0, DECORR_RQD0ENABLE); while (!(rd_reg32(&ctrl->deco_rq) & DECORR_DEN0) && --timeout) @@ -123,12 +126,12 @@ static inline int run_descriptor_deco0(struct device *ctrldev, u32 *desc, if (!timeout) { dev_err(ctrldev, "failed to acquire DECO 0\n"); - clrbits32(&ctrl->deco_rq, DECORR_RQD0ENABLE); + clrsetbits_32(&ctrl->deco_rq, DECORR_RQD0ENABLE, 0); return -ENODEV; } for (i = 0; i < desc_len(desc); i++) - wr_reg32(&deco->descbuf[i], *(desc + i)); + wr_reg32(&deco->descbuf[i], c
[PATCH 4/7] crypto: caam - fix offset field in hw sg entries
From: Cristian Stoica The offset field is 13 bits wide; make sure we don't overwrite more than that in the caam hardware scatter gather structure. Signed-off-by: Cristian Stoica Signed-off-by: Horia Geantă --- drivers/crypto/caam/desc.h | 2 +- drivers/crypto/caam/sg_sw_sec4.h | 8 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/crypto/caam/desc.h b/drivers/crypto/caam/desc.h index 1e93c6af2275..fe30ff69088c 100644 --- a/drivers/crypto/caam/desc.h +++ b/drivers/crypto/caam/desc.h @@ -20,7 +20,7 @@ #define SEC4_SG_BPID_MASK 0x00ff #define SEC4_SG_BPID_SHIFT 16 #define SEC4_SG_LEN_MASK 0x3fff /* Excludes EXT and FINAL */ -#define SEC4_SG_OFFS_MASK 0x1fff +#define SEC4_SG_OFFSET_MASK0x1fff struct sec4_sg_entry { #ifdef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX diff --git a/drivers/crypto/caam/sg_sw_sec4.h b/drivers/crypto/caam/sg_sw_sec4.h index 12ec6616e89d..2311341b7356 100644 --- a/drivers/crypto/caam/sg_sw_sec4.h +++ b/drivers/crypto/caam/sg_sw_sec4.h @@ -11,12 +11,12 @@ struct sec4_sg_entry; * convert single dma address to h/w link table format */ static inline void dma_to_sec4_sg_one(struct sec4_sg_entry *sec4_sg_ptr, - dma_addr_t dma, u32 len, u32 offset) + dma_addr_t dma, u32 len, u16 offset) { sec4_sg_ptr->ptr = dma; sec4_sg_ptr->len = len; sec4_sg_ptr->buf_pool_id = 0; - sec4_sg_ptr->offset = offset; + sec4_sg_ptr->offset = offset & SEC4_SG_OFFSET_MASK; #ifdef DEBUG print_hex_dump(KERN_ERR, "sec4_sg_ptr@: ", DUMP_PREFIX_ADDRESS, 16, 4, sec4_sg_ptr, @@ -30,7 +30,7 @@ static inline void dma_to_sec4_sg_one(struct sec4_sg_entry *sec4_sg_ptr, */ static inline struct sec4_sg_entry * sg_to_sec4_sg(struct scatterlist *sg, int sg_count, - struct sec4_sg_entry *sec4_sg_ptr, u32 offset) + struct sec4_sg_entry *sec4_sg_ptr, u16 offset) { while (sg_count) { dma_to_sec4_sg_one(sec4_sg_ptr, sg_dma_address(sg), @@ -48,7 +48,7 @@ sg_to_sec4_sg(struct scatterlist *sg, int sg_count, */ static inline void sg_to_sec4_sg_last(struct scatterlist *sg, int sg_count, struct sec4_sg_entry *sec4_sg_ptr, - u32 offset) + u16 offset) { sec4_sg_ptr = sg_to_sec4_sg(sg, sg_count, sec4_sg_ptr, offset); sec4_sg_ptr->len |= SEC4_SG_LEN_FIN; -- 2.4.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] powerpc: add io{read,write}64 accessors
This will allow device drivers to consistently use io{read,write}XX also for 64-bit accesses. Signed-off-by: Horia Geantă --- arch/powerpc/kernel/iomap.c | 24 1 file changed, 24 insertions(+) diff --git a/arch/powerpc/kernel/iomap.c b/arch/powerpc/kernel/iomap.c index 12e48d56f771..3963f0b68d52 100644 --- a/arch/powerpc/kernel/iomap.c +++ b/arch/powerpc/kernel/iomap.c @@ -38,6 +38,18 @@ EXPORT_SYMBOL(ioread16); EXPORT_SYMBOL(ioread16be); EXPORT_SYMBOL(ioread32); EXPORT_SYMBOL(ioread32be); +#ifdef __powerpc64__ +u64 ioread64(void __iomem *addr) +{ + return readq(addr); +} +u64 ioread64be(void __iomem *addr) +{ + return readq_be(addr); +} +EXPORT_SYMBOL(ioread64); +EXPORT_SYMBOL(ioread64be); +#endif /* __powerpc64__ */ void iowrite8(u8 val, void __iomem *addr) { @@ -64,6 +76,18 @@ EXPORT_SYMBOL(iowrite16); EXPORT_SYMBOL(iowrite16be); EXPORT_SYMBOL(iowrite32); EXPORT_SYMBOL(iowrite32be); +#ifdef __powerpc64__ +void iowrite64(u64 val, void __iomem *addr) +{ + writeq(val, addr); +} +void iowrite64be(u64 val, void __iomem *addr) +{ + writeq_be(val, addr); +} +EXPORT_SYMBOL(iowrite64); +EXPORT_SYMBOL(iowrite64be); +#endif /* __powerpc64__ */ /* * These are the "repeat read/write" functions. Note the -- 2.4.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] arm64: add io{read,write}64be accessors
This will allow device drivers to consistently use io{read,write}XXbe also for 64-bit accesses. Signed-off-by: Alex Porosanu Signed-off-by: Horia Geantă --- arch/arm64/include/asm/io.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h index 44be1e03ed65..9b6e408cfa51 100644 --- a/arch/arm64/include/asm/io.h +++ b/arch/arm64/include/asm/io.h @@ -174,13 +174,15 @@ extern void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size); #define iounmap__iounmap /* - * io{read,write}{16,32}be() macros + * io{read,write}{16,32,64}be() macros */ #define ioread16be(p) ({ __u16 __v = be16_to_cpu((__force __be16)__raw_readw(p)); __iormb(); __v; }) #define ioread32be(p) ({ __u32 __v = be32_to_cpu((__force __be32)__raw_readl(p)); __iormb(); __v; }) +#define ioread64be(p) ({ __u64 __v = be64_to_cpu((__force __be64)__raw_readq(p)); __iormb(); __v; }) #define iowrite16be(v,p) ({ __iowmb(); __raw_writew((__force __u16)cpu_to_be16(v), p); }) #define iowrite32be(v,p) ({ __iowmb(); __raw_writel((__force __u32)cpu_to_be32(v), p); }) +#define iowrite64be(v,p) ({ __iowmb(); __raw_writeq((__force __u64)cpu_to_be64(v), p); }) /* * Convert a physical pointer to a virtual kernel pointer for /dev/mem -- 2.4.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG
>> + chacha20_block(&crng->state[0], out); >> + if (crng->state[12] == 0) >> + crng->state[13]++; > > state[12]++? Or why do you increment the nonce? In Bernstein's Salsa and ChaCha, the counter is 64-bit. It appears ChaCha-TLS uses a 32-bit counter, and the other 32-bits is given to the nonce. Maybe the first question to ask is, what ChaCha is the kernel providing? If its ChaCha-TLS, then the carry does not make a lot of sense. If the generator is limiting the amount of material under a given set of security parameters (key and nonce), then the generator will likely re-key itself long before the 256-GB induced wrap. In this case, it does not matter which ChaCha the kernel is providing and the carry is superfluous. Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [crypto / sparc64] cryptomgr_test OOPS
On Wed, May 4, 2016 at 7:07 AM, Herbert Xu wrote: > David Miller wrote: >> From: Anatoly Pugachev >> Date: Tue, 3 May 2016 16:54:18 +0300 >> >>> I have git kernel OOPS (4.6.0-rc6) on sparc64. This OOPS does not >>> happen, if I set the following kernel option: >>> >>> CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y >>> >>> Can someone please look at >>> https://bugzilla.kernel.org/show_bug.cgi?id=117551 ? >> >> The lib/mpi/ code hasn't been touched in a long time, so I can only see that >> it might >> be the crypto/rsa.c changes that happened this release. >> >> Can you possibly bisect this or at least tell us that 4.5 doesn't have this >> problem? > > It would also be helpful to test the latest cryptodev tree which > has a large number of MPI fixes from Nicolai Stange. Herbert, just tested cryptodev ( http://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git ) kernel, same OOPS, but kernel version is 4.6.0-rc2+ . kernel OOPS message - https://paste.fedoraproject.org/362554/23732641/ -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG
> +static inline u32 rotl32(u32 v, u8 n) > +{ > + return (v << n) | (v >> (sizeof(v) * 8 - n)); > +} That's undefined behavior when n=0. I think the portable way to do a rotate that avoids UB is the following. GCC, Clang and ICC recognize the pattern, and emit a rotate instruction. static const unsigned int MASK=31; return (v<>(-n&MASK)); You should also avoid the following because its not constant time due to the branch: return n == 0 ? v : (v << n) | (v >> (sizeof(v) * 8 - n)); Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: v4.6-rc1 regression bisected, Problem loading in-kernel X.509 certificate (-2)
Hi David On 05/04/2016 02:01 AM, David Howells wrote: > Do you want to push this via Herbert's tree? > Yes, I think Herbert has some more patches queued for rc-7. Let me also send a proper one with the signed-off tag. ---8<--- Subject: crypto: rsa - select crypto mgr dependency The pkcs1pad template needs CRYPTO_MANAGER so it needs to be explicitly selected by CRYPTO_RSA. Reported-by: Jamie Heilman Signed-off-by: Tadeusz Struk --- crypto/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/crypto/Kconfig b/crypto/Kconfig index 93a1fdc..1d33beb 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -96,6 +96,7 @@ config CRYPTO_AKCIPHER config CRYPTO_RSA tristate "RSA algorithm" select CRYPTO_AKCIPHER + select CRYPTO_MANAGER select MPILIB select ASN1 help -- TS -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] crypto: hash - Fix page length clamping in hash walk
On (05/04/16 12:08), Steffen Klassert wrote: > > > Sowmini, could you please doublecheck with your test setup? > > > > Don't bother, my patch was crap. Here's one that's more likely > > to work: > > This one is much better, works here :) The patch seems to work, but the caveat is that the original bug (iperf segv) was not always reproducible on demand - it depended on memory and other config. But looks like Steffen has a reliable way to reproduce the original problem, so lets go with this patch for now. --Sowmini -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] lib/mpi: Fix kernel unaligned access in mpi_write_to_sgl
On (05/04/16 10:32), Herbert Xu wrote: > > Please base it on cryptodev. > Looks like this got fixed in cryptodev by commit cece762f6f3c ("lib/mpi: mpi_write_sgl(): fix out-of-bounds stack access") Thanks, --Sowmini -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] crypto: hash - Fix page length clamping in hash walk
On Wed, May 04, 2016 at 05:52:56PM +0800, Herbert Xu wrote: > On Wed, May 04, 2016 at 11:34:20AM +0200, Steffen Klassert wrote: > > > > Hmm, the 'sleeping while atomic' because of not unmapping > > the page goes away, but now I see a lot of IPsec ICV fails > > on the receive side. I'll try to find out what's going on. > > > > Sowmini, could you please doublecheck with your test setup? > > Don't bother, my patch was crap. Here's one that's more likely > to work: This one is much better, works here :) -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] crypto: hash - Fix page length clamping in hash walk
On Wed, May 04, 2016 at 11:34:20AM +0200, Steffen Klassert wrote: > > Hmm, the 'sleeping while atomic' because of not unmapping > the page goes away, but now I see a lot of IPsec ICV fails > on the receive side. I'll try to find out what's going on. > > Sowmini, could you please doublecheck with your test setup? Don't bother, my patch was crap. Here's one that's more likely to work: ---8<--- The crypto hash walk code is broken when supplied with an offset greater than or equal to PAGE_SIZE. This patch fixes it by adjusting walk->pg and walk->offset when this happens. Cc: Reported-by: Steffen Klassert Signed-off-by: Herbert Xu diff --git a/crypto/ahash.c b/crypto/ahash.c index 5fc1f17..3887a98 100644 --- a/crypto/ahash.c +++ b/crypto/ahash.c @@ -69,8 +69,9 @@ static int hash_walk_new_entry(struct crypto_hash_walk *walk) struct scatterlist *sg; sg = walk->sg; - walk->pg = sg_page(sg); walk->offset = sg->offset; + walk->pg = sg_page(walk->sg) + (walk->offset >> PAGE_SHIFT); + walk->offset = offset_in_page(walk->offset); walk->entrylen = sg->length; if (walk->entrylen > walk->total) -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crypto: hash - Fix page length clamping in hash walk
On Tue, May 03, 2016 at 05:55:31PM +0800, Herbert Xu wrote: > On Thu, Apr 28, 2016 at 10:27:43AM +0200, Steffen Klassert wrote: > > > > The problem was that if offset (in a superpage) equals > > PAGE_SIZE in hash_walk_next(), nbytes becomes zero. So > > we map the page, but we don't hash and unmap because we > > exit the loop in shash_ahash_update() in this case. > > I see. Does this patch help? Hmm, the 'sleeping while atomic' because of not unmapping the page goes away, but now I see a lot of IPsec ICV fails on the receive side. I'll try to find out what's going on. Sowmini, could you please doublecheck with your test setup? -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: v4.6-rc1 regression bisected, Problem loading in-kernel X.509 certificate (-2)
Tadeusz Struk wrote: > I think the problem is that pkcs1pad template needs CRYPTO_MANAGER, but > your configuration doesn't enable CRYPTO_MANAGER. Could you try this > please: > > diff --git a/crypto/Kconfig b/crypto/Kconfig > index 93a1fdc..1d33beb 100644 > --- a/crypto/Kconfig > +++ b/crypto/Kconfig > @@ -96,6 +96,7 @@ config CRYPTO_AKCIPHER > config CRYPTO_RSA > tristate "RSA algorithm" > select CRYPTO_AKCIPHER > + select CRYPTO_MANAGER > select MPILIB > select ASN1 > help Do you want to push this via Herbert's tree? David -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html