from:"H. Peter Anvin"

Re: [RFC 20/22] x86/relocs: Add option to generate 64-bit relocations

2017-07-19 Thread H. Peter Anvin

,"Paul E . McKenney" ,Andrew 
Morton ,Christopher Li ,Dou 
Liyang ,Masahiro Yamada 
,Daniel Borkmann ,Markus 
Trippelsdorf ,Peter Foley ,Steven 
Rostedt ,Tim Chen ,Ard 
Biesheuvel ,Catalin Marinas 
,Matthew Wilcox ,Michal Hocko 
,Rob Landley ,Jiri Kosina 
,"H . J . Lu" ,Paul Bolle 
,Baoquan He ,Daniel Micay 
,the arch/x86 maintainers 
,linux-crypto@vger.kernel.org,LKML 
,xen-de...@lists.xenproject.org,kvm list 
,Linux PM list
,linux-arch 
,linux-spa...@vger.kernel.org,Kernel Hardening 

From: h...@zytor.com
Message-ID: <0ef6faaa-a99c-4f0d-9e4a-ad25e9395...@zytor.com>

On July 19, 2017 4:25:56 PM PDT, Thomas Garnier  wrote:
>On Wed, Jul 19, 2017 at 4:08 PM, H. Peter Anvin  wrote:
>> On 07/19/17 15:47, Thomas Garnier wrote:
>>> On Wed, Jul 19, 2017 at 3:33 PM, H. Peter Anvin 
>wrote:
>>>> On 07/18/17 15:33, Thomas Garnier wrote:
>>>>> The x86 relocation tool generates a list of 32-bit signed
>integers. There
>>>>> was no need to use 64-bit integers because all addresses where
>above the 2G
>>>>> top of the memory.
>>>>>
>>>>> This change add a large-reloc option to generate 64-bit unsigned
>integers.
>>>>> It can be used when the kernel plan to go below the top 2G and
>32-bit
>>>>> integers are not enough.
>>>>
>>>> Why on Earth?  This would only be necessary if the *kernel itself*
>was
>>>> more than 2G, which isn't going to happen for the forseeable
>future.
>>>
>>> Because the relocation integer is an absolute address, not an offset
>>> in the binary. Next iteration, I can try using a 32-bit offset for
>>> everyone.
>>
>> It is an absolute address *as the kernel was originally linked*, for
>> obvious reasons.
>
>Sure when the kernel was just above 0x8000, it doesn't
>work when it goes down to 0x. That's why using an
>offset might make more sense in general.
>
>>
>> -hpa
>>

What is the motivation for changing the pre linked address at all?
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [RFC 06/22] kvm: Adapt assembly for PIE support

2017-07-19 Thread H. Peter Anvin

,Chris Metcalf ,"Paul E . 
McKenney" ,Andrew Morton 
,Christopher Li ,Dou Liyang 
,Masahiro Yamada 
,Daniel Borkmann ,Markus 
Trippelsdorf ,Peter Foley ,Steven 
Rostedt ,Tim Chen ,Catalin 
Marinas ,Matthew Wilcox 
,Michal Hocko ,Rob Landley 
,Jiri Kosina ,"H . J . Lu" 
,Paul Bolle ,Baoquan He 
,Daniel Micay ,the arch/x86 maintainers 
,"linux-crypto@vger.kernel.org" 
,Linux Kernel Mailing List 
,xen-de...@lists.xenproject.org,kvm list
,linux-pm ,linux-arch 
,Linux-Sparse ,Kernel 
Hardening 
From: h...@zytor.com
Message-ID: <83ba7600-bc8d-4c91-812c-dd2a0bf44...@zytor.com>

On July 19, 2017 3:58:07 PM PDT, Ard Biesheuvel  
wrote:
>On 19 July 2017 at 23:27, H. Peter Anvin  wrote:
>> On 07/19/17 08:40, Thomas Garnier wrote:
>>>>
>>>> This doesn't look right.  It's accessing a per-cpu variable.  The
>>>> per-cpu section is an absolute, zero-based section and not subject
>to
>>>> relocation.
>>>
>>> PIE does not respect the zero-based section, it tries to have
>>> everything relative. Patch 16/22 also adapt per-cpu to work with PIE
>>> (while keeping the zero absolute design by default).
>>>
>>
>> This is silly.  The right thing is for PIE is to be explicitly
>absolute,
>> without (%rip).  The use of (%rip) memory references for percpu is
>just
>> an optimization.
>>
>
>Sadly, there is an issue in binutils that may prevent us from doing
>this as cleanly as we would want.
>
>For historical reasons, bfd.ld emits special symbols like
>__GLOBAL_OFFSET_TABLE__ as absolute symbols with a section index of
>SHN_ABS, even though it is quite obvious that they are relative like
>any other symbol that points into the image. Unfortunately, this means
>that binutils needs to emit R_X86_64_RELATIVE relocations even for
>SHN_ABS symbols, which means we lose the ability to use both absolute
>and relocatable symbols in the same PIE image (unless the reloc tool
>can filter them out)
>
>More info here:
>https://sourceware.org/bugzilla/show_bug.cgi?id=19818

The reloc tool already has the ability to filter symbols.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 5/7] x86, boot, LLVM: Use regparm=0 for memcpy and memset

2017-06-22 Thread H. Peter Anvin

On 03/17/17 05:08, Peter Zijlstra wrote:
> On Thu, Mar 16, 2017 at 05:15:18PM -0700, Michael Davidson wrote:
>> Use the standard regparm=0 calling convention for memcpy and
>> memset when building with clang.
>>
>> This is a work around for a long standing clang bug
>> (see https://llvm.org/bugs/show_bug.cgi?id=3997) where
>> clang always uses the standard regparm=0 calling convention
>> for any implcit calls to memcpy and memset that it generates
>> (eg for structure assignments and initialization) even if an
>> alternate calling convention such as regparm=3 has been specified.
> 
> Seriously, fix LLVM already.
> 

Yes, this is a real stinker, in no small part because once clang is
fixed to DTRT then this is actually broken...

-hpa

Re: [PATCH 3/7] x86, LLVM: suppress clang warnings about unaligned accesses

2017-04-13 Thread H. Peter Anvin

On 04/13/17 16:14, Matthias Kaehlcke wrote:
> El Mon, Apr 03, 2017 at 04:01:58PM -0700 Matthias Kaehlcke ha dit:
> 
>> El Fri, Mar 17, 2017 at 04:50:19PM -0700 h...@zytor.com ha dit:
>>
>>> On March 16, 2017 5:15:16 PM PDT, Michael Davidson  wrote:
 Suppress clang warnings about potential unaliged accesses
 to members in packed structs. This gets rid of almost 10,000
 warnings about accesses to the ring 0 stack pointer in the TSS.

 Signed-off-by: Michael Davidson 
 ---
 arch/x86/Makefile | 5 +
 1 file changed, 5 insertions(+)

 diff --git a/arch/x86/Makefile b/arch/x86/Makefile
 index 894a8d18bf97..7f21703c475d 100644
 --- a/arch/x86/Makefile
 +++ b/arch/x86/Makefile
 @@ -128,6 +128,11 @@ endif
 KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args)
 endif

 +ifeq ($(cc-name),clang)
 +# Suppress clang warnings about potential unaligned accesses.
 +KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member)
 +endif
 +
 ifdef CONFIG_X86_X32
x32_ld_ok := $(call try-run,\
/bin/echo -e '1: .quad 1b' | \
>>>
>>> Why conditional on clang?
>>
>> My understanding is that this warning is clang specific, it is not
>> listed on https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
> 
> Actually this warning affects other platforms besides x86
> (e.g. arm64), I'll submit a patch that disables the warning on all
> platforms.
> 

Drop the ifeq ($(cc-name),clang).

You should assume that if you have to add one of those ifeq's then you
are doing something fundamentally wrong.

-hpa

Re: [PATCH 2/7] Makefile, x86, LLVM: disable unsupported optimization flags

2017-03-17 Thread H. Peter Anvin

On 03/17/17 14:32, H. Peter Anvin wrote:
> 
> NAK.  Fix your compiler, or use a wrapper script or something.  It is
> absolutely *not* acceptable to disable this since future versions of
> clang *should* support that.
> 
> That being said, it might make sense to look for a key pattern like
> "(un|not )supported" on stderr the try-run macro.  Is there really no
> -Wno- or -Werror= option to turn off this craziness?
> 

Well, guess what... I found it myself.

-W{no-,error=}ignored-optimization-argument

Either variant will make this sane.

-hpa

Re: [PATCH 2/7] Makefile, x86, LLVM: disable unsupported optimization flags

2017-03-17 Thread H. Peter Anvin

On 03/16/17 17:15, Michael Davidson wrote:
> Unfortunately, while clang generates a warning about these flags
> being unsupported it still exits with a status of 0 so we have
> to explicitly disable them instead of just using a cc-option check.
> 
> Signed-off-by: Michael Davidson 
> ---
>  Makefile  | 2 ++
>  arch/x86/Makefile | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/Makefile b/Makefile
> index b21fd0ca2946..5e97e5fc1eea 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -629,7 +629,9 @@ ARCH_AFLAGS :=
>  ARCH_CFLAGS :=
>  include arch/$(SRCARCH)/Makefile
>  
> +ifneq ($(cc-name),clang)
>  KBUILD_CFLAGS+= $(call cc-option,-fno-delete-null-pointer-checks,)
> +endif
>  KBUILD_CFLAGS+= $(call cc-disable-warning,frame-address,)
>  
>  ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 2d449337a360..894a8d18bf97 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -87,11 +87,13 @@ else
>  KBUILD_AFLAGS += -m64
>  KBUILD_CFLAGS += -m64
>  
> +ifneq ($(cc-name),clang)
>  # Align jump targets to 1 byte, not the default 16 bytes:
>  KBUILD_CFLAGS += -falign-jumps=1
>  
>  # Pack loops tightly as well:
>  KBUILD_CFLAGS += -falign-loops=1
> +endif
>  
>  # Don't autogenerate traditional x87 instructions
>  KBUILD_CFLAGS += $(call cc-option,-mno-80387)
> 

NAK.  Fix your compiler, or use a wrapper script or something.  It is
absolutely *not* acceptable to disable this since future versions of
clang *should* support that.

That being said, it might make sense to look for a key pattern like
"(un|not )supported" on stderr the try-run macro.  Is there really no
-Wno- or -Werror= option to turn off this craziness?

-hpa

Re: [PATCH] x86/crypto: fix %progbits -> @progbits

2017-01-25 Thread H. Peter Anvin

On 01/19/17 13:28, Denys Vlasenko wrote:
> %progbits form is used on ARM (where @ is a comment char).
> 
> x86 consistently uses @progbits everywhere else.

However, it looks like %progbits works on all architectures (at least
include/linux/init.h seems to imply so.)  Perhaps a tree-wide
replacement the other way would make more sense.

Personally I would also like to see these parameters macroized, to keep
someone from getting them wrong, just like we have __INIT, __INITRODATA
etc already, we should just have plain __TEXT __DATA __BSS...

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Entropy sources

2016-08-25 Thread H. Peter Anvin

On 08/25/16 16:35, Sandy Harris wrote:
> On Thu, Aug 25, 2016 at 5:30 PM, H. Peter Anvin  wrote:
> 
>> The network stack is a good source of entropy, *once it is online*.
>> However, the most serious case is while the machine is still booting,
>> when the network will not have enabled yet.
>>
>> -hpa
> 
> One possible solution is at:
> https://github.com/sandy-harris/maxwell
> 
> A small (< 700 lines) daemon that gets entropy from timer imprecision
> and variations in time for arithmetic (cache misses, interrupts, etc.)
> and pumps it into /dev/random. Make it the first userspace program
> started and all should be covered. Directory above includes a PDF doc
> with detailed rationale and some discussion of alternate solutions.
> 
> Of course if you are dealing with a system-on-a-chip or low-end
> embedded CPU & the timer is really inadequate, this will not work
> well. Conceivably well enough, but we could not know that without
> detailed analysis for each chip in question.
> 

A lot of this is exactly the same thing /dev/random already does in
kernel space.  I have already in the past expressed skepticism toward
this approach, because a lot of the analysis done has simply been bogus.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Entropy sources (was: /dev/random - a new approach)

2016-08-25 Thread H. Peter Anvin

On 08/20/16 22:37, Jeffrey Walton wrote:
>>
>> The biggest problem there is that the timer interrupt adds *no* entropy
>> unless there is a source of asynchronicity in the system.  On PCs,
>> traditionally the timer has been run from a completely different crystal
>> (14.31818 MHz) than the CPU, which is the ideal situation, but if they
>> are run off the same crystal and run in lockstep, there is very little
>> if anything there.  On some systems, the timer may even *be* the only
>> source of time, and the entropy truly is zero.
> 
> It seems like a networked computer should have an abundance on entropy
> available from the network stack. Every common case I can come up with
> includes a networked computer. If a handheld is outside of coverage,
> then it probably does not have the randomness demands because it can't
> communicate (i.e., TCP sequence numbers, key agreement, etc).
> 
> In fact, there are at least two papers that use bits from the network stack:
> 

The network stack is a good source of entropy, *once it is online*.
However, the most serious case is while the machine is still booting,
when the network will not have enabled yet.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 0/5] /dev/random - a new approach

2016-08-19 Thread H. Peter Anvin

On 08/18/16 22:56, Herbert Xu wrote:
> On Thu, Aug 18, 2016 at 10:49:47PM -0400, Theodore Ts'o wrote:
>>
>> That really depends on the system.  We can't assume that people are
>> using systems with a 100Hz clock interrupt.  More often than not
>> people are using tickless kernels these days.  That's actually the
>> problem with changing /dev/urandom to block until things are
>> initialized.
> 
> Couldn't we disable tickless until urandom has been seeded? In fact
> perhaps we should accelerate the timer interrupt rate until it has
> been seeded?
> 

The biggest problem there is that the timer interrupt adds *no* entropy
unless there is a source of asynchronicity in the system.  On PCs,
traditionally the timer has been run from a completely different crystal
(14.31818 MHz) than the CPU, which is the ideal situation, but if they
are run off the same crystal and run in lockstep, there is very little
if anything there.  On some systems, the timer may even *be* the only
source of time, and the entropy truly is zero.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 0/5] /dev/random - a new approach

2016-08-16 Thread H. Peter Anvin

On 08/16/16 15:28, H. Peter Anvin wrote:
> On 08/15/16 22:45, Stephan Mueller wrote:
>> Am Montag, 15. August 2016, 13:42:54 CEST schrieb H. Peter Anvin:
>>
>> Hi H,
>>
>>> On 08/11/16 05:24, Stephan Mueller wrote:
>>>> * prevent fast noise sources from dominating slow noise sources
>>>>
>>>>   in case of /dev/random
>>>
>>> Can someone please explain if and why this is actually desirable, and if
>>> this assessment has been passed to someone who has actual experience
>>> with cryptography at the professional level?
>>
>> There are two motivations for that:
>>
>> - the current /dev/random is compliant to NTG.1 from AIS 20/31 which 
>> requires 
>> (in brief words) that entropy comes from auditible noise sources. Currently 
>> in 
>> my LRNG only RDRAND is a fast noise source which is not auditible (and it is 
>> designed to cause a VM exit making it even harder to assess it). To make the 
>> LRNG to comply with NTG.1, RDRAND can provide entropy but must not become 
>> the 
>> sole entropy provider which is the case now with that change.
>>
>> - the current /dev/random implementation follows the same concept with the 
>> exception of 3.15 and 3.16 where RDRAND was not rate-limited. In later 
>> versions, this was changed.
>>
> 
> I'm not saying it should be *sole*.  I am questioning the value in
> limiting it, as it seems to me that it could only ever produce a worse
> result.
> 

Also, it would be great to actually get a definition for "auditable".  A
quantum white noise source which exceeds the sampling bandwidth is an
ideal RNG; how do you "audit" that?  If what you are doing is looking
for imperfections, those imperfections can be trivially emulated.  If
what you mean is an audit on the chip or circuit level, that would
require some mechanism to know that all items were built identically
without deviation, which may be possible for intelligence agencies or
the military who have full control of their supply chain, but for anyone
else that is most likely an impossible task.  How many people are going
to crack the case and look at even a discrete transistor circuit, and
how many of *those* are going to be able to discern if that circuit is
subject to RF capture, or its output even used?

I have been trying to figure out how to reasonably solve this problem
for a long time now, and it is not just a problem for RDSEED (RDRAND is
a slightly different beast.)  The only reason RDSEED exposes the problem
particularly harshly is because it is extremely high bandwidth compared
to other noise sources and it is architecturally integrated into the
CPU, but the same would apply to an external noise generator connected
via PCIe, for example.

Incidentally, I am hoping -- and this is a personal statement and
nothing official from Intel -- that at some future date RDRAND (not
RDSEED) will be fast enough that it can completely replace even
prandom_u32(), which I really hope can be non-controversial as
prandom_u32() isn't cryptographically strong in the first place.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 0/5] /dev/random - a new approach

2016-08-16 Thread H. Peter Anvin

On 08/15/16 22:45, Stephan Mueller wrote:
> Am Montag, 15. August 2016, 13:42:54 CEST schrieb H. Peter Anvin:
> 
> Hi H,
> 
>> On 08/11/16 05:24, Stephan Mueller wrote:
>>> * prevent fast noise sources from dominating slow noise sources
>>>
>>>   in case of /dev/random
>>
>> Can someone please explain if and why this is actually desirable, and if
>> this assessment has been passed to someone who has actual experience
>> with cryptography at the professional level?
> 
> There are two motivations for that:
> 
> - the current /dev/random is compliant to NTG.1 from AIS 20/31 which requires 
> (in brief words) that entropy comes from auditible noise sources. Currently 
> in 
> my LRNG only RDRAND is a fast noise source which is not auditible (and it is 
> designed to cause a VM exit making it even harder to assess it). To make the 
> LRNG to comply with NTG.1, RDRAND can provide entropy but must not become the 
> sole entropy provider which is the case now with that change.
> 
> - the current /dev/random implementation follows the same concept with the 
> exception of 3.15 and 3.16 where RDRAND was not rate-limited. In later 
> versions, this was changed.
> 

I'm not saying it should be *sole*.  I am questioning the value in
limiting it, as it seems to me that it could only ever produce a worse
result.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 0/5] /dev/random - a new approach

2016-08-15 Thread H. Peter Anvin

On 08/11/16 05:24, Stephan Mueller wrote:
> * prevent fast noise sources from dominating slow noise sources
>   in case of /dev/random

Can someone please explain if and why this is actually desirable, and if
this assessment has been passed to someone who has actual experience
with cryptography at the professional level?

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] crypto: sha256-mb - cleanup a || vs | typo

2016-06-29 Thread H. Peter Anvin

On 06/29/16 07:42, Dan Carpenter wrote:
> || and | behave basically the same here but || is intended.  It causes a
> static checker warning to mix up bitwise and logical operations.
> 
> Signed-off-by: Dan Carpenter 
> 
> diff --git a/arch/x86/crypto/sha256-mb/sha256_mb.c 
> b/arch/x86/crypto/sha256-mb/sha256_mb.c
> index c9d5dcc..4ec895a 100644
> --- a/arch/x86/crypto/sha256-mb/sha256_mb.c
> +++ b/arch/x86/crypto/sha256-mb/sha256_mb.c
> @@ -299,7 +299,7 @@ static struct sha256_hash_ctx 
> *sha256_ctx_mgr_submit(struct sha256_ctx_mgr *mgr,
>* Or if the user's buffer contains less than a whole block,
>* append as much as possible to the extra block.
>*/
> - if ((ctx->partial_block_buffer_length) | (len < SHA256_BLOCK_SIZE)) {
> + if ((ctx->partial_block_buffer_length) || (len < SHA256_BLOCK_SIZE)) {
>   /* Compute how many bytes to copy from user buffer into
>* extra block
>*/
> 

As far as I know the | was an intentional optimization, so you may way
to look at the generated code.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/7] random: replace non-blocking pool with a Chacha20-based CRNG

2016-06-20 Thread H. Peter Anvin

On 06/20/16 08:49, Stephan Mueller wrote:
> Am Montag, 20. Juni 2016, 11:01:47 schrieb Theodore Ts'o:
> 
> Hi Theodore,
> 
>>
>> So simply doing chacha20 encryption in a tight loop in the kernel
>> might not be a good proxy for what would actually happen in real life
>> when someone calls getrandom(2).  (Another good question to ask is
>> when someone might be needing to generate millions of 256-bit session
>> keys per second, when the D-H setup, even if you were using ECCDH,
>> would be largely dominating the time for the connection setup anyway.)
> 
> Is speed everything we should care about? What about:
> 
> - offloading of crypto operation from the CPU
> 

This sounds like a speed operation (and very unlikely to be a win given
the usage).

> - potentially additional security features a hardware cipher may provide like 
> cache coloring attack resistance?

How about burning that bridge when and if we get to it?  It sounds very
hypothetical.

I guess I could add in some comments here about how a lot of these
problems can be eliminated by offloading an entire DRNG into hardware,
but I don't think it is productive.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: linux/bitops.h

2016-05-06 Thread H. Peter Anvin

On May 6, 2016 1:07:13 PM PDT, Sasha Levin  wrote:
>On 05/04/2016 08:30 PM, H. Peter Anvin wrote:
>> On 05/04/16 15:06, John Denker wrote:
>>> On 05/04/2016 02:56 PM, H. Peter Anvin wrote:
>>>>> Beware that shifting by an amount >= the number of bits in the
>>>>> word remains Undefined Behavior.
>>>
>>>> This construct has been supported as a rotate since at least gcc2.
>>>
>>> How then should we understand the story told in commit d7e35dfa?
>>> Is the story wrong?
>>>
>>> At the very least, something inconsistent is going on.  There
>>> are 8 functions.  Why did d7e35dfa change one of them but
>>> not the other 7?
>> 
>> Yes. d7e35dfa is baloney IMNSHO.  All it does is produce worse code,
>and the description even says so.
>
>No, the description says that it produces worse code for *really
>really* ancient
>GCC versions.
>
>> As I said, gcc has treated the former code as idiomatic since gcc 2,
>so that support is beyond ancient.
>
>Because something works in a specific way on one compiler isn't a
>reason to
>ignore this noncompliance with the standard.
>
>
>Thanks,
>Sasha

When the compiler in question is our flagship target and our reference 
compiler, then yes, it matters.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: linux/bitops.h

2016-05-06 Thread H. Peter Anvin

On May 6, 2016 1:07:13 PM PDT, Sasha Levin  wrote:
>On 05/04/2016 08:30 PM, H. Peter Anvin wrote:
>> On 05/04/16 15:06, John Denker wrote:
>>> On 05/04/2016 02:56 PM, H. Peter Anvin wrote:
>>>>> Beware that shifting by an amount >= the number of bits in the
>>>>> word remains Undefined Behavior.
>>>
>>>> This construct has been supported as a rotate since at least gcc2.
>>>
>>> How then should we understand the story told in commit d7e35dfa?
>>> Is the story wrong?
>>>
>>> At the very least, something inconsistent is going on.  There
>>> are 8 functions.  Why did d7e35dfa change one of them but
>>> not the other 7?
>> 
>> Yes. d7e35dfa is baloney IMNSHO.  All it does is produce worse code,
>and the description even says so.
>
>No, the description says that it produces worse code for *really
>really* ancient
>GCC versions.
>
>> As I said, gcc has treated the former code as idiomatic since gcc 2,
>so that support is beyond ancient.
>
>Because something works in a specific way on one compiler isn't a
>reason to
>ignore this noncompliance with the standard.
>
>
>Thanks,
>Sasha

4.6.2 is not "really, really ancient."
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: better patch for linux/bitops.h

2016-05-05 Thread H. Peter Anvin

On 05/05/2016 03:18 PM, ty...@mit.edu wrote:
> 
> So this is why I tend to take a much more pragmatic viewpoint on
> things.  Sure, it makes sense to pay attention to what the C standard
> writers are trying to do to us; but if we need to suppress certain
> optimizations to write sane kernel code --- I'm ok with that.  And
> this is why using a trust-but-verify on a specific set of compilers
> and ranges of compiler versions is a really good idea
> 

For the record, the "portable" construct has apparently only been
supported since gcc 4.6.3.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: better patch for linux/bitops.h

2016-05-05 Thread H. Peter Anvin

On May 5, 2016 3:18:09 PM PDT, ty...@mit.edu wrote:
>On Thu, May 05, 2016 at 05:34:50PM -0400, Sandy Harris wrote:
>> 
>> I completely fail to see why tests or compiler versions should be
>> part of the discussion. The C standard says the behaviour in
>> certain cases is undefined, so a standard-compliant compiler
>> can generate more-or-less any code there.
>> 
>
>> As long as any of portability, reliability or security are among our
>> goals, any code that can give undefined behaviour should be
>> considered problematic.
>
>Because compilers have been known not necessarily to obey the specs,
>and/or interpret the specs in way that not everyone agrees with.  It's
>also the case that we are *already* disabling certain C optimizations
>which are technically allowed by the spec, but which kernel
>programmers consider insane (e.g., strict aliasing).
>
>And of course, memzero_explicit() which crypto people understand is
>necessary, is something that technically compilers are allowed to
>optimize according to the spec.  So trying to write secure kernel code
>which will work on arbitrary compilers may well be impossible.
>
>And which is also why people have said (mostly in jest), "A
>sufficiently advanced compiler is indistinguishable from an
>adversary."  (I assume people will agree that optimizing away a memset
>needed to clear secrets from memory would be considered adversarial,
>at the very least!)
>
>So this is why I tend to take a much more pragmatic viewpoint on
>things.  Sure, it makes sense to pay attention to what the C standard
>writers are trying to do to us; but if we need to suppress certain
>optimizations to write sane kernel code --- I'm ok with that.  And
>this is why using a trust-but-verify on a specific set of compilers
>and ranges of compiler versions is a really good idea
>
>- Ted

I have filed a gcc bug to have the preexisting rotate idiom officially 
documented as a GNU C extension.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70967
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: better patch for linux/bitops.h

2016-05-05 Thread H. Peter Anvin

On 05/05/2016 03:18 PM, ty...@mit.edu wrote:
> On Thu, May 05, 2016 at 05:34:50PM -0400, Sandy Harris wrote:
>>
>> I completely fail to see why tests or compiler versions should be
>> part of the discussion. The C standard says the behaviour in
>> certain cases is undefined, so a standard-compliant compiler
>> can generate more-or-less any code there.
>>
> 
>> As long as any of portability, reliability or security are among our
>> goals, any code that can give undefined behaviour should be
>> considered problematic.
> 
> Because compilers have been known not necessarily to obey the specs,
> and/or interpret the specs in way that not everyone agrees with.  It's
> also the case that we are *already* disabling certain C optimizations
> which are technically allowed by the spec, but which kernel
> programmers consider insane (e.g., strict aliasing).
> 
> And of course, memzero_explicit() which crypto people understand is
> necessary, is something that technically compilers are allowed to
> optimize according to the spec.  So trying to write secure kernel code
> which will work on arbitrary compilers may well be impossible.
> 
> And which is also why people have said (mostly in jest), "A
> sufficiently advanced compiler is indistinguishable from an
> adversary."  (I assume people will agree that optimizing away a memset
> needed to clear secrets from memory would be considered adversarial,
> at the very least!)
> 
> So this is why I tend to take a much more pragmatic viewpoint on
> things.  Sure, it makes sense to pay attention to what the C standard
> writers are trying to do to us; but if we need to suppress certain
> optimizations to write sane kernel code --- I'm ok with that.  And
> this is why using a trust-but-verify on a specific set of compilers
> and ranges of compiler versions is a really good idea
> 

In theory, theory and practice should agree, but in practice, practice
is what counts.  I fully agree we should get rid of UD behavior where
doing so is practical, but not at the cost of breaking real-life
compilers, expecially not gcc, and to a lesser but still very real
extent icc and clang.

I would also agree that we should push the gcc developers to add to the
manual C-standard-UD behavior which are well-defined under the
gnu89/gnu99/gnu11 C dialects.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: better patch for linux/bitops.h

2016-05-04 Thread H. Peter Anvin


On 05/04/16 21:03, Jeffrey Walton wrote:

On Wed, May 4, 2016 at 11:50 PM, Theodore Ts'o  wrote:

...
But instead of arguing over what works and doesn't, let's just create
the the test set and just try it on a wide range of compilers and
architectures, hmmm?


What are the requirements? Here's a short list:

   * No undefined behavior
 - important because the compiler writers use the C standard
   * Compiles to native "rotate IMMEDIATE" if the rotate amount is a
"constant expression" and the machine provides it
 - translates to a native rotate instruction if available
 - "rotate IMM" can be 3 times faster than "rotate REG"
 - do any architectures *not* provide a rotate?
   * Compiles to native "rotate REGISTER" if the rotate is variable and
the machine provides it
 - do any architectures *not* provide a rotate?
   * Constant time
 - important to high-integrity code
 - Non-security code paths probably don't care

Maybe the first thing to do is provide a different rotates for the
constant-time requirement when its in effect?



The disagreement here is the priority between these points.  In my very 
strong opinion, "no undefined behavior" per the C standard is way less 
important than the others; what matters is what gcc and the other 
compilers we care about do.  The kernel relies on various versions of 
C-standard-undefined behavior *all over the place*; for one thing 
sizeof(void *) == sizeof(size_t) == sizeof(unsigned long)!! but they are 
well-defined in the subcontext we care about.


(And no, not all architectures provide a rotate instruction.)

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: better patch for linux/bitops.h

2016-05-04 Thread H. Peter Anvin

On May 4, 2016 7:54:12 PM PDT, Jeffrey Walton  wrote:
>On Wed, May 4, 2016 at 10:41 PM, H. Peter Anvin  wrote:
>> On May 4, 2016 6:35:44 PM PDT, Jeffrey Walton 
>wrote:
>>>On Wed, May 4, 2016 at 5:52 PM, John Denker  wrote:
>>>> On 05/04/2016 02:42 PM, I wrote:
>>>>
>>>>> I find it very odd that the other seven functions were not
>>>>> upgraded. I suggest the attached fix-others.diff would make
>>>>> things more consistent.
>>>>
>>>> Here's a replacement patch.
>>>> ...
>>>
>>>+1, commit it.
>>>
>>>Its good for three additional reasons. First, this change means the
>>>kernel is teaching the next generation the correct way to do things.
>>>Many developers aspire to be kernel hackers, and they sometimes use
>>>the code from bitops.h. I've actually run across the code in
>>>production during an audit where the developers cited bitops.h.
>>>
>>>Second, it preserves a "silent and dark" cockpit during analysis.
>That
>>>is, when analysis is run, no findings are generated. Auditors and
>>>security folks like quiet tools, much like the way pilots like their
>>>cockpits (flashing lights and sounding buzzers usually means
>something
>>>is wrong).
>>>
>>>Third, the pattern is recognized by the major compilers, so the
>kernel
>>>should not have any trouble when porting any of the compilers. I
>often
>>>use multiple compiler to tease out implementation defined behavior in
>>>a effort to achieve greater portability. Here are the citations to
>>>ensure the kernel is safe with the pattern:
>>>
>>>  * GCC: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157
>>>  * ICC: http://software.intel.com/en-us/forums/topic/580884
>>>
>>>However, Clang may cause trouble because they don't want the
>>>responsibility of recognizing the pattern:
>>>
>>> * https://llvm.org/bugs/show_bug.cgi?id=24226#c8
>>>
>>>Instead, they provide a defective rotate. The "defect" here is its
>>>non-constant time due to the branch, so it may not be suitable for
>>>high-integrity or high-assurance code like linux-crypto:
>>>
>>>  * https://llvm.org/bugs/show_bug.cgi?id=24226#c5
>>>
>>>Jeff
>>
>> So you are actually saying outright that we should sacrifice *actual*
>portability in favor of *theoretical* portability?  What kind of
>twilight zone did we just step into?!
>
>I'm not sure what you mean. It will be well defined on all platforms.
>Clang may not recognize the pattern, which means they could run
>slower. GCC and ICC will be fine.
>
>Slower but correct code is what you have to live with until the Clang
>dev's fix their compiler.
>
>Its kind of like what Dr. Jon Bentley said: "If it doesn't have to be
>correct, I can make it as fast as you'd like it to be".
>
>Jeff

The current code works on all compilers we care about.  The code you propose 
does not; it doesn't work on anything but very recent versions of our flagship 
target compiler, and pretty your own admission might even cause security 
hazards in the kernel if compiled on clang.

That qualifies as insane in my book.  The church code is de facto portable with 
the intended outcome, the one you propose is not.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: better patch for linux/bitops.h

2016-05-04 Thread H. Peter Anvin

On May 4, 2016 6:35:44 PM PDT, Jeffrey Walton  wrote:
>On Wed, May 4, 2016 at 5:52 PM, John Denker  wrote:
>> On 05/04/2016 02:42 PM, I wrote:
>>
>>> I find it very odd that the other seven functions were not
>>> upgraded. I suggest the attached fix-others.diff would make
>>> things more consistent.
>>
>> Here's a replacement patch.
>> ...
>
>+1, commit it.
>
>Its good for three additional reasons. First, this change means the
>kernel is teaching the next generation the correct way to do things.
>Many developers aspire to be kernel hackers, and they sometimes use
>the code from bitops.h. I've actually run across the code in
>production during an audit where the developers cited bitops.h.
>
>Second, it preserves a "silent and dark" cockpit during analysis. That
>is, when analysis is run, no findings are generated. Auditors and
>security folks like quiet tools, much like the way pilots like their
>cockpits (flashing lights and sounding buzzers usually means something
>is wrong).
>
>Third, the pattern is recognized by the major compilers, so the kernel
>should not have any trouble when porting any of the compilers. I often
>use multiple compiler to tease out implementation defined behavior in
>a effort to achieve greater portability. Here are the citations to
>ensure the kernel is safe with the pattern:
>
>  * GCC: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157
>  * ICC: http://software.intel.com/en-us/forums/topic/580884
>
>However, Clang may cause trouble because they don't want the
>responsibility of recognizing the pattern:
>
> * https://llvm.org/bugs/show_bug.cgi?id=24226#c8
>
>Instead, they provide a defective rotate. The "defect" here is its
>non-constant time due to the branch, so it may not be suitable for
>high-integrity or high-assurance code like linux-crypto:
>
>  * https://llvm.org/bugs/show_bug.cgi?id=24226#c5
>
>Jeff

So you are actually saying outright that we should sacrifice *actual* 
portability in favor of *theoretical* portability?  What kind of twilight zone 
did we just step into?!
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: linux/bitops.h

2016-05-04 Thread H. Peter Anvin

On May 4, 2016 6:20:32 PM PDT, Jeffrey Walton  wrote:
>On Wed, May 4, 2016 at 7:06 PM, Andi Kleen  wrote:
>> On Wed, May 04, 2016 at 03:06:04PM -0700, John Denker wrote:
>>> On 05/04/2016 02:56 PM, H. Peter Anvin wrote:
>>> >> Beware that shifting by an amount >= the number of bits in the
>>> >> word remains Undefined Behavior.
>>>
>>> > This construct has been supported as a rotate since at least gcc2.
>>>
>>> How then should we understand the story told in commit d7e35dfa?
>>> Is the story wrong?
>>
>> I don't think Linux runs on a system where it would make a difference
>> (like a VAX), and also gcc always converts it before it could.
>> Even UBSan should not complain because it runs after the conversion
>> to ROTATE.
>>
>From what I understand, its a limitation in the barrel shifter and the
>way the shift bits are handled.
>
>Linux runs on a great number of devices, so its conceivable (likely?)
>a low-cost board would have hardware limitations that not found in
>modern desktops and servers or VAX.
>
>Jeff

This is a compiler feature, though.  And if icc or clang don't support the 
idiom they would never have compiled a working kernel.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: linux/bitops.h

2016-05-04 Thread H. Peter Anvin


On 05/04/16 15:06, John Denker wrote:

On 05/04/2016 02:56 PM, H. Peter Anvin wrote:

Beware that shifting by an amount >= the number of bits in the
word remains Undefined Behavior.



This construct has been supported as a rotate since at least gcc2.


How then should we understand the story told in commit d7e35dfa?
Is the story wrong?

At the very least, something inconsistent is going on.  There
are 8 functions.  Why did d7e35dfa change one of them but
not the other 7?


Yes. d7e35dfa is baloney IMNSHO.  All it does is produce worse code, and 
the description even says so.


As I said, gcc has treated the former code as idiomatic since gcc 2, so 
that support is beyond ancient.


-hpa




--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread H. Peter Anvin

On May 4, 2016 2:42:53 PM PDT, John Denker  wrote:
>On 05/04/2016 12:07 PM, ty...@thunk.org wrote:
>
>> it doesn't hit the
>> UB case which Jeffrey was concerned about. 
>
>That should be good enough for present purposes
>
>However, in the interests of long-term maintainability, I
>would suggest sticking in a comment or assertion saying 
>that ror32(,shift) is never called with shift=0.  This
>can be removed if/when bitops.h is upgraded.
>
>There is a track record of compilers doing Bad Things in
>response to UB code, including some very counterintuitive
>Bad Things.
>
>On Wed, May 04, 2016 at 11:29:57AM -0700, H. Peter Anvin wrote:
>>>
>>> If bitops.h doesn't do the right thing, we need to
>>> fix bitops.h.
>
>Most of the ror and rol functions in linux/bitops.h
>should be considered unsafe, as currently implemented.
>http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/bitops.h?id=04974df8049fc4240d22759a91e035082ccd18b4#n103
>
>I don't see anything in the file that suggests any limits
>on the range of the second argument.  So at best it is an
>undocumented trap for the unwary.  This has demonstrably
>been a problem in the past.  The explanation in the attached
>fix-rol32.diff makes amusing reading.
>
>Of the eight functions
>  ror64, rol64, ror32, rol32, ror16, rol16, ror8, and rol8,
>only one of them can handle shifting by zero, namely rol32.
>It was upgraded on Thu Dec 3 22:04:01 2015; see the attached
>fix-rol32.diff.
>
>I find it very odd that the other seven functions were not
>upgraded.  I suggest the attached fix-others.diff would make
>things more consistent.
>
>Beware that shifting by an amount >= the number of bits in the
>word remains Undefined Behavior.  This should be either documented
>or fixed.  It could be fixed easily enough.

This construct has been supported as a rotate since at least gcc2.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread H. Peter Anvin

On 05/04/2016 12:07 PM, ty...@thunk.org wrote:
> 
> If we are all agreed that what is in bitops.h is considered valid,
> then we can start converting people over to using the version defined
> in bitops.h, and if there is some compiler issue we need to work
> around, at least we only need to put the workaround in a single header
> file.
> 

Yes, please.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread H. Peter Anvin

On May 4, 2016 11:22:25 AM PDT, Jeffrey Walton  wrote:
>On Wed, May 4, 2016 at 1:49 PM,   wrote:
>> On Wed, May 04, 2016 at 10:40:20AM -0400, Jeffrey Walton wrote:
>>> > +static inline u32 rotl32(u32 v, u8 n)
>>> > +{
>>> > +   return (v << n) | (v >> (sizeof(v) * 8 - n));
>>> > +}
>>>
>>> That's undefined behavior when n=0.
>>
>> Sure, but it's never called with n = 0; I've double checked and the
>> compiler seems to do the right thing with the above pattern as well.
>
>> Hmm, it looks like there is a "standard" version rotate left and
>right
>> defined in include/linux/bitops.h.  So I suspect it would make sense
>> to use rol32 as defined in bitops.h --- and this is probably
>something
>
>bitops.h could work in this case, but its not an ideal solution. GCC
>does not optimize the code below as expected under all use cases
>because GCC does not recognize it as a rotate (see
>http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157):
>
>return (v << n) | (v >> (sizeof(v) * 8 - n));
>
>And outside of special cases like Salsa, ChaCha and BLAKE2, the code
>provided in bitops.h suffers UB on arbitrary data. So I think care
>needs to be taken when selecting functions from bitops.h.
>
>> that we should do for the rest of crypto/*.c, where people seem to be
>> defininig their own version of something like rotl32 (I copied the
>> contents of crypto/chacha20_generic.c to lib/chacha20, so this
>pattern
>> of defining one's own version of rol32 isn't new).
>
>Yeah, I kind of thought there was some duplication going on.
>
>But I think bitops.h should be fixed. Many folks don't realize the
>lurking UB, and many folks don't realize its not always optimized
>well.
>
>>> I think the portable way to do a rotate that avoids UB is the
>>> following. GCC, Clang and ICC recognize the pattern, and emit a
>rotate
>>> instruction.
>>>
>>> static const unsigned int MASK=31;
>>> return (v<>(-n&MASK));
>>>
>>> You should also avoid the following because its not constant time
>due
>>> to the branch:
>>>
>>> return n == 0 ? v : (v << n) | (v >> (sizeof(v) * 8 - n));
>>>
>>
>> Where is this coming from?  I don't see this construct in the patch.
>
>My bad... It was a general observation. I've seen folks try to correct
>the UB by turning to something like that.
>
>Jeff

We don't care about UB, we care about gcc, and to a lesser extent LLVM and ICC. 
 If bitops.h doesn't do the right thing, we need to fix bitops.h.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread H. Peter Anvin

On May 4, 2016 10:30:41 AM PDT, ty...@mit.edu wrote:
>On Tue, May 03, 2016 at 10:50:25AM +0200, Stephan Mueller wrote:
>> > +/*
>> > + * crng_init =  0 --> Uninitialized
>> > + *2 --> Initialized
>> > + *3 --> Initialized from input_pool
>> > + */
>> > +static int crng_init = 0;
>> 
>> shouldn't that be an atomic_t ?
>
>The crng_init variable only gets incremented (it progresses from
>0->1->2->3) and it's protected by the primary_crng->lock spinlock.
>There are a few places where we read it without first taking the lock,
>but that's where we depend on it getting incremented and where if we
>race with another CPU which has just bumped the crng_init status, it's
>not critical.  (Or we can take the lock and then recheck the crng_init
>if we really need to be sure.)
>
>> > +static void _initialize_crng(struct crng_state *crng)
>> > +{
>> > +  int i;
>> > +  unsigned long   rv;
>> 
>> Why do you use unsigned long here? I thought the state[i] is unsigned
>int.
>
>Because it gets filled in via arch_get_random_seed_long(&rv) and
>arch_get_random_log(&rv).  If that means we get 64 bits and we only
>use 32 bits, that's OK
>
>> Would it make sense to add the ChaCha20 self test vectors from
>RFC7539 here to 
>> test that the ChaCha20 works?
>
>We're not doing that for any of the other ciphers and hashes.  We
>didn't do that for SHA1, and the chacha20 code where I took this from
>didn't check for this as well.  What threat are you most worried
>about.  We don't care about chacha20 being exactly chacha20, so long
>as it's cryptographically strong.  In fact I think I removed a
>potential host order byteswap in the set key operation specifically
>because we don't care and interop.
>
>If this is required for FIPS mode, we can add that later.  I was
>primarily trying to keep the patch small to be easier for people to
>look at it, so I've deliberately left off bells and whistles that
>aren't strictly needed to show that the new approach is sound.
>
>> > +  if (crng_init++ >= 2)
>> > +  wake_up_interruptible(&crng_init_wait);
>> 
>> Don't we have a race here with the crng_init < 3 check in crng_reseed
>
>> considering multi-core systems?
>
>No, because we are holding the primary_crng->lock spinlock.  I'll add
>a comment explaining the locking protections which is intended for
>crng_init where we declare it.
>
>
>> > +  if (num < 16 || num > 32) {
>> > +  WARN_ON(1);
>> > +  pr_err("crng_reseed: num is %d?!?\n", num);
>> > +  }
>> > +  num_words = (num + 3) / 4;
>> > +  for (i = 0; i < num_words; i++)
>> > +  primary_crng.state[i+4] ^= tmp[i];
>> > +  primary_crng.init_time = jiffies;
>> > +  if (crng_init < 3) {
>> 
>> Shouldn't that one be if (crng_init < 3 && num >= 16) ?
>
>I'll just change the above WRN_ON test to be:
>
> BUG_ON(num < 16 || num > 32);
>
>This really can't happen, and I had it as a WARN_ON with a printk for
>debugging purpose in case I was wrong about how the code works.
>
>> > +  crng_init = 3;
>> > +  process_random_ready_list();
>> > +  wake_up_interruptible(&crng_init_wait);
>> > +  pr_notice("random: crng_init 3\n");
>> 
>> Would it make sense to be more descriptive here to allow readers of
>dmesg to 
>> understand the output?
>
>Yes, what we're currently printing during the initialization:
>
>random: crng_init 1
>random: crng_init 2
>random: crng_init 3
>
>was again mostly for debugging purposes.  What I'm thinking about
>doing is changing crng_init 2 and 3 messages to be:
>
>random: crng fast init done
>random: crng conservative init done
>
>> > +  }
>> > +  ret = 1;
>> > +out:
>> > +  spin_unlock_irqrestore(&primary_crng.lock, flags);
>> 
>> memzero_explicit of tmp?
>
>Good point, I've added a call to memzero_explicit().
>
>> > +static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
>> > +{
>> > +  unsigned long v, flags;
>> > +  struct crng_state *crng = &primary_crng;
>> > +
>> > +  if (crng_init > 2 &&
>> > +  time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL))
>> > +  crng_reseed(&input_pool);
>> > +  spin_lock_irqsave(&crng->lock, flags);
>> > +  if (arch_get_random_long(&v))
>> > +  crng->state[14] ^= v;
>> > +  chacha20_block(&crng->state[0], out);
>> > +  if (crng->state[12] == 0)
>> > +  crng->state[13]++;
>
>> What is the purpose to only cover the 2nd 32 bit value of the nonce
>with 
>> arch_get_random?
>> 
>> state[12]++? Or why do you increment the nonce?
>
>In Chacha20, its output is a funcrtion of the state array, where
>state[0..3] is a constant specified by the Chacha20 definition,
>state[4..11] is the Key, and state[12..15] is the IV.  The
>chacha20_block() function increments state[12] each time it is called,
>so state[12] is being used as the block counter.  The increment of
>state[13] is used to make state[13] to be the upper 32-bits of the
>block counter.  We *should* be reseeding more often than 2**32 calls
>to chacha20_block

Re: General flags to turn things off (getrandom, pid lookup, etc)

2014-07-25 Thread H. Peter Anvin

On 07/25/2014 11:30 AM, Andy Lutomirski wrote:
> - 32-bit GDT code segments [huge attack surface]
> - 64-bit GDT code segments [probably pointless]

I presume you mean s/GDT/LDT/.

We already don't allow 64-bit LDT code segments.  Also, it is unclear to
me how 32-bit LDT segments have a huge attack surface, given that there
will realistically always be a 32-bit *GDT* segment present.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -v5] random: introduce getrandom(2) system call

2014-07-24 Thread H. Peter Anvin

On 07/24/2014 01:54 PM, Andy Lutomirski wrote:
> 
> Or that someone writes userspace code that gets -EPERM/-EACCESS on
> getrandom with GRND_RANDOM and falls back to something worse than
> getrandom w/o GRND_RANDOM.
> 

-ENXIO?

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] random: limit the contribution of the hw rng to at most half

2014-07-17 Thread H. Peter Anvin

On 07/17/2014 03:08 PM, Theodore Ts'o wrote:
> 
> There was another complaint more recently, that wasn't related to
> nordrand.  And I was rather disturbed that in
> add_interrupt_randomness() 97% of the entropy was coming from RDSEED.
> I'm willing to have some of the entropy come from RDSEED by default,
> but 97% seems to really way too much.
> 

OK, the that formula was Linus' suggestion.

> And the emergency refill code in random_read() was extremely
> problematic.  First of all, we're dropping 512 bytes on the stack,
> which is kind of bad, but hopefully all of the code paths which call
> random_read() won't have a terribly deep stack.

512 bytes is fine here (1K would possibly not have been.)

> Secondly, even after the emergency refill, we block anyway.

Not unless we don't get any data back.  For a nonzero return we loop
back to extract_entropy_user().

> Remember, regardless of whether or not you believe (as a former head
> of NSA's technology directorate recently claimed) that the DUAL-EC p
> and q were generated using NSA's standard high-quality HW RNG system
> which they use to generate all of their crypto variables, or (as the
> NIST advisory panel has recently concluded) that it was highly likely
> that DUAL-EC was backdoored, it's the optics that matter, because
> those of us without top-drawer SI security clearances can't prove
> things one way or another.

Dual-EC was also heavily criticized by the civilian cryptographic
community since at least 2006.

> And when 99.95% of the entropy when reading large quantities of keying
> material from /dev/random is coming from RDSEED, that just has
> terrible, terrible optics

I just want to make sure we don't negatively impact the real security of
users because of "optics".  We already have a lot of problems with
people extracting long-living keys from /dev/urandom because /dev/random
is too slow.

I have no objection to the khwrngd route -- in fact, as you know, I have
been heavily encouraging the development of khwrngd with exactly this in
mind -- but it is just an issue of timing.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] random: limit the contribution of the hw rng to at most half

2014-07-17 Thread H. Peter Anvin

On 07/17/2014 03:03 AM, Theodore Ts'o wrote:
> For people who don't trust a hardware RNG which can not be audited,
> the changes to add support for RDSEED can be troubling since 97% or
> more of the entropy will be contributed from the in-CPU hardware RNG.
> 
> We now have a in-kernel khwrngd, so for those people who do want to
> implicitly trust the CPU-based system, we could create an arch-rng
> hw_random driver, and allow khwrng refill the entropy pool.  This
> allows system administrator whether or not they trust the CPU (I
> assume the NSA will trust RDRAND/RDSEED implicitly :-), and if so,
> what level of entropy derating they want to use.
> 
> The reason why this is a really good idea is that if different people
> use different levels of entropy derating, it will make it much more
> difficult to design a backdoor'ed hwrng that can be generally
> exploited in terms of the output of /dev/random when different attack
> targets are using differing levels of entropy derating.
> 
> Signed-off-by: Theodore Ts'o 

I saw exactly one complaint to that nature, but that was from someone
who really wanted the "nordrand" option (at which point I observed that
it had inadvertently left RDSEED enabled which quickly got rectified.)
The implication was that this was a request from a specific customer who
presumably have their own "audited" hardware RNG.

There may have been other complaints (justified or not) but if so I
haven't seen them.  I'm wondering if we are overgeneralizing here and if
so if it wouldn't be better to defer this until the hwrng supplier for
this is ready, which probably won't happen in time for 3.17 just given
the current timeline.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: arch_random_refill

2014-05-11 Thread H. Peter Anvin

On 05/11/2014 08:36 PM, Stephan Mueller wrote:
> 
> But in our current predicament, not everybody trusts a few potentially easily 
> manipulated gates that have no other purpose than produce white noise which 
> are developed by the biggest chip vendor in the US. Gates which have other 
> purposes may not be that easily manipulated.
>

Incidentally, I disagree with the "easily manipulated" bit.  Yes, I have
seen the paper which says that you can do it in such a way that it
doesn't show up on *visual* examination.  However, put an electrical
probe on it and it shows up immediately.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: arch_random_refill

2014-05-11 Thread H. Peter Anvin

On 05/11/2014 08:36 PM, Stephan Mueller wrote:
> 
> Ohh, ok, thanks for fixing that. :-) 
> 
> Though what makes me wonder is the following: why are some RNGs forced to use 
> the hw_random framework whereas some others are not? What is the driver for 
> that?
> 
> The current state of random.c vs. drivers/char/hw_random and the strong in-
> kernel separation between both makes me wonder. Isn't that all kind of 
> inconsistent?
> 

The main differences are speed of access, trivial interface, and
architectural guarantees.  You also don't have to deal with enumeration,
DMA engines, interrupts, indirect access, or bus drivers, which all are
utterly unacceptable on a synchronous path.

That being said, it is getting clear that we most likely would be better
off with the kernel directly feeding from at least a subset of the
hw_random drivers, rather than waiting for user space to come along and
launch a daemon... after $DEITY knows how many other processes have
already been launched.  There are patches being worked on to make that
happen, although there are a fair number of potential issues, including
the fact that some of the hw_random drivers are believed to be dodgy --
for example, the TPM driver: some TPMs are believed to not contain any
entropy element and simply rely on a factory-seeded nonvolatile counter
(since the TPM has to have support for nonvolatile counters anyway, this
hardware is already present.)

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: arch_random_refill

2014-05-11 Thread H. Peter Anvin

On 05/11/2014 04:01 PM, Stephan Mueller wrote:
> Hi Peter,
> 
> some time back when the RDRAND instruction was debated, a patch was offered 
> for driver/char/random.c that in essence turned /dev/random into a frontend 
> for RDRAND in case that instruction was available. The patch kind of 
> monopolized the noise sources such that if a user space "random number hog" 
> pulled data from /dev/random endlessly, the (almost) only noise source was 
> RDRAND. As that patch treated RDRAND to provide entropy, the blocking 
> behavior 
> went away for /dev/random.
> 

This is false in a number of ways.

First of all... we NEVER pulled either /dev/random or /dev/urandom
directly from RDRAND.  We used RDRAND directly for kernel internal
randomness uses.  Users did object to this.

> That patch did not sit well with some developers and it got finally changed 
> such that the output of RDRAND is now just XORed with the output of the 
> "classic" /dev/random behavior -- /dev/random is still blocking. 

Mixing in RDRAND into /dev/random and /dev/urandom is actually

> With the current development cycle for 3.15, the function arch_random_refill 
> is added as presented in [1]. It now uses RDSEED instead of RDRAND. Yet, the 
> way this function is called in random_read seems (as I have no system with an 
> RDSEED, I cannot test) to show the very same behavior as the aforementioned 
> RDRAND patch: the blocking behavior of /dev/random will be gone and RDSEED 
> will monopolize the noise sources in case of a user space hog.

There is a huge difference between this and what people objected to
earlier: we filter everything through the kernel random number pool
system, which would require a herculean mathematical effort to reverse
even if the output of RDSEED was 100% predictable.

> Note, I do not see an issue with the patch that adds RDSEED as part of 
> add_interrupt_randomness outlined in [2]. The reason is that this patch does 
> not monopolizes the noise sources.
> 
> I do not want to imply that Intel (or any other chip manufacturer that will 
> hook into arch_random_refill) intentionally provides bad entropy (and this 
> email shall not start a discussion about entropy again), but I would like to 
> be able to only use noise sources that I can fully audit. As it is with 
> hardware, I am not able to see what it is doing.

I have to point out the irony in this given your previous proposals,
however...

> Thus, may I ask that arch_random_refill is revised such that it will not 
> monopolize the noise sources? If somebody wants that, he can easily use rngd.

Feel free to build the kernel without CONFIG_ARCH_RANDOM, or use the
"nordrand" option to the kernel.  These options are there for a reason.

Now when you mention it, though, the nordrand option should turn off
RDSEED as well as RDRAND.  It currently doesn't; that is a bug, plain
and simple.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] crypto: sha{256,512}_ssse3 - remove asmlinkage from static functions

2014-04-17 Thread H. Peter Anvin

On 04/17/2014 09:58 PM, Herbert Xu wrote:
>>
>> It doesn't make sense, sorry.  The right thing to drop here is not
>> "asmlinkage", it is "static": this is an external declaration.
> 
> It's a function pointer that's static, not the function that
> it's pointing to.
> 

{facepalm} Right, function *pointer*.

Duh.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] crypto: sha{256,512}_ssse3 - remove asmlinkage from static functions

2014-04-17 Thread H. Peter Anvin

On 04/17/2014 08:28 AM, Marek Vasut wrote:
> On Wednesday, April 16, 2014 at 06:19:50 PM, Jianyu Zhan wrote:
>> Commit 128ea04a9885("lto: Make asmlinkage __visible") restricts
>> asmlinkage to externally_visible, this causes compilation warnings:
>>
>> arch/x86/crypto/sha256_ssse3_glue.c:56:1:
>> warning: ‘externally_visible’ attribute have effect only on public
>> objects [-Wattributes]
>>
>> static asmlinkage void (*sha256_transform_asm)(const char *, u32 *,
>> u64); ^
>>
>> arch/x86/crypto/sha512_ssse3_glue.c:55:1:
>> warning: ‘externally_visible’ attribute have effect only on public
>> objects [-Wattributes] static asmlinkage void
>> (*sha512_transform_asm)(const char *, u64 *, ^
>>
>> Drop asmlinkage here to avoid such warnings.
>>
>> Also see Commit 8783dd3a37a5853689e1("irqchip: Remove asmlinkage from
>> static functions")
>>
>> Signed-off-by: Jianyu Zhan 
> 
> Makes sense, please add my humble
> 
> Reviewed-by: Marek Vasut 
> 

It doesn't make sense, sorry.  The right thing to drop here is not
"asmlinkage", it is "static": this is an external declaration.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] x86/crypto: ghash: use C implementation for setkey()

2014-03-27 Thread H. Peter Anvin

On 03/27/2014 04:46 AM, Ard Biesheuvel wrote:
> On 27 March 2014 12:36, Herbert Xu  wrote:
>> On Thu, Mar 27, 2014 at 12:29:00PM +0100, Ard Biesheuvel wrote:
>>> The GHASH setkey() function uses SSE registers but fails to call
>>> kernel_fpu_begin()/kernel_fpu_end(). Instead of adding these calls, and
>>> then having to deal with the restriction that they cannot be called from
>>> interrupt context, move the setkey() implementation to the C domain.
>>
>> Note that setkey cannot be called from interrupt context since
>> allocation/setkey is supposed to be slow-path material.
>>
>> But your approach is fine by me.
> 
> I agree that it makes little sense to call this from atomic context,
> but that still means (I think, but the x86 guys should confirm) that
> you are supposed to call kernel_fpu_begin() and kernel_fpu_end().
> 

Yes.  I'm suspecting calling kernel_fpu_begin() for a single GF
operation is probably not worth it, so I'm fine with reimplementing it
in integer logic.

Acked-by: H. Peter Anvin 

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] crypto: x86/sha1 - regression and other fixes

2014-03-24 Thread H. Peter Anvin

On 03/24/2014 09:10 AM, Mathias Krause wrote:
> The recent addition of the AVX2 variant of the SHA1 hash function wrongly
> disabled the AVX variant by introducing a flaw in the feature test. Fixed
> in patch 1.
> 
> The alignment calculations of the AVX2 assembler implementation are
> questionable, too. Especially the page alignment of the stack pointer is
> broken in multiple ways. Fixed in patch 2. In patch 3 another issue for
> code alignment is fixed.
> 
> Please apply!
> 
> Mathias Krause (3):
>   crypto: x86/sha1 - re-enable the AVX variant
>   crypto: x86/sha1 - fix stack alignment of AVX2 variant
>   crypto: x86/sha1 - reduce size of the AVX2 asm implementation
> 
>  arch/x86/crypto/sha1_avx2_x86_64_asm.S |8 ++--
>  arch/x86/crypto/sha1_ssse3_glue.c  |   26 --
>  2 files changed, 18 insertions(+), 16 deletions(-)
> 

For all these patches:

Reviewed-by: H. Peter Anvin 

Thank you for doing these.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] crypto: x86/sha1 - regression and other fixes

2014-03-24 Thread H. Peter Anvin

On 03/24/2014 09:10 AM, Mathias Krause wrote:
> The recent addition of the AVX2 variant of the SHA1 hash function wrongly
> disabled the AVX variant by introducing a flaw in the feature test. Fixed
> in patch 1.
> 
> The alignment calculations of the AVX2 assembler implementation are
> questionable, too. Especially the page alignment of the stack pointer is
> broken in multiple ways. Fixed in patch 2. In patch 3 another issue for
> code alignment is fixed.
> 
> Please apply!
> 

Yikes.  Yes... the alignment calculation is confused in the extreme.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 1/1] crypto: SHA1 transform x86_64 AVX2

2014-03-20 Thread H. Peter Anvin

On 03/20/2014 05:09 PM, Herbert Xu wrote:
> On Thu, Mar 20, 2014 at 03:23:27PM -0700, H. Peter Anvin wrote:
>> Again... Herbert, David... yours or mine?
> 
> Yes I will be taking this through cryptodev.
> 

Thanks!

Acked-by: H. Peter Anvin 

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v6 1/1] crypto: SHA1 transform x86_64 AVX2

2014-03-20 Thread H. Peter Anvin

Again... Herbert, David... yours or mine?

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 1/1] crypto: SHA1 transform x86_64 AVX2

2014-03-19 Thread H. Peter Anvin

Herbert, do you want to take this or should I?

In the former case, please feel free to add my:

Acked-by: H. Peter Anvin 

-hpa


On 03/19/2014 02:21 PM, chandramouli narayanan wrote:
> From 8948c828aa929a3e2531ca88e3f05fbeb1ff53db Mon Sep 17 00:00:00 2001
> From: Chandramouli Narayanan 
> Date: Wed, 19 Mar 2014 09:30:12 -0700
> Subject: [PATCH v4 1/1] crypto: SHA1 transform x86_64 AVX2
> 
> This git patch adds x86_64 AVX2 optimization of SHA1
> transform to crypto support. The patch has been tested with 3.14.0-rc1
> kernel.
> 
> On a Haswell desktop, with turbo disabled and all cpus running
> at maximum frequency, tcrypt shows AVX2 performance improvement
> from 3% for 256 bytes update to 16% for 1024 bytes update over
> AVX implementation.
> 
> This patch adds sha1_avx2_transform(), the glue, build and
> configuration changes needed for AVX2 optimization of
> SHA1 transform to crypto support.
> 
> sha1-ssse3 is one module which adds the necessary optimization
> support (SSSE3/AVX/AVX2) for the low-level SHA1 transform function. With 
> better
> optimization support, transform function is overridden as the case may be.
> In the case of AVX2, due to performance reasons across datablock sizes,
> the AVX or AVX2 transform function is used at run-time as it suits best.
> The Makefile change therefore appends the necessary objects to the linkage.
> Due to this, the patch merely appends AVX2 transform to the existing build mix
> and Kconfig support and leaves the configuration build support as is.
> 
> Changes noted from the initial version of this patch are based on the
> feedback from the community:
> a) check for BMI2 in addition to AVX2 support since
> __sha1_transform_avx2() uses rorx
> b) Since the module build has dependency on 64bit, it is
> redundant to check it in the code here.
> c) coding style cleanup
> d) simplification of the assembly code where macros are repetitively used.
> 
> Signed-off-by: Chandramouli Narayanan 
> ---
>  arch/x86/crypto/Makefile   |   3 +
>  arch/x86/crypto/sha1_avx2_x86_64_asm.S | 706 
> +
>  arch/x86/crypto/sha1_ssse3_glue.c  |  50 ++-
>  crypto/Kconfig |   4 +-
>  4 files changed, 754 insertions(+), 9 deletions(-)
>  create mode 100644 arch/x86/crypto/sha1_avx2_x86_64_asm.S
> 
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index 6ba54d6..61d6e28 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -79,6 +79,9 @@ aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
>  aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o
>  ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
>  sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
> +ifeq ($(avx2_supported),yes)
> +sha1-ssse3-y += sha1_avx2_x86_64_asm.o
> +endif
>  crc32c-intel-y := crc32c-intel_glue.o
>  crc32c-intel-$(CONFIG_64BIT) += crc32c-pcl-intel-asm_64.o
>  crc32-pclmul-y := crc32-pclmul_asm.o crc32-pclmul_glue.o
> diff --git a/arch/x86/crypto/sha1_avx2_x86_64_asm.S 
> b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
> new file mode 100644
> index 000..559eb6c
> --- /dev/null
> +++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
> @@ -0,0 +1,706 @@
> +/*
> + *   Implement fast SHA-1 with AVX2 instructions. (x86_64)
> + *
> + * This file is provided under a dual BSD/GPLv2 license.  When using or
> + * redistributing this file, you may do so under either license.
> + *
> + * GPL LICENSE SUMMARY
> + *
> + * Copyright(c) 2014 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of version 2 of the GNU General Public License as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * Contact Information:
> + * Ilya Albrekht 
> + * Maxim Locktyukhin 
> + * Ronen Zohar 
> + * Chandramouli Narayanan 
> + *
> + * BSD LICENSE
> + *
> + * Copyright(c) 2014 Intel Corporation.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + *
> + * Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in
> + * the docu

Re: [PATCH 2/2] SHA1 transform: x86_64 AVX2 optimization - glue & build-v2

2014-03-17 Thread H. Peter Anvin

On 03/17/2014 09:53 AM, chandramouli narayanan wrote:
> On second thoughts, with sha1-sse3-(CONFIG_AS_AVX2) +=
> sha1_avx2_x86_64_asm.o, I have build issues and sha1_transform_avx2
> undefined in sha1-sss3.ko. 
> 
> I can rid #ifdef CONFIG_AS_AVX2 in patch1. The following works though:
> ifeq ($(avx2_supported),yes)
> sha1-ssse3-y += sha1_avx2_x86_64_asm.o
> endif

Yes, the sad thing is that the CONFIG_AS_* things aren't real config
symbols, despite the name.  They might be in the future when Kconfig can
run test probes (something we have needed for a very long time.)

The "yes" versus "y", though, is a total faceplant.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RESEND 3] hwrng: add randomness to system from rng sources

2014-03-16 Thread H. Peter Anvin

On 03/04/2014 02:39 PM, Matt Mackall wrote:
> 
> [temporarily coming out of retirement to provide a clue]
> 
> The pool mixing function is intentionally _reversible_. This is a
> crucial security property.
> 
> That means, if I have an initial secret pool state X, and hostile
> attacker controlled data Y, then we can do:
> 
> X' = mix(X, Y)
> 
>  and 
> 
> X = unmix(X', Y)
> 
> We can see from this that the combination of (X' and Y) still contain
> the information that was originally in X. Since it's clearly not in Y..
> it must all remain in X'.
> 

This of course assumes that the attacker doesn't know the state of the
pool X.

The other thing to note is that reversible doesn't necessarily mean
linear (the current mixing function is linear.)  AES, for example, is
reversible (if and only if you possess the key) but is highly nonlinear.

I'm not saying we should use AES to mix the pool -- it is almost
guaranteed to be too expensive.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RESEND 3] hwrng: add randomness to system from rng sources

2014-03-16 Thread H. Peter Anvin

On 03/03/2014 03:51 PM, Kees Cook wrote:
> When bringing a new RNG source online, it seems like it would make sense
> to use some of its bytes to make the system entropy pool more random,
> as done with all sorts of other devices that contain per-device or
> per-boot differences.
> 
> Signed-off-by: Kees Cook 

I would like to raise again the concept of at least optionally using a
kernel thread, rather than a user-space daemon, to feed hwrng output to
the kernel pools.  The main service rngd provides is FIPS tests, but
those FIPS tests were withdrawn as a standard over 10 years ago and are
known to be extremely weak, at the very best a minimal sanity check.
Furthermore, they are completely useless against the output of any RNG
which contains a cryptographic whitener, which is the vast majority of
commercial sources -- this is especially so since rngd doesn't expect to
have to do any data reduction for output coming from hwrng.

Furthermore, if more than one hwrng device is available, rngd will only
be able to read one of them because of the way /dev/hwrng is implemented
in the kernel.

For contrast, the kernel could do data reduction just fine by only
crediting the entropy coming out of each hwrng driver with a fractional
amount.

That does *not* mean that there aren't random number generators which
require significant computation better done in user space.  For example,
an audio noise daemon or a lava lamp camera which requires video processing.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1] crypto: aesni - fix build on x86 (32bit)

2014-01-06 Thread H. Peter Anvin

On 01/06/2014 03:39 PM, Tim Chen wrote:
> 
> Will renaming the file to aesni_intel_avx-x86_64.S make things clearer
> now?
> 
> Tim

Yes.

Acked-by: H. Peter Anvin 

Herbert, can you pick it up?

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1] crypto: aesni - fix build on x86 (32bit)

2014-01-06 Thread H. Peter Anvin

On 01/06/2014 12:26 PM, Borislav Petkov wrote:
> On Mon, Jan 06, 2014 at 10:10:55AM -0800, Tim Chen wrote:
>> Yes, the code is in the file named aesni_intel_avx.S. So it should be
>> clear that the code is meant for x86_64.
> 
> How do you deduce aesni_intel_avx.S is meant for x86_64 only from the
> name?
> 
> Shouldn't it be called aesni_intel_avx-x86_64.S, as is the naming
> convention in arch/x86/crypto/
> 

Quite.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1] crypto: aesni - fix build on x86 (32bit)

2014-01-06 Thread H. Peter Anvin

On 01/06/2014 09:57 AM, Tim Chen wrote:
> On Mon, 2014-01-06 at 09:45 -0800, H. Peter Anvin wrote:
>> Can the code be adjusted to compile for 32 bit x86 or is that pointless?
>>
> 
> Code was optimized for wide registers.  So it is only meant for x86_64.
> 

Aren't the "wide registers" the vector registers?  Or are you also
relying on 64-bit integer registers (in which case we should just rename
the file to make it clear it is x86-64 only.)

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1] crypto: aesni - fix build on x86 (32bit)

2014-01-06 Thread H. Peter Anvin

Can the code be adjusted to compile for 32 bit x86 or is that pointless?

Tim Chen  wrote:
>
>
>On Mon, 2013-12-30 at 15:52 +0200, Andy Shevchenko wrote:
>> It seems commit d764593a "crypto: aesni - AVX and AVX2 version of
>AESNI-GCM
>> encode and decode" breaks a build on x86_32 since it's designed only
>for
>> x86_64. This patch makes a compilation unit conditional to
>CONFIG_64BIT and
>> functions usage to CONFIG_X86_64.
>
>Thanks for catching and fixing it.
>
>Tim
>> 
>> Signed-off-by: Andy Shevchenko 

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] CPU Jitter RNG: inclusion into kernel crypto API and /dev/random

2013-11-10 Thread H. Peter Anvin

On 11/10/2013 09:21 AM, Stephan Mueller wrote:
> 
> Here you say it: we *assume*. And please bear in mind that we all know for a 
> fact that the theory behind quantum physics is not complete, because it does 
> not work together with the theory of relativity. That again is a hint that 
> there is NO proof that decay is really unpredictable.
> 

Honestly, if you want to be taken seriously, this is not the kind of
thing to put in your discussion.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3.11-rc1] crypto: Fix boot failure due to module dependency.

2013-07-19 Thread H. Peter Anvin

On 07/19/2013 04:26 PM, Greg Kroah-Hartman wrote:
>>
>> RAID has effectively the same issue, and we just "solved" it by
>> compiling in all the accelerators into the top-level module.
> 
> Then there's nothing to be done in udev or kmod, right?
> 

I don't know.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3.11-rc1] crypto: Fix boot failure due to module dependency.

2013-07-19 Thread H. Peter Anvin

On 07/19/2013 04:16 PM, Greg Kroah-Hartman wrote:
> 
> udev isn't doing any module loading, 'modprobe' is just being called for
> any new module alias that shows up in the system, and all of the drivers
> that match it then get loaded.
> 
> How is it a problem if a module is attempted to be loaded that is
> already loaded?  How is it a problem if a different module is loaded for
> a device already bound to a driver?  Both of those should be total
> "no-ops" for the kernel.
> 
> But, I don't know anything about the cpu code, how is loading a module
> causing problems?  That sounds like it needs to be fixes, as any root
> user can load modules whenever they want, you can't protect the kernel
> from doing that.
> 

The issue here seems to be the dynamic binding nature of the crypto
subsystem.  When something needs crypto, it will request the appropriate
crypto module (e.g. crct10dif), which may race with detecting a specific
hardware accelerator based on CPUID or device information (e.g.
crct10dif_pclmul).

RAID has effectively the same issue, and we just "solved" it by
compiling in all the accelerators into the top-level module.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3.11-rc1] crypto: Fix boot failure due to module dependency.

2013-07-18 Thread H. Peter Anvin

On 07/18/2013 03:17 PM, Rafael J. Wysocki wrote:
>>
>> alias x86cpu:vendor:*:family:*:model:*:feature:*0081* crct10dif_pclmul
>>
>> This should cause udev to load the crct10dif_pclml module when cpu
>> support the PCLMULQDQ (feature code 0081).  I did my testing during
>> development on 3.10 and the module was indeed loaded.
>>
>> However, I found that the following commit under 3.11-rc1 broke
>> the mechanism after some bisection.
>>
>> commit ac212b6980d8d5eda705864fc5a8ecddc6d6eacc
>> Author: Rafael J. Wysocki 
>> Date:   Fri May 3 00:26:22 2013 +0200
>>
>>  ACPI / processor: Use common hotplug infrastructure
>>   Split the ACPI processor driver into two parts, one that is
>>  non-modular, resides in the ACPI core and handles the enumeration
>>  and hotplug of processors and one that implements the rest of the
>>  existing processor driver functionality.
>>  Rafael, can you check and see if this can be fixed so those
>> optimized
>> crypto modules for Intel cpu that support them can be loaded?
> 
> I think this is an ordering issue between udev startup and the time when
> devices are registered.
> 
> I wonder what happens if you put those modules into the initramfs image?
> 

OK, this bothers me on some pretty deep level... a set of changes
exclusively in drivers/acpi breaking functionality which had nothing to
do with ACPI, specifically CPU-feature-based module loading.

Please let me know what the investigation comes up with, or if I need to
get more directly involved.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/11] Added macro to check for AVX2 feature.

2013-03-22 Thread H. Peter Anvin

Just syntactic overhead.  We should probably discuss it among ourselves first 
though.

Tim Chen  wrote:

>On Fri, 2013-03-22 at 17:21 -0700, H. Peter Anvin wrote:
>> I really, really hate these macros... Not sure they are worth the
>extra noise.
>> 
>
>I can do without the macro and I'll remove it.
>
>Wonder the reason that such macro should be avoided?
>
>Tim

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/11] Added macro to check for AVX2 feature.

2013-03-22 Thread H. Peter Anvin

I really, really hate these macros... Not sure they are worth the extra noise.

Tim Chen  wrote:

>Macro to facilitate checking the availability of the AVX2 feature.
>
>Signed-off-by: Tim Chen 
>---
> arch/x86/include/asm/cpufeature.h | 1 +
> 1 file changed, 1 insertion(+)
>
>diff --git a/arch/x86/include/asm/cpufeature.h
>b/arch/x86/include/asm/cpufeature.h
>index 2d9075e..db98ec7 100644
>--- a/arch/x86/include/asm/cpufeature.h
>+++ b/arch/x86/include/asm/cpufeature.h
>@@ -313,6 +313,7 @@ extern const char * const x86_power_flags[32];
> #define cpu_has_cx16  boot_cpu_has(X86_FEATURE_CX16)
> #define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU)
> #define cpu_has_topoext   boot_cpu_has(X86_FEATURE_TOPOEXT)
>+#define cpu_has_avx2  boot_cpu_has(X86_FEATURE_AVX2)
> 
> #ifdef CONFIG_X86_64
> 

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3 v2] Optimize CRC32C calculation using PCLMULQDQ in crc32c-intel module

2012-09-27 Thread H. Peter Anvin


On 09/27/2012 03:44 PM, Tim Chen wrote:

Version 2
This version of the patch series fixes compilation errors for
32 bit x86 targets.

Version 1
This patch series optimized CRC32C calculations with PCLMULQDQ
instruction for crc32c-intel module.  It speeds up the original
implementation by 1.6x for 1K buffer and by 3x for buffer 4k or
more.  The tcrypt module was enhanced for doing speed test
on crc32c calculations.



Herbert - you are handling this one, right?

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] crypto: camellia-x86_64 - module init/exit functions should be static

2012-03-22 Thread H. Peter Anvin

On 03/21/2012 05:46 PM, Herbert Xu wrote:
> On Wed, Mar 21, 2012 at 05:40:03PM -0700, Randy Dunlap wrote:
>> On 03/15/2012 01:11 PM, Jussi Kivilinna wrote:
>>
>>> This caused conflict with twofish-x86_64-3way when compiled into kernel,
>>> same function names and not static.
>>
>> Have these patches been merged anywhere?
>> I'm still seeing build problems in linux-next 20120321.
> 
> Thanks for the reminder, I'll push these through today.

Please push these to Linus ASAP, it is breaking the x86-64 allyesconfig
build upstream right now.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH 00/16] Crypto keys and module signing [ver #2]

2011-12-05 Thread H. Peter Anvin

On 11/29/2011 03:42 PM, David Howells wrote:
> 
> I have provided a couple of subtypes: DSA and RSA.  Both types have signature
> verification facilities available within the kernel, and both can be used for
> module signature verification with any encryption algorithm known by the PGP
> parser, provided the appropriate algorithm is compiled directly into the
> kernel.
> 

Do we really need the complexity of a full OpenPGP parser?  Parsers are
notorious security problems.  Furthermore, using DSA in anything but a
hard legacy application is not something you want to encourage, so why
support DSA?

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] Feed entropy pool via high-resolution clocksources

2011-06-19 Thread H. Peter Anvin

On 06/19/2011 08:07 AM, Herbert Xu wrote:
> On Sun, Jun 19, 2011 at 09:38:43AM -0400, Neil Horman wrote:
>>
>> It sounds to me like, if its desireous to bypass the entropy pool, then we
>> should bypass the /dev/random path altogether.  Why not write a hwrng driver
>> that can export access to the rdrand instruction via a misc device.
> 
> I presume the rdrand instruction can be used from user-space
> directly.
> 

Yes, it can.

Again, RDRAND is not suitable for /dev/random (as opposed to
/dev/urandom users.)  /dev/urandom is used both by user space (and here
the only reason to hook it up to /dev/urandom is compatibility with
existing userspace; we are working separately to enabling user space
users like OpenSSL to use RDRAND directly) and by kernel users via the
internal APIs.

/dev/random as far as I can tell is only ever fed to userspace, however,
the guarantees that it is at least supposed to give are very, very
strict.  RDRAND do not fulfill those criteria, but we should be able to
use it as part of its implementation.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] Feed entropy pool via high-resolution clocksources

2011-06-18 Thread H. Peter Anvin

On 06/17/2011 01:28 PM, Matt Mackall wrote:
>>
>> The one use case that it is cryptographically insufficient for is to
>> seed a new PRNG, which probably means it is unsuitable for being fed
>> as-is into /dev/random.
> 
> The thing to understand about the input side of /dev/random is that it's
> COMPLETELY immune to untrusted data. So there's absolutely no harm in
> sending it data of questionable entropy so long as you don't tell it to
> account it. And, of course, if it DOES contain entropy, it makes things
> better.
> 
> Think of it this way: I have a coin in my pocket. You, the attacker,
> tell me to flip it. You can do that any number of times and not improve
> your guess about the coin's state over your initial guess. This is what
> it means to have a reversible mixing function: no number of iterations
> reduces the degrees of freedom of the pool.
> 

What I meant is that it is unsuitable to *bypass the pool* for
/dev/random.  I think we can -- and almost certainly should -- use
RDRAND on the input side; we just have to figure out the accounting.

However, RDRAND is high enough quality (and high enough bandwidth) that
it should be not just possible but desirable to completely bypass the
pool system for /dev/urandom users and pull straight from the RDRAND
instruction.  I don't actually know what the exact numbers look like,
but the stall conditions being looked at are of the order of "every core
in the socket trying to execute RDRAND at the same time."

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/5] tsc: wire up entropy generation function

2011-06-14 Thread H. Peter Anvin

On 06/13/2011 04:10 PM, Matt Mackall wrote:
> 
> Well the issue is that samples are going to be roughly aligned to some
> multiple of the bus frequency. If an interrupt occurs on bus cycle X,
> this code will be hit at roughly TSC cycle X*M+d.
> 
>> This is correct; at the very least I would multiply the low 32 bits of
>> the TSC with a 32-bit prime number before mixing.
> 
> This multiply doesn't actually accomplish anything. Mixing known values
> into the pool is harmless because the mixing function is fully
> reversable. ie:
> 
>  unmix(sample, mix(sample, pool)) = pool
> 
> If you didn't know the contents of pool before, you can't guess it
> after.
> 
> The danger comes if you (a) already know pool and (b) can narrow sample
> to a tractable set of values. Then you can calculate a set of possible
> pool' values and compare the resulting output to confirm the actual
> state of pool' (aka state extension attack). Sticking a constant
> multiplier on sample doesn't affect this attack at all.
> 

It only matters if you actually truncate it to LSBs.

>> However, the big issue with this is that it's recursive... what causes
>> this to be invoked... probably an interrupt, which is going to have been
>> invoked by a timer, quite possible the TSC deadline timer.  Oops.
> 
> ..and the sampling function already mixes in a TSC timestamp with the
> provided timestamp.

Right.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/5] tsc: wire up entropy generation function

2011-06-13 Thread H. Peter Anvin

On 06/13/2011 05:39 PM, Kent Borg wrote:
> I was assuming that drivers, responding to an interrupt from some
> external event, would want to make this call.
> Such as a network driver.

Those already are doing this.

> Two points:
> 
> 1. Why look at the high-order bits?  How are they going to have values
> that cannot be inferred?  If you are looking for entropy, the low-order
> bits is where they're going keep it.  (Think Willie Sutton.)  If looking
> at the low-byte is cheaper, then let's be cheap.  If, however, if
> looking at more bytes *is* as cheap, then there is no harm.  But in any
> case let's keep the code lean enough that it can be called from an
> interrupt service routine.

Consider the case where the TSC is running in steps of 64 and you're
using the low 6 bits.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/5] tsc: wire up entropy generation function

2011-06-13 Thread H. Peter Anvin

On 06/13/2011 03:27 PM, Venkatesh Pallipadi wrote:
> On Mon, Jun 13, 2011 at 3:06 PM, Jarod Wilson  wrote:
>> TSC is high enough resolution that we can use its low-order byte to
>> stir new data into the random number generator entropy pool.
> 
> From what I vaguely remember from years past, rdtsc, especially last
> few bits of it are not very good as random number source. As they are
> based on lower bus frequency and a multiplier. May be things have
> changed these days. Adding Peter and Suresh for comments.

This is correct; at the very least I would multiply the low 32 bits of
the TSC with a 32-bit prime number before mixing.

However, the big issue with this is that it's recursive... what causes
this to be invoked... probably an interrupt, which is going to have been
invoked by a timer, quite possible the TSC deadline timer.  Oops.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -v4] crypto: Add PCLMULQDQ accelerated GHASH implementation

2009-11-03 Thread H. Peter Anvin

On 11/03/2009 01:03 AM, Ingo Molnar wrote:
>>
>> .macro xmm_num opd xmm
>> .ifc \xmm,%xmm0
>> \opd = 0
>> .endif
>> .ifc \xmm,%xmm1
>> \opd = 1
>> .endif
>> .ifc \xmm,%xmm2
>> \opd = 2
>> .endif
>> .ifc \xmm,%xmm3
>> \opd = 3
>> .endif
>> .ifc \xmm,%xmm4
>> \opd = 4
>> .endif
>> .ifc \xmm,%xmm5
>> \opd = 5
>> .endif
>> .ifc \xmm,%xmm6
>> \opd = 6
>> .endif
>> .ifc \xmm,%xmm7
>> \opd = 7
>> .endif
>> .ifc \xmm,%xmm8
>> \opd = 8
>> .endif
>> .ifc \xmm,%xmm9
>> \opd = 9
>> .endif
>> .ifc \xmm,%xmm10
>> \opd = 10
>> .endif
>> .ifc \xmm,%xmm11
>> \opd = 11
>> .endif
>> .ifc \xmm,%xmm12
>> \opd = 12
>> .endif
>> .ifc \xmm,%xmm13
>> \opd = 13
>> .endif
>> .ifc \xmm,%xmm14
>> \opd = 14
>> .endif
>> .ifc \xmm,%xmm15
>> \opd = 15
>> .endif
>> .endm
>>
>> .macro PSHUFB_XMM xmm1 xmm2
>> xmm_num pshufb_opd1 \xmm1
>> xmm_num pshufb_opd2 \xmm2
>> .if (pshufb_opd1 < 8) && (pshufb_opd2 < 8)
>> .byte 0x66, 0x0f, 0x38, 0x00, 0xc0 | pshufb_opd1 | (pshufb_opd2<<3)
>> .elseif (pshufb_opd1 >= 8) && (pshufb_opd2 < 8)
>> .byte 0x66, 0x41, 0x0f, 0x38, 0x00, 0xc0 | (pshufb_opd1-8) | (pshufb_opd2<<3)
>> .elseif (pshufb_opd1 < 8) && (pshufb_opd2 >= 8)
>> .byte 0x66, 0x44, 0x0f, 0x38, 0x00, 0xc0 | pshufb_opd1 | ((pshufb_opd2-8)<<3)
>> .else
>> .byte 0x66, 0x45, 0x0f, 0x38, 0x00, 0xc0 | (pshufb_opd1-8) | 
>> ((pshufb_opd2-8)<<3)
>> .endif
>> .endm
> 
> Looks far too clever, i like it :-) We have quite a few assembly macros 
> in arch/x86/include/asm/. The above one could be put into calling.h for 
> example.
> 

I would really like to see something like that, with only one minor
tweak: please use submacros to generate the REX and MODRM bytes, since
we are *guaranteed* to want to do the same thing again.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -v2 4/5] x86: Move kernel_fpu_using to irq_is_fpu_using in asm/i387.h

2009-08-05 Thread H. Peter Anvin


Herbert Xu wrote:


Peter, do you want to apply this patch in your tree or would
you prefer for it to go through my tree along with the rest of
the series?



I'll take it tomorrow... want to double-check that we don't have any 
real issues w.r.t. corruption there.


-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 6/7] x86: Move kernel_fpu_using to asm/i387.h

2009-06-17 Thread H. Peter Anvin

Huang Ying wrote:
> 
> After some thinking, I think something as follow may be more
> appropriate:
> 
> /* This may be useful for someone else */
> static inline bool fpu_using(void)
> {
>   return !(read_cr0() & X86_CR0_TS);
> }
> 
> static inline bool irq_fpu_using(void)
> {
>   return in_interrupt() && fpu_using();
> }
> 

Yes, looks good.  I'll pull in the patch as soon as I get it.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 6/7] x86: Move kernel_fpu_using to asm/i387.h

2009-06-17 Thread H. Peter Anvin

Herbert Xu wrote:
> On Wed, Jun 17, 2009 at 10:06:44AM -0700, H. Peter Anvin wrote:
> 
>> Huang: if I recall correctly, these functions were originally designed
>> to deal with the fact that VIA processors generate spurious #TS faults
>> due to broken design of the Padlock instructions.  The AES and PCLMUL
>> instructions actually use SSE registers and so will require different
>> structure.
> 
> No irq_ts_save was the one designed for the VIA, the Intel stuff
> does save the FPU state.
> 

Ah, sorry, misremembered.  Great!  Will apply.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 6/7] x86: Move kernel_fpu_using to asm/i387.h

2009-06-17 Thread H. Peter Anvin

Ingo Molnar wrote:
>>  
>> +static inline int kernel_fpu_using(void)
>> +{
>> +if (in_interrupt() && !(read_cr0() & X86_CR0_TS))
>> +return 1;
>> +return 0;
>> +}
>> +
> 
> Looks sane to me. Herbert, do you ack it?
> 

Although I have to say, the structure of:

if (boolean test)
return 1;
return 0;

... truly was hit with the ugly stick.  It really should be:

static inline bool kernel_fpu_using(void)
{
return in_interrupt() && !(read_cr0() && C86_CR0_TS);
}

Huang: if I recall correctly, these functions were originally designed
to deal with the fact that VIA processors generate spurious #TS faults
due to broken design of the Padlock instructions.  The AES and PCLMUL
instructions actually use SSE registers and so will require different
structure.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

70 matches

Mail list logo