Re: [PATCH 1/1] x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline code before returning to long mode

2019-01-07 Thread Benjamin Gilbert
On Mon, Jan 07, 2019 at 02:03:15PM -0600, Wei Huang wrote:
> On 1/7/19 2:25 AM, Kirill A. Shutemov wrote:
> > On Fri, Jan 04, 2019 at 05:44:11AM +, Wei Huang wrote:
> >> In some old AMD KVM implementation, guest's EFER.LME bit is cleared by KVM
> >> when the hypervsior detects guest sets CR0.PG to 0. This causes guest OS
> >> to reboot when it tries to return from 32-bit trampoline code because CPU
> >> is in incorrect state: CR4.PAE=1, CR0.PG=1, CS.L=1, but EFER.LME=0.
> >> As a precaution, this patch sets EFER.LME=1 as part of long mode
> >> activation procedure. This extra step won't cause any harm when Linux is
> >> booting on bare-metal machine.
> >>
> >> Signed-off-by: Wei Huang 
> > 
> > Thanks for tracking this down.
> 
> BTW I think this patch _might_ be related the recent reboot issue
> reported in https://lkml.org/lkml/2018/7/1/836 since the symptoms are
> exactly the same.

The problem in that case turned out to be https://lkml.org/lkml/2018/7/4/723
which was fixed by d503ac531a.

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Benjamin Gilbert
On Fri, Jul 06, 2018 at 11:11:10AM -0700, Andi Kleen wrote:
> There are valid use cases to override the flags. I use it sometimes too,
> and know some other people do to.
> 
> But you need to know what you're doing. 
> 
> Perhaps a warning during build would be reasonable. So if you ask
> for a build log you would see it.

In our case, the package is presumably passing LDFLAGS="" to override the
LDFLAGS environment variable already set by the packaging system.  This has
worked for years without a problem.

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Benjamin Gilbert
On Fri, Jul 06, 2018 at 11:11:10AM -0700, Andi Kleen wrote:
> There are valid use cases to override the flags. I use it sometimes too,
> and know some other people do to.
> 
> But you need to know what you're doing. 
> 
> Perhaps a warning during build would be reasonable. So if you ask
> for a build log you would see it.

In our case, the package is presumably passing LDFLAGS="" to override the
LDFLAGS environment variable already set by the packaging system.  This has
worked for years without a problem.

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-04 Thread Benjamin Gilbert
On Wed, Jul 04, 2018 at 06:08:57PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > > >> kconfig,
> > > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at 
> > > > >> least)
> > > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 
> > > > >> 5-level
> > > > >> paging boot if kernel is above 4G") fixes it.  I've attached our 
> > > > >> kernel
> > > > >> config for reference, and am happy to test patches, provide sample 
> > > > >> QCOW
> > > > >> images, etc.
> > > > >
> > > > 
> > > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > > 
> > > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > > too with the same symptoms
> > > 
> > > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> > 
> > -flto in LDFLAGS screws up this part of paging_prepare():
> 
> I've got it wrong. *Any* LDFLAGS option passed to make this way:
> 
>   make LDFLAGS="..."
> 
> would cause a issue. Even empty.
> 
> It overrides all assignments to the variable in the makefile.
> As result the image is built without -pie and linker doesn't generate
> position independed code.
> 
> Looks like the patch below helps, but my make-fu is poor.

Sure enough, we're passing LDFLAGS="" to make.  Your patch fixes the boot
failure for me.

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-04 Thread Benjamin Gilbert
On Wed, Jul 04, 2018 at 06:08:57PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > > >> kconfig,
> > > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at 
> > > > >> least)
> > > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 
> > > > >> 5-level
> > > > >> paging boot if kernel is above 4G") fixes it.  I've attached our 
> > > > >> kernel
> > > > >> config for reference, and am happy to test patches, provide sample 
> > > > >> QCOW
> > > > >> images, etc.
> > > > >
> > > > 
> > > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > > 
> > > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > > too with the same symptoms
> > > 
> > > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> > 
> > -flto in LDFLAGS screws up this part of paging_prepare():
> 
> I've got it wrong. *Any* LDFLAGS option passed to make this way:
> 
>   make LDFLAGS="..."
> 
> would cause a issue. Even empty.
> 
> It overrides all assignments to the variable in the makefile.
> As result the image is built without -pie and linker doesn't generate
> position independed code.
> 
> Looks like the patch below helps, but my make-fu is poor.

Sure enough, we're passing LDFLAGS="" to make.  Your patch fixes the boot
failure for me.

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Benjamin Gilbert
On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> I don't know how to solve it. As far as I know we don't support compiling
> kernel with LTO in mainline.
> 
> Any suggestions?
> 
> Benjamin, do you change LDFLAGS or CFLAGS when compiling the kernel?

We're using the standard build flags as far as I can tell.  In particular,
we don't enable LTO, and I've verified that -flto isn't in the build logs.

Here's a sample image:

https://users.developer.core-os.net/bgilbert/4.17/vmlinuz-4.17.3-coreos
https://users.developer.core-os.net/bgilbert/4.17/vmlinux-4.17.3-coreos
https://users.developer.core-os.net/bgilbert/4.17/System.map

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Benjamin Gilbert
On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> I don't know how to solve it. As far as I know we don't support compiling
> kernel with LTO in mainline.
> 
> Any suggestions?
> 
> Benjamin, do you change LDFLAGS or CFLAGS when compiling the kernel?

We're using the standard build flags as far as I can tell.  In particular,
we don't enable LTO, and I've verified that -flto isn't in the build logs.

Here's a sample image:

https://users.developer.core-os.net/bgilbert/4.17/vmlinuz-4.17.3-coreos
https://users.developer.core-os.net/bgilbert/4.17/vmlinux-4.17.3-coreos
https://users.developer.core-os.net/bgilbert/4.17/System.map

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-02 Thread Benjamin Gilbert
On Mon, Jul 02, 2018 at 12:34:50PM +0300, Kirill A. Shutemov wrote:
> Could you check if you can trigger the issue with my changes to config and
> the way I run KVM?

Yes, the issue still triggers in that case.  I've also verified that the
kernel boots normally with your qemu command if the commit is reverted.

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-02 Thread Benjamin Gilbert
On Mon, Jul 02, 2018 at 12:34:50PM +0300, Kirill A. Shutemov wrote:
> Could you check if you can trigger the issue with my changes to config and
> the way I run KVM?

Yes, the issue still triggers in that case.  I've also verified that the
kernel boots normally with your qemu command if the commit is reverted.

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-01 Thread Benjamin Gilbert
On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> 4.17 kernels built with the CoreOS Container Linux toolchain and kconfig, 
> up to and including 4.17.3, fail to boot on AMD64 running in (at least) 
> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.  
> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level 
> paging boot if kernel is above 4G") fixes it.  I've attached our kernel 
> config for reference, and am happy to test patches, provide sample QCOW 
> images, etc.

Adding linux-x86_64, LKML.

--Benjamin Gilbert


config.gz
Description: application/gzip


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-01 Thread Benjamin Gilbert
On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> 4.17 kernels built with the CoreOS Container Linux toolchain and kconfig, 
> up to and including 4.17.3, fail to boot on AMD64 running in (at least) 
> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.  
> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level 
> paging boot if kernel is above 4G") fixes it.  I've attached our kernel 
> config for reference, and am happy to test patches, provide sample QCOW 
> images, etc.

Adding linux-x86_64, LKML.

--Benjamin Gilbert


config.gz
Description: application/gzip


[PATCH 0/3] Drop config options left over from in-kernel firmware

2018-01-23 Thread Benjamin Gilbert
5620a0d1aacd ("firmware: delete in-kernel firmware") left behind several
config options which no longer do anything.  Remove them and fix up
documentation.

Benjamin Gilbert (3):
  USB: serial: keyspan: Drop firmware Kconfig options
  firmware: Drop FIRMWARE_IN_KERNEL Kconfig option
  firmware: Fix up docs referring to FIRMWARE_IN_KERNEL

 Documentation/driver-api/firmware/built-in-fw.rst |  7 +-
 Documentation/x86/microcode.txt   |  5 +-
 arch/arc/configs/axs101_defconfig |  1 -
 arch/arc/configs/axs103_defconfig |  1 -
 arch/arc/configs/axs103_smp_defconfig |  1 -
 arch/arc/configs/haps_hs_defconfig|  1 -
 arch/arc/configs/haps_hs_smp_defconfig|  1 -
 arch/arc/configs/hsdk_defconfig   |  1 -
 arch/arc/configs/nsim_700_defconfig   |  1 -
 arch/arc/configs/nsim_hs_defconfig|  1 -
 arch/arc/configs/nsim_hs_smp_defconfig|  1 -
 arch/arc/configs/nsimosci_defconfig   |  1 -
 arch/arc/configs/nsimosci_hs_defconfig|  1 -
 arch/arc/configs/nsimosci_hs_smp_defconfig|  1 -
 arch/arc/configs/tb10x_defconfig  |  1 -
 arch/arc/configs/vdk_hs38_defconfig   |  1 -
 arch/arc/configs/vdk_hs38_smp_defconfig   |  1 -
 arch/arm/configs/cns3420vb_defconfig  |  1 -
 arch/arm/configs/magician_defconfig   |  1 -
 arch/arm/configs/mini2440_defconfig   |  1 -
 arch/arm/configs/mv78xx0_defconfig|  1 -
 arch/arm/configs/mxs_defconfig|  1 -
 arch/arm/configs/orion5x_defconfig|  1 -
 arch/arm/configs/tegra_defconfig  |  1 -
 arch/arm/configs/vf610m4_defconfig|  1 -
 arch/m68k/configs/amiga_defconfig |  1 -
 arch/m68k/configs/apollo_defconfig|  1 -
 arch/m68k/configs/atari_defconfig |  1 -
 arch/m68k/configs/bvme6000_defconfig  |  1 -
 arch/m68k/configs/hp300_defconfig |  1 -
 arch/m68k/configs/mac_defconfig   |  1 -
 arch/m68k/configs/multi_defconfig |  1 -
 arch/m68k/configs/mvme147_defconfig   |  1 -
 arch/m68k/configs/mvme16x_defconfig   |  1 -
 arch/m68k/configs/q40_defconfig   |  1 -
 arch/m68k/configs/sun3_defconfig  |  1 -
 arch/m68k/configs/sun3x_defconfig |  1 -
 arch/mips/configs/ar7_defconfig   |  1 -
 arch/mips/configs/ath25_defconfig |  1 -
 arch/mips/configs/ath79_defconfig |  1 -
 arch/mips/configs/pic32mzda_defconfig |  1 -
 arch/mips/configs/qi_lb60_defconfig   |  1 -
 arch/mips/configs/rm200_defconfig |  9 ---
 arch/mips/configs/rt305x_defconfig|  1 -
 arch/mips/configs/xway_defconfig  |  1 -
 arch/mn10300/configs/asb2364_defconfig|  1 -
 arch/powerpc/configs/44x/warp_defconfig   |  1 -
 arch/powerpc/configs/c2k_defconfig| 12 
 arch/powerpc/configs/g5_defconfig | 12 
 arch/powerpc/configs/maple_defconfig  | 12 
 arch/powerpc/configs/mpc512x_defconfig|  1 -
 arch/powerpc/configs/pmac32_defconfig | 12 
 arch/powerpc/configs/ppc6xx_defconfig |  1 -
 arch/powerpc/configs/ps3_defconfig|  1 -
 arch/powerpc/configs/wii_defconfig|  1 -
 arch/s390/configs/zfcpdump_defconfig  |  1 -
 arch/sh/configs/polaris_defconfig |  1 -
 arch/tile/configs/tilegx_defconfig|  1 -
 arch/tile/configs/tilepro_defconfig   |  1 -
 arch/x86/Kconfig  |  6 +-
 drivers/base/Kconfig  | 28 ++--
 drivers/usb/serial/Kconfig| 78 ---
 62 files changed, 11 insertions(+), 222 deletions(-)

-- 
2.13.6



[PATCH 2/3] firmware: Drop FIRMWARE_IN_KERNEL Kconfig option

2018-01-23 Thread Benjamin Gilbert
It doesn't actually do anything.  Merge its help text into
EXTRA_FIRMWARE.

Fixes: 5620a0d1aacd ("firmware: delete in-kernel firmware")
Fixes: 0946b2fb38fd ("firmware: cleanup FIRMWARE_IN_KERNEL message")
Signed-off-by: Benjamin Gilbert <benjamin.gilb...@coreos.com>
Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: Robin H. Johnson <robb...@gentoo.org>
---
 arch/arc/configs/axs101_defconfig  |  1 -
 arch/arc/configs/axs103_defconfig  |  1 -
 arch/arc/configs/axs103_smp_defconfig  |  1 -
 arch/arc/configs/haps_hs_defconfig |  1 -
 arch/arc/configs/haps_hs_smp_defconfig |  1 -
 arch/arc/configs/hsdk_defconfig|  1 -
 arch/arc/configs/nsim_700_defconfig|  1 -
 arch/arc/configs/nsim_hs_defconfig |  1 -
 arch/arc/configs/nsim_hs_smp_defconfig |  1 -
 arch/arc/configs/nsimosci_defconfig|  1 -
 arch/arc/configs/nsimosci_hs_defconfig |  1 -
 arch/arc/configs/nsimosci_hs_smp_defconfig |  1 -
 arch/arc/configs/tb10x_defconfig   |  1 -
 arch/arc/configs/vdk_hs38_defconfig|  1 -
 arch/arc/configs/vdk_hs38_smp_defconfig|  1 -
 arch/arm/configs/cns3420vb_defconfig   |  1 -
 arch/arm/configs/magician_defconfig|  1 -
 arch/arm/configs/mini2440_defconfig|  1 -
 arch/arm/configs/mv78xx0_defconfig |  1 -
 arch/arm/configs/mxs_defconfig |  1 -
 arch/arm/configs/orion5x_defconfig |  1 -
 arch/arm/configs/tegra_defconfig   |  1 -
 arch/arm/configs/vf610m4_defconfig |  1 -
 arch/m68k/configs/amiga_defconfig  |  1 -
 arch/m68k/configs/apollo_defconfig |  1 -
 arch/m68k/configs/atari_defconfig  |  1 -
 arch/m68k/configs/bvme6000_defconfig   |  1 -
 arch/m68k/configs/hp300_defconfig  |  1 -
 arch/m68k/configs/mac_defconfig|  1 -
 arch/m68k/configs/multi_defconfig  |  1 -
 arch/m68k/configs/mvme147_defconfig|  1 -
 arch/m68k/configs/mvme16x_defconfig|  1 -
 arch/m68k/configs/q40_defconfig|  1 -
 arch/m68k/configs/sun3_defconfig   |  1 -
 arch/m68k/configs/sun3x_defconfig  |  1 -
 arch/mips/configs/ar7_defconfig|  1 -
 arch/mips/configs/ath25_defconfig  |  1 -
 arch/mips/configs/ath79_defconfig  |  1 -
 arch/mips/configs/pic32mzda_defconfig  |  1 -
 arch/mips/configs/qi_lb60_defconfig|  1 -
 arch/mips/configs/rt305x_defconfig |  1 -
 arch/mips/configs/xway_defconfig   |  1 -
 arch/mn10300/configs/asb2364_defconfig |  1 -
 arch/powerpc/configs/44x/warp_defconfig|  1 -
 arch/powerpc/configs/mpc512x_defconfig |  1 -
 arch/powerpc/configs/ppc6xx_defconfig  |  1 -
 arch/powerpc/configs/ps3_defconfig |  1 -
 arch/powerpc/configs/wii_defconfig |  1 -
 arch/s390/configs/zfcpdump_defconfig   |  1 -
 arch/sh/configs/polaris_defconfig  |  1 -
 arch/tile/configs/tilegx_defconfig |  1 -
 arch/tile/configs/tilepro_defconfig|  1 -
 drivers/base/Kconfig   | 28 +---
 53 files changed, 5 insertions(+), 75 deletions(-)

diff --git a/arch/arc/configs/axs101_defconfig 
b/arch/arc/configs/axs101_defconfig
index ec7c849a5c8e..09f85154c5a4 100644
--- a/arch/arc/configs/axs101_defconfig
+++ b/arch/arc/configs/axs101_defconfig
@@ -44,7 +44,6 @@ CONFIG_IP_PNP_RARP=y
 CONFIG_DEVTMPFS=y
 # CONFIG_STANDALONE is not set
 # CONFIG_PREVENT_FIRMWARE_BUILD is not set
-# CONFIG_FIRMWARE_IN_KERNEL is not set
 CONFIG_SCSI=y
 CONFIG_BLK_DEV_SD=y
 CONFIG_NETDEVICES=y
diff --git a/arch/arc/configs/axs103_defconfig 
b/arch/arc/configs/axs103_defconfig
index 63d3cf69e0b0..09fed3ef22b6 100644
--- a/arch/arc/configs/axs103_defconfig
+++ b/arch/arc/configs/axs103_defconfig
@@ -44,7 +44,6 @@ CONFIG_IP_PNP_RARP=y
 CONFIG_DEVTMPFS=y
 # CONFIG_STANDALONE is not set
 # CONFIG_PREVENT_FIRMWARE_BUILD is not set
-# CONFIG_FIRMWARE_IN_KERNEL is not set
 CONFIG_BLK_DEV_LOOP=y
 CONFIG_SCSI=y
 CONFIG_BLK_DEV_SD=y
diff --git a/arch/arc/configs/axs103_smp_defconfig 
b/arch/arc/configs/axs103_smp_defconfig
index f613ecac14a7..ea2f6d817d1a 100644
--- a/arch/arc/configs/axs103_smp_defconfig
+++ b/arch/arc/configs/axs103_smp_defconfig
@@ -45,7 +45,6 @@ CONFIG_IP_PNP_RARP=y
 CONFIG_DEVTMPFS=y
 # CONFIG_STANDALONE is not set
 # CONFIG_PREVENT_FIRMWARE_BUILD is not set
-# CONFIG_FIRMWARE_IN_KERNEL is not set
 CONFIG_BLK_DEV_LOOP=y
 CONFIG_SCSI=y
 CONFIG_BLK_DEV_SD=y
diff --git a/arch/arc/configs/haps_hs_defconfig 
b/arch/arc/configs/haps_hs_defconfig
index db04ea4dd2d9..ab231c040efe 100644
--- a/arch/arc/configs/haps_hs_defconfig
+++ b/arch/arc/configs/haps_hs_defconfig
@@ -40,7 +40,6 @@ CONFIG_INET=y
 CONFIG_DEVTMPFS=y
 # CONFIG_STANDALONE is not set
 # CONFIG_PREVENT_FIRMWARE_BUILD is not set
-# CONFIG_FIRMWARE_IN_KERNEL is not set
 # CONFIG_BLK_DEV is not set
 CONFIG_NETDEVICES=y
 # CONFIG_NET_VENDOR_ARC is not set
diff --git a/arch/arc/con

[PATCH 0/3] Drop config options left over from in-kernel firmware

2018-01-23 Thread Benjamin Gilbert
5620a0d1aacd ("firmware: delete in-kernel firmware") left behind several
config options which no longer do anything.  Remove them and fix up
documentation.

Benjamin Gilbert (3):
  USB: serial: keyspan: Drop firmware Kconfig options
  firmware: Drop FIRMWARE_IN_KERNEL Kconfig option
  firmware: Fix up docs referring to FIRMWARE_IN_KERNEL

 Documentation/driver-api/firmware/built-in-fw.rst |  7 +-
 Documentation/x86/microcode.txt   |  5 +-
 arch/arc/configs/axs101_defconfig |  1 -
 arch/arc/configs/axs103_defconfig |  1 -
 arch/arc/configs/axs103_smp_defconfig |  1 -
 arch/arc/configs/haps_hs_defconfig|  1 -
 arch/arc/configs/haps_hs_smp_defconfig|  1 -
 arch/arc/configs/hsdk_defconfig   |  1 -
 arch/arc/configs/nsim_700_defconfig   |  1 -
 arch/arc/configs/nsim_hs_defconfig|  1 -
 arch/arc/configs/nsim_hs_smp_defconfig|  1 -
 arch/arc/configs/nsimosci_defconfig   |  1 -
 arch/arc/configs/nsimosci_hs_defconfig|  1 -
 arch/arc/configs/nsimosci_hs_smp_defconfig|  1 -
 arch/arc/configs/tb10x_defconfig  |  1 -
 arch/arc/configs/vdk_hs38_defconfig   |  1 -
 arch/arc/configs/vdk_hs38_smp_defconfig   |  1 -
 arch/arm/configs/cns3420vb_defconfig  |  1 -
 arch/arm/configs/magician_defconfig   |  1 -
 arch/arm/configs/mini2440_defconfig   |  1 -
 arch/arm/configs/mv78xx0_defconfig|  1 -
 arch/arm/configs/mxs_defconfig|  1 -
 arch/arm/configs/orion5x_defconfig|  1 -
 arch/arm/configs/tegra_defconfig  |  1 -
 arch/arm/configs/vf610m4_defconfig|  1 -
 arch/m68k/configs/amiga_defconfig |  1 -
 arch/m68k/configs/apollo_defconfig|  1 -
 arch/m68k/configs/atari_defconfig |  1 -
 arch/m68k/configs/bvme6000_defconfig  |  1 -
 arch/m68k/configs/hp300_defconfig |  1 -
 arch/m68k/configs/mac_defconfig   |  1 -
 arch/m68k/configs/multi_defconfig |  1 -
 arch/m68k/configs/mvme147_defconfig   |  1 -
 arch/m68k/configs/mvme16x_defconfig   |  1 -
 arch/m68k/configs/q40_defconfig   |  1 -
 arch/m68k/configs/sun3_defconfig  |  1 -
 arch/m68k/configs/sun3x_defconfig |  1 -
 arch/mips/configs/ar7_defconfig   |  1 -
 arch/mips/configs/ath25_defconfig |  1 -
 arch/mips/configs/ath79_defconfig |  1 -
 arch/mips/configs/pic32mzda_defconfig |  1 -
 arch/mips/configs/qi_lb60_defconfig   |  1 -
 arch/mips/configs/rm200_defconfig |  9 ---
 arch/mips/configs/rt305x_defconfig|  1 -
 arch/mips/configs/xway_defconfig  |  1 -
 arch/mn10300/configs/asb2364_defconfig|  1 -
 arch/powerpc/configs/44x/warp_defconfig   |  1 -
 arch/powerpc/configs/c2k_defconfig| 12 
 arch/powerpc/configs/g5_defconfig | 12 
 arch/powerpc/configs/maple_defconfig  | 12 
 arch/powerpc/configs/mpc512x_defconfig|  1 -
 arch/powerpc/configs/pmac32_defconfig | 12 
 arch/powerpc/configs/ppc6xx_defconfig |  1 -
 arch/powerpc/configs/ps3_defconfig|  1 -
 arch/powerpc/configs/wii_defconfig|  1 -
 arch/s390/configs/zfcpdump_defconfig  |  1 -
 arch/sh/configs/polaris_defconfig |  1 -
 arch/tile/configs/tilegx_defconfig|  1 -
 arch/tile/configs/tilepro_defconfig   |  1 -
 arch/x86/Kconfig  |  6 +-
 drivers/base/Kconfig  | 28 ++--
 drivers/usb/serial/Kconfig| 78 ---
 62 files changed, 11 insertions(+), 222 deletions(-)

-- 
2.13.6



[PATCH 2/3] firmware: Drop FIRMWARE_IN_KERNEL Kconfig option

2018-01-23 Thread Benjamin Gilbert
It doesn't actually do anything.  Merge its help text into
EXTRA_FIRMWARE.

Fixes: 5620a0d1aacd ("firmware: delete in-kernel firmware")
Fixes: 0946b2fb38fd ("firmware: cleanup FIRMWARE_IN_KERNEL message")
Signed-off-by: Benjamin Gilbert 
Cc: Greg Kroah-Hartman 
Cc: Robin H. Johnson 
---
 arch/arc/configs/axs101_defconfig  |  1 -
 arch/arc/configs/axs103_defconfig  |  1 -
 arch/arc/configs/axs103_smp_defconfig  |  1 -
 arch/arc/configs/haps_hs_defconfig |  1 -
 arch/arc/configs/haps_hs_smp_defconfig |  1 -
 arch/arc/configs/hsdk_defconfig|  1 -
 arch/arc/configs/nsim_700_defconfig|  1 -
 arch/arc/configs/nsim_hs_defconfig |  1 -
 arch/arc/configs/nsim_hs_smp_defconfig |  1 -
 arch/arc/configs/nsimosci_defconfig|  1 -
 arch/arc/configs/nsimosci_hs_defconfig |  1 -
 arch/arc/configs/nsimosci_hs_smp_defconfig |  1 -
 arch/arc/configs/tb10x_defconfig   |  1 -
 arch/arc/configs/vdk_hs38_defconfig|  1 -
 arch/arc/configs/vdk_hs38_smp_defconfig|  1 -
 arch/arm/configs/cns3420vb_defconfig   |  1 -
 arch/arm/configs/magician_defconfig|  1 -
 arch/arm/configs/mini2440_defconfig|  1 -
 arch/arm/configs/mv78xx0_defconfig |  1 -
 arch/arm/configs/mxs_defconfig |  1 -
 arch/arm/configs/orion5x_defconfig |  1 -
 arch/arm/configs/tegra_defconfig   |  1 -
 arch/arm/configs/vf610m4_defconfig |  1 -
 arch/m68k/configs/amiga_defconfig  |  1 -
 arch/m68k/configs/apollo_defconfig |  1 -
 arch/m68k/configs/atari_defconfig  |  1 -
 arch/m68k/configs/bvme6000_defconfig   |  1 -
 arch/m68k/configs/hp300_defconfig  |  1 -
 arch/m68k/configs/mac_defconfig|  1 -
 arch/m68k/configs/multi_defconfig  |  1 -
 arch/m68k/configs/mvme147_defconfig|  1 -
 arch/m68k/configs/mvme16x_defconfig|  1 -
 arch/m68k/configs/q40_defconfig|  1 -
 arch/m68k/configs/sun3_defconfig   |  1 -
 arch/m68k/configs/sun3x_defconfig  |  1 -
 arch/mips/configs/ar7_defconfig|  1 -
 arch/mips/configs/ath25_defconfig  |  1 -
 arch/mips/configs/ath79_defconfig  |  1 -
 arch/mips/configs/pic32mzda_defconfig  |  1 -
 arch/mips/configs/qi_lb60_defconfig|  1 -
 arch/mips/configs/rt305x_defconfig |  1 -
 arch/mips/configs/xway_defconfig   |  1 -
 arch/mn10300/configs/asb2364_defconfig |  1 -
 arch/powerpc/configs/44x/warp_defconfig|  1 -
 arch/powerpc/configs/mpc512x_defconfig |  1 -
 arch/powerpc/configs/ppc6xx_defconfig  |  1 -
 arch/powerpc/configs/ps3_defconfig |  1 -
 arch/powerpc/configs/wii_defconfig |  1 -
 arch/s390/configs/zfcpdump_defconfig   |  1 -
 arch/sh/configs/polaris_defconfig  |  1 -
 arch/tile/configs/tilegx_defconfig |  1 -
 arch/tile/configs/tilepro_defconfig|  1 -
 drivers/base/Kconfig   | 28 +---
 53 files changed, 5 insertions(+), 75 deletions(-)

diff --git a/arch/arc/configs/axs101_defconfig 
b/arch/arc/configs/axs101_defconfig
index ec7c849a5c8e..09f85154c5a4 100644
--- a/arch/arc/configs/axs101_defconfig
+++ b/arch/arc/configs/axs101_defconfig
@@ -44,7 +44,6 @@ CONFIG_IP_PNP_RARP=y
 CONFIG_DEVTMPFS=y
 # CONFIG_STANDALONE is not set
 # CONFIG_PREVENT_FIRMWARE_BUILD is not set
-# CONFIG_FIRMWARE_IN_KERNEL is not set
 CONFIG_SCSI=y
 CONFIG_BLK_DEV_SD=y
 CONFIG_NETDEVICES=y
diff --git a/arch/arc/configs/axs103_defconfig 
b/arch/arc/configs/axs103_defconfig
index 63d3cf69e0b0..09fed3ef22b6 100644
--- a/arch/arc/configs/axs103_defconfig
+++ b/arch/arc/configs/axs103_defconfig
@@ -44,7 +44,6 @@ CONFIG_IP_PNP_RARP=y
 CONFIG_DEVTMPFS=y
 # CONFIG_STANDALONE is not set
 # CONFIG_PREVENT_FIRMWARE_BUILD is not set
-# CONFIG_FIRMWARE_IN_KERNEL is not set
 CONFIG_BLK_DEV_LOOP=y
 CONFIG_SCSI=y
 CONFIG_BLK_DEV_SD=y
diff --git a/arch/arc/configs/axs103_smp_defconfig 
b/arch/arc/configs/axs103_smp_defconfig
index f613ecac14a7..ea2f6d817d1a 100644
--- a/arch/arc/configs/axs103_smp_defconfig
+++ b/arch/arc/configs/axs103_smp_defconfig
@@ -45,7 +45,6 @@ CONFIG_IP_PNP_RARP=y
 CONFIG_DEVTMPFS=y
 # CONFIG_STANDALONE is not set
 # CONFIG_PREVENT_FIRMWARE_BUILD is not set
-# CONFIG_FIRMWARE_IN_KERNEL is not set
 CONFIG_BLK_DEV_LOOP=y
 CONFIG_SCSI=y
 CONFIG_BLK_DEV_SD=y
diff --git a/arch/arc/configs/haps_hs_defconfig 
b/arch/arc/configs/haps_hs_defconfig
index db04ea4dd2d9..ab231c040efe 100644
--- a/arch/arc/configs/haps_hs_defconfig
+++ b/arch/arc/configs/haps_hs_defconfig
@@ -40,7 +40,6 @@ CONFIG_INET=y
 CONFIG_DEVTMPFS=y
 # CONFIG_STANDALONE is not set
 # CONFIG_PREVENT_FIRMWARE_BUILD is not set
-# CONFIG_FIRMWARE_IN_KERNEL is not set
 # CONFIG_BLK_DEV is not set
 CONFIG_NETDEVICES=y
 # CONFIG_NET_VENDOR_ARC is not set
diff --git a/arch/arc/configs/haps_hs_smp_defconfig 
b/arch/arc/configs/haps_hs_smp_defconfig
index 3507be2af6fe.

[PATCH 3/3] firmware: Fix up docs referring to FIRMWARE_IN_KERNEL

2018-01-23 Thread Benjamin Gilbert
We've removed the option, so stop talking about it.

Signed-off-by: Benjamin Gilbert <benjamin.gilb...@coreos.com>
Cc: Borislav Petkov <b...@suse.de>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
---
 Documentation/driver-api/firmware/built-in-fw.rst | 7 +--
 Documentation/x86/microcode.txt   | 5 ++---
 arch/x86/Kconfig  | 6 +++---
 3 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/Documentation/driver-api/firmware/built-in-fw.rst 
b/Documentation/driver-api/firmware/built-in-fw.rst
index 7300e66857f8..396cdf591ac5 100644
--- a/Documentation/driver-api/firmware/built-in-fw.rst
+++ b/Documentation/driver-api/firmware/built-in-fw.rst
@@ -11,13 +11,8 @@ options:
   * CONFIG_EXTRA_FIRMWARE
   * CONFIG_EXTRA_FIRMWARE_DIR
 
-This should not be confused with CONFIG_FIRMWARE_IN_KERNEL, this is for drivers
-which enables firmware to be built as part of the kernel build process. This
-option, CONFIG_FIRMWARE_IN_KERNEL, will build all firmware for all drivers
-enabled which ship its firmware inside the Linux kernel source tree.
-
 There are a few reasons why you might want to consider building your firmware
-into the kernel with CONFIG_EXTRA_FIRMWARE though:
+into the kernel with CONFIG_EXTRA_FIRMWARE:
 
 * Speed
 * Firmware is needed for accessing the boot device, and the user doesn't
diff --git a/Documentation/x86/microcode.txt b/Documentation/x86/microcode.txt
index f57e1b45e628..aacd2f5e1a46 100644
--- a/Documentation/x86/microcode.txt
+++ b/Documentation/x86/microcode.txt
@@ -108,12 +108,11 @@ packages already put them there.
 
 
 The loader supports also loading of a builtin microcode supplied through
-the regular firmware builtin method CONFIG_FIRMWARE_IN_KERNEL. Only
-64-bit is currently supported.
+the regular firmware builtin method CONFIG_EXTRA_FIRMWARE. Only 64-bit is
+currently supported.
 
 Here's an example:
 
-CONFIG_FIRMWARE_IN_KERNEL=y
 CONFIG_EXTRA_FIRMWARE="intel-ucode/06-3a-09 amd-ucode/microcode_amd_fam15h.bin"
 CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware"
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 20da391b5f32..6d27d53de60d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1255,9 +1255,9 @@ config MICROCODE
  CONFIG_BLK_DEV_INITRD in order for the loader to be able to scan the
  initrd for microcode blobs.
 
- In addition, you can build-in the microcode into the kernel. For that 
you
- need to enable FIRMWARE_IN_KERNEL and add the vendor-supplied 
microcode
- to the CONFIG_EXTRA_FIRMWARE config option.
+ In addition, you can build the microcode into the kernel. For that you
+ need to add the vendor-supplied microcode to the CONFIG_EXTRA_FIRMWARE
+ config option.
 
 config MICROCODE_INTEL
bool "Intel microcode loading support"
-- 
2.13.6



[PATCH 1/3] USB: serial: keyspan: Drop firmware Kconfig options

2018-01-23 Thread Benjamin Gilbert
The USB_SERIAL_KEYSPAN_* firmware options no longer do anything.

Fixes: 5620a0d1aacd ("firmware: delete in-kernel firmware")
Signed-off-by: Benjamin Gilbert <benjamin.gilb...@coreos.com>
Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: Johan Hovold <jo...@kernel.org>
---
 arch/mips/configs/rm200_defconfig |  9 
 arch/powerpc/configs/c2k_defconfig| 12 --
 arch/powerpc/configs/g5_defconfig | 12 --
 arch/powerpc/configs/maple_defconfig  | 12 --
 arch/powerpc/configs/pmac32_defconfig | 12 --
 drivers/usb/serial/Kconfig| 78 ---
 6 files changed, 135 deletions(-)

diff --git a/arch/mips/configs/rm200_defconfig 
b/arch/mips/configs/rm200_defconfig
index 99679e514042..5f71aa598b06 100644
--- a/arch/mips/configs/rm200_defconfig
+++ b/arch/mips/configs/rm200_defconfig
@@ -325,15 +325,6 @@ CONFIG_USB_SERIAL_EDGEPORT=m
 CONFIG_USB_SERIAL_EDGEPORT_TI=m
 CONFIG_USB_SERIAL_KEYSPAN_PDA=m
 CONFIG_USB_SERIAL_KEYSPAN=m
-CONFIG_USB_SERIAL_KEYSPAN_MPR=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y
 CONFIG_USB_SERIAL_KLSI=m
 CONFIG_USB_SERIAL_KOBIL_SCT=m
 CONFIG_USB_SERIAL_MCT_U232=m
diff --git a/arch/powerpc/configs/c2k_defconfig 
b/arch/powerpc/configs/c2k_defconfig
index f1552af9eecc..4bb832a41d55 100644
--- a/arch/powerpc/configs/c2k_defconfig
+++ b/arch/powerpc/configs/c2k_defconfig
@@ -272,18 +272,6 @@ CONFIG_USB_SERIAL_EDGEPORT=m
 CONFIG_USB_SERIAL_EDGEPORT_TI=m
 CONFIG_USB_SERIAL_KEYSPAN_PDA=m
 CONFIG_USB_SERIAL_KEYSPAN=m
-CONFIG_USB_SERIAL_KEYSPAN_MPR=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19=y
-CONFIG_USB_SERIAL_KEYSPAN_USA18X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y
 CONFIG_USB_SERIAL_KLSI=m
 CONFIG_USB_SERIAL_KOBIL_SCT=m
 CONFIG_USB_SERIAL_MCT_U232=m
diff --git a/arch/powerpc/configs/g5_defconfig 
b/arch/powerpc/configs/g5_defconfig
index 063817fee61c..67c39f4acede 100644
--- a/arch/powerpc/configs/g5_defconfig
+++ b/arch/powerpc/configs/g5_defconfig
@@ -189,18 +189,6 @@ CONFIG_USB_SERIAL_GARMIN=m
 CONFIG_USB_SERIAL_IPW=m
 CONFIG_USB_SERIAL_KEYSPAN_PDA=m
 CONFIG_USB_SERIAL_KEYSPAN=m
-CONFIG_USB_SERIAL_KEYSPAN_MPR=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19=y
-CONFIG_USB_SERIAL_KEYSPAN_USA18X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y
 CONFIG_USB_SERIAL_KLSI=m
 CONFIG_USB_SERIAL_KOBIL_SCT=m
 CONFIG_USB_SERIAL_MCT_U232=m
diff --git a/arch/powerpc/configs/maple_defconfig 
b/arch/powerpc/configs/maple_defconfig
index 078cdb427fc9..59e47ec85336 100644
--- a/arch/powerpc/configs/maple_defconfig
+++ b/arch/powerpc/configs/maple_defconfig
@@ -82,18 +82,6 @@ CONFIG_USB_SERIAL_CYPRESS_M8=m
 CONFIG_USB_SERIAL_GARMIN=m
 CONFIG_USB_SERIAL_IPW=m
 CONFIG_USB_SERIAL_KEYSPAN=y
-CONFIG_USB_SERIAL_KEYSPAN_MPR=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19=y
-CONFIG_USB_SERIAL_KEYSPAN_USA18X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y
 CONFIG_USB_SERIAL_TI=m
 CONFIG_EXT2_FS=y
 CONFIG_EXT4_FS=y
diff --git a/arch/powerpc/configs/pmac32_defconfig 
b/arch/powerpc/configs/pmac32_defconfig
index 1aab9a62a681..62948d198d7f 100644
--- a/arch/powerpc/configs/pmac32_defconfig
+++ b/arch/powerpc/configs/pmac32_defconfig
@@ -264,18 +264,6 @@ CONFIG_USB_SERIAL_VISOR=m
 CONFIG_USB_SERIAL_IPAQ=m
 CONFIG_USB_SERIAL_KEYSPAN_PDA=m
 CONFIG_USB_SERIAL_KEYSPAN=m
-CONFIG_USB_SERIAL_KEYSPAN_MPR=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19=y
-CONFIG_USB_SERIAL_KEYSPAN_USA18X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y
 CONFIG_USB_APPLEDISPLAY=m
 CONFIG_LEDS_TRIGGER_DEFAULT_ON=y
 CONFIG_EXT2_FS=y
diff --git a/drivers/usb/serial/Kconfig b/drivers/usb/serial/Kconfig
index a8d5f2e4878d..716a3aa142ff 100644
--- 

[PATCH 3/3] firmware: Fix up docs referring to FIRMWARE_IN_KERNEL

2018-01-23 Thread Benjamin Gilbert
We've removed the option, so stop talking about it.

Signed-off-by: Benjamin Gilbert 
Cc: Borislav Petkov 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: H. Peter Anvin 
---
 Documentation/driver-api/firmware/built-in-fw.rst | 7 +--
 Documentation/x86/microcode.txt   | 5 ++---
 arch/x86/Kconfig  | 6 +++---
 3 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/Documentation/driver-api/firmware/built-in-fw.rst 
b/Documentation/driver-api/firmware/built-in-fw.rst
index 7300e66857f8..396cdf591ac5 100644
--- a/Documentation/driver-api/firmware/built-in-fw.rst
+++ b/Documentation/driver-api/firmware/built-in-fw.rst
@@ -11,13 +11,8 @@ options:
   * CONFIG_EXTRA_FIRMWARE
   * CONFIG_EXTRA_FIRMWARE_DIR
 
-This should not be confused with CONFIG_FIRMWARE_IN_KERNEL, this is for drivers
-which enables firmware to be built as part of the kernel build process. This
-option, CONFIG_FIRMWARE_IN_KERNEL, will build all firmware for all drivers
-enabled which ship its firmware inside the Linux kernel source tree.
-
 There are a few reasons why you might want to consider building your firmware
-into the kernel with CONFIG_EXTRA_FIRMWARE though:
+into the kernel with CONFIG_EXTRA_FIRMWARE:
 
 * Speed
 * Firmware is needed for accessing the boot device, and the user doesn't
diff --git a/Documentation/x86/microcode.txt b/Documentation/x86/microcode.txt
index f57e1b45e628..aacd2f5e1a46 100644
--- a/Documentation/x86/microcode.txt
+++ b/Documentation/x86/microcode.txt
@@ -108,12 +108,11 @@ packages already put them there.
 
 
 The loader supports also loading of a builtin microcode supplied through
-the regular firmware builtin method CONFIG_FIRMWARE_IN_KERNEL. Only
-64-bit is currently supported.
+the regular firmware builtin method CONFIG_EXTRA_FIRMWARE. Only 64-bit is
+currently supported.
 
 Here's an example:
 
-CONFIG_FIRMWARE_IN_KERNEL=y
 CONFIG_EXTRA_FIRMWARE="intel-ucode/06-3a-09 amd-ucode/microcode_amd_fam15h.bin"
 CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware"
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 20da391b5f32..6d27d53de60d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1255,9 +1255,9 @@ config MICROCODE
  CONFIG_BLK_DEV_INITRD in order for the loader to be able to scan the
  initrd for microcode blobs.
 
- In addition, you can build-in the microcode into the kernel. For that 
you
- need to enable FIRMWARE_IN_KERNEL and add the vendor-supplied 
microcode
- to the CONFIG_EXTRA_FIRMWARE config option.
+ In addition, you can build the microcode into the kernel. For that you
+ need to add the vendor-supplied microcode to the CONFIG_EXTRA_FIRMWARE
+ config option.
 
 config MICROCODE_INTEL
bool "Intel microcode loading support"
-- 
2.13.6



[PATCH 1/3] USB: serial: keyspan: Drop firmware Kconfig options

2018-01-23 Thread Benjamin Gilbert
The USB_SERIAL_KEYSPAN_* firmware options no longer do anything.

Fixes: 5620a0d1aacd ("firmware: delete in-kernel firmware")
Signed-off-by: Benjamin Gilbert 
Cc: Greg Kroah-Hartman 
Cc: Johan Hovold 
---
 arch/mips/configs/rm200_defconfig |  9 
 arch/powerpc/configs/c2k_defconfig| 12 --
 arch/powerpc/configs/g5_defconfig | 12 --
 arch/powerpc/configs/maple_defconfig  | 12 --
 arch/powerpc/configs/pmac32_defconfig | 12 --
 drivers/usb/serial/Kconfig| 78 ---
 6 files changed, 135 deletions(-)

diff --git a/arch/mips/configs/rm200_defconfig 
b/arch/mips/configs/rm200_defconfig
index 99679e514042..5f71aa598b06 100644
--- a/arch/mips/configs/rm200_defconfig
+++ b/arch/mips/configs/rm200_defconfig
@@ -325,15 +325,6 @@ CONFIG_USB_SERIAL_EDGEPORT=m
 CONFIG_USB_SERIAL_EDGEPORT_TI=m
 CONFIG_USB_SERIAL_KEYSPAN_PDA=m
 CONFIG_USB_SERIAL_KEYSPAN=m
-CONFIG_USB_SERIAL_KEYSPAN_MPR=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y
 CONFIG_USB_SERIAL_KLSI=m
 CONFIG_USB_SERIAL_KOBIL_SCT=m
 CONFIG_USB_SERIAL_MCT_U232=m
diff --git a/arch/powerpc/configs/c2k_defconfig 
b/arch/powerpc/configs/c2k_defconfig
index f1552af9eecc..4bb832a41d55 100644
--- a/arch/powerpc/configs/c2k_defconfig
+++ b/arch/powerpc/configs/c2k_defconfig
@@ -272,18 +272,6 @@ CONFIG_USB_SERIAL_EDGEPORT=m
 CONFIG_USB_SERIAL_EDGEPORT_TI=m
 CONFIG_USB_SERIAL_KEYSPAN_PDA=m
 CONFIG_USB_SERIAL_KEYSPAN=m
-CONFIG_USB_SERIAL_KEYSPAN_MPR=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19=y
-CONFIG_USB_SERIAL_KEYSPAN_USA18X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y
 CONFIG_USB_SERIAL_KLSI=m
 CONFIG_USB_SERIAL_KOBIL_SCT=m
 CONFIG_USB_SERIAL_MCT_U232=m
diff --git a/arch/powerpc/configs/g5_defconfig 
b/arch/powerpc/configs/g5_defconfig
index 063817fee61c..67c39f4acede 100644
--- a/arch/powerpc/configs/g5_defconfig
+++ b/arch/powerpc/configs/g5_defconfig
@@ -189,18 +189,6 @@ CONFIG_USB_SERIAL_GARMIN=m
 CONFIG_USB_SERIAL_IPW=m
 CONFIG_USB_SERIAL_KEYSPAN_PDA=m
 CONFIG_USB_SERIAL_KEYSPAN=m
-CONFIG_USB_SERIAL_KEYSPAN_MPR=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19=y
-CONFIG_USB_SERIAL_KEYSPAN_USA18X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y
 CONFIG_USB_SERIAL_KLSI=m
 CONFIG_USB_SERIAL_KOBIL_SCT=m
 CONFIG_USB_SERIAL_MCT_U232=m
diff --git a/arch/powerpc/configs/maple_defconfig 
b/arch/powerpc/configs/maple_defconfig
index 078cdb427fc9..59e47ec85336 100644
--- a/arch/powerpc/configs/maple_defconfig
+++ b/arch/powerpc/configs/maple_defconfig
@@ -82,18 +82,6 @@ CONFIG_USB_SERIAL_CYPRESS_M8=m
 CONFIG_USB_SERIAL_GARMIN=m
 CONFIG_USB_SERIAL_IPW=m
 CONFIG_USB_SERIAL_KEYSPAN=y
-CONFIG_USB_SERIAL_KEYSPAN_MPR=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19=y
-CONFIG_USB_SERIAL_KEYSPAN_USA18X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y
 CONFIG_USB_SERIAL_TI=m
 CONFIG_EXT2_FS=y
 CONFIG_EXT4_FS=y
diff --git a/arch/powerpc/configs/pmac32_defconfig 
b/arch/powerpc/configs/pmac32_defconfig
index 1aab9a62a681..62948d198d7f 100644
--- a/arch/powerpc/configs/pmac32_defconfig
+++ b/arch/powerpc/configs/pmac32_defconfig
@@ -264,18 +264,6 @@ CONFIG_USB_SERIAL_VISOR=m
 CONFIG_USB_SERIAL_IPAQ=m
 CONFIG_USB_SERIAL_KEYSPAN_PDA=m
 CONFIG_USB_SERIAL_KEYSPAN=m
-CONFIG_USB_SERIAL_KEYSPAN_MPR=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y
-CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19=y
-CONFIG_USB_SERIAL_KEYSPAN_USA18X=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y
-CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49W=y
-CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y
 CONFIG_USB_APPLEDISPLAY=m
 CONFIG_LEDS_TRIGGER_DEFAULT_ON=y
 CONFIG_EXT2_FS=y
diff --git a/drivers/usb/serial/Kconfig b/drivers/usb/serial/Kconfig
index a8d5f2e4878d..716a3aa142ff 100644
--- a/drivers/usb/serial/Kconfig
+++ b/drivers/usb/serial/Kconfig
@@ -322,84 +322,6 @@ config USB_SERI

Re: [tip:x86/pti] x86/kaslr: Fix the vaddr_end mess

2018-01-04 Thread Benjamin Gilbert
On Thu, Jan 04, 2018 at 02:10:44PM -0800, tip-bot for Thomas Gleixner wrote:
> + BUILD_BUG_ON)(vaddr_end != CPU_ENTRY_AREA_BASE);
^^

Note typo.

--Benjamin Gilbert


Re: [tip:x86/pti] x86/kaslr: Fix the vaddr_end mess

2018-01-04 Thread Benjamin Gilbert
On Thu, Jan 04, 2018 at 02:10:44PM -0800, tip-bot for Thomas Gleixner wrote:
> + BUILD_BUG_ON)(vaddr_end != CPU_ENTRY_AREA_BASE);
^^

Note typo.

--Benjamin Gilbert


Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs

2018-01-04 Thread Benjamin Gilbert
On Thu, Jan 04, 2018 at 01:28:59PM +0100, Thomas Gleixner wrote:
> On Wed, 3 Jan 2018, Andy Lutomirski wrote:
> > Our memory map code is utter shite.  This kind of bug should not be
> > possible without a giant warning at boot that something is screwed up.
> 
> You're right it's utter shite and the KASLR folks who added this insanity
> of making vaddr_end depend on a gazillion of config options and not
> documenting it in mm.txt or elsewhere where it's obvious to find should
> really sit back and think hard about their half baken 'security' features.
> 
> Just look at the insanity of comment above the vaddr_end ifdef maze.
> 
> Benjamin, can you test the patch below please?

Seems to work!

Thanks,
--Benjamin Gilbert


Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs

2018-01-04 Thread Benjamin Gilbert
On Thu, Jan 04, 2018 at 01:28:59PM +0100, Thomas Gleixner wrote:
> On Wed, 3 Jan 2018, Andy Lutomirski wrote:
> > Our memory map code is utter shite.  This kind of bug should not be
> > possible without a giant warning at boot that something is screwed up.
> 
> You're right it's utter shite and the KASLR folks who added this insanity
> of making vaddr_end depend on a gazillion of config options and not
> documenting it in mm.txt or elsewhere where it's obvious to find should
> really sit back and think hard about their half baken 'security' features.
> 
> Just look at the insanity of comment above the vaddr_end ifdef maze.
> 
> Benjamin, can you test the patch below please?

Seems to work!

Thanks,
--Benjamin Gilbert


Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs

2018-01-03 Thread Benjamin Gilbert
On Wed, Jan 03, 2018 at 05:37:42PM -0800, Benjamin Gilbert wrote:
> I was caught by the fact that 4.14.11 has PAGE_TABLE_ISOLATION default y
> but 4.15-rc6 doesn't.  Retesting.

It turns out that 4.15-rc6 has the same problem as 4.14.11.

--Benjamin Gilbert


Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs

2018-01-03 Thread Benjamin Gilbert
On Wed, Jan 03, 2018 at 05:37:42PM -0800, Benjamin Gilbert wrote:
> I was caught by the fact that 4.14.11 has PAGE_TABLE_ISOLATION default y
> but 4.15-rc6 doesn't.  Retesting.

It turns out that 4.15-rc6 has the same problem as 4.14.11.

--Benjamin Gilbert


Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs

2018-01-03 Thread Benjamin Gilbert
On Wed, Jan 03, 2018 at 04:37:53PM -0800, Andy Lutomirski wrote:
> Maybe try rebuilding a bad kernel with free_ldt_pgtables() modified
> to do nothing, and the read /sys/kernel/debug/page_tables/current (or
> current_kernel, or whatever it's called).  The problem may be obvious.

current_kernel attached.  I have not seen any crashes with
free_ldt_pgtables() stubbed out.

--Benjamin Gilbert


current_kernel.gz
Description: application/gzip


Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs

2018-01-03 Thread Benjamin Gilbert
On Wed, Jan 03, 2018 at 04:37:53PM -0800, Andy Lutomirski wrote:
> Maybe try rebuilding a bad kernel with free_ldt_pgtables() modified
> to do nothing, and the read /sys/kernel/debug/page_tables/current (or
> current_kernel, or whatever it's called).  The problem may be obvious.

current_kernel attached.  I have not seen any crashes with
free_ldt_pgtables() stubbed out.

--Benjamin Gilbert


current_kernel.gz
Description: application/gzip


Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs

2018-01-03 Thread Benjamin Gilbert
On Wed, Jan 03, 2018 at 04:33:03PM -0800, Benjamin Gilbert wrote:
> I haven't been able to reproduce this on 4.15-rc6.

This is bad data.  I was caught by the fact that 4.14.11 has
PAGE_TABLE_ISOLATION default y but 4.15-rc6 doesn't.  Retesting.

--Benjamin Gilbert


Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs

2018-01-03 Thread Benjamin Gilbert
On Wed, Jan 03, 2018 at 04:33:03PM -0800, Benjamin Gilbert wrote:
> I haven't been able to reproduce this on 4.15-rc6.

This is bad data.  I was caught by the fact that 4.14.11 has
PAGE_TABLE_ISOLATION default y but 4.15-rc6 doesn't.  Retesting.

--Benjamin Gilbert


Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs

2018-01-03 Thread Benjamin Gilbert
On Wed, Jan 03, 2018 at 04:27:04PM -0800, Andy Lutomirski wrote:
> How much memory does the affected system have?  It sounds like something
> is mapped in the LDT region and is getting corrupted because the LDT code
> expects to own that region.

We've seen this on systems from 1 to 7 GB.

--Benjamin Gilbert


Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs

2018-01-03 Thread Benjamin Gilbert
On Wed, Jan 03, 2018 at 04:27:04PM -0800, Andy Lutomirski wrote:
> How much memory does the affected system have?  It sounds like something
> is mapped in the LDT region and is getting corrupted because the LDT code
> expects to own that region.

We've seen this on systems from 1 to 7 GB.

--Benjamin Gilbert


Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs

2018-01-03 Thread Benjamin Gilbert
On Wed, Jan 03, 2018 at 10:20:16AM +0100, Greg Kroah-Hartman wrote:
> Ick, not good, any chance you can test 4.15-rc6 to verify that the issue
> is also there (or not)?

I haven't been able to reproduce this on 4.15-rc6.

--Benjamin Gilbert


Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs

2018-01-03 Thread Benjamin Gilbert
On Wed, Jan 03, 2018 at 10:20:16AM +0100, Greg Kroah-Hartman wrote:
> Ick, not good, any chance you can test 4.15-rc6 to verify that the issue
> is also there (or not)?

I haven't been able to reproduce this on 4.15-rc6.

--Benjamin Gilbert


Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs

2018-01-03 Thread Benjamin Gilbert
On Wed, Jan 03, 2018 at 11:34:46PM +0100, Thomas Gleixner wrote:
> Can you please send me your .config and a full dmesg ?

I've attached a serial log from a local QEMU.  I can rerun with a higher
loglevel if need be.

--Benjamin Gilbert


config-4.14.11.gz
Description: application/gzip


console.txt.gz
Description: application/gzip


Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs

2018-01-03 Thread Benjamin Gilbert
On Wed, Jan 03, 2018 at 11:34:46PM +0100, Thomas Gleixner wrote:
> Can you please send me your .config and a full dmesg ?

I've attached a serial log from a local QEMU.  I can rerun with a higher
loglevel if need be.

--Benjamin Gilbert


config-4.14.11.gz
Description: application/gzip


console.txt.gz
Description: application/gzip


Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs

2018-01-03 Thread Benjamin Gilbert
On Wed, Jan 03, 2018 at 04:48:33PM +0100, Ingo Molnar wrote:
> first please test the latest WIP.x86/pti branch which has a couple of fixes.

I'm still seeing the problem with that branch (3ffdeb1a02be, plus a couple
of local patches which shouldn't affect the resulting binary).

--Benjamin Gilbert


Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs

2018-01-03 Thread Benjamin Gilbert
On Wed, Jan 03, 2018 at 04:48:33PM +0100, Ingo Molnar wrote:
> first please test the latest WIP.x86/pti branch which has a couple of fixes.

I'm still seeing the problem with that branch (3ffdeb1a02be, plus a couple
of local patches which shouldn't affect the resulting binary).

--Benjamin Gilbert


[TRIVIAL PATCH] Kill blk_congestion_wait() stub for !CONFIG_BLOCK

2007-06-18 Thread Benjamin Gilbert
blk_congestion_wait() doesn't exist anymore, but there's still a stub
in blkdev.h for the !CONFIG_BLOCK case.  Kill it.

Signed-off-by: Benjamin Gilbert <[EMAIL PROTECTED]>
---

 include/linux/blkdev.h |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index db5b00a..fae138b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -868,11 +868,6 @@ void kblockd_flush_work(struct work_struct *work);
  */
 #define buffer_heads_over_limit 0
 
-static inline long blk_congestion_wait(int rw, long timeout)
-{
-   return io_schedule_timeout(timeout);
-}
-
 static inline long nr_blockdev_pages(void)
 {
return 0;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[TRIVIAL PATCH] Kill blk_congestion_wait() stub for !CONFIG_BLOCK

2007-06-18 Thread Benjamin Gilbert
blk_congestion_wait() doesn't exist anymore, but there's still a stub
in blkdev.h for the !CONFIG_BLOCK case.  Kill it.

Signed-off-by: Benjamin Gilbert [EMAIL PROTECTED]
---

 include/linux/blkdev.h |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index db5b00a..fae138b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -868,11 +868,6 @@ void kblockd_flush_work(struct work_struct *work);
  */
 #define buffer_heads_over_limit 0
 
-static inline long blk_congestion_wait(int rw, long timeout)
-{
-   return io_schedule_timeout(timeout);
-}
-
 static inline long nr_blockdev_pages(void)
 {
return 0;

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [CRYPTO] Add optimized SHA-1 implementation for x86_64

2007-06-11 Thread Benjamin Gilbert
Add optimized implementation of the SHA-1 hash function for x86_64, ported
from the x86 implementation in Nettle (which is LGPLed).

The code has been tested with tcrypt and the NIST test vectors.

Signed-off-by: Benjamin Gilbert <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/x8664_ksyms.c |3 
 arch/x86_64/lib/Makefile |2 
 arch/x86_64/lib/sha1.S   |  281 ++
 include/linux/cryptohash.h   |2 
 lib/Kconfig  |7 +
 5 files changed, 293 insertions(+), 2 deletions(-)

diff --git a/arch/x86_64/kernel/x8664_ksyms.c b/arch/x86_64/kernel/x8664_ksyms.c
index 77c25b3..bc641ab 100644
--- a/arch/x86_64/kernel/x8664_ksyms.c
+++ b/arch/x86_64/kernel/x8664_ksyms.c
@@ -3,6 +3,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -60,3 +61,5 @@ EXPORT_SYMBOL(init_level4_pgt);
 EXPORT_SYMBOL(load_gs_index);
 
 EXPORT_SYMBOL(_proxy_pda);
+
+EXPORT_SYMBOL(sha_transform);
diff --git a/arch/x86_64/lib/Makefile b/arch/x86_64/lib/Makefile
index c943271..6c8110b 100644
--- a/arch/x86_64/lib/Makefile
+++ b/arch/x86_64/lib/Makefile
@@ -9,5 +9,5 @@ obj-$(CONFIG_SMP)   += msr-on-cpu.o
 
 lib-y := csum-partial.o csum-copy.o csum-wrappers.o delay.o \
usercopy.o getuser.o putuser.o  \
-   thunk.o clear_page.o copy_page.o bitstr.o bitops.o
+   thunk.o clear_page.o copy_page.o bitstr.o bitops.o sha1.o
 lib-y += memcpy.o memmove.o memset.o copy_user.o rwlock.o copy_user_nocache.o
diff --git a/arch/x86_64/lib/sha1.S b/arch/x86_64/lib/sha1.S
new file mode 100644
index 000..f928ac3
--- /dev/null
+++ b/arch/x86_64/lib/sha1.S
@@ -0,0 +1,281 @@
+/*
+ * sha1-x86_64 - x86_64-optimized SHA1 hash algorithm
+ *
+ * Originally from Nettle
+ * Ported from M4 to cpp by Benjamin Gilbert <[EMAIL PROTECTED]>
+ * Ported from x86 to x86_64 by Benjamin Gilbert
+ *
+ * Copyright (C) 2004, Niels Möller
+ * Copyright (C) 2006-2007 Carnegie Mellon University
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License as
+ * published by the Free Software Foundation.  A copy of the GNU Lesser General
+ * Public License should have been distributed along with this library in the
+ * file LICENSE.LGPL.
+ *
+ * This library is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+
+/* Register usage.  r12-15 must be saved if they will be used.  Accessing
+   r8-r15 takes an extra instruction byte. */
+#define P_STATE%rdi/* Pointer parameter */
+#define P_DATA %rsi/* Pointer parameter */
+#define DATA   %rdx/* Pointer parameter */
+#define SA %edi/* Reuses P_STATE */
+#define SB %esi/* Reuses P_DATA */
+#define SC %eax
+#define SD %ebx/* Callee-saved */
+#define SE %ebp/* Callee-saved */
+#define TMP%ecx
+#define TMP2   %r8d/* Used by F3 */
+#define CONST  %r9d
+#define STATE  %r10
+
+/* Constants */
+#define K1VALUE$0x5A827999 /* Rounds  0-19 */
+#define K2VALUE$0x6ED9EBA1 /* Rounds 20-39 */
+#define K3VALUE$0x8F1BBCDC /* Rounds 40-59 */
+#define K4VALUE$0xCA62C1D6 /* Rounds 60-79 */
+
+/* Convert stack offsets in 32-bit words to offsets in bytes */
+#define OFFSET(i) 4*(i)
+
+/* Reads the input via P_DATA into register, byteswaps it, and stores it in
+   the DATA array. */
+#define SWAP(index, register)  \
+   movlOFFSET(index)(P_DATA), register;\
+   bswap   register;   \
+   movlregister, OFFSET(index)(DATA)
+
+/* push/pop wrappers that update the DWARF unwind table */
+#define PUSH(regname)  \
+   push%regname;   \
+   CFI_ADJUST_CFA_OFFSET   8;  \
+   CFI_REL_OFFSET  regname, 0
+
+#define POP(regname)   \
+   pop %regname;   \
+   CFI_ADJUST_CFA_OFFSET   -8; \
+   CFI_RESTORE regname
+
+/*
+ * expand(i) is the expansion function
+ *
+ *   W[i] = (W[i - 16] ^ W[i - 14] ^ W[i - 8] ^ W[i - 3]) <<< 1
+ *
+ * where W[i] is stored in DATA[i mod 16].
+ *
+ * Result is stored back in W[i], and also left in TMP, the only
+ * register that is used.
+ */
+#define EXPAND(i)  \
+   movlOFFSET(i % 16)(DATA), TMP;  \
+   xorl  

[PATCH] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread Benjamin Gilbert
Add x86-optimized implementation of the SHA-1 hash function, taken from
Nettle under the LGPL.  This code will be enabled on kernels compiled for
486es or better; kernels which support 386es will use the generic
implementation (since we need BSWAP).

We disable building lib/sha1.o when an optimized implementation is
available, as the library link order for x86 (and x86_64) would otherwise
ignore the optimized version.  The existing optimized implementation for ARM
does not do this; the library link order for that architecture appears to
favor the arch/arm/ version automatically.  I've left this situation alone
since I'm not familiar with the ARM code, but a !ARM condition could be
added to CONFIG_SHA1_GENERIC if it makes sense.

The code has been tested with tcrypt and the NIST test vectors.

Signed-off-by: Benjamin Gilbert <[EMAIL PROTECTED]>
---

 arch/i386/kernel/i386_ksyms.c |5 +
 arch/i386/lib/Makefile|1 
 arch/i386/lib/sha1.S  |  299 +
 include/linux/cryptohash.h|9 +
 lib/Kconfig   |   13 ++
 lib/Makefile  |3 
 6 files changed, 328 insertions(+), 2 deletions(-)

diff --git a/arch/i386/kernel/i386_ksyms.c b/arch/i386/kernel/i386_ksyms.c
index e3d4b73..812bc4e 100644
--- a/arch/i386/kernel/i386_ksyms.c
+++ b/arch/i386/kernel/i386_ksyms.c
@@ -1,4 +1,5 @@
 #include 
+#include 
 #include 
 #include 
 
@@ -28,3 +29,7 @@ EXPORT_SYMBOL(__read_lock_failed);
 #endif
 
 EXPORT_SYMBOL(csum_partial);
+
+#ifdef CONFIG_SHA1_X86
+EXPORT_SYMBOL(sha_transform);
+#endif
diff --git a/arch/i386/lib/Makefile b/arch/i386/lib/Makefile
index 22d8ac5..69f4845 100644
--- a/arch/i386/lib/Makefile
+++ b/arch/i386/lib/Makefile
@@ -6,6 +6,7 @@
 lib-y = checksum.o delay.o usercopy.o getuser.o putuser.o memcpy.o strstr.o \
bitops.o semaphore.o
 
+lib-$(CONFIG_SHA1_X86) += sha1.o
 lib-$(CONFIG_X86_USE_3DNOW) += mmx.o
 
 obj-$(CONFIG_SMP)  += msr-on-cpu.o
diff --git a/arch/i386/lib/sha1.S b/arch/i386/lib/sha1.S
new file mode 100644
index 000..a84d829
--- /dev/null
+++ b/arch/i386/lib/sha1.S
@@ -0,0 +1,299 @@
+/*
+ * x86-optimized SHA1 hash algorithm (i486 and above)
+ *
+ * Originally from Nettle
+ * Ported from M4 to cpp by Benjamin Gilbert <[EMAIL PROTECTED]>
+ *
+ * Copyright (C) 2004, Niels Möller
+ * Copyright (C) 2006-2007 Carnegie Mellon University
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License as
+ * published by the Free Software Foundation.  A copy of the GNU Lesser General
+ * Public License should have been distributed along with this library in the
+ * file LICENSE.LGPL.
+ *
+ * This library is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+
+/* Register usage */
+#define SA %eax
+#define SB %ebx
+#define SC %ecx
+#define SD %edx
+#define SE %ebp
+#define DATA   %esp
+#define TMP%edi
+#define TMP2   %esi/* Used by SWAP and F3 */
+#define TMP3   64(%esp)
+
+/* Constants */
+#define K1VALUE$0x5A827999 /* Rounds  0-19 */
+#define K2VALUE$0x6ED9EBA1 /* Rounds 20-39 */
+#define K3VALUE$0x8F1BBCDC /* Rounds 40-59 */
+#define K4VALUE$0xCA62C1D6 /* Rounds 60-79 */
+
+/* Convert stack offsets in words to offsets in bytes */
+#define OFFSET(i) 4*(i)
+
+/* Reads the input via TMP2 into register, byteswaps it, and stores it in
+   the DATA array. */
+#define SWAP(index, register)  \
+   movlOFFSET(index)(TMP2), register;  \
+   bswap   register;   \
+   movlregister, OFFSET(index)(DATA)
+
+/* Sets the workspace word at the given index to TMP. */
+#define CLEAR(index)   \
+   movlTMP, OFFSET(index)(DATA)
+
+/* pushl/popl wrappers that update the DWARF unwind table */
+#define PUSH(regname)  \
+   pushl   %regname;   \
+   CFI_ADJUST_CFA_OFFSET   4;  \
+   CFI_REL_OFFSET  regname, 0
+
+#define POP(regname)   \
+   popl%regname;   \
+   CFI_ADJUST_CFA_OFFSET   -4; \
+   CFI_RESTORE regname
+
+/*
+ * expand(i) is the expansion function
+ *
+ *   W[i] = (W[i - 16] ^ W[i - 14] ^ W[i - 8] ^ W[i - 3]) <<< 1
+ *
+ * where W[i] is stored in DATA[i mod 16].
+ *
+ * Result is stored back in W[i], and also left in TMP, the only
+ * register that is used.
+

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread Benjamin Gilbert

Benjamin Gilbert wrote:

Jan Engelhardt wrote:

UTF-8 please. Hint: it should most likely be an ö.


Whoops, I had thought I had gotten that right.  I'll get updates for 
parts 2 and 3 sent out on Monday.


I'm sending the corrected parts 2 and 3 as replies to this email.  The 
UTF-8 fix is the *only* thing that has changed.  The patches themselves 
are moot in their current form, but I wanted to make sure they were 
archived with the correct attribution.


--Benjamin Gilbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] [CRYPTO] Add optimized SHA-1 implementation for x86_64

2007-06-11 Thread Benjamin Gilbert

Andi Kleen wrote:

Benjamin Gilbert <[EMAIL PROTECTED]> writes:

+#define EXPAND(i)  \
+   movlOFFSET(i % 16)(DATA), TMP;  \
+   xorlOFFSET((i + 2) % 16)(DATA), TMP;\


Such overlapping memory accesses are somewhat dangerous as they tend
to stall some CPUs.  Better probably to do a quad load and then extract.


OFFSET(i) is defined as 4*(i), so they don't actually overlap. 
(Arguably that macro should go away.)



I haven't checked in detail if it's possible but it's suspicious you
never use quad operations for anything. You keep at least half
the CPU's bits idle all the time.


SHA-1 fundamentally wants to work with 32-bit quantities.  It might be 
possible to use quad operations for some things, with sufficient 
cleverness, but I doubt it'd be worth the effort.



Gut feeling is that the unroll factor is far too large.
Have you tried a smaller one? That would save icache
which is very important in the kernel.


That seems to be the consensus.  I'll see if I can find some time to try 
[EMAIL PROTECTED]'s suggestion and report back.


I don't think, though, that cache footprint is the *only* thing that 
matters.  Leaving aside /dev/urandom, there are cases where throughput 
matters a lot.  This patch set came out of some work on a hashing block 
device driver in which SHA is, by far, the biggest CPU user.  One could 
imagine content-addressable filesystems, or even IPsec under the right 
workloads, being in a similar situation.


Would it be more palatable to roll the patch as an optimized CryptoAPI 
module rather than as a lib/sha1.c replacement?  That wouldn't help 
/dev/urandom, of course, but for other cases it would allow the user to 
ask for the optimized version if needed, and not pay the footprint costs 
otherwise.


--Benjamin Gilbert

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread Benjamin Gilbert

[EMAIL PROTECTED] wrote:

/* Majority: (x^y)|(y)|(z) = (x & z) + ((x ^ z) & y)
#define F3(x,y,z,dest)  \
movlz, TMP; \
andlx, TMP; \
addlTMP, dest;  \
movlz, TMP; \
xorlx, TMP; \
andly, TMP; \
addlTMP, dest

Since y is the most recently computed result (it's rotated in the
previous round), I arranged the code to delay its use as late as
possible.


Now you have one more register to play with.


Okay, thanks.  It doesn't actually give one more register except in the 
F3 rounds (TMP2 is normally used to hold the magic constants) but it's a 
good cleanup.



A faster way is to unroll 5 iterations and do:
e += F(b, c, d) + K + rol32(a, 5) + W[i  ]; b = rol32(b, 30);
d += F(a, b, c) + K + rol32(e, 5) + W[i+1]; a = rol32(a, 30);
c += F(e, a, b) + K + rol32(d, 5) + W[i+2]; e = rol32(e, 30);
b += F(d, e, a) + K + rol32(c, 5) + W[i+3]; d = rol32(d, 30);
a += F(c, d, e) + K + rol32(b, 5) + W[i+4]; c = rol32(c, 30);
then loop over that 4 times each.  This is somewhat larger, but
still reasonably compact; only 20 of the 80 rounds are written out
long-hand.


I got this code from Nettle, originally, and I never looked at the SHA-1 
round structure very closely.  I'll give that approach a try.


Thanks
--Benjamin Gilbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread Benjamin Gilbert

Matt Mackall wrote:

In 2003, I was getting 17MB/s out of my Athlon. Now I'm getting 2.7MB/s.
Were your tests with or without the latest /dev/urandom fixes? This
one in particular:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.21.y.git;a=commitdiff;h=374f167dfb97c1785515a0c41e32a66b414859a8


With.  I just tried 2.6.11 (the oldest that will boot) on the Pentium IV 
box and got 3.7 MB/s, so if it's a regression it's been around for a while.


--Benjamin Gilbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread Benjamin Gilbert

Matt Mackall wrote:

In 2003, I was getting 17MB/s out of my Athlon. Now I'm getting 2.7MB/s.
Were your tests with or without the latest /dev/urandom fixes? This
one in particular:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.21.y.git;a=commitdiff;h=374f167dfb97c1785515a0c41e32a66b414859a8


With.  I just tried 2.6.11 (the oldest that will boot) on the Pentium IV 
box and got 3.7 MB/s, so if it's a regression it's been around for a while.


--Benjamin Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread Benjamin Gilbert

[EMAIL PROTECTED] wrote:

/* Majority: (x^y)|(yz)|(zx) = (x  z) + ((x ^ z)  y)
#define F3(x,y,z,dest)  \
movlz, TMP; \
andlx, TMP; \
addlTMP, dest;  \
movlz, TMP; \
xorlx, TMP; \
andly, TMP; \
addlTMP, dest

Since y is the most recently computed result (it's rotated in the
previous round), I arranged the code to delay its use as late as
possible.


Now you have one more register to play with.


Okay, thanks.  It doesn't actually give one more register except in the 
F3 rounds (TMP2 is normally used to hold the magic constants) but it's a 
good cleanup.



A faster way is to unroll 5 iterations and do:
e += F(b, c, d) + K + rol32(a, 5) + W[i  ]; b = rol32(b, 30);
d += F(a, b, c) + K + rol32(e, 5) + W[i+1]; a = rol32(a, 30);
c += F(e, a, b) + K + rol32(d, 5) + W[i+2]; e = rol32(e, 30);
b += F(d, e, a) + K + rol32(c, 5) + W[i+3]; d = rol32(d, 30);
a += F(c, d, e) + K + rol32(b, 5) + W[i+4]; c = rol32(c, 30);
then loop over that 4 times each.  This is somewhat larger, but
still reasonably compact; only 20 of the 80 rounds are written out
long-hand.


I got this code from Nettle, originally, and I never looked at the SHA-1 
round structure very closely.  I'll give that approach a try.


Thanks
--Benjamin Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] [CRYPTO] Add optimized SHA-1 implementation for x86_64

2007-06-11 Thread Benjamin Gilbert

Andi Kleen wrote:

Benjamin Gilbert [EMAIL PROTECTED] writes:

+#define EXPAND(i)  \
+   movlOFFSET(i % 16)(DATA), TMP;  \
+   xorlOFFSET((i + 2) % 16)(DATA), TMP;\


Such overlapping memory accesses are somewhat dangerous as they tend
to stall some CPUs.  Better probably to do a quad load and then extract.


OFFSET(i) is defined as 4*(i), so they don't actually overlap. 
(Arguably that macro should go away.)



I haven't checked in detail if it's possible but it's suspicious you
never use quad operations for anything. You keep at least half
the CPU's bits idle all the time.


SHA-1 fundamentally wants to work with 32-bit quantities.  It might be 
possible to use quad operations for some things, with sufficient 
cleverness, but I doubt it'd be worth the effort.



Gut feeling is that the unroll factor is far too large.
Have you tried a smaller one? That would save icache
which is very important in the kernel.


That seems to be the consensus.  I'll see if I can find some time to try 
[EMAIL PROTECTED]'s suggestion and report back.


I don't think, though, that cache footprint is the *only* thing that 
matters.  Leaving aside /dev/urandom, there are cases where throughput 
matters a lot.  This patch set came out of some work on a hashing block 
device driver in which SHA is, by far, the biggest CPU user.  One could 
imagine content-addressable filesystems, or even IPsec under the right 
workloads, being in a similar situation.


Would it be more palatable to roll the patch as an optimized CryptoAPI 
module rather than as a lib/sha1.c replacement?  That wouldn't help 
/dev/urandom, of course, but for other cases it would allow the user to 
ask for the optimized version if needed, and not pay the footprint costs 
otherwise.


--Benjamin Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread Benjamin Gilbert

Benjamin Gilbert wrote:

Jan Engelhardt wrote:

UTF-8 please. Hint: it should most likely be an ö.


Whoops, I had thought I had gotten that right.  I'll get updates for 
parts 2 and 3 sent out on Monday.


I'm sending the corrected parts 2 and 3 as replies to this email.  The 
UTF-8 fix is the *only* thing that has changed.  The patches themselves 
are moot in their current form, but I wanted to make sure they were 
archived with the correct attribution.


--Benjamin Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread Benjamin Gilbert
Add x86-optimized implementation of the SHA-1 hash function, taken from
Nettle under the LGPL.  This code will be enabled on kernels compiled for
486es or better; kernels which support 386es will use the generic
implementation (since we need BSWAP).

We disable building lib/sha1.o when an optimized implementation is
available, as the library link order for x86 (and x86_64) would otherwise
ignore the optimized version.  The existing optimized implementation for ARM
does not do this; the library link order for that architecture appears to
favor the arch/arm/ version automatically.  I've left this situation alone
since I'm not familiar with the ARM code, but a !ARM condition could be
added to CONFIG_SHA1_GENERIC if it makes sense.

The code has been tested with tcrypt and the NIST test vectors.

Signed-off-by: Benjamin Gilbert [EMAIL PROTECTED]
---

 arch/i386/kernel/i386_ksyms.c |5 +
 arch/i386/lib/Makefile|1 
 arch/i386/lib/sha1.S  |  299 +
 include/linux/cryptohash.h|9 +
 lib/Kconfig   |   13 ++
 lib/Makefile  |3 
 6 files changed, 328 insertions(+), 2 deletions(-)

diff --git a/arch/i386/kernel/i386_ksyms.c b/arch/i386/kernel/i386_ksyms.c
index e3d4b73..812bc4e 100644
--- a/arch/i386/kernel/i386_ksyms.c
+++ b/arch/i386/kernel/i386_ksyms.c
@@ -1,4 +1,5 @@
 #include linux/module.h
+#include linux/cryptohash.h
 #include asm/checksum.h
 #include asm/desc.h
 
@@ -28,3 +29,7 @@ EXPORT_SYMBOL(__read_lock_failed);
 #endif
 
 EXPORT_SYMBOL(csum_partial);
+
+#ifdef CONFIG_SHA1_X86
+EXPORT_SYMBOL(sha_transform);
+#endif
diff --git a/arch/i386/lib/Makefile b/arch/i386/lib/Makefile
index 22d8ac5..69f4845 100644
--- a/arch/i386/lib/Makefile
+++ b/arch/i386/lib/Makefile
@@ -6,6 +6,7 @@
 lib-y = checksum.o delay.o usercopy.o getuser.o putuser.o memcpy.o strstr.o \
bitops.o semaphore.o
 
+lib-$(CONFIG_SHA1_X86) += sha1.o
 lib-$(CONFIG_X86_USE_3DNOW) += mmx.o
 
 obj-$(CONFIG_SMP)  += msr-on-cpu.o
diff --git a/arch/i386/lib/sha1.S b/arch/i386/lib/sha1.S
new file mode 100644
index 000..a84d829
--- /dev/null
+++ b/arch/i386/lib/sha1.S
@@ -0,0 +1,299 @@
+/*
+ * x86-optimized SHA1 hash algorithm (i486 and above)
+ *
+ * Originally from Nettle
+ * Ported from M4 to cpp by Benjamin Gilbert [EMAIL PROTECTED]
+ *
+ * Copyright (C) 2004, Niels Möller
+ * Copyright (C) 2006-2007 Carnegie Mellon University
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License as
+ * published by the Free Software Foundation.  A copy of the GNU Lesser General
+ * Public License should have been distributed along with this library in the
+ * file LICENSE.LGPL.
+ *
+ * This library is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
+ * for more details.
+ */
+
+#include linux/linkage.h
+#include asm/dwarf2.h
+
+/* Register usage */
+#define SA %eax
+#define SB %ebx
+#define SC %ecx
+#define SD %edx
+#define SE %ebp
+#define DATA   %esp
+#define TMP%edi
+#define TMP2   %esi/* Used by SWAP and F3 */
+#define TMP3   64(%esp)
+
+/* Constants */
+#define K1VALUE$0x5A827999 /* Rounds  0-19 */
+#define K2VALUE$0x6ED9EBA1 /* Rounds 20-39 */
+#define K3VALUE$0x8F1BBCDC /* Rounds 40-59 */
+#define K4VALUE$0xCA62C1D6 /* Rounds 60-79 */
+
+/* Convert stack offsets in words to offsets in bytes */
+#define OFFSET(i) 4*(i)
+
+/* Reads the input via TMP2 into register, byteswaps it, and stores it in
+   the DATA array. */
+#define SWAP(index, register)  \
+   movlOFFSET(index)(TMP2), register;  \
+   bswap   register;   \
+   movlregister, OFFSET(index)(DATA)
+
+/* Sets the workspace word at the given index to TMP. */
+#define CLEAR(index)   \
+   movlTMP, OFFSET(index)(DATA)
+
+/* pushl/popl wrappers that update the DWARF unwind table */
+#define PUSH(regname)  \
+   pushl   %regname;   \
+   CFI_ADJUST_CFA_OFFSET   4;  \
+   CFI_REL_OFFSET  regname, 0
+
+#define POP(regname)   \
+   popl%regname;   \
+   CFI_ADJUST_CFA_OFFSET   -4; \
+   CFI_RESTORE regname
+
+/*
+ * expand(i) is the expansion function
+ *
+ *   W[i] = (W[i - 16] ^ W[i - 14] ^ W[i - 8] ^ W[i - 3])  1
+ *
+ * where W[i] is stored in DATA[i mod 16].
+ *
+ * Result is stored back in W[i

[PATCH] [CRYPTO] Add optimized SHA-1 implementation for x86_64

2007-06-11 Thread Benjamin Gilbert
Add optimized implementation of the SHA-1 hash function for x86_64, ported
from the x86 implementation in Nettle (which is LGPLed).

The code has been tested with tcrypt and the NIST test vectors.

Signed-off-by: Benjamin Gilbert [EMAIL PROTECTED]
---

 arch/x86_64/kernel/x8664_ksyms.c |3 
 arch/x86_64/lib/Makefile |2 
 arch/x86_64/lib/sha1.S   |  281 ++
 include/linux/cryptohash.h   |2 
 lib/Kconfig  |7 +
 5 files changed, 293 insertions(+), 2 deletions(-)

diff --git a/arch/x86_64/kernel/x8664_ksyms.c b/arch/x86_64/kernel/x8664_ksyms.c
index 77c25b3..bc641ab 100644
--- a/arch/x86_64/kernel/x8664_ksyms.c
+++ b/arch/x86_64/kernel/x8664_ksyms.c
@@ -3,6 +3,7 @@
 
 #include linux/module.h
 #include linux/smp.h
+#include linux/cryptohash.h
 
 #include asm/semaphore.h
 #include asm/processor.h
@@ -60,3 +61,5 @@ EXPORT_SYMBOL(init_level4_pgt);
 EXPORT_SYMBOL(load_gs_index);
 
 EXPORT_SYMBOL(_proxy_pda);
+
+EXPORT_SYMBOL(sha_transform);
diff --git a/arch/x86_64/lib/Makefile b/arch/x86_64/lib/Makefile
index c943271..6c8110b 100644
--- a/arch/x86_64/lib/Makefile
+++ b/arch/x86_64/lib/Makefile
@@ -9,5 +9,5 @@ obj-$(CONFIG_SMP)   += msr-on-cpu.o
 
 lib-y := csum-partial.o csum-copy.o csum-wrappers.o delay.o \
usercopy.o getuser.o putuser.o  \
-   thunk.o clear_page.o copy_page.o bitstr.o bitops.o
+   thunk.o clear_page.o copy_page.o bitstr.o bitops.o sha1.o
 lib-y += memcpy.o memmove.o memset.o copy_user.o rwlock.o copy_user_nocache.o
diff --git a/arch/x86_64/lib/sha1.S b/arch/x86_64/lib/sha1.S
new file mode 100644
index 000..f928ac3
--- /dev/null
+++ b/arch/x86_64/lib/sha1.S
@@ -0,0 +1,281 @@
+/*
+ * sha1-x86_64 - x86_64-optimized SHA1 hash algorithm
+ *
+ * Originally from Nettle
+ * Ported from M4 to cpp by Benjamin Gilbert [EMAIL PROTECTED]
+ * Ported from x86 to x86_64 by Benjamin Gilbert
+ *
+ * Copyright (C) 2004, Niels Möller
+ * Copyright (C) 2006-2007 Carnegie Mellon University
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License as
+ * published by the Free Software Foundation.  A copy of the GNU Lesser General
+ * Public License should have been distributed along with this library in the
+ * file LICENSE.LGPL.
+ *
+ * This library is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
+ * for more details.
+ */
+
+#include linux/linkage.h
+#include asm/dwarf2.h
+
+/* Register usage.  r12-15 must be saved if they will be used.  Accessing
+   r8-r15 takes an extra instruction byte. */
+#define P_STATE%rdi/* Pointer parameter */
+#define P_DATA %rsi/* Pointer parameter */
+#define DATA   %rdx/* Pointer parameter */
+#define SA %edi/* Reuses P_STATE */
+#define SB %esi/* Reuses P_DATA */
+#define SC %eax
+#define SD %ebx/* Callee-saved */
+#define SE %ebp/* Callee-saved */
+#define TMP%ecx
+#define TMP2   %r8d/* Used by F3 */
+#define CONST  %r9d
+#define STATE  %r10
+
+/* Constants */
+#define K1VALUE$0x5A827999 /* Rounds  0-19 */
+#define K2VALUE$0x6ED9EBA1 /* Rounds 20-39 */
+#define K3VALUE$0x8F1BBCDC /* Rounds 40-59 */
+#define K4VALUE$0xCA62C1D6 /* Rounds 60-79 */
+
+/* Convert stack offsets in 32-bit words to offsets in bytes */
+#define OFFSET(i) 4*(i)
+
+/* Reads the input via P_DATA into register, byteswaps it, and stores it in
+   the DATA array. */
+#define SWAP(index, register)  \
+   movlOFFSET(index)(P_DATA), register;\
+   bswap   register;   \
+   movlregister, OFFSET(index)(DATA)
+
+/* push/pop wrappers that update the DWARF unwind table */
+#define PUSH(regname)  \
+   push%regname;   \
+   CFI_ADJUST_CFA_OFFSET   8;  \
+   CFI_REL_OFFSET  regname, 0
+
+#define POP(regname)   \
+   pop %regname;   \
+   CFI_ADJUST_CFA_OFFSET   -8; \
+   CFI_RESTORE regname
+
+/*
+ * expand(i) is the expansion function
+ *
+ *   W[i] = (W[i - 16] ^ W[i - 14] ^ W[i - 8] ^ W[i - 3])  1
+ *
+ * where W[i] is stored in DATA[i mod 16].
+ *
+ * Result is stored back in W[i], and also left in TMP, the only
+ * register that is used.
+ */
+#define EXPAND(i)  \
+   movl

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-10 Thread Benjamin Gilbert

Matt Mackall wrote:

On Sat, Jun 09, 2007 at 08:33:25PM -0400, Benjamin Gilbert wrote:
It's not just the loop unrolling; it's the register allocation and 
spilling.  For comparison, I built SHATransform() from the 
drivers/char/random.c in 2.6.11, using gcc 3.3.5 with -O2 and 
SHA_CODE_SIZE == 3 (i.e., fully unrolled); I'm guessing this is pretty 
close to what you tested back then.  The resulting code is 49% MOV 
instructions, and 80% of *those* involve memory.  gcc4 is somewhat 
better, but it still spills a whole lot, both for the 2.6.11 unrolled 
code and for the current lib/sha1.c.


Wait, your benchmark is comparing against the unrolled code?


No, it's comparing the current lib/sha1.c to the optimized code in the 
patch.  I was just pointing out that the unrolled code you were likely 
testing against, back then, may not have been very good.  (Though I 
assumed that you were talking about the unrolled code in random.c, not 
the code in CryptoAPI, so that might change the numbers some.  It 
appears from the post you linked below that the unrolled CryptoAPI code 
still beat the rolled version?)



How big is the -code- footprint?


About 3700 bytes for the 32-bit version of sha_transform().


Whoa. We've regressed something horrible here:

http://groups.google.com/group/linux.kernel/msg/fba056363c99d4f9?dmode=source=en

In 2003, I was getting 17MB/s out of my Athlon. Now I'm getting 2.7MB/s.
Were your tests with or without the latest /dev/urandom fixes? This
one in particular:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.21.y.git;a=commitdiff;h=374f167dfb97c1785515a0c41e32a66b414859a8


I'm not in front of that machine right now; I can check tomorrow.  For 
what it's worth, I've seen equivalent performance (a few MB/s) on a 
range of fairly-recent kernels.


--Benjamin Gilbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-10 Thread Benjamin Gilbert

Matt Mackall wrote:

On Sat, Jun 09, 2007 at 08:33:25PM -0400, Benjamin Gilbert wrote:
It's not just the loop unrolling; it's the register allocation and 
spilling.  For comparison, I built SHATransform() from the 
drivers/char/random.c in 2.6.11, using gcc 3.3.5 with -O2 and 
SHA_CODE_SIZE == 3 (i.e., fully unrolled); I'm guessing this is pretty 
close to what you tested back then.  The resulting code is 49% MOV 
instructions, and 80% of *those* involve memory.  gcc4 is somewhat 
better, but it still spills a whole lot, both for the 2.6.11 unrolled 
code and for the current lib/sha1.c.


Wait, your benchmark is comparing against the unrolled code?


No, it's comparing the current lib/sha1.c to the optimized code in the 
patch.  I was just pointing out that the unrolled code you were likely 
testing against, back then, may not have been very good.  (Though I 
assumed that you were talking about the unrolled code in random.c, not 
the code in CryptoAPI, so that might change the numbers some.  It 
appears from the post you linked below that the unrolled CryptoAPI code 
still beat the rolled version?)



How big is the -code- footprint?


About 3700 bytes for the 32-bit version of sha_transform().


Whoa. We've regressed something horrible here:

http://groups.google.com/group/linux.kernel/msg/fba056363c99d4f9?dmode=sourcehl=en

In 2003, I was getting 17MB/s out of my Athlon. Now I'm getting 2.7MB/s.
Were your tests with or without the latest /dev/urandom fixes? This
one in particular:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.21.y.git;a=commitdiff;h=374f167dfb97c1785515a0c41e32a66b414859a8


I'm not in front of that machine right now; I can check tomorrow.  For 
what it's worth, I've seen equivalent performance (a few MB/s) on a 
range of fairly-recent kernels.


--Benjamin Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-09 Thread Benjamin Gilbert

Jan Engelhardt wrote:

On Jun 8 2007 17:42, Benjamin Gilbert wrote:

@@ -0,0 +1,299 @@
+/*
+ * x86-optimized SHA1 hash algorithm (i486 and above)
+ *
+ * Originally from Nettle
+ * Ported from M4 to cpp by Benjamin Gilbert <[EMAIL PROTECTED]>
+ *
+ * Copyright (C) 2004, Niels M?ller
+ * Copyright (C) 2006-2007 Carnegie Mellon University


UTF-8 please. Hint: it should most likely be an ö.


Whoops, I had thought I had gotten that right.  I'll get updates for 
parts 2 and 3 sent out on Monday.


Thanks
--Benjamin Gilbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-09 Thread Benjamin Gilbert

Jeff Garzik wrote:

Matt Mackall wrote:

Have you benchmarked this against lib/sha1.c? Please post the results.
Until then, I'm frankly skeptical that your unrolled version is faster
because when I introduced lib/sha1.c the rolled version therein won by
a significant margin and had 1/10th the cache footprint.


See the benchmark tables in patch 0 at the head of this thread. 
Performance improved by at least 25% in every test, and 40-60% was more 
common for the 32-bit version (on a Pentium IV).


It's not just the loop unrolling; it's the register allocation and 
spilling.  For comparison, I built SHATransform() from the 
drivers/char/random.c in 2.6.11, using gcc 3.3.5 with -O2 and 
SHA_CODE_SIZE == 3 (i.e., fully unrolled); I'm guessing this is pretty 
close to what you tested back then.  The resulting code is 49% MOV 
instructions, and 80% of *those* involve memory.  gcc4 is somewhat 
better, but it still spills a whole lot, both for the 2.6.11 unrolled 
code and for the current lib/sha1.c.


In contrast, the assembly implementation in this patch only has to go to 
memory for data and workspace (with one small exception in the F3 
rounds), and the workspace has a fifth of the cache footprint of the 
default implementation.


Yes. And it also depends on the CPU as well.  Testing on a server-class 
x86 CPU (often with bigger L2, and perhaps even L1, cache) will produce 
different result than from popular but less-capable "value" CPUs.


Good point.  I benchmarked the 32-bit assembly code on a couple more boxes:

=== AMD Duron, average of 5 trials ===
Test#  Bytes/  Bytes/  Cyc/B  Cyc/B  Change
block  update(C)  (asm)
0  16  16104 72 31%
1  64  16 52 36 31%
2  64  64 45 29 36%
3 256  16 33 23 30%
4 256  64 27 17 37%
5 256 256 24 14 42%
61024  16 29 20 31%
71024 256 20 11 45%
810241024 19 11 42%
92048  16 28 20 29%
   102048 256 19 11 42%
   1120481024 18 10 44%
   1220482048 18 10 44%
   134096  16 28 19 32%
   144096 256 18 10 44%
   1540961024 18 10 44%
   1640964096 18 10 44%
   178192  16 27 19 30%
   188192 256 18 10 44%
   1981921024 18 10 44%
   2081924096 17 10 41%
   2181928192 17 10 41%

=== Classic Pentium, average of 5 trials ===
Test#  Bytes/  Bytes/  Cyc/B  Cyc/B  Change
block  update(C)  (asm)
0  16  16145144  1%
1  64  16 72 61 15%
2  64  64 65 52 20%
3 256  16 46 39 15%
4 256  64 39 32 18%
5 256 256 36 29 19%
61024  16 40 33 18%
71024 256 30 23 23%
810241024 29 23 21%
92048  16 39 32 18%
   102048 256 29 22 24%
   1120481024 28 22 21%
   1220482048 28 22 21%
   134096  16 38 32 16%
   144096 256 28 22 21%
   1540961024 28 21 25%
   1640964096 27 21 22%
   178192  16 38 32 16%
   188192 256 28 22 21%
   1981921024 28 21 25%
   2081924096 27 21 22%
   2181928192 27 21 22%

The improvement isn't as good, but it's still noticeable.

--Benjamin Gilbert

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-09 Thread Benjamin Gilbert

Jeff Garzik wrote:

Matt Mackall wrote:

Have you benchmarked this against lib/sha1.c? Please post the results.
Until then, I'm frankly skeptical that your unrolled version is faster
because when I introduced lib/sha1.c the rolled version therein won by
a significant margin and had 1/10th the cache footprint.


See the benchmark tables in patch 0 at the head of this thread. 
Performance improved by at least 25% in every test, and 40-60% was more 
common for the 32-bit version (on a Pentium IV).


It's not just the loop unrolling; it's the register allocation and 
spilling.  For comparison, I built SHATransform() from the 
drivers/char/random.c in 2.6.11, using gcc 3.3.5 with -O2 and 
SHA_CODE_SIZE == 3 (i.e., fully unrolled); I'm guessing this is pretty 
close to what you tested back then.  The resulting code is 49% MOV 
instructions, and 80% of *those* involve memory.  gcc4 is somewhat 
better, but it still spills a whole lot, both for the 2.6.11 unrolled 
code and for the current lib/sha1.c.


In contrast, the assembly implementation in this patch only has to go to 
memory for data and workspace (with one small exception in the F3 
rounds), and the workspace has a fifth of the cache footprint of the 
default implementation.


Yes. And it also depends on the CPU as well.  Testing on a server-class 
x86 CPU (often with bigger L2, and perhaps even L1, cache) will produce 
different result than from popular but less-capable value CPUs.


Good point.  I benchmarked the 32-bit assembly code on a couple more boxes:

=== AMD Duron, average of 5 trials ===
Test#  Bytes/  Bytes/  Cyc/B  Cyc/B  Change
block  update(C)  (asm)
0  16  16104 72 31%
1  64  16 52 36 31%
2  64  64 45 29 36%
3 256  16 33 23 30%
4 256  64 27 17 37%
5 256 256 24 14 42%
61024  16 29 20 31%
71024 256 20 11 45%
810241024 19 11 42%
92048  16 28 20 29%
   102048 256 19 11 42%
   1120481024 18 10 44%
   1220482048 18 10 44%
   134096  16 28 19 32%
   144096 256 18 10 44%
   1540961024 18 10 44%
   1640964096 18 10 44%
   178192  16 27 19 30%
   188192 256 18 10 44%
   1981921024 18 10 44%
   2081924096 17 10 41%
   2181928192 17 10 41%

=== Classic Pentium, average of 5 trials ===
Test#  Bytes/  Bytes/  Cyc/B  Cyc/B  Change
block  update(C)  (asm)
0  16  16145144  1%
1  64  16 72 61 15%
2  64  64 65 52 20%
3 256  16 46 39 15%
4 256  64 39 32 18%
5 256 256 36 29 19%
61024  16 40 33 18%
71024 256 30 23 23%
810241024 29 23 21%
92048  16 39 32 18%
   102048 256 29 22 24%
   1120481024 28 22 21%
   1220482048 28 22 21%
   134096  16 38 32 16%
   144096 256 28 22 21%
   1540961024 28 21 25%
   1640964096 27 21 22%
   178192  16 38 32 16%
   188192 256 28 22 21%
   1981921024 28 21 25%
   2081924096 27 21 22%
   2181928192 27 21 22%

The improvement isn't as good, but it's still noticeable.

--Benjamin Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-09 Thread Benjamin Gilbert

Jan Engelhardt wrote:

On Jun 8 2007 17:42, Benjamin Gilbert wrote:

@@ -0,0 +1,299 @@
+/*
+ * x86-optimized SHA1 hash algorithm (i486 and above)
+ *
+ * Originally from Nettle
+ * Ported from M4 to cpp by Benjamin Gilbert [EMAIL PROTECTED]
+ *
+ * Copyright (C) 2004, Niels M?ller
+ * Copyright (C) 2006-2007 Carnegie Mellon University


UTF-8 please. Hint: it should most likely be an ö.


Whoops, I had thought I had gotten that right.  I'll get updates for 
parts 2 and 3 sent out on Monday.


Thanks
--Benjamin Gilbert
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-08 Thread Benjamin Gilbert
Add x86-optimized implementation of the SHA-1 hash function, taken from
Nettle under the LGPL.  This code will be enabled on kernels compiled for
486es or better; kernels which support 386es will use the generic
implementation (since we need BSWAP).

We disable building lib/sha1.o when an optimized implementation is
available, as the library link order for x86 (and x86_64) would otherwise
ignore the optimized version.  The existing optimized implementation for ARM
does not do this; the library link order for that architecture appears to
favor the arch/arm/ version automatically.  I've left this situation alone
since I'm not familiar with the ARM code, but a !ARM condition could be
added to CONFIG_SHA1_GENERIC if it makes sense.

The code has been tested with tcrypt and the NIST test vectors.

Signed-off-by: Benjamin Gilbert <[EMAIL PROTECTED]>
---

 arch/i386/kernel/i386_ksyms.c |5 +
 arch/i386/lib/Makefile|1 
 arch/i386/lib/sha1.S  |  299 +
 include/linux/cryptohash.h|9 +
 lib/Kconfig   |   13 ++
 lib/Makefile  |3 
 6 files changed, 328 insertions(+), 2 deletions(-)

diff --git a/arch/i386/kernel/i386_ksyms.c b/arch/i386/kernel/i386_ksyms.c
index e3d4b73..812bc4e 100644
--- a/arch/i386/kernel/i386_ksyms.c
+++ b/arch/i386/kernel/i386_ksyms.c
@@ -1,4 +1,5 @@
 #include 
+#include 
 #include 
 #include 
 
@@ -28,3 +29,7 @@ EXPORT_SYMBOL(__read_lock_failed);
 #endif
 
 EXPORT_SYMBOL(csum_partial);
+
+#ifdef CONFIG_SHA1_X86
+EXPORT_SYMBOL(sha_transform);
+#endif
diff --git a/arch/i386/lib/Makefile b/arch/i386/lib/Makefile
index 22d8ac5..69f4845 100644
--- a/arch/i386/lib/Makefile
+++ b/arch/i386/lib/Makefile
@@ -6,6 +6,7 @@
 lib-y = checksum.o delay.o usercopy.o getuser.o putuser.o memcpy.o strstr.o \
bitops.o semaphore.o
 
+lib-$(CONFIG_SHA1_X86) += sha1.o
 lib-$(CONFIG_X86_USE_3DNOW) += mmx.o
 
 obj-$(CONFIG_SMP)  += msr-on-cpu.o
diff --git a/arch/i386/lib/sha1.S b/arch/i386/lib/sha1.S
new file mode 100644
index 000..28aa4b7
--- /dev/null
+++ b/arch/i386/lib/sha1.S
@@ -0,0 +1,299 @@
+/*
+ * x86-optimized SHA1 hash algorithm (i486 and above)
+ *
+ * Originally from Nettle
+ * Ported from M4 to cpp by Benjamin Gilbert <[EMAIL PROTECTED]>
+ *
+ * Copyright (C) 2004, Niels M�ller
+ * Copyright (C) 2006-2007 Carnegie Mellon University
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License as
+ * published by the Free Software Foundation.  A copy of the GNU Lesser General
+ * Public License should have been distributed along with this library in the
+ * file LICENSE.LGPL.
+ *
+ * This library is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+
+/* Register usage */
+#define SA %eax
+#define SB %ebx
+#define SC %ecx
+#define SD %edx
+#define SE %ebp
+#define DATA   %esp
+#define TMP%edi
+#define TMP2   %esi/* Used by SWAP and F3 */
+#define TMP3   64(%esp)
+
+/* Constants */
+#define K1VALUE$0x5A827999 /* Rounds  0-19 */
+#define K2VALUE$0x6ED9EBA1 /* Rounds 20-39 */
+#define K3VALUE$0x8F1BBCDC /* Rounds 40-59 */
+#define K4VALUE$0xCA62C1D6 /* Rounds 60-79 */
+
+/* Convert stack offsets in words to offsets in bytes */
+#define OFFSET(i) 4*(i)
+
+/* Reads the input via TMP2 into register, byteswaps it, and stores it in
+   the DATA array. */
+#define SWAP(index, register)  \
+   movlOFFSET(index)(TMP2), register;  \
+   bswap   register;   \
+   movlregister, OFFSET(index)(DATA)
+
+/* Sets the workspace word at the given index to TMP. */
+#define CLEAR(index)   \
+   movlTMP, OFFSET(index)(DATA)
+
+/* pushl/popl wrappers that update the DWARF unwind table */
+#define PUSH(regname)  \
+   pushl   %regname;   \
+   CFI_ADJUST_CFA_OFFSET   4;  \
+   CFI_REL_OFFSET  regname, 0
+
+#define POP(regname)   \
+   popl%regname;   \
+   CFI_ADJUST_CFA_OFFSET   -4; \
+   CFI_RESTORE regname
+
+/*
+ * expand(i) is the expansion function
+ *
+ *   W[i] = (W[i - 16] ^ W[i - 14] ^ W[i - 8] ^ W[i - 3]) <<< 1
+ *
+ * where W[i] is stored in DATA[i mod 16].
+ *
+ * Result is stored back in W[i], and also left in TMP, the only
+ * register that is used.
+

[PATCH 3/3] [CRYPTO] Add optimized SHA-1 implementation for x86_64

2007-06-08 Thread Benjamin Gilbert
Add optimized implementation of the SHA-1 hash function for x86_64, ported
from the x86 implementation in Nettle (which is LGPLed).

The code has been tested with tcrypt and the NIST test vectors.

Signed-off-by: Benjamin Gilbert <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/x8664_ksyms.c |3 
 arch/x86_64/lib/Makefile |2 
 arch/x86_64/lib/sha1.S   |  281 ++
 include/linux/cryptohash.h   |2 
 lib/Kconfig  |7 +
 5 files changed, 293 insertions(+), 2 deletions(-)

diff --git a/arch/x86_64/kernel/x8664_ksyms.c b/arch/x86_64/kernel/x8664_ksyms.c
index 77c25b3..bc641ab 100644
--- a/arch/x86_64/kernel/x8664_ksyms.c
+++ b/arch/x86_64/kernel/x8664_ksyms.c
@@ -3,6 +3,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -60,3 +61,5 @@ EXPORT_SYMBOL(init_level4_pgt);
 EXPORT_SYMBOL(load_gs_index);
 
 EXPORT_SYMBOL(_proxy_pda);
+
+EXPORT_SYMBOL(sha_transform);
diff --git a/arch/x86_64/lib/Makefile b/arch/x86_64/lib/Makefile
index c943271..6c8110b 100644
--- a/arch/x86_64/lib/Makefile
+++ b/arch/x86_64/lib/Makefile
@@ -9,5 +9,5 @@ obj-$(CONFIG_SMP)   += msr-on-cpu.o
 
 lib-y := csum-partial.o csum-copy.o csum-wrappers.o delay.o \
usercopy.o getuser.o putuser.o  \
-   thunk.o clear_page.o copy_page.o bitstr.o bitops.o
+   thunk.o clear_page.o copy_page.o bitstr.o bitops.o sha1.o
 lib-y += memcpy.o memmove.o memset.o copy_user.o rwlock.o copy_user_nocache.o
diff --git a/arch/x86_64/lib/sha1.S b/arch/x86_64/lib/sha1.S
new file mode 100644
index 000..48f4fde
--- /dev/null
+++ b/arch/x86_64/lib/sha1.S
@@ -0,0 +1,281 @@
+/*
+ * sha1-x86_64 - x86_64-optimized SHA1 hash algorithm
+ *
+ * Originally from Nettle
+ * Ported from M4 to cpp by Benjamin Gilbert <[EMAIL PROTECTED]>
+ * Ported from x86 to x86_64 by Benjamin Gilbert
+ *
+ * Copyright (C) 2004, Niels M�ller
+ * Copyright (C) 2006-2007 Carnegie Mellon University
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License as
+ * published by the Free Software Foundation.  A copy of the GNU Lesser General
+ * Public License should have been distributed along with this library in the
+ * file LICENSE.LGPL.
+ *
+ * This library is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
+ * for more details.
+ */
+
+#include 
+#include 
+
+/* Register usage.  r12-15 must be saved if they will be used.  Accessing
+   r8-r15 takes an extra instruction byte. */
+#define P_STATE%rdi/* Pointer parameter */
+#define P_DATA %rsi/* Pointer parameter */
+#define DATA   %rdx/* Pointer parameter */
+#define SA %edi/* Reuses P_STATE */
+#define SB %esi/* Reuses P_DATA */
+#define SC %eax
+#define SD %ebx/* Callee-saved */
+#define SE %ebp/* Callee-saved */
+#define TMP%ecx
+#define TMP2   %r8d/* Used by F3 */
+#define CONST  %r9d
+#define STATE  %r10
+
+/* Constants */
+#define K1VALUE$0x5A827999 /* Rounds  0-19 */
+#define K2VALUE$0x6ED9EBA1 /* Rounds 20-39 */
+#define K3VALUE$0x8F1BBCDC /* Rounds 40-59 */
+#define K4VALUE$0xCA62C1D6 /* Rounds 60-79 */
+
+/* Convert stack offsets in 32-bit words to offsets in bytes */
+#define OFFSET(i) 4*(i)
+
+/* Reads the input via P_DATA into register, byteswaps it, and stores it in
+   the DATA array. */
+#define SWAP(index, register)  \
+   movlOFFSET(index)(P_DATA), register;\
+   bswap   register;   \
+   movlregister, OFFSET(index)(DATA)
+
+/* push/pop wrappers that update the DWARF unwind table */
+#define PUSH(regname)  \
+   push%regname;   \
+   CFI_ADJUST_CFA_OFFSET   8;  \
+   CFI_REL_OFFSET  regname, 0
+
+#define POP(regname)   \
+   pop %regname;   \
+   CFI_ADJUST_CFA_OFFSET   -8; \
+   CFI_RESTORE regname
+
+/*
+ * expand(i) is the expansion function
+ *
+ *   W[i] = (W[i - 16] ^ W[i - 14] ^ W[i - 8] ^ W[i - 3]) <<< 1
+ *
+ * where W[i] is stored in DATA[i mod 16].
+ *
+ * Result is stored back in W[i], and also left in TMP, the only
+ * register that is used.
+ */
+#define EXPAND(i)  \
+   movlOFFSET(i % 16)(DATA), TMP;  \
+   xorl  

[PATCH 0/3] Add optimized SHA-1 implementations for x86 and x86_64

2007-06-08 Thread Benjamin Gilbert
The following 3-part series adds assembly implementations of the SHA-1
transform for x86 and x86_64.  For x86_64 the optimized code is always
selected; on x86 it is selected if the kernel is compiled for i486 or above
(since the code needs BSWAP).  These changes primarily improve the
performance of the CryptoAPI SHA-1 module and of /dev/urandom.  I've
included some performance data from my test boxes below.

This version incorporates feedback from Herbert Xu.  Andrew, I'm sending
this to you because of the (admittedly tiny) intersection with arm and s390
in part 1.

-

tcrypt performance tests:

=== Pentium IV in 32-bit mode, average of 5 trials ===
Test#  Bytes/  Bytes/  Cyc/B  Cyc/B  Change
block  update(C)  (asm)
0  16  16229114 50%
1  64  16142 76 46%
2  64  64 79 35 56%
3 256  16 59 34 42%
4 256  64 44 24 45%
5 256 256 43 17 60%
61024  16 51 36 29%
71024 256 30 13 57%
810241024 28 12 57%
92048  16 66 30 55%
   102048 256 31 12 61%
   1120481024 27 13 52%
   1220482048 26 13 50%
   134096  16 49 30 39%
   144096 256 28 12 57%
   1540961024 28 11 61%
   1640964096 26 13 50%
   178192  16 49 29 41%
   188192 256 27 11 59%
   1981921024 26 11 58%
   2081924096 25 10 60%
   2181928192 25 10 60%

=== Intel Core 2 in 64-bit mode, average of 5 trials ===
Test#  Bytes/  Bytes/  Cyc/B  Cyc/B  Change
block  update(C)  (asm)
0  16  16112 81 28%
1  64  16 55 39 29%
2  64  64 42 27 36%
3 256  16 35 25 29%
4 256  64 24 14 42%
5 256 256 22 12 45%
61024  16 31 22 29%
71024 256 17  9 47%
810241024 16  9 44%
92048  16 30 22 27%
   102048 256 16  8 50%
   1120481024 16  8 50%
   1220482048 16  8 50%
   134096  16 29 21 28%
   144096 256 16  8 50%
   1540961024 15  8 47%
   1640964096 15  7 53%
   178192  16 29 22 24%
   188192 256 16  8 50%
   1981921024 15  7 53%
   2081924096 15  7 53%
   2181928192 15  7 53%

I've also done informal tests on other boxes, and the performance
improvement has been in the same ballpark.

On the aforementioned Pentium IV, /dev/urandom throughput goes from 3.7 MB/s
to 5.6 MB/s with the patches; on the Core 2, it increases from 5.5 MB/s to
8.1 MB/s.

Signed-off-by: Benjamin Gilbert <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] [CRYPTO] Move sha_init() into cryptohash.h

2007-06-08 Thread Benjamin Gilbert
Make sha_init() a static inline in cryptohash.h rather than an (unexported)
function in lib/sha1.c, in preparation for making sha1.c optional.  This
also allows some cleanups:

- Modular code can now use sha_init() rather than reimplementing it

- The optimized implementation of SHA-1 for ARM no longer needs to
reimplement sha_init() in assembly

Signed-off-by: Benjamin Gilbert <[EMAIL PROTECTED]>
---

 arch/arm/lib/sha1.S  |   16 
 arch/s390/crypto/sha1_s390.c |6 +-
 drivers/crypto/padlock-sha.c |8 ++--
 include/linux/cryptohash.h   |   14 +-
 lib/sha1.c   |   14 --
 5 files changed, 16 insertions(+), 42 deletions(-)

diff --git a/arch/arm/lib/sha1.S b/arch/arm/lib/sha1.S
index ff6ece4..5be800c 100644
--- a/arch/arm/lib/sha1.S
+++ b/arch/arm/lib/sha1.S
@@ -188,19 +188,3 @@ ENTRY(sha_transform)
 .L_sha_K:
.word   0x5a827999, 0x6ed9eba1, 0x8f1bbcdc, 0xca62c1d6
 
-
-/*
- * void sha_init(__u32 *buf)
- */
-
-.L_sha_initial_digest:
-   .word   0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476, 0xc3d2e1f0
-
-ENTRY(sha_init)
-
-   str lr, [sp, #-4]!
-   adr r1, .L_sha_initial_digest
-   ldmia   r1, {r1, r2, r3, ip, lr}
-   stmia   r0, {r1, r2, r3, ip, lr}
-   ldr pc, [sp], #4
-
diff --git a/arch/s390/crypto/sha1_s390.c b/arch/s390/crypto/sha1_s390.c
index af4460e..fed9a2e 100644
--- a/arch/s390/crypto/sha1_s390.c
+++ b/arch/s390/crypto/sha1_s390.c
@@ -42,11 +42,7 @@ static void sha1_init(struct crypto_tfm *tfm)
 {
struct s390_sha1_ctx *sctx = crypto_tfm_ctx(tfm);
 
-   sctx->state[0] = 0x67452301;
-   sctx->state[1] = 0xEFCDAB89;
-   sctx->state[2] = 0x98BADCFE;
-   sctx->state[3] = 0x10325476;
-   sctx->state[4] = 0xC3D2E1F0;
+   sha_init(sctx->state);
sctx->count = 0;
 }
 
diff --git a/drivers/crypto/padlock-sha.c b/drivers/crypto/padlock-sha.c
index a781fd2..b47d708 100644
--- a/drivers/crypto/padlock-sha.c
+++ b/drivers/crypto/padlock-sha.c
@@ -107,12 +107,8 @@ static void padlock_do_sha1(const char *in, char *out, int 
count)
char buf[128+16];
char *result = NEAREST_ALIGNED(buf);
 
-   ((uint32_t *)result)[0] = 0x67452301;
-   ((uint32_t *)result)[1] = 0xEFCDAB89;
-   ((uint32_t *)result)[2] = 0x98BADCFE;
-   ((uint32_t *)result)[3] = 0x10325476;
-   ((uint32_t *)result)[4] = 0xC3D2E1F0;
- 
+   sha_init((uint32_t *)result);
+
asm volatile (".byte 0xf3,0x0f,0xa6,0xc8" /* rep xsha1 */
  : "+S"(in), "+D"(result)
  : "c"(count), "a"(0));
diff --git a/include/linux/cryptohash.h b/include/linux/cryptohash.h
index c118b2a..a172401 100644
--- a/include/linux/cryptohash.h
+++ b/include/linux/cryptohash.h
@@ -4,7 +4,19 @@
 #define SHA_DIGEST_WORDS 5
 #define SHA_WORKSPACE_WORDS 80
 
-void sha_init(__u32 *buf);
+/**
+ * sha_init - initialize the vectors for a SHA1 digest
+ * @buf: vector to initialize
+ */
+static inline void sha_init(__u32 *buf)
+{
+   buf[0] = 0x67452301;
+   buf[1] = 0xefcdab89;
+   buf[2] = 0x98badcfe;
+   buf[3] = 0x10325476;
+   buf[4] = 0xc3d2e1f0;
+}
+
 void sha_transform(__u32 *digest, const char *data, __u32 *W);
 
 __u32 half_md4_transform(__u32 buf[4], __u32 const in[8]);
diff --git a/lib/sha1.c b/lib/sha1.c
index 4c45fd5..815816f 100644
--- a/lib/sha1.c
+++ b/lib/sha1.c
@@ -79,17 +79,3 @@ void sha_transform(__u32 *digest, const char *in, __u32 *W)
digest[4] += e;
 }
 EXPORT_SYMBOL(sha_transform);
-
-/**
- * sha_init - initialize the vectors for a SHA1 digest
- * @buf: vector to initialize
- */
-void sha_init(__u32 *buf)
-{
-   buf[0] = 0x67452301;
-   buf[1] = 0xefcdab89;
-   buf[2] = 0x98badcfe;
-   buf[3] = 0x10325476;
-   buf[4] = 0xc3d2e1f0;
-}
-

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] [CRYPTO] Move sha_init() into cryptohash.h

2007-06-08 Thread Benjamin Gilbert
Make sha_init() a static inline in cryptohash.h rather than an (unexported)
function in lib/sha1.c, in preparation for making sha1.c optional.  This
also allows some cleanups:

- Modular code can now use sha_init() rather than reimplementing it

- The optimized implementation of SHA-1 for ARM no longer needs to
reimplement sha_init() in assembly

Signed-off-by: Benjamin Gilbert [EMAIL PROTECTED]
---

 arch/arm/lib/sha1.S  |   16 
 arch/s390/crypto/sha1_s390.c |6 +-
 drivers/crypto/padlock-sha.c |8 ++--
 include/linux/cryptohash.h   |   14 +-
 lib/sha1.c   |   14 --
 5 files changed, 16 insertions(+), 42 deletions(-)

diff --git a/arch/arm/lib/sha1.S b/arch/arm/lib/sha1.S
index ff6ece4..5be800c 100644
--- a/arch/arm/lib/sha1.S
+++ b/arch/arm/lib/sha1.S
@@ -188,19 +188,3 @@ ENTRY(sha_transform)
 .L_sha_K:
.word   0x5a827999, 0x6ed9eba1, 0x8f1bbcdc, 0xca62c1d6
 
-
-/*
- * void sha_init(__u32 *buf)
- */
-
-.L_sha_initial_digest:
-   .word   0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476, 0xc3d2e1f0
-
-ENTRY(sha_init)
-
-   str lr, [sp, #-4]!
-   adr r1, .L_sha_initial_digest
-   ldmia   r1, {r1, r2, r3, ip, lr}
-   stmia   r0, {r1, r2, r3, ip, lr}
-   ldr pc, [sp], #4
-
diff --git a/arch/s390/crypto/sha1_s390.c b/arch/s390/crypto/sha1_s390.c
index af4460e..fed9a2e 100644
--- a/arch/s390/crypto/sha1_s390.c
+++ b/arch/s390/crypto/sha1_s390.c
@@ -42,11 +42,7 @@ static void sha1_init(struct crypto_tfm *tfm)
 {
struct s390_sha1_ctx *sctx = crypto_tfm_ctx(tfm);
 
-   sctx-state[0] = 0x67452301;
-   sctx-state[1] = 0xEFCDAB89;
-   sctx-state[2] = 0x98BADCFE;
-   sctx-state[3] = 0x10325476;
-   sctx-state[4] = 0xC3D2E1F0;
+   sha_init(sctx-state);
sctx-count = 0;
 }
 
diff --git a/drivers/crypto/padlock-sha.c b/drivers/crypto/padlock-sha.c
index a781fd2..b47d708 100644
--- a/drivers/crypto/padlock-sha.c
+++ b/drivers/crypto/padlock-sha.c
@@ -107,12 +107,8 @@ static void padlock_do_sha1(const char *in, char *out, int 
count)
char buf[128+16];
char *result = NEAREST_ALIGNED(buf);
 
-   ((uint32_t *)result)[0] = 0x67452301;
-   ((uint32_t *)result)[1] = 0xEFCDAB89;
-   ((uint32_t *)result)[2] = 0x98BADCFE;
-   ((uint32_t *)result)[3] = 0x10325476;
-   ((uint32_t *)result)[4] = 0xC3D2E1F0;
- 
+   sha_init((uint32_t *)result);
+
asm volatile (.byte 0xf3,0x0f,0xa6,0xc8 /* rep xsha1 */
  : +S(in), +D(result)
  : c(count), a(0));
diff --git a/include/linux/cryptohash.h b/include/linux/cryptohash.h
index c118b2a..a172401 100644
--- a/include/linux/cryptohash.h
+++ b/include/linux/cryptohash.h
@@ -4,7 +4,19 @@
 #define SHA_DIGEST_WORDS 5
 #define SHA_WORKSPACE_WORDS 80
 
-void sha_init(__u32 *buf);
+/**
+ * sha_init - initialize the vectors for a SHA1 digest
+ * @buf: vector to initialize
+ */
+static inline void sha_init(__u32 *buf)
+{
+   buf[0] = 0x67452301;
+   buf[1] = 0xefcdab89;
+   buf[2] = 0x98badcfe;
+   buf[3] = 0x10325476;
+   buf[4] = 0xc3d2e1f0;
+}
+
 void sha_transform(__u32 *digest, const char *data, __u32 *W);
 
 __u32 half_md4_transform(__u32 buf[4], __u32 const in[8]);
diff --git a/lib/sha1.c b/lib/sha1.c
index 4c45fd5..815816f 100644
--- a/lib/sha1.c
+++ b/lib/sha1.c
@@ -79,17 +79,3 @@ void sha_transform(__u32 *digest, const char *in, __u32 *W)
digest[4] += e;
 }
 EXPORT_SYMBOL(sha_transform);
-
-/**
- * sha_init - initialize the vectors for a SHA1 digest
- * @buf: vector to initialize
- */
-void sha_init(__u32 *buf)
-{
-   buf[0] = 0x67452301;
-   buf[1] = 0xefcdab89;
-   buf[2] = 0x98badcfe;
-   buf[3] = 0x10325476;
-   buf[4] = 0xc3d2e1f0;
-}
-

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] Add optimized SHA-1 implementations for x86 and x86_64

2007-06-08 Thread Benjamin Gilbert
The following 3-part series adds assembly implementations of the SHA-1
transform for x86 and x86_64.  For x86_64 the optimized code is always
selected; on x86 it is selected if the kernel is compiled for i486 or above
(since the code needs BSWAP).  These changes primarily improve the
performance of the CryptoAPI SHA-1 module and of /dev/urandom.  I've
included some performance data from my test boxes below.

This version incorporates feedback from Herbert Xu.  Andrew, I'm sending
this to you because of the (admittedly tiny) intersection with arm and s390
in part 1.

-

tcrypt performance tests:

=== Pentium IV in 32-bit mode, average of 5 trials ===
Test#  Bytes/  Bytes/  Cyc/B  Cyc/B  Change
block  update(C)  (asm)
0  16  16229114 50%
1  64  16142 76 46%
2  64  64 79 35 56%
3 256  16 59 34 42%
4 256  64 44 24 45%
5 256 256 43 17 60%
61024  16 51 36 29%
71024 256 30 13 57%
810241024 28 12 57%
92048  16 66 30 55%
   102048 256 31 12 61%
   1120481024 27 13 52%
   1220482048 26 13 50%
   134096  16 49 30 39%
   144096 256 28 12 57%
   1540961024 28 11 61%
   1640964096 26 13 50%
   178192  16 49 29 41%
   188192 256 27 11 59%
   1981921024 26 11 58%
   2081924096 25 10 60%
   2181928192 25 10 60%

=== Intel Core 2 in 64-bit mode, average of 5 trials ===
Test#  Bytes/  Bytes/  Cyc/B  Cyc/B  Change
block  update(C)  (asm)
0  16  16112 81 28%
1  64  16 55 39 29%
2  64  64 42 27 36%
3 256  16 35 25 29%
4 256  64 24 14 42%
5 256 256 22 12 45%
61024  16 31 22 29%
71024 256 17  9 47%
810241024 16  9 44%
92048  16 30 22 27%
   102048 256 16  8 50%
   1120481024 16  8 50%
   1220482048 16  8 50%
   134096  16 29 21 28%
   144096 256 16  8 50%
   1540961024 15  8 47%
   1640964096 15  7 53%
   178192  16 29 22 24%
   188192 256 16  8 50%
   1981921024 15  7 53%
   2081924096 15  7 53%
   2181928192 15  7 53%

I've also done informal tests on other boxes, and the performance
improvement has been in the same ballpark.

On the aforementioned Pentium IV, /dev/urandom throughput goes from 3.7 MB/s
to 5.6 MB/s with the patches; on the Core 2, it increases from 5.5 MB/s to
8.1 MB/s.

Signed-off-by: Benjamin Gilbert [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] [CRYPTO] Add optimized SHA-1 implementation for x86_64

2007-06-08 Thread Benjamin Gilbert
Add optimized implementation of the SHA-1 hash function for x86_64, ported
from the x86 implementation in Nettle (which is LGPLed).

The code has been tested with tcrypt and the NIST test vectors.

Signed-off-by: Benjamin Gilbert [EMAIL PROTECTED]
---

 arch/x86_64/kernel/x8664_ksyms.c |3 
 arch/x86_64/lib/Makefile |2 
 arch/x86_64/lib/sha1.S   |  281 ++
 include/linux/cryptohash.h   |2 
 lib/Kconfig  |7 +
 5 files changed, 293 insertions(+), 2 deletions(-)

diff --git a/arch/x86_64/kernel/x8664_ksyms.c b/arch/x86_64/kernel/x8664_ksyms.c
index 77c25b3..bc641ab 100644
--- a/arch/x86_64/kernel/x8664_ksyms.c
+++ b/arch/x86_64/kernel/x8664_ksyms.c
@@ -3,6 +3,7 @@
 
 #include linux/module.h
 #include linux/smp.h
+#include linux/cryptohash.h
 
 #include asm/semaphore.h
 #include asm/processor.h
@@ -60,3 +61,5 @@ EXPORT_SYMBOL(init_level4_pgt);
 EXPORT_SYMBOL(load_gs_index);
 
 EXPORT_SYMBOL(_proxy_pda);
+
+EXPORT_SYMBOL(sha_transform);
diff --git a/arch/x86_64/lib/Makefile b/arch/x86_64/lib/Makefile
index c943271..6c8110b 100644
--- a/arch/x86_64/lib/Makefile
+++ b/arch/x86_64/lib/Makefile
@@ -9,5 +9,5 @@ obj-$(CONFIG_SMP)   += msr-on-cpu.o
 
 lib-y := csum-partial.o csum-copy.o csum-wrappers.o delay.o \
usercopy.o getuser.o putuser.o  \
-   thunk.o clear_page.o copy_page.o bitstr.o bitops.o
+   thunk.o clear_page.o copy_page.o bitstr.o bitops.o sha1.o
 lib-y += memcpy.o memmove.o memset.o copy_user.o rwlock.o copy_user_nocache.o
diff --git a/arch/x86_64/lib/sha1.S b/arch/x86_64/lib/sha1.S
new file mode 100644
index 000..48f4fde
--- /dev/null
+++ b/arch/x86_64/lib/sha1.S
@@ -0,0 +1,281 @@
+/*
+ * sha1-x86_64 - x86_64-optimized SHA1 hash algorithm
+ *
+ * Originally from Nettle
+ * Ported from M4 to cpp by Benjamin Gilbert [EMAIL PROTECTED]
+ * Ported from x86 to x86_64 by Benjamin Gilbert
+ *
+ * Copyright (C) 2004, Niels M�ller
+ * Copyright (C) 2006-2007 Carnegie Mellon University
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License as
+ * published by the Free Software Foundation.  A copy of the GNU Lesser General
+ * Public License should have been distributed along with this library in the
+ * file LICENSE.LGPL.
+ *
+ * This library is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
+ * for more details.
+ */
+
+#include linux/linkage.h
+#include asm/dwarf2.h
+
+/* Register usage.  r12-15 must be saved if they will be used.  Accessing
+   r8-r15 takes an extra instruction byte. */
+#define P_STATE%rdi/* Pointer parameter */
+#define P_DATA %rsi/* Pointer parameter */
+#define DATA   %rdx/* Pointer parameter */
+#define SA %edi/* Reuses P_STATE */
+#define SB %esi/* Reuses P_DATA */
+#define SC %eax
+#define SD %ebx/* Callee-saved */
+#define SE %ebp/* Callee-saved */
+#define TMP%ecx
+#define TMP2   %r8d/* Used by F3 */
+#define CONST  %r9d
+#define STATE  %r10
+
+/* Constants */
+#define K1VALUE$0x5A827999 /* Rounds  0-19 */
+#define K2VALUE$0x6ED9EBA1 /* Rounds 20-39 */
+#define K3VALUE$0x8F1BBCDC /* Rounds 40-59 */
+#define K4VALUE$0xCA62C1D6 /* Rounds 60-79 */
+
+/* Convert stack offsets in 32-bit words to offsets in bytes */
+#define OFFSET(i) 4*(i)
+
+/* Reads the input via P_DATA into register, byteswaps it, and stores it in
+   the DATA array. */
+#define SWAP(index, register)  \
+   movlOFFSET(index)(P_DATA), register;\
+   bswap   register;   \
+   movlregister, OFFSET(index)(DATA)
+
+/* push/pop wrappers that update the DWARF unwind table */
+#define PUSH(regname)  \
+   push%regname;   \
+   CFI_ADJUST_CFA_OFFSET   8;  \
+   CFI_REL_OFFSET  regname, 0
+
+#define POP(regname)   \
+   pop %regname;   \
+   CFI_ADJUST_CFA_OFFSET   -8; \
+   CFI_RESTORE regname
+
+/*
+ * expand(i) is the expansion function
+ *
+ *   W[i] = (W[i - 16] ^ W[i - 14] ^ W[i - 8] ^ W[i - 3])  1
+ *
+ * where W[i] is stored in DATA[i mod 16].
+ *
+ * Result is stored back in W[i], and also left in TMP, the only
+ * register that is used.
+ */
+#define EXPAND(i)  \
+   movl

[PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-08 Thread Benjamin Gilbert
Add x86-optimized implementation of the SHA-1 hash function, taken from
Nettle under the LGPL.  This code will be enabled on kernels compiled for
486es or better; kernels which support 386es will use the generic
implementation (since we need BSWAP).

We disable building lib/sha1.o when an optimized implementation is
available, as the library link order for x86 (and x86_64) would otherwise
ignore the optimized version.  The existing optimized implementation for ARM
does not do this; the library link order for that architecture appears to
favor the arch/arm/ version automatically.  I've left this situation alone
since I'm not familiar with the ARM code, but a !ARM condition could be
added to CONFIG_SHA1_GENERIC if it makes sense.

The code has been tested with tcrypt and the NIST test vectors.

Signed-off-by: Benjamin Gilbert [EMAIL PROTECTED]
---

 arch/i386/kernel/i386_ksyms.c |5 +
 arch/i386/lib/Makefile|1 
 arch/i386/lib/sha1.S  |  299 +
 include/linux/cryptohash.h|9 +
 lib/Kconfig   |   13 ++
 lib/Makefile  |3 
 6 files changed, 328 insertions(+), 2 deletions(-)

diff --git a/arch/i386/kernel/i386_ksyms.c b/arch/i386/kernel/i386_ksyms.c
index e3d4b73..812bc4e 100644
--- a/arch/i386/kernel/i386_ksyms.c
+++ b/arch/i386/kernel/i386_ksyms.c
@@ -1,4 +1,5 @@
 #include linux/module.h
+#include linux/cryptohash.h
 #include asm/checksum.h
 #include asm/desc.h
 
@@ -28,3 +29,7 @@ EXPORT_SYMBOL(__read_lock_failed);
 #endif
 
 EXPORT_SYMBOL(csum_partial);
+
+#ifdef CONFIG_SHA1_X86
+EXPORT_SYMBOL(sha_transform);
+#endif
diff --git a/arch/i386/lib/Makefile b/arch/i386/lib/Makefile
index 22d8ac5..69f4845 100644
--- a/arch/i386/lib/Makefile
+++ b/arch/i386/lib/Makefile
@@ -6,6 +6,7 @@
 lib-y = checksum.o delay.o usercopy.o getuser.o putuser.o memcpy.o strstr.o \
bitops.o semaphore.o
 
+lib-$(CONFIG_SHA1_X86) += sha1.o
 lib-$(CONFIG_X86_USE_3DNOW) += mmx.o
 
 obj-$(CONFIG_SMP)  += msr-on-cpu.o
diff --git a/arch/i386/lib/sha1.S b/arch/i386/lib/sha1.S
new file mode 100644
index 000..28aa4b7
--- /dev/null
+++ b/arch/i386/lib/sha1.S
@@ -0,0 +1,299 @@
+/*
+ * x86-optimized SHA1 hash algorithm (i486 and above)
+ *
+ * Originally from Nettle
+ * Ported from M4 to cpp by Benjamin Gilbert [EMAIL PROTECTED]
+ *
+ * Copyright (C) 2004, Niels M�ller
+ * Copyright (C) 2006-2007 Carnegie Mellon University
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License as
+ * published by the Free Software Foundation.  A copy of the GNU Lesser General
+ * Public License should have been distributed along with this library in the
+ * file LICENSE.LGPL.
+ *
+ * This library is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
+ * for more details.
+ */
+
+#include linux/linkage.h
+#include asm/dwarf2.h
+
+/* Register usage */
+#define SA %eax
+#define SB %ebx
+#define SC %ecx
+#define SD %edx
+#define SE %ebp
+#define DATA   %esp
+#define TMP%edi
+#define TMP2   %esi/* Used by SWAP and F3 */
+#define TMP3   64(%esp)
+
+/* Constants */
+#define K1VALUE$0x5A827999 /* Rounds  0-19 */
+#define K2VALUE$0x6ED9EBA1 /* Rounds 20-39 */
+#define K3VALUE$0x8F1BBCDC /* Rounds 40-59 */
+#define K4VALUE$0xCA62C1D6 /* Rounds 60-79 */
+
+/* Convert stack offsets in words to offsets in bytes */
+#define OFFSET(i) 4*(i)
+
+/* Reads the input via TMP2 into register, byteswaps it, and stores it in
+   the DATA array. */
+#define SWAP(index, register)  \
+   movlOFFSET(index)(TMP2), register;  \
+   bswap   register;   \
+   movlregister, OFFSET(index)(DATA)
+
+/* Sets the workspace word at the given index to TMP. */
+#define CLEAR(index)   \
+   movlTMP, OFFSET(index)(DATA)
+
+/* pushl/popl wrappers that update the DWARF unwind table */
+#define PUSH(regname)  \
+   pushl   %regname;   \
+   CFI_ADJUST_CFA_OFFSET   4;  \
+   CFI_REL_OFFSET  regname, 0
+
+#define POP(regname)   \
+   popl%regname;   \
+   CFI_ADJUST_CFA_OFFSET   -4; \
+   CFI_RESTORE regname
+
+/*
+ * expand(i) is the expansion function
+ *
+ *   W[i] = (W[i - 16] ^ W[i - 14] ^ W[i - 8] ^ W[i - 3])  1
+ *
+ * where W[i] is stored in DATA[i mod 16].
+ *
+ * Result is stored back in W[i

Re: Failure to release lock after CPU hot-unplug canceled

2007-01-09 Thread Benjamin Gilbert

Heiko Carstens wrote:

On Tue, Jan 09, 2007 at 05:57:40PM +0530, Srivatsa Vaddagiri wrote:

On Tue, Jan 09, 2007 at 01:17:38PM +0100, Heiko Carstens wrote:

The workqueue code grabs a lock on CPU_[UP|DOWN]_PREPARE and releases it
again on CPU_DOWN_FAILED/CPU_UP_CANCELED. If something in the callchain
returns NOTIFY_BAD the rest of the entries in the callchain won't be
called anymore. But DOWN_FAILED/UP_CANCELED will be called for every
entry.
So we might even end up with a mutex_unlock(_mutex) even if
mutex_lock(_mutex) hasn't been called...

>>

This is a known problem. Gautham had sent out patches to address them

http://lkml.org/lkml/2006/11/14/93

Looks like they are in latest mm tree. Perhaps the testcase should be
retried against latest mm.

>

Ah, nice! Wasn't aware of that. But I still think we should have a
CPU_DOWN_FAILED in case CPU_DOWN_PREPARED failed.
Also the slab cache code hasn't been changed to make use of the of the
new CPU_LOCK_[ACQUIRE|RELEASE] stuff. I'm going to send patches in reply
to this mail.


2.6.20-rc3-mm1 plus your patches fixes it for me.

Thanks
--Benjamin Gilbert

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Failure to release lock after CPU hot-unplug canceled

2007-01-09 Thread Benjamin Gilbert

Heiko Carstens wrote:

On Tue, Jan 09, 2007 at 05:57:40PM +0530, Srivatsa Vaddagiri wrote:

On Tue, Jan 09, 2007 at 01:17:38PM +0100, Heiko Carstens wrote:

The workqueue code grabs a lock on CPU_[UP|DOWN]_PREPARE and releases it
again on CPU_DOWN_FAILED/CPU_UP_CANCELED. If something in the callchain
returns NOTIFY_BAD the rest of the entries in the callchain won't be
called anymore. But DOWN_FAILED/UP_CANCELED will be called for every
entry.
So we might even end up with a mutex_unlock(workqueue_mutex) even if
mutex_lock(workqueue_mutex) hasn't been called...



This is a known problem. Gautham had sent out patches to address them

http://lkml.org/lkml/2006/11/14/93

Looks like they are in latest mm tree. Perhaps the testcase should be
retried against latest mm.



Ah, nice! Wasn't aware of that. But I still think we should have a
CPU_DOWN_FAILED in case CPU_DOWN_PREPARED failed.
Also the slab cache code hasn't been changed to make use of the of the
new CPU_LOCK_[ACQUIRE|RELEASE] stuff. I'm going to send patches in reply
to this mail.


2.6.20-rc3-mm1 plus your patches fixes it for me.

Thanks
--Benjamin Gilbert

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/