date:20170303

[GIT PULL] parisc architecture fixes for 4.11

2017-03-03 Thread Helge Deller

Hi Linus,

Please pull the changes for the parisc architecture for v4.11 from
  git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git 
parisc-4.11-1

Nothing really important in this patchset: Fix resource leaks in error paths,
coding style cleanups and code removal.

Thanks,
Helge


Arvind Yadav (3):
  parisc: eisa: Remove coding style errors
  parisc: eisa: Fix resource leaks in error paths
  parisc: ccio-dma: Handle return NULL error from ioremap_nocache

Dan Carpenter (1):
  parisc: fix a printk

Helge Deller (1):
  parisc: Define access_ok() as macro

John David Anglin (1):
  parisc: Remove flush_user_dcache_range and flush_user_icache_range

 arch/parisc/include/asm/cacheflush.h |   2 -
 arch/parisc/include/asm/uaccess.h|   6 +-
 arch/parisc/kernel/cache.c   |  18 --
 arch/parisc/kernel/signal.c  |  13 ++--
 arch/parisc/mm/fault.c   |   4 +-
 drivers/parisc/ccio-dma.c|   6 +-
 drivers/parisc/eisa.c| 122 +++
 7 files changed, 83 insertions(+), 88 deletions(-)

[GIT PULL] parisc architecture fixes for 4.11

2017-03-03 Thread Helge Deller

Hi Linus,

Please pull the changes for the parisc architecture for v4.11 from
  git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git 
parisc-4.11-1

Nothing really important in this patchset: Fix resource leaks in error paths,
coding style cleanups and code removal.

Thanks,
Helge


Arvind Yadav (3):
  parisc: eisa: Remove coding style errors
  parisc: eisa: Fix resource leaks in error paths
  parisc: ccio-dma: Handle return NULL error from ioremap_nocache

Dan Carpenter (1):
  parisc: fix a printk

Helge Deller (1):
  parisc: Define access_ok() as macro

John David Anglin (1):
  parisc: Remove flush_user_dcache_range and flush_user_icache_range

 arch/parisc/include/asm/cacheflush.h |   2 -
 arch/parisc/include/asm/uaccess.h|   6 +-
 arch/parisc/kernel/cache.c   |  18 --
 arch/parisc/kernel/signal.c  |  13 ++--
 arch/parisc/mm/fault.c   |   4 +-
 drivers/parisc/ccio-dma.c|   6 +-
 drivers/parisc/eisa.c| 122 +++
 7 files changed, 83 insertions(+), 88 deletions(-)

[PATCH ALT5] audit: ignore module syscalls on inode child

2017-03-03 Thread Richard Guy Briggs

Tracefs or debugfs were causing hundreds to thousands of null PATH
records to be associated with the init_module and finit_module SYSCALL
records on a few modules when the following rule was in place for
startup:
-a always,exit -F arch=x86_64 -S init_module -F key=mod-load

In __audit_inode_child, return immedialy upon detecting module-related
syscalls.

See https://github.com/linux-audit/audit-kernel/issues/8
Test case: https://github.com/linux-audit/audit-testsuite/issues/42

Signed-off-by: Richard Guy Briggs 
---
 kernel/auditsc.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 4db32e8..d7fe943 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1868,6 +1868,12 @@ void __audit_inode_child(struct inode *parent,
 
if (!context->in_syscall)
return;
+   switch (context->major) {
+   case __NR_init_module:
+   case __NR_delete_module:
+   case __NR_finit_module:
+   return;
+   }
 
if (inode)
handle_one(inode);
-- 
1.7.1

[PATCH ALT5] audit: ignore module syscalls on inode child

2017-03-03 Thread Richard Guy Briggs

Tracefs or debugfs were causing hundreds to thousands of null PATH
records to be associated with the init_module and finit_module SYSCALL
records on a few modules when the following rule was in place for
startup:
-a always,exit -F arch=x86_64 -S init_module -F key=mod-load

In __audit_inode_child, return immedialy upon detecting module-related
syscalls.

See https://github.com/linux-audit/audit-kernel/issues/8
Test case: https://github.com/linux-audit/audit-testsuite/issues/42

Signed-off-by: Richard Guy Briggs 
---
 kernel/auditsc.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 4db32e8..d7fe943 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1868,6 +1868,12 @@ void __audit_inode_child(struct inode *parent,
 
if (!context->in_syscall)
return;
+   switch (context->major) {
+   case __NR_init_module:
+   case __NR_delete_module:
+   case __NR_finit_module:
+   return;
+   }
 
if (inode)
handle_one(inode);
-- 
1.7.1

Re: [PATCH 2/3] mtd: Add support for reading MTD devices via the nvmem API

2017-03-03 Thread Richard Weinberger

Am 03.03.2017 um 15:11 schrieb Boris Brezillon:
>> And add a list of successfully added notifiers, along with their
>> data pointer, to the MTD device. That's simple and would also remove
>> the need for notifier to have a private list of their instances as I
>> had to do here.
> 
> And then you're abusing the notifier concept. As said earlier, a
> notifier is not necessarily using the device, and thus, don't
> necessarily need private data.
> It's not only about what is the simplest solution for your use case,
> but also what other users want/need.

Yes, please don't use the mtd_notifier.
I strongly vote to embed the nvmem pointer into struct mtd_info.

Thanks,
//richard

Re: [PATCH 2/3] mtd: Add support for reading MTD devices via the nvmem API

2017-03-03 Thread Richard Weinberger

Am 03.03.2017 um 15:11 schrieb Boris Brezillon:
>> And add a list of successfully added notifiers, along with their
>> data pointer, to the MTD device. That's simple and would also remove
>> the need for notifier to have a private list of their instances as I
>> had to do here.
> 
> And then you're abusing the notifier concept. As said earlier, a
> notifier is not necessarily using the device, and thus, don't
> necessarily need private data.
> It's not only about what is the simplest solution for your use case,
> but also what other users want/need.

Yes, please don't use the mtd_notifier.
I strongly vote to embed the nvmem pointer into struct mtd_info.

Thanks,
//richard

Re: [PATCHv2] omap3isp: add support for CSI1 bus

2017-03-03 Thread Sakari Ailus

Hi Pavel,

On Thu, Mar 02, 2017 at 01:38:48PM +0100, Pavel Machek wrote:
> Hi!
> 
> > > Ok, how about this one?
> > > omap3isp: add rest of CSI1 support
> > > 
> > > CSI1 needs one more bit to be set up. Do just that.
> > > 
> > > It is not as straightforward as I'd like, see the comments in the code
> > > for explanation.
> ...
> > > + if (isp->phy_type == ISP_PHY_TYPE_3430) {
> > > + struct media_pad *pad;
> > > + struct v4l2_subdev *sensor;
> > > + const struct isp_ccp2_cfg *buscfg;
> > > +
> > > + pad = media_entity_remote_pad(>pads[CCP2_PAD_SINK]);
> > > + sensor = media_entity_to_v4l2_subdev(pad->entity);
> > > + /* Struct isp_bus_cfg has union inside */
> > > + buscfg = &((struct isp_bus_cfg *)sensor->host_priv)->bus.ccp2;
> > > +
> > > + csiphy_routing_cfg_3430(>isp_csiphy2,
> > > + ISP_INTERFACE_CCP2B_PHY1,
> > > + enable, !!buscfg->phy_layer,
> > > + buscfg->strobe_clk_pol);
> > 
> > You should do this through omap3isp_csiphy_acquire(), and not call
> > csiphy_routing_cfg_3430() directly from here.
> 
> Well, unfortunately omap3isp_csiphy_acquire() does have csi2
> assumptions hard-coded :-(.
> 
> This will probably fail.
> 
>   rval = omap3isp_csi2_reset(phy->csi2);
>   if (rval < 0)
>   goto done;

Yes. It needs to be fixed. :-)

>   
> And this will oops:
> 
> static int omap3isp_csiphy_config(struct isp_csiphy *phy)
> {
>   struct isp_csi2_device *csi2 = phy->csi2;
> struct isp_pipeline *pipe = to_isp_pipeline(>subdev.entity);
>   struct isp_bus_cfg *buscfg = pipe->external->host_priv;

There seems to be some more work left, yes. :-I

> 
> > > @@ -1137,10 +1159,19 @@ int omap3isp_ccp2_init(struct isp_device *isp)
> > >   if (isp->revision == ISP_REVISION_2_0) {
> > >   ccp2->vdds_csib = devm_regulator_get(isp->dev, "vdds_csib");
> > >   if (IS_ERR(ccp2->vdds_csib)) {
> > > + if (PTR_ERR(ccp2->vdds_csib) == -EPROBE_DEFER)
> > > + return -EPROBE_DEFER;
> > 
> > This should go to a separate patch.
> 
> Ok, easy enough.
> 
> > >   dev_dbg(isp->dev,
> > >   "Could not get regulator vdds_csib\n");
> > >   ccp2->vdds_csib = NULL;
> > >   }
> > > + /*
> > > +  * If we set up ccp2->phy here,
> > > +  * omap3isp_csiphy_acquire() will go ahead and assume
> > > +  * csi2, dereferencing some null pointers.
> > > +  *
> > > +  * ccp2->phy = >isp_csiphy2;
> > 
> > That needs to be fixed separately.
> 
> See analysis above. Yes, it would be nice to fix it. Can you provide
> some hints how to do that? Maybe even patch to test? :-).

If I only will have the time. Let's see if I can find some time this
week-end.

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ai...@iki.fi XMPP: sai...@retiisi.org.uk

Re: [PATCHv2] omap3isp: add support for CSI1 bus

2017-03-03 Thread Sakari Ailus

Hi Pavel,

On Thu, Mar 02, 2017 at 01:38:48PM +0100, Pavel Machek wrote:
> Hi!
> 
> > > Ok, how about this one?
> > > omap3isp: add rest of CSI1 support
> > > 
> > > CSI1 needs one more bit to be set up. Do just that.
> > > 
> > > It is not as straightforward as I'd like, see the comments in the code
> > > for explanation.
> ...
> > > + if (isp->phy_type == ISP_PHY_TYPE_3430) {
> > > + struct media_pad *pad;
> > > + struct v4l2_subdev *sensor;
> > > + const struct isp_ccp2_cfg *buscfg;
> > > +
> > > + pad = media_entity_remote_pad(>pads[CCP2_PAD_SINK]);
> > > + sensor = media_entity_to_v4l2_subdev(pad->entity);
> > > + /* Struct isp_bus_cfg has union inside */
> > > + buscfg = &((struct isp_bus_cfg *)sensor->host_priv)->bus.ccp2;
> > > +
> > > + csiphy_routing_cfg_3430(>isp_csiphy2,
> > > + ISP_INTERFACE_CCP2B_PHY1,
> > > + enable, !!buscfg->phy_layer,
> > > + buscfg->strobe_clk_pol);
> > 
> > You should do this through omap3isp_csiphy_acquire(), and not call
> > csiphy_routing_cfg_3430() directly from here.
> 
> Well, unfortunately omap3isp_csiphy_acquire() does have csi2
> assumptions hard-coded :-(.
> 
> This will probably fail.
> 
>   rval = omap3isp_csi2_reset(phy->csi2);
>   if (rval < 0)
>   goto done;

Yes. It needs to be fixed. :-)

>   
> And this will oops:
> 
> static int omap3isp_csiphy_config(struct isp_csiphy *phy)
> {
>   struct isp_csi2_device *csi2 = phy->csi2;
> struct isp_pipeline *pipe = to_isp_pipeline(>subdev.entity);
>   struct isp_bus_cfg *buscfg = pipe->external->host_priv;

There seems to be some more work left, yes. :-I

> 
> > > @@ -1137,10 +1159,19 @@ int omap3isp_ccp2_init(struct isp_device *isp)
> > >   if (isp->revision == ISP_REVISION_2_0) {
> > >   ccp2->vdds_csib = devm_regulator_get(isp->dev, "vdds_csib");
> > >   if (IS_ERR(ccp2->vdds_csib)) {
> > > + if (PTR_ERR(ccp2->vdds_csib) == -EPROBE_DEFER)
> > > + return -EPROBE_DEFER;
> > 
> > This should go to a separate patch.
> 
> Ok, easy enough.
> 
> > >   dev_dbg(isp->dev,
> > >   "Could not get regulator vdds_csib\n");
> > >   ccp2->vdds_csib = NULL;
> > >   }
> > > + /*
> > > +  * If we set up ccp2->phy here,
> > > +  * omap3isp_csiphy_acquire() will go ahead and assume
> > > +  * csi2, dereferencing some null pointers.
> > > +  *
> > > +  * ccp2->phy = >isp_csiphy2;
> > 
> > That needs to be fixed separately.
> 
> See analysis above. Yes, it would be nice to fix it. Can you provide
> some hints how to do that? Maybe even patch to test? :-).

If I only will have the time. Let's see if I can find some time this
week-end.

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ai...@iki.fi XMPP: sai...@retiisi.org.uk

Re: [media] omap3isp: Correctly set IO_OUT_SEL and VP_CLK_POL for CCP2 mode

2017-03-03 Thread Pavel Machek

Hi!

> [auto build test ERROR on linuxtv-media/master]
> [also build test ERROR on v4.10 next-20170303]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 

Yes, the patch is against Sakari's ccp2 branch. It should work ok there.

I don't think you can do much to fix the automated system


Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [media] omap3isp: Correctly set IO_OUT_SEL and VP_CLK_POL for CCP2 mode

2017-03-03 Thread Pavel Machek

Hi!

> [auto build test ERROR on linuxtv-media/master]
> [also build test ERROR on v4.10 next-20170303]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 

Yes, the patch is against Sakari's ccp2 branch. It should work ok there.

I don't think you can do much to fix the automated system


Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

[GIT PULL] libnvdimm fixes for 4.11-rc1

2017-03-03 Thread Dan Williams

Hi Linus, please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm libnvdimm-fixes

...to receive a fix and regression test case for nvdimm namespace
label compatibility.

Details:
An "nvdimm namespace label" is metadata on an nvdimm that provisions
dimm capacity into a "namespace" that can host a block device /
dax-filesytem, or a device-dax character device. A namespace is an
object that other operating environment and platform firmware needs to
comprehend for capabilities like booting from an nvdimm. The label
metadata contains a checksum that Linux was not calculating correctly
leading to other environments rejecting the Linux label.

These have received a build success notification from the kbuild
robot, and a positive test result from Nick who reported the problem.

---

The following changes since commit c470abd4fde40ea6a0846a2beab642a578c0b8cd:

  Linux 4.10 (2017-02-19 14:34:00 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm libnvdimm-fixes

for you to fetch changes up to 86ef58a4e35e8fa66afb5898cf6dec6a3bb29f67:

  nfit, libnvdimm: fix interleave set cookie calculation (2017-03-01
00:49:42 -0800)


Dan Williams (2):
  tools/testing/nvdimm: make iset cookie predictable
  nfit, libnvdimm: fix interleave set cookie calculation

 drivers/acpi/nfit/core.c | 16 +++-
 drivers/nvdimm/namespace_devs.c  | 18 ++
 drivers/nvdimm/nd.h  |  1 +
 drivers/nvdimm/region_devs.c |  9 +
 include/linux/libnvdimm.h|  2 ++
 tools/testing/nvdimm/test/nfit.c | 14 +++---
 6 files changed, 48 insertions(+), 12 deletions(-)

commit 86ef58a4e35e8fa66afb5898cf6dec6a3bb29f67
Author: Dan Williams 
Date:   Tue Feb 28 18:32:48 2017 -0800

nfit, libnvdimm: fix interleave set cookie calculation

The interleave-set cookie is a sum that sanity checks the composition of
an interleave set has not changed from when the namespace was initially
created.  The checksum is calculated by sorting the DIMMs by their
location in the interleave-set. The comparison for the sort must be
64-bit wide, not byte-by-byte as performed by memcmp() in the broken
case.

Fix the implementation to accept correct cookie values in addition to
the Linux "memcmp" order cookies, but only allow correct cookies to be
generated going forward. It does mean that namespaces created by
third-party-tooling, or created by newer kernels with this fix, will not
validate on older kernels. However, there are a couple mitigating
conditions:

1/ platforms with namespace-label capable NVDIMMs are not widely
   available.

2/ interleave-sets with a single-dimm are by definition not affected
   (nothing to sort). This covers the QEMU-KVM NVDIMM emulation case.

The cookie stored in the namespace label will be fixed by any write the
namespace label, the most straightforward way to achieve this is to
write to the "alt_name" attribute of a namespace in sysfs.

Cc: 
Fixes: eaf961536e16 ("libnvdimm, nfit: add interleave-set
state-tracking infrastructure")
Reported-by: Nicholas Moulin 
Tested-by: Nicholas Moulin 
Signed-off-by: Dan Williams 

commit df06a2d57711a1472ced72207373eeb6422d4721
Author: Dan Williams 
Date:   Wed Mar 1 00:03:37 2017 -0800

tools/testing/nvdimm: make iset cookie predictable

For testing changes to the iset cookie algorithm we need a value that is
constant from run-to-run.

Stop including dynamic data in the emulated region_offset values. Also,
pick values that sort in a different order depending on whether the
comparison is a memcmp() of two 8-byte arrays or subtraction of two
64-bit values.

Signed-off-by: Dan Williams

[GIT PULL] libnvdimm fixes for 4.11-rc1

2017-03-03 Thread Dan Williams

Hi Linus, please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm libnvdimm-fixes

...to receive a fix and regression test case for nvdimm namespace
label compatibility.

Details:
An "nvdimm namespace label" is metadata on an nvdimm that provisions
dimm capacity into a "namespace" that can host a block device /
dax-filesytem, or a device-dax character device. A namespace is an
object that other operating environment and platform firmware needs to
comprehend for capabilities like booting from an nvdimm. The label
metadata contains a checksum that Linux was not calculating correctly
leading to other environments rejecting the Linux label.

These have received a build success notification from the kbuild
robot, and a positive test result from Nick who reported the problem.

---

The following changes since commit c470abd4fde40ea6a0846a2beab642a578c0b8cd:

  Linux 4.10 (2017-02-19 14:34:00 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm libnvdimm-fixes

for you to fetch changes up to 86ef58a4e35e8fa66afb5898cf6dec6a3bb29f67:

  nfit, libnvdimm: fix interleave set cookie calculation (2017-03-01
00:49:42 -0800)


Dan Williams (2):
  tools/testing/nvdimm: make iset cookie predictable
  nfit, libnvdimm: fix interleave set cookie calculation

 drivers/acpi/nfit/core.c | 16 +++-
 drivers/nvdimm/namespace_devs.c  | 18 ++
 drivers/nvdimm/nd.h  |  1 +
 drivers/nvdimm/region_devs.c |  9 +
 include/linux/libnvdimm.h|  2 ++
 tools/testing/nvdimm/test/nfit.c | 14 +++---
 6 files changed, 48 insertions(+), 12 deletions(-)

commit 86ef58a4e35e8fa66afb5898cf6dec6a3bb29f67
Author: Dan Williams 
Date:   Tue Feb 28 18:32:48 2017 -0800

nfit, libnvdimm: fix interleave set cookie calculation

The interleave-set cookie is a sum that sanity checks the composition of
an interleave set has not changed from when the namespace was initially
created.  The checksum is calculated by sorting the DIMMs by their
location in the interleave-set. The comparison for the sort must be
64-bit wide, not byte-by-byte as performed by memcmp() in the broken
case.

Fix the implementation to accept correct cookie values in addition to
the Linux "memcmp" order cookies, but only allow correct cookies to be
generated going forward. It does mean that namespaces created by
third-party-tooling, or created by newer kernels with this fix, will not
validate on older kernels. However, there are a couple mitigating
conditions:

1/ platforms with namespace-label capable NVDIMMs are not widely
   available.

2/ interleave-sets with a single-dimm are by definition not affected
   (nothing to sort). This covers the QEMU-KVM NVDIMM emulation case.

The cookie stored in the namespace label will be fixed by any write the
namespace label, the most straightforward way to achieve this is to
write to the "alt_name" attribute of a namespace in sysfs.

Cc: 
Fixes: eaf961536e16 ("libnvdimm, nfit: add interleave-set
state-tracking infrastructure")
Reported-by: Nicholas Moulin 
Tested-by: Nicholas Moulin 
Signed-off-by: Dan Williams 

commit df06a2d57711a1472ced72207373eeb6422d4721
Author: Dan Williams 
Date:   Wed Mar 1 00:03:37 2017 -0800

tools/testing/nvdimm: make iset cookie predictable

For testing changes to the iset cookie algorithm we need a value that is
constant from run-to-run.

Stop including dynamic data in the emulated region_offset values. Also,
pick values that sort in a different order depending on whether the
comparison is a memcmp() of two 8-byte arrays or subtraction of two
64-bit values.

Signed-off-by: Dan Williams

Re: [RFC PATCH 2/2] mtd: devices: m25p80: Enable spi-nor bounce buffer support

2017-03-03 Thread Vignesh R


   
>>> Not really, I am debugging another issue with UBIFS on DRA74 EVM (ARM
>>> cortex-a15) wherein pages allocated by vmalloc are in highmem region
>>> that are not addressable using 32 bit addresses and is backed by LPAE.
>>> So, a 32 bit DMA cannot access these buffers at all.
>>> When dma_map_sg() is called to map these pages by spi_map_buf() the
>>> physical address is just truncated to 32 bit in pfn_to_dma() (as part of
>>> dma_map_sg() call). This results in random crashes as DMA starts
>>> accessing random memory during SPI read.
>>>
>>> IMO, there may be more undiscovered caveat with using dma_map_sg() for
>>> non kmalloc'd buffers and its better that spi-nor starts handling these
>>> buffers instead of relying on spi_map_msg() and working around every
>>> time something pops up.
>>>
>> Ok, I had a closer look at the SPI framework, and it seems there's a
>> way to tell to the core that a specific transfer cannot use DMA
>> (->can_dam()). The first thing you should do is fix the spi-davinci
>> driver:
>>
>> 1/ implement ->can_dma()
>> 2/ patch davinci_spi_bufs() to take the decision to do DMA or not on a
>>per-xfer basis and not on a per-device basis
>>

This would lead to poor perf defeating entire purpose of using DMA.

>> Then we can start thinking about how to improve perfs by using a bounce
>> buffer for large transfers, but I'm still not sure this should be done
>> at the MTD level...

If its at SPI level, then I guess each individual drivers which cannot
handle vmalloc'd buffers will have to implement bounce buffer logic.

Or SPI core can be extended in a way similar to this RFC. That is, SPI
master driver will set a flag to request SPI core to use of bounce
buffer for vmalloc'd buffers. And spi_map_buf() just uses bounce buffer
in case buf does not belong to kmalloc region based on the flag.

Mark, Cyrille, Is that what you prefer?

-- 
Regards
Vignesh

Re: [RFC PATCH 2/2] mtd: devices: m25p80: Enable spi-nor bounce buffer support

2017-03-03 Thread Vignesh R


   
>>> Not really, I am debugging another issue with UBIFS on DRA74 EVM (ARM
>>> cortex-a15) wherein pages allocated by vmalloc are in highmem region
>>> that are not addressable using 32 bit addresses and is backed by LPAE.
>>> So, a 32 bit DMA cannot access these buffers at all.
>>> When dma_map_sg() is called to map these pages by spi_map_buf() the
>>> physical address is just truncated to 32 bit in pfn_to_dma() (as part of
>>> dma_map_sg() call). This results in random crashes as DMA starts
>>> accessing random memory during SPI read.
>>>
>>> IMO, there may be more undiscovered caveat with using dma_map_sg() for
>>> non kmalloc'd buffers and its better that spi-nor starts handling these
>>> buffers instead of relying on spi_map_msg() and working around every
>>> time something pops up.
>>>
>> Ok, I had a closer look at the SPI framework, and it seems there's a
>> way to tell to the core that a specific transfer cannot use DMA
>> (->can_dam()). The first thing you should do is fix the spi-davinci
>> driver:
>>
>> 1/ implement ->can_dma()
>> 2/ patch davinci_spi_bufs() to take the decision to do DMA or not on a
>>per-xfer basis and not on a per-device basis
>>

This would lead to poor perf defeating entire purpose of using DMA.

>> Then we can start thinking about how to improve perfs by using a bounce
>> buffer for large transfers, but I'm still not sure this should be done
>> at the MTD level...

If its at SPI level, then I guess each individual drivers which cannot
handle vmalloc'd buffers will have to implement bounce buffer logic.

Or SPI core can be extended in a way similar to this RFC. That is, SPI
master driver will set a flag to request SPI core to use of bounce
buffer for vmalloc'd buffers. And spi_map_buf() just uses bounce buffer
in case buf does not belong to kmalloc region based on the flag.

Mark, Cyrille, Is that what you prefer?

-- 
Regards
Vignesh

Re: [PATCH] ARM: dts: exynos: Use thermal fuse value for thermal zone 0 on Exynos5420

2017-03-03 Thread Javier Martinez Canillas

Hello Krzysztof,

On 02/11/2017 05:14 PM, Krzysztof Kozlowski wrote:
> In Odroid XU3 Lite board, the temperature levels reported for thermal
> zone 0 were weird. In warm room:
>   /sys/class/thermal/thermal_zone0/temp:32000
>   /sys/class/thermal/thermal_zone1/temp:51000
>   /sys/class/thermal/thermal_zone2/temp:55000
>   /sys/class/thermal/thermal_zone3/temp:54000
>   /sys/class/thermal/thermal_zone4/temp:51000
> 
> Sometimes after booting the value was even equal to ambient temperature
> which is highly unlikely to be a real temperature of sensor in SoC.
> 
> The thermal sensor's calibration (trimming) is based on fused values.
> In case of the board above, the fused values are: 35, 52, 43, 58 and 43
> (corresponding to each TMU device).  However driver defined a minimum value
> for fused data as 40 and for smaller values it was using a hard-coded 55
> instead.  This lead to mapping data from sensor to wrong temperatures
> for thermal zone 0.
> 
> Various vendor 3.10 trees (Hardkernel's based on Samsung LSI, Artik 10)
> do not impose any limits on fused values.  Since we do not have any
> knowledge about these limits, use 0 as a minimum accepted fused value.
> This should essentially allow accepting any reasonable fused value thus
> behaving like vendor driver.
> 
> The exynos5420-tmu-sensor-conf.dtsi is copied directly from existing
> exynso4412 with one change - the samsung,tmu_min_efuse_value.
> 
> Signed-off-by: Krzysztof Kozlowski 
> 
> ---
> 
> Testing on other Exynos542x boards is much appreciated. Especially I
> wonder what efuse values are there.

I tested on both Exynos5422 Odroid XU4 and Exynos5800 Peach Pi boards.

The temperatures levels reported for these two boards on a warm room are:

Odroid XU4

# cat /sys/class/thermal/thermal_zone*/temp
5
5
54000
51000
48000

Peach Pi

# cat /sys/class/thermal/thermal_zone*/temp
42000
44000
27000 <-- weird value for thermal zone 2 like zone 0 in your XU3 lite
45000
45000

And the efuse values for the TMU devices are:

Odroid XU4

TMU0 = 45
TMU1 = 44
TMU2 = 44
TMU3 = 46
TMU3 = 46

Peach Pi

TMU0 = 44
TMU1 = 46
TMU2 = 36
TMU3 = 53
TMU3 = 46

The fused value for TMU2 is < 40 so that explains the weird temperature
level for thermal zone 2 in Peach. So after your patch, makes more sense:

# cat /sys/class/thermal/thermal_zone*/temp
41000
42000
45000
43000
43000

I wonder though if 0 is the best value or if we should just lower more
to cover the used e-fuse values in Exynos5 boards. But as you said,
we have no knowledge about these limits...

Reviewed-by: Javier Martinez Canillas 
Tested-by: Javier Martinez Canillas 

Best regards,
-- 
Javier Martinez Canillas
Open Source Group
Samsung Research America

Re: [PATCH] ARM: dts: exynos: Use thermal fuse value for thermal zone 0 on Exynos5420

2017-03-03 Thread Javier Martinez Canillas

Hello Krzysztof,

On 02/11/2017 05:14 PM, Krzysztof Kozlowski wrote:
> In Odroid XU3 Lite board, the temperature levels reported for thermal
> zone 0 were weird. In warm room:
>   /sys/class/thermal/thermal_zone0/temp:32000
>   /sys/class/thermal/thermal_zone1/temp:51000
>   /sys/class/thermal/thermal_zone2/temp:55000
>   /sys/class/thermal/thermal_zone3/temp:54000
>   /sys/class/thermal/thermal_zone4/temp:51000
> 
> Sometimes after booting the value was even equal to ambient temperature
> which is highly unlikely to be a real temperature of sensor in SoC.
> 
> The thermal sensor's calibration (trimming) is based on fused values.
> In case of the board above, the fused values are: 35, 52, 43, 58 and 43
> (corresponding to each TMU device).  However driver defined a minimum value
> for fused data as 40 and for smaller values it was using a hard-coded 55
> instead.  This lead to mapping data from sensor to wrong temperatures
> for thermal zone 0.
> 
> Various vendor 3.10 trees (Hardkernel's based on Samsung LSI, Artik 10)
> do not impose any limits on fused values.  Since we do not have any
> knowledge about these limits, use 0 as a minimum accepted fused value.
> This should essentially allow accepting any reasonable fused value thus
> behaving like vendor driver.
> 
> The exynos5420-tmu-sensor-conf.dtsi is copied directly from existing
> exynso4412 with one change - the samsung,tmu_min_efuse_value.
> 
> Signed-off-by: Krzysztof Kozlowski 
> 
> ---
> 
> Testing on other Exynos542x boards is much appreciated. Especially I
> wonder what efuse values are there.

I tested on both Exynos5422 Odroid XU4 and Exynos5800 Peach Pi boards.

The temperatures levels reported for these two boards on a warm room are:

Odroid XU4

# cat /sys/class/thermal/thermal_zone*/temp
5
5
54000
51000
48000

Peach Pi

# cat /sys/class/thermal/thermal_zone*/temp
42000
44000
27000 <-- weird value for thermal zone 2 like zone 0 in your XU3 lite
45000
45000

And the efuse values for the TMU devices are:

Odroid XU4

TMU0 = 45
TMU1 = 44
TMU2 = 44
TMU3 = 46
TMU3 = 46

Peach Pi

TMU0 = 44
TMU1 = 46
TMU2 = 36
TMU3 = 53
TMU3 = 46

The fused value for TMU2 is < 40 so that explains the weird temperature
level for thermal zone 2 in Peach. So after your patch, makes more sense:

# cat /sys/class/thermal/thermal_zone*/temp
41000
42000
45000
43000
43000

I wonder though if 0 is the best value or if we should just lower more
to cover the used e-fuse values in Exynos5 boards. But as you said,
we have no knowledge about these limits...

Reviewed-by: Javier Martinez Canillas 
Tested-by: Javier Martinez Canillas 

Best regards,
-- 
Javier Martinez Canillas
Open Source Group
Samsung Research America

[PULL 0/3] Xtensa improvements for v4.11

2017-03-03 Thread Max Filippov

Hi Linus,

please pull the following batch of updates for the Xtensa architecture.

The following changes since commit 69973b830859bc6529a7a0468ba0d80ee5117826:

  Linux 4.9 (2016-12-11 11:17:54 -0800)

are available in the git repository at:

  git://github.com/jcmvbkbc/linux-xtensa.git tags/xtensa-20170303

for you to fetch changes up to b46dcfa378b0cdea1ee832802c9e36750e0fffa9:

  xtensa: allow merging vectors into .text section (2017-03-01 12:32:50 -0800)


Xtensa improvements for v4.11:

- clean up bootable image build targets: provide separate 'Image',
  'zImage' and 'uImage' make targets that only build corresponding
  image type. Make 'all' build all images appropriate for a platform;
- allow merging vectors code into .text section as a preparation step
  for XIP support;
- fix handling external FDT when the kernel is built without
  BLK_DEV_INITRD support.


Max Filippov (3):
  xtensa: move parse_tag_fdt out of #ifdef CONFIG_BLK_DEV_INITRD
  xtensa: clean up bootable image build targets
  xtensa: allow merging vectors into .text section

 arch/xtensa/Makefile   |  8 +++
 arch/xtensa/boot/Makefile  | 23 +--
 arch/xtensa/boot/boot-elf/Makefile |  2 +-
 arch/xtensa/boot/boot-redboot/Makefile |  2 +-
 arch/xtensa/boot/boot-uboot/Makefile   | 14 
 arch/xtensa/include/asm/vectors.h  |  4 
 arch/xtensa/kernel/setup.c |  7 --
 arch/xtensa/kernel/vmlinux.lds.S   | 41 ++
 8 files changed, 71 insertions(+), 30 deletions(-)
 delete mode 100644 arch/xtensa/boot/boot-uboot/Makefile

-- 
Thanks.
-- Max

[PULL 0/3] Xtensa improvements for v4.11

2017-03-03 Thread Max Filippov

Hi Linus,

please pull the following batch of updates for the Xtensa architecture.

The following changes since commit 69973b830859bc6529a7a0468ba0d80ee5117826:

  Linux 4.9 (2016-12-11 11:17:54 -0800)

are available in the git repository at:

  git://github.com/jcmvbkbc/linux-xtensa.git tags/xtensa-20170303

for you to fetch changes up to b46dcfa378b0cdea1ee832802c9e36750e0fffa9:

  xtensa: allow merging vectors into .text section (2017-03-01 12:32:50 -0800)


Xtensa improvements for v4.11:

- clean up bootable image build targets: provide separate 'Image',
  'zImage' and 'uImage' make targets that only build corresponding
  image type. Make 'all' build all images appropriate for a platform;
- allow merging vectors code into .text section as a preparation step
  for XIP support;
- fix handling external FDT when the kernel is built without
  BLK_DEV_INITRD support.


Max Filippov (3):
  xtensa: move parse_tag_fdt out of #ifdef CONFIG_BLK_DEV_INITRD
  xtensa: clean up bootable image build targets
  xtensa: allow merging vectors into .text section

 arch/xtensa/Makefile   |  8 +++
 arch/xtensa/boot/Makefile  | 23 +--
 arch/xtensa/boot/boot-elf/Makefile |  2 +-
 arch/xtensa/boot/boot-redboot/Makefile |  2 +-
 arch/xtensa/boot/boot-uboot/Makefile   | 14 
 arch/xtensa/include/asm/vectors.h  |  4 
 arch/xtensa/kernel/setup.c |  7 --
 arch/xtensa/kernel/vmlinux.lds.S   | 41 ++
 8 files changed, 71 insertions(+), 30 deletions(-)
 delete mode 100644 arch/xtensa/boot/boot-uboot/Makefile

-- 
Thanks.
-- Max

[PATCH v4 1/5] staging: nvec: Remove unnecessary cast on void pointer

2017-03-03 Thread simran singhal

The following Coccinelle script was used to detect this:

@r@
expression x;
void* e;
type T;
identifier f;
@@
(
  *((T *)e)
|
  ((T *)x)[...]
|
  ((T*)x)->f
|
- (T*)
  e
)

Signed-off-by: simran singhal 
---
 drivers/staging/nvec/nvec_kbd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/nvec/nvec_kbd.c b/drivers/staging/nvec/nvec_kbd.c
index e881e6b..a01f486 100644
--- a/drivers/staging/nvec/nvec_kbd.c
+++ b/drivers/staging/nvec/nvec_kbd.c
@@ -58,7 +58,7 @@ static int nvec_keys_notifier(struct notifier_block *nb,
  unsigned long event_type, void *data)
 {
int code, state;
-   unsigned char *msg = (unsigned char *)data;
+   unsigned char *msg = data;
 
if (event_type == NVEC_KB_EVT) {
int _size = (msg[0] & (3 << 5)) >> 5;
-- 
2.7.4

[PATCH v4 1/5] staging: nvec: Remove unnecessary cast on void pointer

2017-03-03 Thread simran singhal

The following Coccinelle script was used to detect this:

@r@
expression x;
void* e;
type T;
identifier f;
@@
(
  *((T *)e)
|
  ((T *)x)[...]
|
  ((T*)x)->f
|
- (T*)
  e
)

Signed-off-by: simran singhal 
---
 drivers/staging/nvec/nvec_kbd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/nvec/nvec_kbd.c b/drivers/staging/nvec/nvec_kbd.c
index e881e6b..a01f486 100644
--- a/drivers/staging/nvec/nvec_kbd.c
+++ b/drivers/staging/nvec/nvec_kbd.c
@@ -58,7 +58,7 @@ static int nvec_keys_notifier(struct notifier_block *nb,
  unsigned long event_type, void *data)
 {
int code, state;
-   unsigned char *msg = (unsigned char *)data;
+   unsigned char *msg = data;
 
if (event_type == NVEC_KB_EVT) {
int _size = (msg[0] & (3 << 5)) >> 5;
-- 
2.7.4

Re: [RFC PATCH v2 00/32] x86: Secure Encrypted Virtualization (AMD)

2017-03-03 Thread Brijesh Singh


Hi Bjorn,

On 03/03/2017 02:33 PM, Bjorn Helgaas wrote:

On Thu, Mar 02, 2017 at 10:12:01AM -0500, Brijesh Singh wrote:

This RFC series provides support for AMD's new Secure Encrypted Virtualization
(SEV) feature. This RFC is build upon Secure Memory Encryption (SME) RFCv4 [1].


What kernel version is this series based on?



This patch series is based off of the master branch of tip.
  Commit a27cb9e1b2b4 ("Merge branch 'WIP.sched/core'")
  Tom's RFC v4 patches (http://marc.info/?l=linux-mm=148725973013686=2)

Accidentally, I ended up rebasing SEV RFCv2 patches from updated SME v4 
instead of original SME v4. So you may need to apply patch [1]


[1] http://marc.info/?l=linux-mm=148857523132253=2

Optionally, I have posted the full git tree here [2]

[2] https://github.com/codomania/tip/branches

Re: [RFC PATCH v2 00/32] x86: Secure Encrypted Virtualization (AMD)

2017-03-03 Thread Brijesh Singh


Hi Bjorn,

On 03/03/2017 02:33 PM, Bjorn Helgaas wrote:

On Thu, Mar 02, 2017 at 10:12:01AM -0500, Brijesh Singh wrote:

This RFC series provides support for AMD's new Secure Encrypted Virtualization
(SEV) feature. This RFC is build upon Secure Memory Encryption (SME) RFCv4 [1].


What kernel version is this series based on?



This patch series is based off of the master branch of tip.
  Commit a27cb9e1b2b4 ("Merge branch 'WIP.sched/core'")
  Tom's RFC v4 patches (http://marc.info/?l=linux-mm=148725973013686=2)

Accidentally, I ended up rebasing SEV RFCv2 patches from updated SME v4 
instead of original SME v4. So you may need to apply patch [1]


[1] http://marc.info/?l=linux-mm=148857523132253=2

Optionally, I have posted the full git tree here [2]

[2] https://github.com/codomania/tip/branches

[v5 00/20] x86: Enable User-Mode Instruction Prevention

2017-03-03 Thread Ricardo Neri

This is v5 of this series. The four previous submissions can be found
here [1], here [2], here[3], and here [4]. This version addresses the
comments received in v4 plus improvements of the handling of emulation
in 64-bit builds. Please see details in the change log.

=== What is UMIP?

User-Mode Instruction Prevention (UMIP) is a security feature present in
new Intel Processors. If enabled, it prevents the execution of certain
instructions if the Current Privilege Level (CPL) is greater than 0. If
these instructions were executed while in CPL > 0, user space applications
could have access to system-wide settings such as the global and local
descriptor tables, the segment selectors to the current task state and the
local descriptor table.

These are the instructions covered by UMIP:
* SGDT - Store Global Descriptor Table
* SIDT - Store Interrupt Descriptor Table
* SLDT - Store Local Descriptor Table
* SMSW - Store Machine Status Word
* STR - Store Task Register

If any of these instructions is executed with CPL > 0, a general protection
exception is issued when UMIP is enabled.

=== How does it impact applications?

There is a caveat, however. Certain applications rely on some of these
instructions to function. An example of this are applications that use
WineHQ[6]. For instance, these applications rely on sidt returning a non-
accessible memory location[6]. During the discussions, it was proposed that
the fault could be relied to the user-space and perform the emulation in
user-mode. However, this would break existing applications until, for
instance, they update to a new WineHQ version. However, this approach
would require UMIP to be disabled by default. The consensus in this forum
is to always enable it.

This patchset initially treated tasks running in virtual-8086 mode as a
special case. However, I received clarification that DOSEMU[7] does not
support applications that use these instructions. It relies on WineHQ for
this [8]. Furthermore, the applications for which the concern was raised
run in protected mode [6].

Please note that UMIP is always enabled for both 64-bit and 32-bit Linux
builds. However, emulation of the UMIP-protected instructions is not done
for 64-bit processes. 64-bit user space applications will receive the
SIGSEGV signal when UMIP instructions causes a general protection fault.

=== How are UMIP-protected instructions emulated?

This version keeps UMIP enabled at all times and by default. If a general
protection fault caused by the instructions protected by UMIP is
detected, such fault will be fixed-up by returning dummy values as follows:
 
 * SGDT and SIDT return hard-coded dummy values as the base of the global
   descriptor and interrupt descriptor tables. These hard-coded values
   correspond to memory addresses that are near the end of the kernel
   memory map. This is also the case for virtual-8086 mode tasks. In all
   my experiments in x86_32, the base of GDT and IDT was always a 4-byte
   address, even for 16-bit operands. Thus, my emulation code does the
   same. In all cases, the limit of the table is set to 0.
 * STR and SLDT return 0 as the segment selector. This looks appropriate
   since we are providing a dummy value as the base address of the global
   descriptor table.
 * SMSW returns the value with which the CR0 register is programmed in
   head_32/64.S at boot time. This is, the following bits are enabled:
   CR0.0 for Protection Enable, CR.1 for Monitor Coprocessor, CR.4 for
   Extension Type, which will always be 1 in recent processors with UMIP;
   CR.5 for Numeric Error, CR0.16 for Write Protect, CR0.18 for Alignment
   Mask. As per the Intel 64 and IA-32 Architectures Software Developer's
   Manual, SMSW returns a 16-bit results for memory operands. However, when
   the operand is a register, the results can be up to CR0[63:0]. Since
   the emulation code only kicks-in in x86_32, we return up to CR[31:0].
 * The proposed emulation code is handles faults that happens in both
   protected and virtual-8086 mode.

=== How is this series laid out?

++ Fix bugs in MPX address evaluator
I found very useful the code for Intel MPX (Memory Protection Extensions)
used to parse opcodes and the memory locations contained in the general
purpose registers when used as operands. I put some of this code in
a separate library file that both MPX and UMIP can access and avoid code
duplication. Before creating the new library, I fixed a couple of bugs
that I found in how MPX determines the address contained in the
instruction and operands.

++ Provide a new x86 instruction evaluating library
With bugs fixed, the MPX evaluating code is relocated in a new insn-eval.c
library. The basic functionality of this library is extended to obtain the
segment descriptor selected by either segment override prefixes or the
default segment by the involved registers in the calculation of the
effective address. It was also extended to obtain the default address and
operand sizes as well as

[v5 00/20] x86: Enable User-Mode Instruction Prevention

2017-03-03 Thread Ricardo Neri

This is v5 of this series. The four previous submissions can be found
here [1], here [2], here[3], and here [4]. This version addresses the
comments received in v4 plus improvements of the handling of emulation
in 64-bit builds. Please see details in the change log.

=== What is UMIP?

User-Mode Instruction Prevention (UMIP) is a security feature present in
new Intel Processors. If enabled, it prevents the execution of certain
instructions if the Current Privilege Level (CPL) is greater than 0. If
these instructions were executed while in CPL > 0, user space applications
could have access to system-wide settings such as the global and local
descriptor tables, the segment selectors to the current task state and the
local descriptor table.

These are the instructions covered by UMIP:
* SGDT - Store Global Descriptor Table
* SIDT - Store Interrupt Descriptor Table
* SLDT - Store Local Descriptor Table
* SMSW - Store Machine Status Word
* STR - Store Task Register

If any of these instructions is executed with CPL > 0, a general protection
exception is issued when UMIP is enabled.

=== How does it impact applications?

There is a caveat, however. Certain applications rely on some of these
instructions to function. An example of this are applications that use
WineHQ[6]. For instance, these applications rely on sidt returning a non-
accessible memory location[6]. During the discussions, it was proposed that
the fault could be relied to the user-space and perform the emulation in
user-mode. However, this would break existing applications until, for
instance, they update to a new WineHQ version. However, this approach
would require UMIP to be disabled by default. The consensus in this forum
is to always enable it.

This patchset initially treated tasks running in virtual-8086 mode as a
special case. However, I received clarification that DOSEMU[7] does not
support applications that use these instructions. It relies on WineHQ for
this [8]. Furthermore, the applications for which the concern was raised
run in protected mode [6].

Please note that UMIP is always enabled for both 64-bit and 32-bit Linux
builds. However, emulation of the UMIP-protected instructions is not done
for 64-bit processes. 64-bit user space applications will receive the
SIGSEGV signal when UMIP instructions causes a general protection fault.

=== How are UMIP-protected instructions emulated?

This version keeps UMIP enabled at all times and by default. If a general
protection fault caused by the instructions protected by UMIP is
detected, such fault will be fixed-up by returning dummy values as follows:
 
 * SGDT and SIDT return hard-coded dummy values as the base of the global
   descriptor and interrupt descriptor tables. These hard-coded values
   correspond to memory addresses that are near the end of the kernel
   memory map. This is also the case for virtual-8086 mode tasks. In all
   my experiments in x86_32, the base of GDT and IDT was always a 4-byte
   address, even for 16-bit operands. Thus, my emulation code does the
   same. In all cases, the limit of the table is set to 0.
 * STR and SLDT return 0 as the segment selector. This looks appropriate
   since we are providing a dummy value as the base address of the global
   descriptor table.
 * SMSW returns the value with which the CR0 register is programmed in
   head_32/64.S at boot time. This is, the following bits are enabled:
   CR0.0 for Protection Enable, CR.1 for Monitor Coprocessor, CR.4 for
   Extension Type, which will always be 1 in recent processors with UMIP;
   CR.5 for Numeric Error, CR0.16 for Write Protect, CR0.18 for Alignment
   Mask. As per the Intel 64 and IA-32 Architectures Software Developer's
   Manual, SMSW returns a 16-bit results for memory operands. However, when
   the operand is a register, the results can be up to CR0[63:0]. Since
   the emulation code only kicks-in in x86_32, we return up to CR[31:0].
 * The proposed emulation code is handles faults that happens in both
   protected and virtual-8086 mode.

=== How is this series laid out?

++ Fix bugs in MPX address evaluator
I found very useful the code for Intel MPX (Memory Protection Extensions)
used to parse opcodes and the memory locations contained in the general
purpose registers when used as operands. I put some of this code in
a separate library file that both MPX and UMIP can access and avoid code
duplication. Before creating the new library, I fixed a couple of bugs
that I found in how MPX determines the address contained in the
instruction and operands.

++ Provide a new x86 instruction evaluating library
With bugs fixed, the MPX evaluating code is relocated in a new insn-eval.c
library. The basic functionality of this library is extended to obtain the
segment descriptor selected by either segment override prefixes or the
default segment by the involved registers in the calculation of the
effective address. It was also extended to obtain the default address and
operand sizes as well as

[v5 02/20] x86/mpx: Do not use SIB index if index points to R/ESP

2017-03-03 Thread Ricardo Neri

Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when memory addressing is used
(i.e., mod part of ModR/M is not 3), a SIB byte is used and the index of
the SIB byte points to the R/ESP (i.e., index = 4), the index should not be
used in the computation of the memory address.

In these cases the address is simply the value present in the register
pointed by the base part of the SIB byte plus the displacement byte.

An example of such instruction could be

insn -0x80(%rsp)

This is represented as:

 [opcode] 4c 23 80

  ModR/M=0x4c: mod: 0x1, reg: 0x1: r/m: 0x4(R/ESP)
  SIB=0x23: sc: 0, index: 0x100(R/ESP), base: 0x11(R/EBX):
  Displacement -0x80

The correct address is (base) + displacement; no index is used.

We can achieve the desired effect of not using the index by making
get_reg_offset return -EDOM in this particular case. This value indicates
callers that they should not use the index to calculate the address.
EINVAL continues to indicate that an error when decoding the SIB byte.

Care is taken to allow R12 to be used as index, which is a valid scenario.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Peter Zijlstra 
Cc: Nathan Howard 
Cc: Adan Hawthorn 
Cc: Joe Perches 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/mm/mpx.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index ff112e3..d9e92d6 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -110,6 +110,13 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
regno = X86_SIB_INDEX(insn->sib.value);
if (X86_REX_X(insn->rex_prefix.value))
regno += 8;
+   /*
+* If mod !=3, register R/ESP (regno=4) is not used as index in
+* the address computation. Check is done after looking at REX.X
+* This is because R12 (regno=12) can be used as an index.
+*/
+   if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3)
+   return -EDOM;
break;
 
case REG_TYPE_BASE:
@@ -159,11 +166,19 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
goto out_err;
 
indx_offset = get_reg_offset(insn, regs, 
REG_TYPE_INDEX);
-   if (indx_offset < 0)
+   /*
+* A negative offset generally means a error, except
+* -EDOM, which means that the contents of the register
+* should not be used as index.
+*/
+   if (unlikely(indx_offset == -EDOM))
+   indx = 0;
+   else if (unlikely(indx_offset < 0))
goto out_err;
+   else
+   indx = regs_get_register(regs, indx_offset);
 
base = regs_get_register(regs, base_offset);
-   indx = regs_get_register(regs, indx_offset);
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-- 
2.9.3

[v5 02/20] x86/mpx: Do not use SIB index if index points to R/ESP

2017-03-03 Thread Ricardo Neri

Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when memory addressing is used
(i.e., mod part of ModR/M is not 3), a SIB byte is used and the index of
the SIB byte points to the R/ESP (i.e., index = 4), the index should not be
used in the computation of the memory address.

In these cases the address is simply the value present in the register
pointed by the base part of the SIB byte plus the displacement byte.

An example of such instruction could be

insn -0x80(%rsp)

This is represented as:

 [opcode] 4c 23 80

  ModR/M=0x4c: mod: 0x1, reg: 0x1: r/m: 0x4(R/ESP)
  SIB=0x23: sc: 0, index: 0x100(R/ESP), base: 0x11(R/EBX):
  Displacement -0x80

The correct address is (base) + displacement; no index is used.

We can achieve the desired effect of not using the index by making
get_reg_offset return -EDOM in this particular case. This value indicates
callers that they should not use the index to calculate the address.
EINVAL continues to indicate that an error when decoding the SIB byte.

Care is taken to allow R12 to be used as index, which is a valid scenario.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Peter Zijlstra 
Cc: Nathan Howard 
Cc: Adan Hawthorn 
Cc: Joe Perches 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/mm/mpx.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index ff112e3..d9e92d6 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -110,6 +110,13 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
regno = X86_SIB_INDEX(insn->sib.value);
if (X86_REX_X(insn->rex_prefix.value))
regno += 8;
+   /*
+* If mod !=3, register R/ESP (regno=4) is not used as index in
+* the address computation. Check is done after looking at REX.X
+* This is because R12 (regno=12) can be used as an index.
+*/
+   if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3)
+   return -EDOM;
break;
 
case REG_TYPE_BASE:
@@ -159,11 +166,19 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
goto out_err;
 
indx_offset = get_reg_offset(insn, regs, 
REG_TYPE_INDEX);
-   if (indx_offset < 0)
+   /*
+* A negative offset generally means a error, except
+* -EDOM, which means that the contents of the register
+* should not be used as index.
+*/
+   if (unlikely(indx_offset == -EDOM))
+   indx = 0;
+   else if (unlikely(indx_offset < 0))
goto out_err;
+   else
+   indx = regs_get_register(regs, indx_offset);
 
base = regs_get_register(regs, base_offset);
-   indx = regs_get_register(regs, indx_offset);
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-- 
2.9.3

[v5 01/20] x86/mpx: Use signed variables to compute effective addresses

2017-03-03 Thread Ricardo Neri

Even though memory addresses are unsigned. The operands used to compute the
effective address do have a sign. This is true for the r/m part of the
ModRM byte, the base and index parts of the SiB byte as well as the
displacement. Thus, signed variables shall be used when computing the
effective address from these operands. Once the signed effective address
has been computed, it is casted to an unsigned long to determine the
linear address.

Variables are renamed to better reflect the type of address being
computed.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Peter Zijlstra 
Cc: Nathan Howard 
Cc: Adan Hawthorn 
Cc: Joe Perches 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/mm/mpx.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 5126dfd..ff112e3 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -138,7 +138,8 @@ static int get_reg_offset(struct insn *insn, struct pt_regs 
*regs,
  */
 static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 {
-   unsigned long addr, base, indx;
+   unsigned long linear_addr;
+   long eff_addr, base, indx;
int addr_offset, base_offset, indx_offset;
insn_byte_t sib;
 
@@ -150,7 +151,7 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
goto out_err;
-   addr = regs_get_register(regs, addr_offset);
+   eff_addr = regs_get_register(regs, addr_offset);
} else {
if (insn->sib.nbytes) {
base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
@@ -163,16 +164,18 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
 
base = regs_get_register(regs, base_offset);
indx = regs_get_register(regs, indx_offset);
-   addr = base + indx * (1 << X86_SIB_SCALE(sib));
+   eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
goto out_err;
-   addr = regs_get_register(regs, addr_offset);
+   eff_addr = regs_get_register(regs, addr_offset);
}
-   addr += insn->displacement.value;
+   eff_addr += insn->displacement.value;
}
-   return (void __user *)addr;
+   linear_addr = (unsigned long)eff_addr;
+
+   return (void __user *)linear_addr;
 out_err:
return (void __user *)-1;
 }
-- 
2.9.3

[v5 07/20] x86/insn-eval: Add utility function to get segment descriptor

2017-03-03 Thread Ricardo Neri

The segment descriptor contains information that is relevant to how linear
address need to be computed. It contains the default size of addresses as
well as the base address of the segment. Thus, given a segment selector,
we ought look at segment descriptor to correctly calculate the linear
address.

In protected mode, the segment selector might indicate a segment
descriptor from either the global descriptor table or a local descriptor
table. Both cases are considered in this function.

This function is the initial implementation for subsequent functions that
will obtain the aforementioned attributes of the segment descriptor.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 61 
 1 file changed, 61 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 8d45df8..8608adf 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -5,9 +5,13 @@
  */
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 enum reg_type {
@@ -294,6 +298,63 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 }
 
 /**
+ * get_desc() - Obtain address of segment descriptor
+ * @seg:   Segment selector
+ * @desc:  Pointer to the selected segment descriptor
+ *
+ * Given a segment selector, obtain a memory pointer to the segment
+ * descriptor. Both global and local descriptor tables are supported.
+ * desc will contain the address of the descriptor.
+ *
+ * Return: 0 if success, -EINVAL if failure
+ */
+static int get_desc(unsigned short seg, struct desc_struct **desc)
+{
+   struct desc_ptr gdt_desc = {0, 0};
+   unsigned long desc_base;
+
+   if (!desc)
+   return -EINVAL;
+
+   desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
+
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+   if ((seg & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+   seg >>= 3;
+
+   mutex_lock(>active_mm->context.lock);
+   if (unlikely(!current->active_mm->context.ldt ||
+seg >= current->active_mm->context.ldt->size)) {
+   *desc = NULL;
+   mutex_unlock(>active_mm->context.lock);
+   return -EINVAL;
+   }
+
+   *desc = >active_mm->context.ldt->entries[seg];
+   mutex_unlock(>active_mm->context.lock);
+   return 0;
+   }
+#endif
+   native_store_gdt(_desc);
+
+   /*
+* Bits [15:3] of the segment selector contain the index. Such
+* index needs to be multiplied by 8. However, as the index
+* least significant bit is already in bit 3, we don't have
+* to perform the multiplication.
+*/
+   desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
+
+   if (desc_base > gdt_desc.size) {
+   *desc = NULL;
+   return -EINVAL;
+   }
+
+   *desc = (struct desc_struct *)(gdt_desc.address + desc_base);
+   return 0;
+}
+
+/**
  * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
  * @insn:  Instruction structure containing the ModRM byte
  * @regs:  Set of registers indicated by the ModRM byte
-- 
2.9.3

[v5 01/20] x86/mpx: Use signed variables to compute effective addresses

2017-03-03 Thread Ricardo Neri

Even though memory addresses are unsigned. The operands used to compute the
effective address do have a sign. This is true for the r/m part of the
ModRM byte, the base and index parts of the SiB byte as well as the
displacement. Thus, signed variables shall be used when computing the
effective address from these operands. Once the signed effective address
has been computed, it is casted to an unsigned long to determine the
linear address.

Variables are renamed to better reflect the type of address being
computed.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Peter Zijlstra 
Cc: Nathan Howard 
Cc: Adan Hawthorn 
Cc: Joe Perches 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/mm/mpx.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 5126dfd..ff112e3 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -138,7 +138,8 @@ static int get_reg_offset(struct insn *insn, struct pt_regs 
*regs,
  */
 static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 {
-   unsigned long addr, base, indx;
+   unsigned long linear_addr;
+   long eff_addr, base, indx;
int addr_offset, base_offset, indx_offset;
insn_byte_t sib;
 
@@ -150,7 +151,7 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
goto out_err;
-   addr = regs_get_register(regs, addr_offset);
+   eff_addr = regs_get_register(regs, addr_offset);
} else {
if (insn->sib.nbytes) {
base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
@@ -163,16 +164,18 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
 
base = regs_get_register(regs, base_offset);
indx = regs_get_register(regs, indx_offset);
-   addr = base + indx * (1 << X86_SIB_SCALE(sib));
+   eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
goto out_err;
-   addr = regs_get_register(regs, addr_offset);
+   eff_addr = regs_get_register(regs, addr_offset);
}
-   addr += insn->displacement.value;
+   eff_addr += insn->displacement.value;
}
-   return (void __user *)addr;
+   linear_addr = (unsigned long)eff_addr;
+
+   return (void __user *)linear_addr;
 out_err:
return (void __user *)-1;
 }
-- 
2.9.3

[v5 07/20] x86/insn-eval: Add utility function to get segment descriptor

2017-03-03 Thread Ricardo Neri

The segment descriptor contains information that is relevant to how linear
address need to be computed. It contains the default size of addresses as
well as the base address of the segment. Thus, given a segment selector,
we ought look at segment descriptor to correctly calculate the linear
address.

In protected mode, the segment selector might indicate a segment
descriptor from either the global descriptor table or a local descriptor
table. Both cases are considered in this function.

This function is the initial implementation for subsequent functions that
will obtain the aforementioned attributes of the segment descriptor.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 61 
 1 file changed, 61 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 8d45df8..8608adf 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -5,9 +5,13 @@
  */
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 enum reg_type {
@@ -294,6 +298,63 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 }
 
 /**
+ * get_desc() - Obtain address of segment descriptor
+ * @seg:   Segment selector
+ * @desc:  Pointer to the selected segment descriptor
+ *
+ * Given a segment selector, obtain a memory pointer to the segment
+ * descriptor. Both global and local descriptor tables are supported.
+ * desc will contain the address of the descriptor.
+ *
+ * Return: 0 if success, -EINVAL if failure
+ */
+static int get_desc(unsigned short seg, struct desc_struct **desc)
+{
+   struct desc_ptr gdt_desc = {0, 0};
+   unsigned long desc_base;
+
+   if (!desc)
+   return -EINVAL;
+
+   desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
+
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+   if ((seg & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+   seg >>= 3;
+
+   mutex_lock(>active_mm->context.lock);
+   if (unlikely(!current->active_mm->context.ldt ||
+seg >= current->active_mm->context.ldt->size)) {
+   *desc = NULL;
+   mutex_unlock(>active_mm->context.lock);
+   return -EINVAL;
+   }
+
+   *desc = >active_mm->context.ldt->entries[seg];
+   mutex_unlock(>active_mm->context.lock);
+   return 0;
+   }
+#endif
+   native_store_gdt(_desc);
+
+   /*
+* Bits [15:3] of the segment selector contain the index. Such
+* index needs to be multiplied by 8. However, as the index
+* least significant bit is already in bit 3, we don't have
+* to perform the multiplication.
+*/
+   desc_base = seg & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
+
+   if (desc_base > gdt_desc.size) {
+   *desc = NULL;
+   return -EINVAL;
+   }
+
+   *desc = (struct desc_struct *)(gdt_desc.address + desc_base);
+   return 0;
+}
+
+/**
  * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
  * @insn:  Instruction structure containing the ModRM byte
  * @regs:  Set of registers indicated by the ModRM byte
-- 
2.9.3

[v5 08/20] x86/insn-eval: Add utility function to get segment descriptor base address

2017-03-03 Thread Ricardo Neri

With segmentation, the base address of the segment descriptor is needed
to compute a linear address. The segment descriptor used in the address
computation depends on either any segment override prefixes in the in the
instruction or the default segment determined by the registers involved
in the address computation. Thus, both the instruction as well as the
register (specified as the offset from the base of pt_regs) are given as
inputs, along with a boolean variable to select between override and
default.

The segment selector is determined by get_seg_selector with the inputs
described above. Once the selector is known the base address is
determined. In protected mode, the selector is used to obtain the segment
descriptor and then its base address. If in 64-bit user mode, the segment =
base address is zero except when FS or GS are used. In virtual-8086 mode,
the base address is computed as the value of the segment selector shifted 4
positions to the left.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  2 ++
 arch/x86/lib/insn-eval.c | 66 
 2 files changed, 68 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 754211b..b201742 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -15,5 +15,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs);
 int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+   int regoff, bool use_default_seg);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 8608adf..383ca83 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -355,6 +355,72 @@ static int get_desc(unsigned short seg, struct desc_struct 
**desc)
 }
 
 /**
+ * insn_get_seg_base() - Obtain base address contained in descriptor
+ * @regs:  Set of registers containing the segment selector
+ * @insn:  Instruction structure with selector override prefixes
+ * @regoff:Operand offset, in pt_regs, of which the selector is needed
+ * @use_default_seg: Use the default segment instead of prefix overrides
+ *
+ * Obtain the base address of the segment descriptor as indicated by either
+ * any segment override prefixes contained in insn or the default segment
+ * applicable to the register indicated by regoff. regoff is specified as the
+ * offset in bytes from the base of pt_regs.
+ *
+ * Return: In protected mode, base address of the segment. It may be zero in
+ * certain cases for 64-bit builds and/or 64-bit applications. In virtual-8086
+ * mode, the segment selector shifed 4 positions to the right. -1L in case of
+ * error.
+ */
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+   int regoff, bool use_default_seg)
+{
+   struct desc_struct *desc;
+   unsigned short seg;
+   enum segment seg_type;
+   int ret;
+
+   seg_type = resolve_seg_selector(insn, regoff, use_default_seg);
+
+   seg = get_segment_selector(regs, seg_type);
+   if (seg < 0)
+   return -1L;
+
+   if (v8086_mode(regs))
+   /*
+* Base is simply the segment selector shifted 4
+* positions to the right.
+*/
+   return (unsigned long)(seg << 4);
+
+#ifdef CONFIG_X86_64
+   if (user_64bit_mode(regs)) {
+   /*
+* Only FS or GS will have a base address, the rest of
+* the segments' bases are forced to 0.
+*/
+   unsigned long base;
+
+   if (seg_type == SEG_FS)
+   rdmsrl(MSR_FS_BASE, base);
+   else if (seg_type == SEG_GS)
+   /*
+* swapgs was called at the kernel entry point. Thus,
+* MSR_KERNEL_GS_BASE will have the user-space GS base.
+*/
+   rdmsrl(MSR_KERNEL_GS_BASE, base);
+

[v5 08/20] x86/insn-eval: Add utility function to get segment descriptor base address

2017-03-03 Thread Ricardo Neri

With segmentation, the base address of the segment descriptor is needed
to compute a linear address. The segment descriptor used in the address
computation depends on either any segment override prefixes in the in the
instruction or the default segment determined by the registers involved
in the address computation. Thus, both the instruction as well as the
register (specified as the offset from the base of pt_regs) are given as
inputs, along with a boolean variable to select between override and
default.

The segment selector is determined by get_seg_selector with the inputs
described above. Once the selector is known the base address is
determined. In protected mode, the selector is used to obtain the segment
descriptor and then its base address. If in 64-bit user mode, the segment =
base address is zero except when FS or GS are used. In virtual-8086 mode,
the base address is computed as the value of the segment selector shifted 4
positions to the left.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  2 ++
 arch/x86/lib/insn-eval.c | 66 
 2 files changed, 68 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 754211b..b201742 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -15,5 +15,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs);
 int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+   int regoff, bool use_default_seg);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 8608adf..383ca83 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -355,6 +355,72 @@ static int get_desc(unsigned short seg, struct desc_struct 
**desc)
 }
 
 /**
+ * insn_get_seg_base() - Obtain base address contained in descriptor
+ * @regs:  Set of registers containing the segment selector
+ * @insn:  Instruction structure with selector override prefixes
+ * @regoff:Operand offset, in pt_regs, of which the selector is needed
+ * @use_default_seg: Use the default segment instead of prefix overrides
+ *
+ * Obtain the base address of the segment descriptor as indicated by either
+ * any segment override prefixes contained in insn or the default segment
+ * applicable to the register indicated by regoff. regoff is specified as the
+ * offset in bytes from the base of pt_regs.
+ *
+ * Return: In protected mode, base address of the segment. It may be zero in
+ * certain cases for 64-bit builds and/or 64-bit applications. In virtual-8086
+ * mode, the segment selector shifed 4 positions to the right. -1L in case of
+ * error.
+ */
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+   int regoff, bool use_default_seg)
+{
+   struct desc_struct *desc;
+   unsigned short seg;
+   enum segment seg_type;
+   int ret;
+
+   seg_type = resolve_seg_selector(insn, regoff, use_default_seg);
+
+   seg = get_segment_selector(regs, seg_type);
+   if (seg < 0)
+   return -1L;
+
+   if (v8086_mode(regs))
+   /*
+* Base is simply the segment selector shifted 4
+* positions to the right.
+*/
+   return (unsigned long)(seg << 4);
+
+#ifdef CONFIG_X86_64
+   if (user_64bit_mode(regs)) {
+   /*
+* Only FS or GS will have a base address, the rest of
+* the segments' bases are forced to 0.
+*/
+   unsigned long base;
+
+   if (seg_type == SEG_FS)
+   rdmsrl(MSR_FS_BASE, base);
+   else if (seg_type == SEG_GS)
+   /*
+* swapgs was called at the kernel entry point. Thus,
+* MSR_KERNEL_GS_BASE will have the user-space GS base.
+*/
+   rdmsrl(MSR_KERNEL_GS_BASE, base);
+   else
+   base = 0;
+   return base;
+   }
+#endif
+   ret = get_desc(seg, );
+   if (ret)
+   return -1L;
+
+   return get_desc_base(desc);
+}
+
+/**
  * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
  * @insn:  Instruction structure containing the ModRM byte

[v5 03/20] x86/mpx: Do not use R/EBP as base in the SIB byte with Mod = 0

2017-03-03 Thread Ricardo Neri

Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when a SIB byte is used and the
base of the SIB byte points to R/EBP (i.e., base = 5) and the mod part
of the ModRM byte is zero, the value of such register will not be used
as part of the address computation. To signal this, a -EDOM error is
returned to indicate callers that they should ignore the value.

Also, for this particular case, a displacement of 32-bits should follow
the SIB byte if the mod part of ModRM is equal to zero. The instruction
decoder ensures that this is the case.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Peter Zijlstra 
Cc: Nathan Howard 
Cc: Adan Hawthorn 
Cc: Joe Perches 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/mm/mpx.c | 29 ++---
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index d9e92d6..ef7eb67 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -121,6 +121,17 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 
case REG_TYPE_BASE:
regno = X86_SIB_BASE(insn->sib.value);
+   /*
+* If mod is 0 and register R/EBP (regno=5) is indicated in the
+* base part of the SIB byte, the value of such register should
+* not be used in the address computation. Also, a 32-bit
+* displacement is expected in this case; the instruction
+* decoder takes care of it. This is true for both R13 and
+* R/EBP as REX.B will not be decoded.
+*/
+   if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
+   return -EDOM;
+
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
@@ -161,16 +172,21 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
eff_addr = regs_get_register(regs, addr_offset);
} else {
if (insn->sib.nbytes) {
+   /*
+* Negative values in the base and index offset means
+* an error when decoding the SIB byte. Except -EDOM,
+* which means that the registers should not be used
+* in the address computation.
+*/
base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
-   if (base_offset < 0)
+   if (unlikely(base_offset == -EDOM))
+   base = 0;
+   else if (unlikely(base_offset < 0))
goto out_err;
+   else
+   base = regs_get_register(regs, base_offset);
 
indx_offset = get_reg_offset(insn, regs, 
REG_TYPE_INDEX);
-   /*
-* A negative offset generally means a error, except
-* -EDOM, which means that the contents of the register
-* should not be used as index.
-*/
if (unlikely(indx_offset == -EDOM))
indx = 0;
else if (unlikely(indx_offset < 0))
@@ -178,7 +194,6 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
else
indx = regs_get_register(regs, indx_offset);
 
-   base = regs_get_register(regs, base_offset);
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-- 
2.9.3

[v5 09/20] x86/insn-eval: Add functions to get default operand and address sizes

2017-03-03 Thread Ricardo Neri

These functions read the default values of the address and operand sizes
as specified in the segment descriptor. This information is determined
from the D and L bits. Hence, it can be used for both IA-32e 64-bit and
32-bit legacy modes. For virtual-8086 mode, the default address and
operand sizes are always 2 bytes.

The D bit is only meaningful for code segments. Thus, these functions
always use the code segment selector contained in regs.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  2 +
 arch/x86/lib/insn-eval.c | 80 
 2 files changed, 82 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index b201742..a0d81fc 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -15,6 +15,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs);
 int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
+unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs);
+unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs);
 unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
int regoff, bool use_default_seg);
 
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 383ca83..cda6c71 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -421,6 +421,86 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, 
struct insn *insn,
 }
 
 /**
+ * insn_get_seg_default_address_bytes - Obtain default address size of segment
+ * @regs:  Set of registers containing the segment selector
+ *
+ * Obtain the default address size as indicated in the segment descriptor
+ * selected in regs' code segment selector. In protected mode, the default
+ * address is determined by inspecting the L and D bits of the segment
+ * descriptor. In virtual-8086 mode, the default is always two bytes.
+ *
+ * Return: Default address size of segment
+ */
+unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs)
+{
+   struct desc_struct *desc;
+   unsigned short seg;
+   int ret;
+
+   if (v8086_mode(regs))
+   return 2;
+
+   seg = (unsigned short)regs->cs;
+
+   ret = get_desc(seg, );
+   if (ret)
+   return 0;
+
+   switch ((desc->l << 1) | desc->d) {
+   case 0: /* Legacy mode. 16-bit addresses. CS.L=0, CS.D=0 */
+   return 2;
+   case 1: /* Legacy mode. 32-bit addresses. CS.L=0, CS.D=1 */
+   return 4;
+   case 2: /* IA-32e 64-bit mode. 64-bit addresses. CS.L=1, CS.D=0 */
+   return 8;
+   case 3: /* Invalid setting. CS.L=1, CS.D=1 */
+   /* fall through */
+   default:
+   return 0;
+   }
+}
+
+/**
+ * insn_get_seg_default_operand_bytes - Obtain default operand size of segment
+ * @regs:  Set of registers containing the segment selector
+ *
+ * Obtain the default operand size as indicated in the segment descriptor
+ * selected in regs' code segment selector. In protected mode, the default
+ * operand size is determined by inspecting the L and D bits of the segment
+ * descriptor. In virtual-8086 mode, the default is always two bytes.
+ *
+ * Return: Default operand size of segment
+ */
+unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs)
+{
+   struct desc_struct *desc;
+   unsigned short seg;
+   int ret;
+
+   if (v8086_mode(regs))
+   return 2;
+
+   seg = (unsigned short)regs->cs;
+
+   ret = get_desc(seg, );
+   if (ret)
+   return 0;
+
+   switch ((desc->l << 1) | desc->d) {
+   case 0: /* Legacy mode. 16-bit or 8-bit operands CS.L=0, CS.D=0 */
+   return 2;
+   case 1: /* Legacy mode. 32- or 8 bit operands CS.L=0, CS.D=1 */
+   /* fall through */
+   case 2: /* IA-32e 64-bit mode. 32- or 8-bit opnds. CS.L=1, CS.D=0 */
+   return 4;
+   case 3: /* Invalid setting. CS.L=1, CS.D=1 */
+   /* fall through */
+   default:
+   return 0;
+   }
+}

[v5 10/20] x86/insn-eval: Do not use R/EBP as base if mod in ModRM is zero

2017-03-03 Thread Ricardo Neri

Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when the mod part of the ModRM
byte is zero and R/EBP is specified in the R/M part of such bit, the value
of the aforementioned register should not be used in the address
computation. Instead, a 32-bit displacement is expected. The instruction
decoder takes care of setting the displacement to the expected value.
Returning -EDOM signals callers that they should ignore the value of such
register when computing the address encoded in the instruction operands.

Also, callers should exercise care to correctly interpret this particular
case. In IA-32e 64-bit mode, the address is given by the displacement plus
the value of the RIP. In IA-32e compatibility mode, the value of EIP is
ignored. This correction is done for our insn_get_addr_ref.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index cda6c71..ea10b03 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -250,6 +250,14 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
switch (type) {
case REG_TYPE_RM:
regno = X86_MODRM_RM(insn->modrm.value);
+   /* if mod=0, register R/EBP is not used in the address
+* computation. Instead, a 32-bit displacement is expected;
+* the instruction decoder takes care of reading such
+* displacement. This is true for both R/EBP and R13, as the
+* REX.B bit is not decoded.
+*/
+   if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
+   return -EDOM;
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
@@ -599,9 +607,22 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-   if (addr_offset < 0)
+   /* -EDOM means that we must ignore the address_offset.
+* The only case in which we see this value is when
+* R/M points to R/EBP. In such a case, in 64-bit mode
+* the effective address is relative to tho RIP.
+*/
+   if (addr_offset == -EDOM) {
+   eff_addr = 0;
+#ifdef CONFIG_X86_64
+   if (user_64bit_mode(regs))
+   eff_addr = (long)regs->ip;
+#endif
+   } else if (addr_offset < 0) {
goto out_err;
-   eff_addr = regs_get_register(regs, addr_offset);
+   } else {
+   eff_addr = regs_get_register(regs, addr_offset);
+   }
}
eff_addr += insn->displacement.value;
}
-- 
2.9.3

[v5 03/20] x86/mpx: Do not use R/EBP as base in the SIB byte with Mod = 0

2017-03-03 Thread Ricardo Neri

Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when a SIB byte is used and the
base of the SIB byte points to R/EBP (i.e., base = 5) and the mod part
of the ModRM byte is zero, the value of such register will not be used
as part of the address computation. To signal this, a -EDOM error is
returned to indicate callers that they should ignore the value.

Also, for this particular case, a displacement of 32-bits should follow
the SIB byte if the mod part of ModRM is equal to zero. The instruction
decoder ensures that this is the case.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Peter Zijlstra 
Cc: Nathan Howard 
Cc: Adan Hawthorn 
Cc: Joe Perches 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/mm/mpx.c | 29 ++---
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index d9e92d6..ef7eb67 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -121,6 +121,17 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 
case REG_TYPE_BASE:
regno = X86_SIB_BASE(insn->sib.value);
+   /*
+* If mod is 0 and register R/EBP (regno=5) is indicated in the
+* base part of the SIB byte, the value of such register should
+* not be used in the address computation. Also, a 32-bit
+* displacement is expected in this case; the instruction
+* decoder takes care of it. This is true for both R13 and
+* R/EBP as REX.B will not be decoded.
+*/
+   if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
+   return -EDOM;
+
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
@@ -161,16 +172,21 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
eff_addr = regs_get_register(regs, addr_offset);
} else {
if (insn->sib.nbytes) {
+   /*
+* Negative values in the base and index offset means
+* an error when decoding the SIB byte. Except -EDOM,
+* which means that the registers should not be used
+* in the address computation.
+*/
base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
-   if (base_offset < 0)
+   if (unlikely(base_offset == -EDOM))
+   base = 0;
+   else if (unlikely(base_offset < 0))
goto out_err;
+   else
+   base = regs_get_register(regs, base_offset);
 
indx_offset = get_reg_offset(insn, regs, 
REG_TYPE_INDEX);
-   /*
-* A negative offset generally means a error, except
-* -EDOM, which means that the contents of the register
-* should not be used as index.
-*/
if (unlikely(indx_offset == -EDOM))
indx = 0;
else if (unlikely(indx_offset < 0))
@@ -178,7 +194,6 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
else
indx = regs_get_register(regs, indx_offset);
 
-   base = regs_get_register(regs, base_offset);
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-- 
2.9.3

[v5 09/20] x86/insn-eval: Add functions to get default operand and address sizes

2017-03-03 Thread Ricardo Neri

These functions read the default values of the address and operand sizes
as specified in the segment descriptor. This information is determined
from the D and L bits. Hence, it can be used for both IA-32e 64-bit and
32-bit legacy modes. For virtual-8086 mode, the default address and
operand sizes are always 2 bytes.

The D bit is only meaningful for code segments. Thus, these functions
always use the code segment selector contained in regs.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  2 +
 arch/x86/lib/insn-eval.c | 80 
 2 files changed, 82 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index b201742..a0d81fc 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -15,6 +15,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs);
 int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
 int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
+unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs);
+unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs);
 unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
int regoff, bool use_default_seg);
 
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 383ca83..cda6c71 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -421,6 +421,86 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, 
struct insn *insn,
 }
 
 /**
+ * insn_get_seg_default_address_bytes - Obtain default address size of segment
+ * @regs:  Set of registers containing the segment selector
+ *
+ * Obtain the default address size as indicated in the segment descriptor
+ * selected in regs' code segment selector. In protected mode, the default
+ * address is determined by inspecting the L and D bits of the segment
+ * descriptor. In virtual-8086 mode, the default is always two bytes.
+ *
+ * Return: Default address size of segment
+ */
+unsigned char insn_get_seg_default_address_bytes(struct pt_regs *regs)
+{
+   struct desc_struct *desc;
+   unsigned short seg;
+   int ret;
+
+   if (v8086_mode(regs))
+   return 2;
+
+   seg = (unsigned short)regs->cs;
+
+   ret = get_desc(seg, );
+   if (ret)
+   return 0;
+
+   switch ((desc->l << 1) | desc->d) {
+   case 0: /* Legacy mode. 16-bit addresses. CS.L=0, CS.D=0 */
+   return 2;
+   case 1: /* Legacy mode. 32-bit addresses. CS.L=0, CS.D=1 */
+   return 4;
+   case 2: /* IA-32e 64-bit mode. 64-bit addresses. CS.L=1, CS.D=0 */
+   return 8;
+   case 3: /* Invalid setting. CS.L=1, CS.D=1 */
+   /* fall through */
+   default:
+   return 0;
+   }
+}
+
+/**
+ * insn_get_seg_default_operand_bytes - Obtain default operand size of segment
+ * @regs:  Set of registers containing the segment selector
+ *
+ * Obtain the default operand size as indicated in the segment descriptor
+ * selected in regs' code segment selector. In protected mode, the default
+ * operand size is determined by inspecting the L and D bits of the segment
+ * descriptor. In virtual-8086 mode, the default is always two bytes.
+ *
+ * Return: Default operand size of segment
+ */
+unsigned char insn_get_seg_default_operand_bytes(struct pt_regs *regs)
+{
+   struct desc_struct *desc;
+   unsigned short seg;
+   int ret;
+
+   if (v8086_mode(regs))
+   return 2;
+
+   seg = (unsigned short)regs->cs;
+
+   ret = get_desc(seg, );
+   if (ret)
+   return 0;
+
+   switch ((desc->l << 1) | desc->d) {
+   case 0: /* Legacy mode. 16-bit or 8-bit operands CS.L=0, CS.D=0 */
+   return 2;
+   case 1: /* Legacy mode. 32- or 8 bit operands CS.L=0, CS.D=1 */
+   /* fall through */
+   case 2: /* IA-32e 64-bit mode. 32- or 8-bit opnds. CS.L=1, CS.D=0 */
+   return 4;
+   case 3: /* Invalid setting. CS.L=1, CS.D=1 */
+   /* fall through */
+   default:
+   return 0;
+   }
+}
+
+/**
  * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
  * @insn:  Instruction structure containing the ModRM byte
  * @regs:  Set of registers indicated by the ModRM byte
-- 
2.9.3

[v5 10/20] x86/insn-eval: Do not use R/EBP as base if mod in ModRM is zero

2017-03-03 Thread Ricardo Neri

Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when the mod part of the ModRM
byte is zero and R/EBP is specified in the R/M part of such bit, the value
of the aforementioned register should not be used in the address
computation. Instead, a 32-bit displacement is expected. The instruction
decoder takes care of setting the displacement to the expected value.
Returning -EDOM signals callers that they should ignore the value of such
register when computing the address encoded in the instruction operands.

Also, callers should exercise care to correctly interpret this particular
case. In IA-32e 64-bit mode, the address is given by the displacement plus
the value of the RIP. In IA-32e compatibility mode, the value of EIP is
ignored. This correction is done for our insn_get_addr_ref.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index cda6c71..ea10b03 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -250,6 +250,14 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
switch (type) {
case REG_TYPE_RM:
regno = X86_MODRM_RM(insn->modrm.value);
+   /* if mod=0, register R/EBP is not used in the address
+* computation. Instead, a 32-bit displacement is expected;
+* the instruction decoder takes care of reading such
+* displacement. This is true for both R/EBP and R13, as the
+* REX.B bit is not decoded.
+*/
+   if (regno == 5 && X86_MODRM_MOD(insn->modrm.value) == 0)
+   return -EDOM;
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
@@ -599,9 +607,22 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-   if (addr_offset < 0)
+   /* -EDOM means that we must ignore the address_offset.
+* The only case in which we see this value is when
+* R/M points to R/EBP. In such a case, in 64-bit mode
+* the effective address is relative to tho RIP.
+*/
+   if (addr_offset == -EDOM) {
+   eff_addr = 0;
+#ifdef CONFIG_X86_64
+   if (user_64bit_mode(regs))
+   eff_addr = (long)regs->ip;
+#endif
+   } else if (addr_offset < 0) {
goto out_err;
-   eff_addr = regs_get_register(regs, addr_offset);
+   } else {
+   eff_addr = regs_get_register(regs, addr_offset);
+   }
}
eff_addr += insn->displacement.value;
}
-- 
2.9.3

[v5 15/20] x86/cpufeature: Add User-Mode Instruction Prevention definitions

2017-03-03 Thread Ricardo Neri

User-Mode Instruction Prevention is a security feature present in new
Intel processors that, when set, prevents the execution of a subset of
instructions if such instructions are executed in user mode (CPL > 0).
Attempting to execute such instructions causes a general protection
exception.

The subset of instructions comprises:

 * SGDT - Store Global Descriptor Table
 * SIDT - Store Interrupt Descriptor Table
 * SLDT - Store Local Descriptor Table
 * SMSW - Store Machine Status Word
 * STR  - Store Task Register

This feature is also added to the list of disabled-features to allow
a cleaner handling of build-time configuration.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: Alexandre Julliard 
Cc: Stas Sergeev 
Cc: x...@kernel.org
Cc: linux-ms...@vger.kernel.org

Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/cpufeatures.h  | 1 +
 arch/x86/include/asm/disabled-features.h| 8 +++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 4e77723..0739f1e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -286,6 +286,7 @@
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (ecx), word 16 */
 #define X86_FEATURE_AVX512VBMI  (16*32+ 1) /* AVX512 Vector Bit Manipulation 
instructions*/
+#define X86_FEATURE_UMIP   (16*32+ 2) /* User Mode Instruction Protection 
*/
 #define X86_FEATURE_PKU(16*32+ 3) /* Protection Keys for 
Userspace */
 #define X86_FEATURE_OSPKE  (16*32+ 4) /* OS Protection Keys Enable */
 #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW 
*/
diff --git a/arch/x86/include/asm/disabled-features.h 
b/arch/x86/include/asm/disabled-features.h
index 85599ad..4707445 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -16,6 +16,12 @@
 # define DISABLE_MPX   (1<<(X86_FEATURE_MPX & 31))
 #endif
 
+#ifdef CONFIG_X86_INTEL_UMIP
+# define DISABLE_UMIP  0
+#else
+# define DISABLE_UMIP  (1<<(X86_FEATURE_UMIP & 31))
+#endif
+
 #ifdef CONFIG_X86_64
 # define DISABLE_VME   (1<<(X86_FEATURE_VME & 31))
 # define DISABLE_K6_MTRR   (1<<(X86_FEATURE_K6_MTRR & 31))
@@ -55,7 +61,7 @@
 #define DISABLED_MASK130
 #define DISABLED_MASK140
 #define DISABLED_MASK150
-#define DISABLED_MASK16(DISABLE_PKU|DISABLE_OSPKE)
+#define DISABLED_MASK16(DISABLE_PKU|DISABLE_OSPKE|DISABLE_UMIP)
 #define DISABLED_MASK170
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
 
diff --git a/arch/x86/include/uapi/asm/processor-flags.h 
b/arch/x86/include/uapi/asm/processor-flags.h
index 567de50..d2c2af8 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -104,6 +104,8 @@
 #define X86_CR4_OSFXSR _BITUL(X86_CR4_OSFXSR_BIT)
 #define X86_CR4_OSXMMEXCPT_BIT 10 /* enable unmasked SSE exceptions */
 #define X86_CR4_OSXMMEXCPT _BITUL(X86_CR4_OSXMMEXCPT_BIT)
+#define X86_CR4_UMIP_BIT   11 /* enable UMIP support */
+#define X86_CR4_UMIP   _BITUL(X86_CR4_UMIP_BIT)
 #define X86_CR4_VMXE_BIT   13 /* enable VMX virtualization */
 #define X86_CR4_VMXE   _BITUL(X86_CR4_VMXE_BIT)
 #define X86_CR4_SMXE_BIT   14 /* enable safer mode (TXT) */
-- 
2.9.3

[v5 15/20] x86/cpufeature: Add User-Mode Instruction Prevention definitions

2017-03-03 Thread Ricardo Neri

User-Mode Instruction Prevention is a security feature present in new
Intel processors that, when set, prevents the execution of a subset of
instructions if such instructions are executed in user mode (CPL > 0).
Attempting to execute such instructions causes a general protection
exception.

The subset of instructions comprises:

 * SGDT - Store Global Descriptor Table
 * SIDT - Store Interrupt Descriptor Table
 * SLDT - Store Local Descriptor Table
 * SMSW - Store Machine Status Word
 * STR  - Store Task Register

This feature is also added to the list of disabled-features to allow
a cleaner handling of build-time configuration.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: Alexandre Julliard 
Cc: Stas Sergeev 
Cc: x...@kernel.org
Cc: linux-ms...@vger.kernel.org

Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/cpufeatures.h  | 1 +
 arch/x86/include/asm/disabled-features.h| 8 +++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 4e77723..0739f1e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -286,6 +286,7 @@
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (ecx), word 16 */
 #define X86_FEATURE_AVX512VBMI  (16*32+ 1) /* AVX512 Vector Bit Manipulation 
instructions*/
+#define X86_FEATURE_UMIP   (16*32+ 2) /* User Mode Instruction Protection 
*/
 #define X86_FEATURE_PKU(16*32+ 3) /* Protection Keys for 
Userspace */
 #define X86_FEATURE_OSPKE  (16*32+ 4) /* OS Protection Keys Enable */
 #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW 
*/
diff --git a/arch/x86/include/asm/disabled-features.h 
b/arch/x86/include/asm/disabled-features.h
index 85599ad..4707445 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -16,6 +16,12 @@
 # define DISABLE_MPX   (1<<(X86_FEATURE_MPX & 31))
 #endif
 
+#ifdef CONFIG_X86_INTEL_UMIP
+# define DISABLE_UMIP  0
+#else
+# define DISABLE_UMIP  (1<<(X86_FEATURE_UMIP & 31))
+#endif
+
 #ifdef CONFIG_X86_64
 # define DISABLE_VME   (1<<(X86_FEATURE_VME & 31))
 # define DISABLE_K6_MTRR   (1<<(X86_FEATURE_K6_MTRR & 31))
@@ -55,7 +61,7 @@
 #define DISABLED_MASK130
 #define DISABLED_MASK140
 #define DISABLED_MASK150
-#define DISABLED_MASK16(DISABLE_PKU|DISABLE_OSPKE)
+#define DISABLED_MASK16(DISABLE_PKU|DISABLE_OSPKE|DISABLE_UMIP)
 #define DISABLED_MASK170
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
 
diff --git a/arch/x86/include/uapi/asm/processor-flags.h 
b/arch/x86/include/uapi/asm/processor-flags.h
index 567de50..d2c2af8 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -104,6 +104,8 @@
 #define X86_CR4_OSFXSR _BITUL(X86_CR4_OSFXSR_BIT)
 #define X86_CR4_OSXMMEXCPT_BIT 10 /* enable unmasked SSE exceptions */
 #define X86_CR4_OSXMMEXCPT _BITUL(X86_CR4_OSXMMEXCPT_BIT)
+#define X86_CR4_UMIP_BIT   11 /* enable UMIP support */
+#define X86_CR4_UMIP   _BITUL(X86_CR4_UMIP_BIT)
 #define X86_CR4_VMXE_BIT   13 /* enable VMX virtualization */
 #define X86_CR4_VMXE   _BITUL(X86_CR4_VMXE_BIT)
 #define X86_CR4_SMXE_BIT   14 /* enable safer mode (TXT) */
-- 
2.9.3

[v5 11/20] insn/eval: Incorporate segment base in address computation

2017-03-03 Thread Ricardo Neri

insn_get_addr_ref returns the effective address as defined by the
section 3.7.5.1 Vol 1 of the Intel 64 and IA-32 Architectures Software
Developer's Manual. In order to compute the linear address, we must add
to the effective address the segment base address as set in the segment
descriptor. Furthermore, the segment descriptor to use depends on the
register that is used as the base of the effective address. The effective
base address varies depending on whether the operand is a register or a
memory address and on whether a SiB byte is used.

In most cases, the segment base address will be 0 if the USER_DS/USER32_DS
segment is used or if segmentation is not used. However, the base address
is not necessarily zero if a user programs defines its own segments. This
is possible by using a local descriptor table.

Since the effective address is a signed quantity, the unsigned segment
base address saved in a separate variable and added to the final effective
address.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index ea10b03..edb360f 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -566,7 +566,7 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct 
pt_regs *regs)
  */
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 {
-   unsigned long linear_addr;
+   unsigned long linear_addr, seg_base_addr;
long eff_addr, base, indx;
int addr_offset, base_offset, indx_offset;
insn_byte_t sib;
@@ -580,6 +580,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
if (addr_offset < 0)
goto out_err;
eff_addr = regs_get_register(regs, addr_offset);
+   seg_base_addr = insn_get_seg_base(regs, insn, addr_offset,
+ false);
} else {
if (insn->sib.nbytes) {
/*
@@ -605,6 +607,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
indx = regs_get_register(regs, indx_offset);
 
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
+   seg_base_addr = insn_get_seg_base(regs, insn,
+ base_offset, false);
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
/* -EDOM means that we must ignore the address_offset.
@@ -623,10 +627,12 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
} else {
eff_addr = regs_get_register(regs, addr_offset);
}
+   seg_base_addr = insn_get_seg_base(regs, insn,
+ addr_offset, false);
}
eff_addr += insn->displacement.value;
}
-   linear_addr = (unsigned long)eff_addr;
+   linear_addr = (unsigned long)eff_addr + seg_base_addr;
 
return (void __user *)linear_addr;
 out_err:
-- 
2.9.3

[v5 11/20] insn/eval: Incorporate segment base in address computation

2017-03-03 Thread Ricardo Neri

insn_get_addr_ref returns the effective address as defined by the
section 3.7.5.1 Vol 1 of the Intel 64 and IA-32 Architectures Software
Developer's Manual. In order to compute the linear address, we must add
to the effective address the segment base address as set in the segment
descriptor. Furthermore, the segment descriptor to use depends on the
register that is used as the base of the effective address. The effective
base address varies depending on whether the operand is a register or a
memory address and on whether a SiB byte is used.

In most cases, the segment base address will be 0 if the USER_DS/USER32_DS
segment is used or if segmentation is not used. However, the base address
is not necessarily zero if a user programs defines its own segments. This
is possible by using a local descriptor table.

Since the effective address is a signed quantity, the unsigned segment
base address saved in a separate variable and added to the final effective
address.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index ea10b03..edb360f 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -566,7 +566,7 @@ int insn_get_reg_offset_sib_index(struct insn *insn, struct 
pt_regs *regs)
  */
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 {
-   unsigned long linear_addr;
+   unsigned long linear_addr, seg_base_addr;
long eff_addr, base, indx;
int addr_offset, base_offset, indx_offset;
insn_byte_t sib;
@@ -580,6 +580,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
if (addr_offset < 0)
goto out_err;
eff_addr = regs_get_register(regs, addr_offset);
+   seg_base_addr = insn_get_seg_base(regs, insn, addr_offset,
+ false);
} else {
if (insn->sib.nbytes) {
/*
@@ -605,6 +607,8 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
indx = regs_get_register(regs, indx_offset);
 
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
+   seg_base_addr = insn_get_seg_base(regs, insn,
+ base_offset, false);
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
/* -EDOM means that we must ignore the address_offset.
@@ -623,10 +627,12 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
} else {
eff_addr = regs_get_register(regs, addr_offset);
}
+   seg_base_addr = insn_get_seg_base(regs, insn,
+ addr_offset, false);
}
eff_addr += insn->displacement.value;
}
-   linear_addr = (unsigned long)eff_addr;
+   linear_addr = (unsigned long)eff_addr + seg_base_addr;
 
return (void __user *)linear_addr;
 out_err:
-- 
2.9.3

[v5 14/20] x86/insn-eval: Add wrapper function for 16-bit and 32-bit address encodings

2017-03-03 Thread Ricardo Neri

Convert the function insn_get_add_ref into a wrapper function that calls
the correct static address-decoding function depending on the size of the
address. In this way, callers do not need to worry about calling the
correct function and decreases the number of functions that need to be
exposed.

To this end, the original 32/64-bit insn_get_addr_ref is renamed as
insn_get_addr_ref_32_64 to reflect the type of address encodings that it
handles.

Documentation is added to the new wrapper function and the documentation
for the 32/64-bit address decoding function is improved.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 45 -
 1 file changed, 40 insertions(+), 5 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index cb1076d..e633588 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -705,12 +705,21 @@ static inline long __to_signed_long(unsigned long val, 
int long_bytes)
 #endif
 }
 
-/*
- * return the address being referenced be instruction
- * for rm=3 returning the content of the rm reg
- * for rm!=3 calculates the address using SIB and Disp
+/**
+ * insn_get_addr_ref_32_64 - Obtain a 32/64-bit address referred by instruction
+ * @insn:  Instruction struct with ModRM and SiB bytes and displacement
+ * @regs:  Set of registers referred by the instruction
+ *
+ * This function is to be used with 32-bit and 64-bit address encodings. Obtain
+ * the memory address referred by the instruction's ModRM bytes and
+ * displacement. Also, the segment used as base is determined by either any
+ * segment override prefixes in insn or the default segment of the registers
+ * involved in the linear address computation.
+ *
+ * Return: linear address referenced by instruction and registers
  */
-void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+static void __user *insn_get_addr_ref_32_64(struct insn *insn,
+   struct pt_regs *regs)
 {
unsigned long linear_addr, seg_base_addr;
long eff_addr, base, indx, tmp;
@@ -795,3 +804,29 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
 out_err:
return (void __user *)-1;
 }
+
+/**
+ * insn_get_addr_ref - Obtain the linear address referred by instruction
+ * @insn:  Instruction structure containing ModRM byte and displacement
+ * @regs:  Set of registers referred by the instruction
+ *
+ * Obtain the memory address referred by the instruction's ModRM bytes and
+ * displacement. Also, the segment used as base is determined by either any
+ * segment override prefixes in insn or the default segment of the registers
+ * involved in the address computation.
+ *
+ * Return: linear address referenced by instruction and registers
+ */
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+{
+   switch (insn->addr_bytes) {
+   case 2:
+   return insn_get_addr_ref_16(insn, regs);
+   case 4:
+   /* fall through */
+   case 8:
+   return insn_get_addr_ref_32_64(insn, regs);
+   default:
+   return (void __user *)-1;
+   }
+}
-- 
2.9.3

[v5 13/20] x86/insn-eval: Add support to resolve 16-bit addressing encodings

2017-03-03 Thread Ricardo Neri

Tasks running in virtual-8086 mode or in protected mode with code
segment descriptors that specify 16-bit default address sizes via the
D bit will use 16-bit addressing form encodings as described in the Intel
64 and IA-32 Architecture Software Developer's Manual Volume 2A Section
2.1.5. 16-bit addressing encodings differ in several ways from the
32-bit/64-bit addressing form encodings: the r/m part of the ModRM byte
points to different registers and, in some cases, addresses can be
indicated by the addition of the value of two registers. Also, there is
no support for SiB bytes. Thus, a separate function is needed to parse
this form of addressing.

A couple of functions are introduced. get_reg_offset_16 obtains the
offset from the base of pt_regs of the registers indicated by the ModRM
byte of the address encoding. insn_get_addr_ref_16 computes the linear
address indicated by the instructions using the value of the registers
given by ModRM as well as the base address of the segment.

Lastly, the original function insn_get_addr_ref is renamed as
insn_get_addr_ref_32_64. A new insn_get_addr_ref function decides what
type of address decoding must be done base on the number of address bytes
given by the instruction. Documentation for insn_get_addr_ref_32_64 is
also improved.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 137 +++
 1 file changed, 137 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index a9a1704..cb1076d 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -306,6 +306,73 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 }
 
 /**
+ * get_reg_offset_16 - Obtain offset of register indicated by instruction
+ * @insn:  Instruction structure containing ModRM and SiB bytes
+ * @regs:  Set of registers referred by the instruction
+ * @offs1: Offset of the first operand register
+ * @offs2: Offset of the second opeand register, if applicable.
+ *
+ * Obtain the offset, in pt_regs, of the registers indicated by the ModRM byte
+ * within insn. This function is to be used with 16-bit address encodings. The
+ * offs1 and offs2 will be written with the offset of the two registers
+ * indicated by the instruction. In cases where any of the registers is not
+ * referenced by the instruction, the value will be set to -EDOM.
+ *
+ * Return: 0 on success, -EINVAL on failure.
+ */
+static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs,
+int *offs1, int *offs2)
+{
+   /* 16-bit addressing can use one or two registers */
+   static const int regoff1[] = {
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, bx),
+   };
+
+   static const int regoff2[] = {
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   -EDOM,
+   -EDOM,
+   -EDOM,
+   -EDOM,
+   };
+
+   if (!offs1 || !offs2)
+   return -EINVAL;
+
+   /* operand is a register, use the generic function */
+   if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+   *offs1 = insn_get_reg_offset_modrm_rm(insn, regs);
+   *offs2 = -EDOM;
+   return 0;
+   }
+
+   *offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)];
+   *offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)];
+
+   /*
+* If no displacement is indicated in the mod part of the ModRM byte,
+* (mod part is 0) and the r/m part of the same byte is 6, no register
+* is used caculate the operand address. An r/m part of 6 means that
+* the second register offset is already invalid.
+*/
+   if ((X86_MODRM_MOD(insn->modrm.value) == 0) &&
+   (X86_MODRM_RM(insn->modrm.value) == 6))
+   *offs1 = -EDOM;
+
+   return 0;
+}
+
+/**
  * get_desc() - Obtain

[v5 14/20] x86/insn-eval: Add wrapper function for 16-bit and 32-bit address encodings

2017-03-03 Thread Ricardo Neri

Convert the function insn_get_add_ref into a wrapper function that calls
the correct static address-decoding function depending on the size of the
address. In this way, callers do not need to worry about calling the
correct function and decreases the number of functions that need to be
exposed.

To this end, the original 32/64-bit insn_get_addr_ref is renamed as
insn_get_addr_ref_32_64 to reflect the type of address encodings that it
handles.

Documentation is added to the new wrapper function and the documentation
for the 32/64-bit address decoding function is improved.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 45 -
 1 file changed, 40 insertions(+), 5 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index cb1076d..e633588 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -705,12 +705,21 @@ static inline long __to_signed_long(unsigned long val, 
int long_bytes)
 #endif
 }
 
-/*
- * return the address being referenced be instruction
- * for rm=3 returning the content of the rm reg
- * for rm!=3 calculates the address using SIB and Disp
+/**
+ * insn_get_addr_ref_32_64 - Obtain a 32/64-bit address referred by instruction
+ * @insn:  Instruction struct with ModRM and SiB bytes and displacement
+ * @regs:  Set of registers referred by the instruction
+ *
+ * This function is to be used with 32-bit and 64-bit address encodings. Obtain
+ * the memory address referred by the instruction's ModRM bytes and
+ * displacement. Also, the segment used as base is determined by either any
+ * segment override prefixes in insn or the default segment of the registers
+ * involved in the linear address computation.
+ *
+ * Return: linear address referenced by instruction and registers
  */
-void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+static void __user *insn_get_addr_ref_32_64(struct insn *insn,
+   struct pt_regs *regs)
 {
unsigned long linear_addr, seg_base_addr;
long eff_addr, base, indx, tmp;
@@ -795,3 +804,29 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
 out_err:
return (void __user *)-1;
 }
+
+/**
+ * insn_get_addr_ref - Obtain the linear address referred by instruction
+ * @insn:  Instruction structure containing ModRM byte and displacement
+ * @regs:  Set of registers referred by the instruction
+ *
+ * Obtain the memory address referred by the instruction's ModRM bytes and
+ * displacement. Also, the segment used as base is determined by either any
+ * segment override prefixes in insn or the default segment of the registers
+ * involved in the address computation.
+ *
+ * Return: linear address referenced by instruction and registers
+ */
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+{
+   switch (insn->addr_bytes) {
+   case 2:
+   return insn_get_addr_ref_16(insn, regs);
+   case 4:
+   /* fall through */
+   case 8:
+   return insn_get_addr_ref_32_64(insn, regs);
+   default:
+   return (void __user *)-1;
+   }
+}
-- 
2.9.3

[v5 13/20] x86/insn-eval: Add support to resolve 16-bit addressing encodings

2017-03-03 Thread Ricardo Neri

Tasks running in virtual-8086 mode or in protected mode with code
segment descriptors that specify 16-bit default address sizes via the
D bit will use 16-bit addressing form encodings as described in the Intel
64 and IA-32 Architecture Software Developer's Manual Volume 2A Section
2.1.5. 16-bit addressing encodings differ in several ways from the
32-bit/64-bit addressing form encodings: the r/m part of the ModRM byte
points to different registers and, in some cases, addresses can be
indicated by the addition of the value of two registers. Also, there is
no support for SiB bytes. Thus, a separate function is needed to parse
this form of addressing.

A couple of functions are introduced. get_reg_offset_16 obtains the
offset from the base of pt_regs of the registers indicated by the ModRM
byte of the address encoding. insn_get_addr_ref_16 computes the linear
address indicated by the instructions using the value of the registers
given by ModRM as well as the base address of the segment.

Lastly, the original function insn_get_addr_ref is renamed as
insn_get_addr_ref_32_64. A new insn_get_addr_ref function decides what
type of address decoding must be done base on the number of address bytes
given by the instruction. Documentation for insn_get_addr_ref_32_64 is
also improved.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 137 +++
 1 file changed, 137 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index a9a1704..cb1076d 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -306,6 +306,73 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 }
 
 /**
+ * get_reg_offset_16 - Obtain offset of register indicated by instruction
+ * @insn:  Instruction structure containing ModRM and SiB bytes
+ * @regs:  Set of registers referred by the instruction
+ * @offs1: Offset of the first operand register
+ * @offs2: Offset of the second opeand register, if applicable.
+ *
+ * Obtain the offset, in pt_regs, of the registers indicated by the ModRM byte
+ * within insn. This function is to be used with 16-bit address encodings. The
+ * offs1 and offs2 will be written with the offset of the two registers
+ * indicated by the instruction. In cases where any of the registers is not
+ * referenced by the instruction, the value will be set to -EDOM.
+ *
+ * Return: 0 on success, -EINVAL on failure.
+ */
+static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs,
+int *offs1, int *offs2)
+{
+   /* 16-bit addressing can use one or two registers */
+   static const int regoff1[] = {
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, bx),
+   };
+
+   static const int regoff2[] = {
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   -EDOM,
+   -EDOM,
+   -EDOM,
+   -EDOM,
+   };
+
+   if (!offs1 || !offs2)
+   return -EINVAL;
+
+   /* operand is a register, use the generic function */
+   if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+   *offs1 = insn_get_reg_offset_modrm_rm(insn, regs);
+   *offs2 = -EDOM;
+   return 0;
+   }
+
+   *offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)];
+   *offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)];
+
+   /*
+* If no displacement is indicated in the mod part of the ModRM byte,
+* (mod part is 0) and the r/m part of the same byte is 6, no register
+* is used caculate the operand address. An r/m part of 6 means that
+* the second register offset is already invalid.
+*/
+   if ((X86_MODRM_MOD(insn->modrm.value) == 0) &&
+   (X86_MODRM_RM(insn->modrm.value) == 6))
+   *offs1 = -EDOM;
+
+   return 0;
+}
+
+/**
  * get_desc() - Obtain address of segment descriptor
  * @seg:   Segment selector
  * @desc:  Pointer to the selected segment descriptor
@@ -559,6 +626,76 @@ int insn_get_reg_offset_sib_index(struct insn *insn, 
struct pt_regs *regs)
return get_reg_offset(insn, regs, REG_TYPE_INDEX);
 }
 
+/**
+ * insn_get_addr_ref_16 - Obtain the 16-bit address referred by

[v5 12/20] x86/insn: Support both signed 32-bit and 64-bit effective addresses

2017-03-03 Thread Ricardo Neri

The 32-bit and 64-bit address encodings are identical. This means that we
can use the same function in both cases. In order to reuse the function for
32-bit address encodings, we must sign-extend our 32-bit signed operands to
64-bit signed variables (only for 64-bit builds). To decide on whether sign
extension is needed, we rely on the address size as given by the
instruction structure.

Lastly, before computing the linear address, we must truncate our signed
64-bit signed effective address if the address size is 32-bit.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 44 
 1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index edb360f..a9a1704 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -559,6 +559,15 @@ int insn_get_reg_offset_sib_index(struct insn *insn, 
struct pt_regs *regs)
return get_reg_offset(insn, regs, REG_TYPE_INDEX);
 }
 
+static inline long __to_signed_long(unsigned long val, int long_bytes)
+{
+#ifdef CONFIG_X86_64
+   return long_bytes == 4 ? (long)((int)((val) & 0x)) : (long)val;
+#else
+   return (long)val;
+#endif
+}
+
 /*
  * return the address being referenced be instruction
  * for rm=3 returning the content of the rm reg
@@ -567,19 +576,21 @@ int insn_get_reg_offset_sib_index(struct insn *insn, 
struct pt_regs *regs)
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 {
unsigned long linear_addr, seg_base_addr;
-   long eff_addr, base, indx;
-   int addr_offset, base_offset, indx_offset;
+   long eff_addr, base, indx, tmp;
+   int addr_offset, base_offset, indx_offset, addr_bytes;
insn_byte_t sib;
 
insn_get_modrm(insn);
insn_get_sib(insn);
sib = insn->sib.value;
+   addr_bytes = insn->addr_bytes;
 
if (X86_MODRM_MOD(insn->modrm.value) == 3) {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
goto out_err;
-   eff_addr = regs_get_register(regs, addr_offset);
+   tmp = regs_get_register(regs, addr_offset);
+   eff_addr = __to_signed_long(tmp, addr_bytes);
seg_base_addr = insn_get_seg_base(regs, insn, addr_offset,
  false);
} else {
@@ -591,20 +602,24 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
 * in the address computation.
 */
base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
-   if (unlikely(base_offset == -EDOM))
+   if (unlikely(base_offset == -EDOM)) {
base = 0;
-   else if (unlikely(base_offset < 0))
+   } else if (unlikely(base_offset < 0)) {
goto out_err;
-   else
-   base = regs_get_register(regs, base_offset);
+   } else {
+   tmp = regs_get_register(regs, base_offset);
+   base = __to_signed_long(tmp, addr_bytes);
+   }
 
indx_offset = get_reg_offset(insn, regs, 
REG_TYPE_INDEX);
-   if (unlikely(indx_offset == -EDOM))
+   if (unlikely(indx_offset == -EDOM)) {
indx = 0;
-   else if (unlikely(indx_offset < 0))
+   } else if (unlikely(indx_offset < 0)) {
goto out_err;
-   else
-   indx = regs_get_register(regs, indx_offset);
+   } else {
+   tmp = regs_get_register(regs, indx_offset);
+   indx = __to_signed_long(tmp, addr_bytes);
+   }
 
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
seg_base_addr = insn_get_seg_base(regs, insn,
@@ -625,13 +640,18 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)

[v5 12/20] x86/insn: Support both signed 32-bit and 64-bit effective addresses

2017-03-03 Thread Ricardo Neri

The 32-bit and 64-bit address encodings are identical. This means that we
can use the same function in both cases. In order to reuse the function for
32-bit address encodings, we must sign-extend our 32-bit signed operands to
64-bit signed variables (only for 64-bit builds). To decide on whether sign
extension is needed, we rely on the address size as given by the
instruction structure.

Lastly, before computing the linear address, we must truncate our signed
64-bit signed effective address if the address size is 32-bit.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 44 
 1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index edb360f..a9a1704 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -559,6 +559,15 @@ int insn_get_reg_offset_sib_index(struct insn *insn, 
struct pt_regs *regs)
return get_reg_offset(insn, regs, REG_TYPE_INDEX);
 }
 
+static inline long __to_signed_long(unsigned long val, int long_bytes)
+{
+#ifdef CONFIG_X86_64
+   return long_bytes == 4 ? (long)((int)((val) & 0x)) : (long)val;
+#else
+   return (long)val;
+#endif
+}
+
 /*
  * return the address being referenced be instruction
  * for rm=3 returning the content of the rm reg
@@ -567,19 +576,21 @@ int insn_get_reg_offset_sib_index(struct insn *insn, 
struct pt_regs *regs)
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 {
unsigned long linear_addr, seg_base_addr;
-   long eff_addr, base, indx;
-   int addr_offset, base_offset, indx_offset;
+   long eff_addr, base, indx, tmp;
+   int addr_offset, base_offset, indx_offset, addr_bytes;
insn_byte_t sib;
 
insn_get_modrm(insn);
insn_get_sib(insn);
sib = insn->sib.value;
+   addr_bytes = insn->addr_bytes;
 
if (X86_MODRM_MOD(insn->modrm.value) == 3) {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
goto out_err;
-   eff_addr = regs_get_register(regs, addr_offset);
+   tmp = regs_get_register(regs, addr_offset);
+   eff_addr = __to_signed_long(tmp, addr_bytes);
seg_base_addr = insn_get_seg_base(regs, insn, addr_offset,
  false);
} else {
@@ -591,20 +602,24 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
 * in the address computation.
 */
base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
-   if (unlikely(base_offset == -EDOM))
+   if (unlikely(base_offset == -EDOM)) {
base = 0;
-   else if (unlikely(base_offset < 0))
+   } else if (unlikely(base_offset < 0)) {
goto out_err;
-   else
-   base = regs_get_register(regs, base_offset);
+   } else {
+   tmp = regs_get_register(regs, base_offset);
+   base = __to_signed_long(tmp, addr_bytes);
+   }
 
indx_offset = get_reg_offset(insn, regs, 
REG_TYPE_INDEX);
-   if (unlikely(indx_offset == -EDOM))
+   if (unlikely(indx_offset == -EDOM)) {
indx = 0;
-   else if (unlikely(indx_offset < 0))
+   } else if (unlikely(indx_offset < 0)) {
goto out_err;
-   else
-   indx = regs_get_register(regs, indx_offset);
+   } else {
+   tmp = regs_get_register(regs, indx_offset);
+   indx = __to_signed_long(tmp, addr_bytes);
+   }
 
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
seg_base_addr = insn_get_seg_base(regs, insn,
@@ -625,13 +640,18 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
} else if (addr_offset < 0) {
goto out_err;
} else {
-   eff_addr = regs_get_register(regs, addr_offset);
+   tmp = regs_get_register(regs, addr_offset);
+   eff_addr = __to_signed_long(tmp,

[v5 16/20] x86: Add emulation code for UMIP instructions

2017-03-03 Thread Ricardo Neri

The feature User-Mode Instruction Prevention present in recent Intel
processor prevents a group of instructions from being executed with
CPL > 0. Otherwise, a general protection fault is issued.

Rather than relaying this fault to the user space (in the form of a SIGSEGV
signal), the instructions protected by UMIP can be emulated to provide
dummy results. This allows to conserve the current kernel behavior and not
reveal the system resources that UMIP intends to protect (the global
descriptor and interrupt descriptor tables, the segment selectors of the
local descriptor table and the task state and the machine status word).

This emulation is needed because certain applications (e.g., WineHQ) rely
on this subset of instructions to function.

The instructions protected by UMIP can be split in two groups. Those who
return a kernel memory address (sgdt and sidt) and those who return a
value (sldt, str and smsw).

For the instructions that return a kernel memory address, applications
such as WineHQ rely on the result being located in the kernel memory space.
The result is emulated as a hard-coded value that, lies close to the top
of the kernel memory. The limit for the GDT and the IDT are set to zero.

The instructions sldt and str return a segment selector relative to the
base address of the global descriptor table. Since the actual address of
such table is not revealed, it makes sense to emulate the result as zero.

The instruction smsw is emulated to return the value that the register CR0
has at boot time as set in the head_32.

Care is taken to appropriately emulate the results when segmentation is
used. This is, rather than relying on USER_DS and USER_CS, the function
insn_get_addr_ref inspects the segment descriptor pointed by the registers
in pt_regs. This ensures that we correctly obtain the segment base address
and the address and operand sizes even if the user space application uses
local descriptor table.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: Alexandre Julliard 
Cc: Stas Sergeev 
Cc: x...@kernel.org
Cc: linux-ms...@vger.kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/umip.h |  15 +++
 arch/x86/kernel/Makefile|   1 +
 arch/x86/kernel/umip.c  | 264 
 3 files changed, 280 insertions(+)
 create mode 100644 arch/x86/include/asm/umip.h
 create mode 100644 arch/x86/kernel/umip.c

diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h
new file mode 100644
index 000..077b236
--- /dev/null
+++ b/arch/x86/include/asm/umip.h
@@ -0,0 +1,15 @@
+#ifndef _ASM_X86_UMIP_H
+#define _ASM_X86_UMIP_H
+
+#include 
+#include 
+
+#ifdef CONFIG_X86_INTEL_UMIP
+bool fixup_umip_exception(struct pt_regs *regs);
+#else
+static inline bool fixup_umip_exception(struct pt_regs *regs)
+{
+   return false;
+}
+#endif  /* CONFIG_X86_INTEL_UMIP */
+#endif  /* _ASM_X86_UMIP_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 84c0059..0ded7b1 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -122,6 +122,7 @@ obj-$(CONFIG_EFI)   += sysfb_efi.o
 obj-$(CONFIG_PERF_EVENTS)  += perf_regs.o
 obj-$(CONFIG_TRACING)  += tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)+= itmt.o
+obj-$(CONFIG_X86_INTEL_UMIP)   += umip.o
 
 ifdef CONFIG_FRAME_POINTER
 obj-y  += unwind_frame.o
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
new file mode 100644
index 000..e932f40
--- /dev/null
+++ b/arch/x86/kernel/umip.c
@@ -0,0 +1,264 @@
+/*
+ * umip.c Emulation for instruction protected by the Intel User-Mode
+ * Instruction Prevention. The instructions are:
+ *sgdt
+ *sldt
+ *sidt
+ *str
+ *smsw
+ *
+ * Copyright (c) 2017, Intel Corporation.
+ * Ricardo Neri 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * == Base addresses of GDT and IDT
+ * Some applications to function rely finding the global descriptor table (GDT)
+ * and the interrupt descriptor table (IDT) in kernel memory.
+ * For

[v5 16/20] x86: Add emulation code for UMIP instructions

2017-03-03 Thread Ricardo Neri

The feature User-Mode Instruction Prevention present in recent Intel
processor prevents a group of instructions from being executed with
CPL > 0. Otherwise, a general protection fault is issued.

Rather than relaying this fault to the user space (in the form of a SIGSEGV
signal), the instructions protected by UMIP can be emulated to provide
dummy results. This allows to conserve the current kernel behavior and not
reveal the system resources that UMIP intends to protect (the global
descriptor and interrupt descriptor tables, the segment selectors of the
local descriptor table and the task state and the machine status word).

This emulation is needed because certain applications (e.g., WineHQ) rely
on this subset of instructions to function.

The instructions protected by UMIP can be split in two groups. Those who
return a kernel memory address (sgdt and sidt) and those who return a
value (sldt, str and smsw).

For the instructions that return a kernel memory address, applications
such as WineHQ rely on the result being located in the kernel memory space.
The result is emulated as a hard-coded value that, lies close to the top
of the kernel memory. The limit for the GDT and the IDT are set to zero.

The instructions sldt and str return a segment selector relative to the
base address of the global descriptor table. Since the actual address of
such table is not revealed, it makes sense to emulate the result as zero.

The instruction smsw is emulated to return the value that the register CR0
has at boot time as set in the head_32.

Care is taken to appropriately emulate the results when segmentation is
used. This is, rather than relying on USER_DS and USER_CS, the function
insn_get_addr_ref inspects the segment descriptor pointed by the registers
in pt_regs. This ensures that we correctly obtain the segment base address
and the address and operand sizes even if the user space application uses
local descriptor table.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: Alexandre Julliard 
Cc: Stas Sergeev 
Cc: x...@kernel.org
Cc: linux-ms...@vger.kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/umip.h |  15 +++
 arch/x86/kernel/Makefile|   1 +
 arch/x86/kernel/umip.c  | 264 
 3 files changed, 280 insertions(+)
 create mode 100644 arch/x86/include/asm/umip.h
 create mode 100644 arch/x86/kernel/umip.c

diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h
new file mode 100644
index 000..077b236
--- /dev/null
+++ b/arch/x86/include/asm/umip.h
@@ -0,0 +1,15 @@
+#ifndef _ASM_X86_UMIP_H
+#define _ASM_X86_UMIP_H
+
+#include 
+#include 
+
+#ifdef CONFIG_X86_INTEL_UMIP
+bool fixup_umip_exception(struct pt_regs *regs);
+#else
+static inline bool fixup_umip_exception(struct pt_regs *regs)
+{
+   return false;
+}
+#endif  /* CONFIG_X86_INTEL_UMIP */
+#endif  /* _ASM_X86_UMIP_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 84c0059..0ded7b1 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -122,6 +122,7 @@ obj-$(CONFIG_EFI)   += sysfb_efi.o
 obj-$(CONFIG_PERF_EVENTS)  += perf_regs.o
 obj-$(CONFIG_TRACING)  += tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)+= itmt.o
+obj-$(CONFIG_X86_INTEL_UMIP)   += umip.o
 
 ifdef CONFIG_FRAME_POINTER
 obj-y  += unwind_frame.o
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
new file mode 100644
index 000..e932f40
--- /dev/null
+++ b/arch/x86/kernel/umip.c
@@ -0,0 +1,264 @@
+/*
+ * umip.c Emulation for instruction protected by the Intel User-Mode
+ * Instruction Prevention. The instructions are:
+ *sgdt
+ *sldt
+ *sidt
+ *str
+ *smsw
+ *
+ * Copyright (c) 2017, Intel Corporation.
+ * Ricardo Neri 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * == Base addresses of GDT and IDT
+ * Some applications to function rely finding the global descriptor table (GDT)
+ * and the interrupt descriptor table (IDT) in kernel memory.
+ * For x86_32, the selected values do not match any particular hole, but it
+ * suffices to provide a memory location within kernel memory.
+ *
+ * == CRO flags for SMSW
+ * Use the flags given when booting, as found in head_32.S
+ */
+
+#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | \
+  X86_CR0_WP | X86_CR0_AM)
+#define UMIP_DUMMY_GDT_BASE 0xfffe
+#define UMIP_DUMMY_IDT_BASE 0x
+
+/*
+ * Definitions for x86 page fault error code bits. Only a simple
+ * pagefault during a write in user

[v5 17/20] x86/umip: Force a page fault when unable to copy emulated result to user

2017-03-03 Thread Ricardo Neri

fixup_umip_exception will be called from do_general_protection. If the
former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV.
However, when emulation is successful but the emulated result cannot be
copied to user space memory, it is more accurate to issue a SIGSEGV with
SEGV_MAPERR with the offending address. A new function is inspired in
force_sig_info_fault is introduced to model the page fault.

Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/umip.c | 45 +++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index e932f40..5b6a7cf 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -170,6 +170,41 @@ static int __emulate_umip_insn(struct insn *insn, enum 
umip_insn umip_inst,
 }
 
 /**
+ * __force_sig_info_umip_fault - Force a SIGSEGV with SEGV_MAPERR
+ * @address:   Address that caused the signal
+ * @regs:  Register set containing the instruction pointer
+ *
+ * Force a SIGSEGV signal with SEGV_MAPERR as the error code. This function is
+ * intended to be used to provide a segmentation fault when the result of the
+ * UMIP emulation could not be copied to the user space memory.
+ *
+ * Return: none
+ */
+static void __force_sig_info_umip_fault(void __user *address,
+   struct pt_regs *regs)
+{
+   siginfo_t info;
+   struct task_struct *tsk = current;
+
+   if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)) {
+   printk_ratelimited("%s[%d] umip emulation segfault ip:%lx 
sp:%lx error:%lx in %lx\n",
+  tsk->comm, task_pid_nr(tsk), regs->ip,
+  regs->sp, UMIP_PF_USER | UMIP_PF_WRITE,
+  regs->ip);
+   }
+
+   tsk->thread.cr2 = (unsigned long)address;
+   tsk->thread.error_code  = UMIP_PF_USER | UMIP_PF_WRITE;
+   tsk->thread.trap_nr = X86_TRAP_PF;
+
+   info.si_signo   = SIGSEGV;
+   info.si_errno   = 0;
+   info.si_code= SEGV_MAPERR;
+   info.si_addr= address;
+   force_sig_info(SIGSEGV, , tsk);
+}
+
+/**
  * fixup_umip_exception - Fixup #GP faults caused by UMIP
  * @regs:  Registers as saved when entering the #GP trap
  *
@@ -254,8 +289,14 @@ bool fixup_umip_exception(struct pt_regs *regs)
} else {
uaddr = insn_get_addr_ref(, regs);
nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
-   if (nr_copied  > 0)
-   return false;
+   if (nr_copied  > 0) {
+   /*
+* If copy fails, send a signal and tell caller that
+* fault was fixed up
+*/
+   __force_sig_info_umip_fault(uaddr, regs);
+   return true;
+   }
}
 
/* increase IP to let the program keep going */
-- 
2.9.3

[v5 17/20] x86/umip: Force a page fault when unable to copy emulated result to user

2017-03-03 Thread Ricardo Neri

fixup_umip_exception will be called from do_general_protection. If the
former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV.
However, when emulation is successful but the emulated result cannot be
copied to user space memory, it is more accurate to issue a SIGSEGV with
SEGV_MAPERR with the offending address. A new function is inspired in
force_sig_info_fault is introduced to model the page fault.

Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/umip.c | 45 +++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index e932f40..5b6a7cf 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -170,6 +170,41 @@ static int __emulate_umip_insn(struct insn *insn, enum 
umip_insn umip_inst,
 }
 
 /**
+ * __force_sig_info_umip_fault - Force a SIGSEGV with SEGV_MAPERR
+ * @address:   Address that caused the signal
+ * @regs:  Register set containing the instruction pointer
+ *
+ * Force a SIGSEGV signal with SEGV_MAPERR as the error code. This function is
+ * intended to be used to provide a segmentation fault when the result of the
+ * UMIP emulation could not be copied to the user space memory.
+ *
+ * Return: none
+ */
+static void __force_sig_info_umip_fault(void __user *address,
+   struct pt_regs *regs)
+{
+   siginfo_t info;
+   struct task_struct *tsk = current;
+
+   if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)) {
+   printk_ratelimited("%s[%d] umip emulation segfault ip:%lx 
sp:%lx error:%lx in %lx\n",
+  tsk->comm, task_pid_nr(tsk), regs->ip,
+  regs->sp, UMIP_PF_USER | UMIP_PF_WRITE,
+  regs->ip);
+   }
+
+   tsk->thread.cr2 = (unsigned long)address;
+   tsk->thread.error_code  = UMIP_PF_USER | UMIP_PF_WRITE;
+   tsk->thread.trap_nr = X86_TRAP_PF;
+
+   info.si_signo   = SIGSEGV;
+   info.si_errno   = 0;
+   info.si_code= SEGV_MAPERR;
+   info.si_addr= address;
+   force_sig_info(SIGSEGV, , tsk);
+}
+
+/**
  * fixup_umip_exception - Fixup #GP faults caused by UMIP
  * @regs:  Registers as saved when entering the #GP trap
  *
@@ -254,8 +289,14 @@ bool fixup_umip_exception(struct pt_regs *regs)
} else {
uaddr = insn_get_addr_ref(, regs);
nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
-   if (nr_copied  > 0)
-   return false;
+   if (nr_copied  > 0) {
+   /*
+* If copy fails, send a signal and tell caller that
+* fault was fixed up
+*/
+   __force_sig_info_umip_fault(uaddr, regs);
+   return true;
+   }
}
 
/* increase IP to let the program keep going */
-- 
2.9.3

[v5 18/20] x86/traps: Fixup general protection faults caused by UMIP

2017-03-03 Thread Ricardo Neri

If the User-Mode Instruction Prevention CPU feature is available and
enabled, a general protection fault will be issued if the instructions
sgdt, sldt, sidt, str or smsw are executed from user-mode context
(CPL > 0). If the fault was caused by any of the instructions protected
by UMIP, fixup_umip_exception will emulate dummy results for these
instructions. If emulation is successful, the result is passed to the
user space program and no SIGSEGV signal is emitted.

Please note that fixup_umip_exception also caters for the case when
the fault originated while running in virtual-8086 mode.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: Alexandre Julliard 
Cc: Stas Sergeev 
Cc: x...@kernel.org
Cc: linux-ms...@vger.kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/traps.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 948443e..86efbcb 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -65,6 +65,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -492,6 +493,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
cond_local_irq_enable(regs);
 
+   if (user_mode(regs) && fixup_umip_exception(regs))
+   return;
+
if (v8086_mode(regs)) {
local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
-- 
2.9.3

[v5 18/20] x86/traps: Fixup general protection faults caused by UMIP

2017-03-03 Thread Ricardo Neri

If the User-Mode Instruction Prevention CPU feature is available and
enabled, a general protection fault will be issued if the instructions
sgdt, sldt, sidt, str or smsw are executed from user-mode context
(CPL > 0). If the fault was caused by any of the instructions protected
by UMIP, fixup_umip_exception will emulate dummy results for these
instructions. If emulation is successful, the result is passed to the
user space program and no SIGSEGV signal is emitted.

Please note that fixup_umip_exception also caters for the case when
the fault originated while running in virtual-8086 mode.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: Alexandre Julliard 
Cc: Stas Sergeev 
Cc: x...@kernel.org
Cc: linux-ms...@vger.kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/traps.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 948443e..86efbcb 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -65,6 +65,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -492,6 +493,9 @@ do_general_protection(struct pt_regs *regs, long error_code)
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
cond_local_irq_enable(regs);
 
+   if (user_mode(regs) && fixup_umip_exception(regs))
+   return;
+
if (v8086_mode(regs)) {
local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
-- 
2.9.3

[v5 20/20] selftests/x86: Add tests for User-Mode Instruction Prevention

2017-03-03 Thread Ricardo Neri

Certain user space programs that run on virtual-8086 mode may utilize
instructions protected by the User-Mode Instruction Prevention (UMIP)
security feature present in new Intel processors: SGDT, SIDT and SMSW. In
such a case, a general protection fault is issued if UMIP is enabled. When
such a fault happens, the kernel catches it and emulates the results of
these instructions with dummy values. The purpose of this new
test is to verify whether the impacted instructions can be executed without
causing such #GP. If no #GP exceptions occur, we expect to exit virtual-
8086 mode from INT 0x80.

The instructions protected by UMIP are executed in representative use
cases:
 a) the memory address of the result is given in the form of a displacement
from the base of the data segment
 b) the memory address of the result is given in a general purpose register
 c) the result is stored directly in a general purpose register.

Unfortunately, it is not possible to check the results against a set of
expected values because no emulation will occur in systems that do not have
the UMIP feature. Instead, results are printed for verification.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Signed-off-by: Ricardo Neri 
---
 tools/testing/selftests/x86/entry_from_vm86.c | 39 ++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/entry_from_vm86.c 
b/tools/testing/selftests/x86/entry_from_vm86.c
index d075ea0..377b773 100644
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -95,6 +95,22 @@ asm (
"int3\n\t"
"vmcode_int80:\n\t"
"int $0x80\n\t"
+   "umip:\n\t"
+   /* addressing via displacements */
+   "smsw (2052)\n\t"
+   "sidt (2054)\n\t"
+   "sgdt (2060)\n\t"
+   /* addressing via registers */
+   "mov $2066, %bx\n\t"
+   "smsw (%bx)\n\t"
+   "mov $2068, %bx\n\t"
+   "sidt (%bx)\n\t"
+   "mov $2074, %bx\n\t"
+   "sgdt (%bx)\n\t"
+   /* register operands, only for smsw */
+   "smsw %ax\n\t"
+   "mov %ax, (2080)\n\t"
+   "int $0x80\n\t"
".size vmcode, . - vmcode\n\t"
"end_vmcode:\n\t"
".code32\n\t"
@@ -103,7 +119,7 @@ asm (
 
 extern unsigned char vmcode[], end_vmcode[];
 extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
-   vmcode_sti[], vmcode_int3[], vmcode_int80[];
+   vmcode_sti[], vmcode_int3[], vmcode_int80[], umip[];
 
 /* Returns false if the test was skipped. */
 static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -218,6 +234,27 @@ int main(void)
v86.regs.eax = (unsigned int)-1;
do_test(, vmcode_int80 - vmcode, VM86_INTx, 0x80, "int80");
 
+   /* UMIP -- should exit with INTx 0x80 unless UMIP was not disabled */
+   do_test(, umip - vmcode, VM86_INTx, 0x80, "UMIP tests");
+   printf("[INFO]\tResults of UMIP-protected instructions via 
displacements:\n");
+   printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2052));
+   printf("[INFO]\tSIDT: limit[0x%04x]base[0x%08lx]\n",
+  *(unsigned short *)(addr + 2054),
+  *(unsigned long  *)(addr + 2056));
+   printf("[INFO]\tSGDT: limit[0x%04x]base[0x%08lx]\n",
+  *(unsigned short *)(addr + 2060),
+  *(unsigned long  *)(addr + 2062));
+   printf("[INFO]\tResults of UMIP-protected instructions via addressing 
in registers:\n");
+   printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2066));
+   printf("[INFO]\tSIDT: limit[0x%04x]base[0x%08lx]\n",
+  *(unsigned short *)(addr + 2068),
+  *(unsigned long  *)(addr + 2070));
+   printf("[INFO]\tSGDT: limit[0x%04x]base[0x%08lx]\n",
+  *(unsigned short *)(addr + 2074),
+  *(unsigned long  *)(addr + 2076));
+   printf("[INFO]\tResults of SMSW via register operands:\n");
+   printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2080));
+
/* Execute a null pointer */
v86.regs.cs = 0;
v86.regs.ss = 0;
-- 
2.9.3

[v5 05/20] x86/insn-eval: Add utility functions to get register offsets

2017-03-03 Thread Ricardo Neri

The function insn_get_reg_offset takes as argument an enumeration that
indicates the type of offset that is returned: the R/M part of the ModRM
byte, the index of the SIB byte or the base of the SIB byte. Callers of
this function would need the definition of such enumeration. This is not
needed. Instead, helper functions can be defined for this purpose can be
added. These functions are useful in cases when, for instance, the caller
needs to decide whether the operand is a register or a memory location by
looking at the mod part of the ModRM byte.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  3 +++
 arch/x86/lib/insn-eval.c | 51 
 2 files changed, 54 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 5cab1b1..754211b 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -12,5 +12,8 @@
 #include 
 
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
+int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
+int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 23cf010..78df1c9 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -98,6 +98,57 @@ static int get_reg_offset(struct insn *insn, struct pt_regs 
*regs,
return regoff[regno];
 }
 
+/**
+ * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
+ * @insn:  Instruction structure containing the ModRM byte
+ * @regs:  Set of registers indicated by the ModRM byte
+ *
+ * Obtain the register indicated by the r/m part of the ModRM byte. The
+ * register is obtained as an offset from the base of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of ModRM does not refer to a register.
+ *
+ * Return: Register indicated by r/m, as an offset within struct pt_regs
+ */
+int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs)
+{
+   return get_reg_offset(insn, regs, REG_TYPE_RM);
+}
+
+/**
+ * insn_get_reg_offset_sib_base - Obtain register in base part of SiB byte
+ * @insn:  Instruction structure containing the SiB byte
+ * @regs:  Set of registers indicated by the SiB byte
+ *
+ * Obtain the register indicated by the base part of the SiB byte. The
+ * register is obtained as an offset from the base of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of SiB does not refer to a register.
+ *
+ * Return: Register indicated by SiB's base, as an offset within struct pt_regs
+ */
+int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs)
+{
+   return get_reg_offset(insn, regs, REG_TYPE_BASE);
+}
+
+/**
+ * insn_get_reg_offset_sib_index - Obtain register in index part of SiB byte
+ * @insn:  Instruction structure containing the SiB byte
+ * @regs:  Set of registers indicated by the SiB byte
+ *
+ * Obtain the register indicated by the index part of the SiB byte. The
+ * register is obtained as an offset from the index of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of SiB does not refer to a register.
+ *
+ * Return: Register indicated by SiB's base, as an offset within struct pt_regs
+ */
+int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs)
+{
+   return get_reg_offset(insn, regs, REG_TYPE_INDEX);
+}
+
 /*
  * return the address being referenced be instruction
  * for rm=3 returning the content of the rm reg
-- 
2.9.3

[v5 20/20] selftests/x86: Add tests for User-Mode Instruction Prevention

2017-03-03 Thread Ricardo Neri

Certain user space programs that run on virtual-8086 mode may utilize
instructions protected by the User-Mode Instruction Prevention (UMIP)
security feature present in new Intel processors: SGDT, SIDT and SMSW. In
such a case, a general protection fault is issued if UMIP is enabled. When
such a fault happens, the kernel catches it and emulates the results of
these instructions with dummy values. The purpose of this new
test is to verify whether the impacted instructions can be executed without
causing such #GP. If no #GP exceptions occur, we expect to exit virtual-
8086 mode from INT 0x80.

The instructions protected by UMIP are executed in representative use
cases:
 a) the memory address of the result is given in the form of a displacement
from the base of the data segment
 b) the memory address of the result is given in a general purpose register
 c) the result is stored directly in a general purpose register.

Unfortunately, it is not possible to check the results against a set of
expected values because no emulation will occur in systems that do not have
the UMIP feature. Instead, results are printed for verification.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Signed-off-by: Ricardo Neri 
---
 tools/testing/selftests/x86/entry_from_vm86.c | 39 ++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/entry_from_vm86.c 
b/tools/testing/selftests/x86/entry_from_vm86.c
index d075ea0..377b773 100644
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -95,6 +95,22 @@ asm (
"int3\n\t"
"vmcode_int80:\n\t"
"int $0x80\n\t"
+   "umip:\n\t"
+   /* addressing via displacements */
+   "smsw (2052)\n\t"
+   "sidt (2054)\n\t"
+   "sgdt (2060)\n\t"
+   /* addressing via registers */
+   "mov $2066, %bx\n\t"
+   "smsw (%bx)\n\t"
+   "mov $2068, %bx\n\t"
+   "sidt (%bx)\n\t"
+   "mov $2074, %bx\n\t"
+   "sgdt (%bx)\n\t"
+   /* register operands, only for smsw */
+   "smsw %ax\n\t"
+   "mov %ax, (2080)\n\t"
+   "int $0x80\n\t"
".size vmcode, . - vmcode\n\t"
"end_vmcode:\n\t"
".code32\n\t"
@@ -103,7 +119,7 @@ asm (
 
 extern unsigned char vmcode[], end_vmcode[];
 extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
-   vmcode_sti[], vmcode_int3[], vmcode_int80[];
+   vmcode_sti[], vmcode_int3[], vmcode_int80[], umip[];
 
 /* Returns false if the test was skipped. */
 static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -218,6 +234,27 @@ int main(void)
v86.regs.eax = (unsigned int)-1;
do_test(, vmcode_int80 - vmcode, VM86_INTx, 0x80, "int80");
 
+   /* UMIP -- should exit with INTx 0x80 unless UMIP was not disabled */
+   do_test(, umip - vmcode, VM86_INTx, 0x80, "UMIP tests");
+   printf("[INFO]\tResults of UMIP-protected instructions via 
displacements:\n");
+   printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2052));
+   printf("[INFO]\tSIDT: limit[0x%04x]base[0x%08lx]\n",
+  *(unsigned short *)(addr + 2054),
+  *(unsigned long  *)(addr + 2056));
+   printf("[INFO]\tSGDT: limit[0x%04x]base[0x%08lx]\n",
+  *(unsigned short *)(addr + 2060),
+  *(unsigned long  *)(addr + 2062));
+   printf("[INFO]\tResults of UMIP-protected instructions via addressing 
in registers:\n");
+   printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2066));
+   printf("[INFO]\tSIDT: limit[0x%04x]base[0x%08lx]\n",
+  *(unsigned short *)(addr + 2068),
+  *(unsigned long  *)(addr + 2070));
+   printf("[INFO]\tSGDT: limit[0x%04x]base[0x%08lx]\n",
+  *(unsigned short *)(addr + 2074),
+  *(unsigned long  *)(addr + 2076));
+   printf("[INFO]\tResults of SMSW via register operands:\n");
+   printf("[INFO]\tSMSW:[0x%04x]\n", *(unsigned short *)(addr + 2080));
+
/* Execute a null pointer */
v86.regs.cs = 0;
v86.regs.ss = 0;
-- 
2.9.3

[v5 05/20] x86/insn-eval: Add utility functions to get register offsets

2017-03-03 Thread Ricardo Neri

The function insn_get_reg_offset takes as argument an enumeration that
indicates the type of offset that is returned: the R/M part of the ModRM
byte, the index of the SIB byte or the base of the SIB byte. Callers of
this function would need the definition of such enumeration. This is not
needed. Instead, helper functions can be defined for this purpose can be
added. These functions are useful in cases when, for instance, the caller
needs to decide whether the operand is a register or a memory location by
looking at the mod part of the ModRM byte.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  3 +++
 arch/x86/lib/insn-eval.c | 51 
 2 files changed, 54 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 5cab1b1..754211b 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -12,5 +12,8 @@
 #include 
 
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs);
+int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
+int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 23cf010..78df1c9 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -98,6 +98,57 @@ static int get_reg_offset(struct insn *insn, struct pt_regs 
*regs,
return regoff[regno];
 }
 
+/**
+ * insn_get_reg_offset_modrm_rm - Obtain register in r/m part of ModRM byte
+ * @insn:  Instruction structure containing the ModRM byte
+ * @regs:  Set of registers indicated by the ModRM byte
+ *
+ * Obtain the register indicated by the r/m part of the ModRM byte. The
+ * register is obtained as an offset from the base of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of ModRM does not refer to a register.
+ *
+ * Return: Register indicated by r/m, as an offset within struct pt_regs
+ */
+int insn_get_reg_offset_modrm_rm(struct insn *insn, struct pt_regs *regs)
+{
+   return get_reg_offset(insn, regs, REG_TYPE_RM);
+}
+
+/**
+ * insn_get_reg_offset_sib_base - Obtain register in base part of SiB byte
+ * @insn:  Instruction structure containing the SiB byte
+ * @regs:  Set of registers indicated by the SiB byte
+ *
+ * Obtain the register indicated by the base part of the SiB byte. The
+ * register is obtained as an offset from the base of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of SiB does not refer to a register.
+ *
+ * Return: Register indicated by SiB's base, as an offset within struct pt_regs
+ */
+int insn_get_reg_offset_sib_base(struct insn *insn, struct pt_regs *regs)
+{
+   return get_reg_offset(insn, regs, REG_TYPE_BASE);
+}
+
+/**
+ * insn_get_reg_offset_sib_index - Obtain register in index part of SiB byte
+ * @insn:  Instruction structure containing the SiB byte
+ * @regs:  Set of registers indicated by the SiB byte
+ *
+ * Obtain the register indicated by the index part of the SiB byte. The
+ * register is obtained as an offset from the index of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of SiB does not refer to a register.
+ *
+ * Return: Register indicated by SiB's base, as an offset within struct pt_regs
+ */
+int insn_get_reg_offset_sib_index(struct insn *insn, struct pt_regs *regs)
+{
+   return get_reg_offset(insn, regs, REG_TYPE_INDEX);
+}
+
 /*
  * return the address being referenced be instruction
  * for rm=3 returning the content of the rm reg
-- 
2.9.3

[v5 06/20] x86/insn-eval: Add utility functions to get segment selector

2017-03-03 Thread Ricardo Neri

When computing a linear address and segmentation is used, we need to know
the base address of the segment involved in the computation. In most of
the cases, the segment base address will be zero as in USER_DS/USER32_DS.
However, it may be possible that a user space program defines its own
segments via a local descriptor table. In such a case, the segment base
address may not be zero .Thus, the segment base address is needed to
calculate correctly the linear address.

The segment selector to be used when computing a linear address is
determined by either any of segment select override prefixes in the
instruction or inferred from the registers involved in the computation of
the effective address; in that order. Also, there are cases when the
overrides shall be ignored.

For clarity, this process can be split into two steps: resolving the
relevant segment and, once known, read the applicable segment selector.
The method to obtain the segment selector depends on several factors. In
32-bit builds, segment selectors are saved into the pt_regs structure
when switching to kernel mode. The same is also true for virtual-8086
mode. In 64-bit builds, segmentation is mostly ignored, except when
running a program in 32-bit legacy mode. In this case, CS and SS can be
obtained from pt_regs. DS, ES, FS and GS can be read directly from
registers. Lastly, segmentation is possible in 64-bit mode via FS and GS.
In these two cases, base addresses are obtained from the relevant MSRs.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 195 +++
 1 file changed, 195 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 78df1c9..8d45df8 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 enum reg_type {
REG_TYPE_RM = 0,
@@ -15,6 +16,200 @@ enum reg_type {
REG_TYPE_BASE,
 };
 
+enum segment {
+   SEG_CS = 0x23,
+   SEG_SS = 0x36,
+   SEG_DS = 0x3e,
+   SEG_ES = 0x26,
+   SEG_FS = 0x64,
+   SEG_GS = 0x65
+};
+
+/**
+ * resolve_seg_selector() - obtain segment selector
+ * @regs:  Set of registers containing the segment selector
+ * @insn:  Instruction structure with selector override prefixes
+ * @regoff:Operand offset, in pt_regs, of which the selector is needed
+ * @default:   Resolve default segment selector (i.e., ignore overrides)
+ *
+ * The segment selector to which an effective address refers depends on
+ * a) segment selector overrides instruction prefixes or b) the operand
+ * register indicated in the ModRM or SiB byte.
+ *
+ * For case a), the function inspects any prefixes in the insn instruction;
+ * insn can be null to indicate that selector override prefixes shall be
+ * ignored. This is useful when the use of prefixes is forbidden (e.g.,
+ * obtaining the code selector). For case b), the operand register shall be
+ * represented as the offset from the base address of pt_regs. Also, regoff
+ * can be -EINVAL for cases in which registers are not used as operands (e.g.,
+ * when the mod and r/m parts of the ModRM byte are 0 and 5, respectively).
+ *
+ * This function returns the segment selector to utilize as per the conditions
+ * described above. Please note that this functin does not return the value
+ * of the segment selector. The value of the segment selector needs to be
+ * obtained using get_segment_selector and passing the segment selector type
+ * resolved by this function.
+ *
+ * Return: Segment selector to use, among CS, SS, DS, ES, FS or GS.
+ */
+static int resolve_seg_selector(struct insn *insn, int regoff, bool 
get_default)
+{
+   int i;
+
+   if (!insn)
+   return -EINVAL;
+
+   if (get_default)
+   goto default_seg;
+   /*
+* Check first if we have selector overrides. Having more than
+* one selector override leads to undefined behavior. We
+* only use the first one and return
+*/
+   for (i = 0; i < insn->prefixes.nbytes; i++) {
+   switch (insn->prefixes.bytes[i]) {
+   case SEG_CS:
+   return SEG_CS;
+   case SEG_SS:
+   return SEG_SS;
+   case SEG_DS:
+

[v5 19/20] x86: Enable User-Mode Instruction Prevention

2017-03-03 Thread Ricardo Neri

User_mode Instruction Prevention (UMIP) is enabled by setting/clearing a
bit in %cr4.

It makes sense to enable UMIP at some point while booting, before user
spaces come up. Like SMAP and SMEP, is not critical to have it enabled
very early during boot. This is because UMIP is relevant only when there is
a userspace to be protected from. Given the similarities in relevance, it
makes sense to enable UMIP along with SMAP and SMEP.

UMIP is enabled by default. It can be disabled by adding clearcpuid=514
to the kernel parameters.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: Alexandre Julliard 
Cc: Stas Sergeev 
Cc: x...@kernel.org
Cc: linux-ms...@vger.kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/Kconfig | 10 ++
 arch/x86/kernel/cpu/common.c | 16 +++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cc98d5a..b7f1226 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1735,6 +1735,16 @@ config X86_SMAP
 
  If unsure, say Y.
 
+config X86_INTEL_UMIP
+   def_bool y
+   depends on CPU_SUP_INTEL
+   prompt "Intel User Mode Instruction Prevention" if EXPERT
+   ---help---
+ The User Mode Instruction Prevention (UMIP) is a security
+ feature in newer Intel processors. If enabled, a general
+ protection fault is issued if the instructions SGDT, SLDT,
+ SIDT, SMSW and STR are executed in user mode.
+
 config X86_INTEL_MPX
prompt "Intel MPX (Memory Protection Extensions)"
def_bool n
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 58094a1..9f59eb5 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -311,6 +311,19 @@ static __always_inline void setup_smap(struct cpuinfo_x86 
*c)
}
 }
 
+static __always_inline void setup_umip(struct cpuinfo_x86 *c)
+{
+   if (cpu_feature_enabled(X86_FEATURE_UMIP) &&
+   cpu_has(c, X86_FEATURE_UMIP))
+   cr4_set_bits(X86_CR4_UMIP);
+   else
+   /*
+* Make sure UMIP is disabled in case it was enabled in a
+* previous boot (e.g., via kexec).
+*/
+   cr4_clear_bits(X86_CR4_UMIP);
+}
+
 /*
  * Protection Keys are not available in 32-bit mode.
  */
@@ -1080,9 +1093,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);
 
-   /* Set up SMEP/SMAP */
+   /* Set up SMEP/SMAP/UMIP */
setup_smep(c);
setup_smap(c);
+   setup_umip(c);
 
/*
 * The vendor-specific functions might have changed features.
-- 
2.9.3

[v5 06/20] x86/insn-eval: Add utility functions to get segment selector

2017-03-03 Thread Ricardo Neri

When computing a linear address and segmentation is used, we need to know
the base address of the segment involved in the computation. In most of
the cases, the segment base address will be zero as in USER_DS/USER32_DS.
However, it may be possible that a user space program defines its own
segments via a local descriptor table. In such a case, the segment base
address may not be zero .Thus, the segment base address is needed to
calculate correctly the linear address.

The segment selector to be used when computing a linear address is
determined by either any of segment select override prefixes in the
instruction or inferred from the registers involved in the computation of
the effective address; in that order. Also, there are cases when the
overrides shall be ignored.

For clarity, this process can be split into two steps: resolving the
relevant segment and, once known, read the applicable segment selector.
The method to obtain the segment selector depends on several factors. In
32-bit builds, segment selectors are saved into the pt_regs structure
when switching to kernel mode. The same is also true for virtual-8086
mode. In 64-bit builds, segmentation is mostly ignored, except when
running a program in 32-bit legacy mode. In this case, CS and SS can be
obtained from pt_regs. DS, ES, FS and GS can be read directly from
registers. Lastly, segmentation is possible in 64-bit mode via FS and GS.
In these two cases, base addresses are obtained from the relevant MSRs.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 195 +++
 1 file changed, 195 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 78df1c9..8d45df8 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 enum reg_type {
REG_TYPE_RM = 0,
@@ -15,6 +16,200 @@ enum reg_type {
REG_TYPE_BASE,
 };
 
+enum segment {
+   SEG_CS = 0x23,
+   SEG_SS = 0x36,
+   SEG_DS = 0x3e,
+   SEG_ES = 0x26,
+   SEG_FS = 0x64,
+   SEG_GS = 0x65
+};
+
+/**
+ * resolve_seg_selector() - obtain segment selector
+ * @regs:  Set of registers containing the segment selector
+ * @insn:  Instruction structure with selector override prefixes
+ * @regoff:Operand offset, in pt_regs, of which the selector is needed
+ * @default:   Resolve default segment selector (i.e., ignore overrides)
+ *
+ * The segment selector to which an effective address refers depends on
+ * a) segment selector overrides instruction prefixes or b) the operand
+ * register indicated in the ModRM or SiB byte.
+ *
+ * For case a), the function inspects any prefixes in the insn instruction;
+ * insn can be null to indicate that selector override prefixes shall be
+ * ignored. This is useful when the use of prefixes is forbidden (e.g.,
+ * obtaining the code selector). For case b), the operand register shall be
+ * represented as the offset from the base address of pt_regs. Also, regoff
+ * can be -EINVAL for cases in which registers are not used as operands (e.g.,
+ * when the mod and r/m parts of the ModRM byte are 0 and 5, respectively).
+ *
+ * This function returns the segment selector to utilize as per the conditions
+ * described above. Please note that this functin does not return the value
+ * of the segment selector. The value of the segment selector needs to be
+ * obtained using get_segment_selector and passing the segment selector type
+ * resolved by this function.
+ *
+ * Return: Segment selector to use, among CS, SS, DS, ES, FS or GS.
+ */
+static int resolve_seg_selector(struct insn *insn, int regoff, bool 
get_default)
+{
+   int i;
+
+   if (!insn)
+   return -EINVAL;
+
+   if (get_default)
+   goto default_seg;
+   /*
+* Check first if we have selector overrides. Having more than
+* one selector override leads to undefined behavior. We
+* only use the first one and return
+*/
+   for (i = 0; i < insn->prefixes.nbytes; i++) {
+   switch (insn->prefixes.bytes[i]) {
+   case SEG_CS:
+   return SEG_CS;
+   case SEG_SS:
+   return SEG_SS;
+   case SEG_DS:
+   return SEG_DS;
+   case SEG_ES:
+   return SEG_ES;
+   case SEG_FS:
+   return SEG_FS;
+   case SEG_GS:
+   return SEG_GS;
+   default:
+   return -EINVAL;
+   }
+   }
+
+default_seg:
+   /*
+* If no

[v5 19/20] x86: Enable User-Mode Instruction Prevention

2017-03-03 Thread Ricardo Neri

User_mode Instruction Prevention (UMIP) is enabled by setting/clearing a
bit in %cr4.

It makes sense to enable UMIP at some point while booting, before user
spaces come up. Like SMAP and SMEP, is not critical to have it enabled
very early during boot. This is because UMIP is relevant only when there is
a userspace to be protected from. Given the similarities in relevance, it
makes sense to enable UMIP along with SMAP and SMEP.

UMIP is enabled by default. It can be disabled by adding clearcpuid=514
to the kernel parameters.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: Alexandre Julliard 
Cc: Stas Sergeev 
Cc: x...@kernel.org
Cc: linux-ms...@vger.kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/Kconfig | 10 ++
 arch/x86/kernel/cpu/common.c | 16 +++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cc98d5a..b7f1226 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1735,6 +1735,16 @@ config X86_SMAP
 
  If unsure, say Y.
 
+config X86_INTEL_UMIP
+   def_bool y
+   depends on CPU_SUP_INTEL
+   prompt "Intel User Mode Instruction Prevention" if EXPERT
+   ---help---
+ The User Mode Instruction Prevention (UMIP) is a security
+ feature in newer Intel processors. If enabled, a general
+ protection fault is issued if the instructions SGDT, SLDT,
+ SIDT, SMSW and STR are executed in user mode.
+
 config X86_INTEL_MPX
prompt "Intel MPX (Memory Protection Extensions)"
def_bool n
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 58094a1..9f59eb5 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -311,6 +311,19 @@ static __always_inline void setup_smap(struct cpuinfo_x86 
*c)
}
 }
 
+static __always_inline void setup_umip(struct cpuinfo_x86 *c)
+{
+   if (cpu_feature_enabled(X86_FEATURE_UMIP) &&
+   cpu_has(c, X86_FEATURE_UMIP))
+   cr4_set_bits(X86_CR4_UMIP);
+   else
+   /*
+* Make sure UMIP is disabled in case it was enabled in a
+* previous boot (e.g., via kexec).
+*/
+   cr4_clear_bits(X86_CR4_UMIP);
+}
+
 /*
  * Protection Keys are not available in 32-bit mode.
  */
@@ -1080,9 +1093,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);
 
-   /* Set up SMEP/SMAP */
+   /* Set up SMEP/SMAP/UMIP */
setup_smep(c);
setup_smap(c);
+   setup_umip(c);
 
/*
 * The vendor-specific functions might have changed features.
-- 
2.9.3

[v5 04/20] x86/mpx, x86/insn: Relocate insn util functions to a new insn-kernel

2017-03-03 Thread Ricardo Neri

Other kernel submodules can benefit from using the utility functions
defined in mpx.c to obtain the addresses and values of operands contained
in the general purpose registers. An instance of this is the emulation code
used for instructions protected by the Intel User-Mode Instruction
Prevention feature.

Thus, these functions are relocated to a new insn-eval.c file. The reason
to not relocate these utilities into insn.c is that the latter solely
analyses instructions given by a struct insn without any knowledge of the
meaning of the values of instruction operands. This new utility insn-
eval.c aims to be used to resolve effective and userspace linear addresses
based on the contents of the instruction operands as well as the contents
of pt_regs structure.

These utilities come with a separate header. This is to avoid taking insn.c
out of sync from the instructions decoders under tools/obj and tools/perf.
This also avoids adding cumbersome #ifdef's for the #include'd files
required to decode instructions in a kernel context.

Functions are simply relocated. There are not functional or indentation
changes.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  16 
 arch/x86/lib/Makefile|   2 +-
 arch/x86/lib/insn-eval.c | 160 +++
 arch/x86/mm/mpx.c| 153 +
 4 files changed, 179 insertions(+), 152 deletions(-)
 create mode 100644 arch/x86/include/asm/insn-eval.h
 create mode 100644 arch/x86/lib/insn-eval.c

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
new file mode 100644
index 000..5cab1b1
--- /dev/null
+++ b/arch/x86/include/asm/insn-eval.h
@@ -0,0 +1,16 @@
+#ifndef _ASM_X86_INSN_EVAL_H
+#define _ASM_X86_INSN_EVAL_H
+/*
+ * A collection of utility functions for x86 instruction analysis to be
+ * used in a kernel context. Useful when, for instance, making sense
+ * of the registers indicated by operands.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+
+#endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 34a7413..675d7b0 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o
 lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
-lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
new file mode 100644
index 000..23cf010
--- /dev/null
+++ b/arch/x86/lib/insn-eval.c
@@ -0,0 +1,160 @@
+/*
+ * Utility functions for x86 operand and address decoding
+ *
+ * Copyright (C) Intel Corporation 2017
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+enum reg_type {
+   REG_TYPE_RM = 0,
+   REG_TYPE_INDEX,
+   REG_TYPE_BASE,
+};
+
+static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
+ enum reg_type type)
+{
+   int regno = 0;
+
+   static const int regoff[] = {
+   offsetof(struct pt_regs, ax),
+   offsetof(struct pt_regs, cx),
+   offsetof(struct pt_regs, dx),
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, sp),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+   offsetof(struct pt_regs, r8),
+   offsetof(struct pt_regs, r9),
+   offsetof(struct pt_regs, r10),
+   offsetof(struct pt_regs, r11),
+   offsetof(struct pt_regs, r12),
+   offsetof(struct pt_regs, r13),
+   offsetof(struct pt_regs, r14),
+   offsetof(struct pt_regs, r15),
+#endif
+   };
+   int nr_registers = ARRAY_SIZE(regoff);
+   /*
+* Don't possibly decode a 32-bit instructions as
+* reading a 64-bit-only register.
+*/
+   if (IS_ENABLED(CONFIG_X86_64) &&

[v5 04/20] x86/mpx, x86/insn: Relocate insn util functions to a new insn-kernel

2017-03-03 Thread Ricardo Neri

Other kernel submodules can benefit from using the utility functions
defined in mpx.c to obtain the addresses and values of operands contained
in the general purpose registers. An instance of this is the emulation code
used for instructions protected by the Intel User-Mode Instruction
Prevention feature.

Thus, these functions are relocated to a new insn-eval.c file. The reason
to not relocate these utilities into insn.c is that the latter solely
analyses instructions given by a struct insn without any knowledge of the
meaning of the values of instruction operands. This new utility insn-
eval.c aims to be used to resolve effective and userspace linear addresses
based on the contents of the instruction operands as well as the contents
of pt_regs structure.

These utilities come with a separate header. This is to avoid taking insn.c
out of sync from the instructions decoders under tools/obj and tools/perf.
This also avoids adding cumbersome #ifdef's for the #include'd files
required to decode instructions in a kernel context.

Functions are simply relocated. There are not functional or indentation
changes.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  16 
 arch/x86/lib/Makefile|   2 +-
 arch/x86/lib/insn-eval.c | 160 +++
 arch/x86/mm/mpx.c| 153 +
 4 files changed, 179 insertions(+), 152 deletions(-)
 create mode 100644 arch/x86/include/asm/insn-eval.h
 create mode 100644 arch/x86/lib/insn-eval.c

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
new file mode 100644
index 000..5cab1b1
--- /dev/null
+++ b/arch/x86/include/asm/insn-eval.h
@@ -0,0 +1,16 @@
+#ifndef _ASM_X86_INSN_EVAL_H
+#define _ASM_X86_INSN_EVAL_H
+/*
+ * A collection of utility functions for x86 instruction analysis to be
+ * used in a kernel context. Useful when, for instance, making sense
+ * of the registers indicated by operands.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+
+#endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 34a7413..675d7b0 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o
 lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
-lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
new file mode 100644
index 000..23cf010
--- /dev/null
+++ b/arch/x86/lib/insn-eval.c
@@ -0,0 +1,160 @@
+/*
+ * Utility functions for x86 operand and address decoding
+ *
+ * Copyright (C) Intel Corporation 2017
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+enum reg_type {
+   REG_TYPE_RM = 0,
+   REG_TYPE_INDEX,
+   REG_TYPE_BASE,
+};
+
+static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
+ enum reg_type type)
+{
+   int regno = 0;
+
+   static const int regoff[] = {
+   offsetof(struct pt_regs, ax),
+   offsetof(struct pt_regs, cx),
+   offsetof(struct pt_regs, dx),
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, sp),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+   offsetof(struct pt_regs, r8),
+   offsetof(struct pt_regs, r9),
+   offsetof(struct pt_regs, r10),
+   offsetof(struct pt_regs, r11),
+   offsetof(struct pt_regs, r12),
+   offsetof(struct pt_regs, r13),
+   offsetof(struct pt_regs, r14),
+   offsetof(struct pt_regs, r15),
+#endif
+   };
+   int nr_registers = ARRAY_SIZE(regoff);
+   /*
+* Don't possibly decode a 32-bit instructions as
+* reading a 64-bit-only register.
+*/
+   if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
+   nr_registers -= 8;
+
+   switch (type) {
+   case REG_TYPE_RM:
+   regno = X86_MODRM_RM(insn->modrm.value);
+   if (X86_REX_B(insn->rex_prefix.value))
+   regno += 8;
+   break;
+
+   case REG_TYPE_INDEX:
+   regno = X86_SIB_INDEX(insn->sib.value);
+

Re: [PATCH v3 03/25] dt-bindings: timer: Document Owl timer

2017-03-03 Thread Andreas Färber

Am 03.03.2017 um 07:20 schrieb Rob Herring:
> On Tue, Feb 28, 2017 at 12:39:08PM +, Mark Rutland wrote:
>> On Tue, Feb 28, 2017 at 07:35:13AM +0100, Andreas Färber wrote:
>>> The Actions Semi S500 SoC contains a timer block with two 2 Hz and two
>>> 32-bit timers. The S900 SoC timer block has four 32-bit timers.
>>>
>>> Signed-off-by: Andreas Färber 
>>> ---
>>>  v2 -> v3:
>>>  * Adopted interrupt-names
>>>  * Changed compatible for S500
>>>  * Added S900 compatible and interrupt names
>>>  
>>>  v2: new
>>>  
>>>  .../devicetree/bindings/timer/actions,owl-timer.txt  | 20 
>>> 
>>>  1 file changed, 20 insertions(+)
>>>  create mode 100644 
>>> Documentation/devicetree/bindings/timer/actions,owl-timer.txt
>>>
>>> diff --git a/Documentation/devicetree/bindings/timer/actions,owl-timer.txt 
>>> b/Documentation/devicetree/bindings/timer/actions,owl-timer.txt
>>> new file mode 100644
>>> index 000..5b4834d
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/timer/actions,owl-timer.txt
>>> @@ -0,0 +1,20 @@
>>> +Actions Semi Owl Timer
>>> +
>>> +Required properties:
>>> +- compatible  :  "actions,s500-timer" for S500
>>> + "actions,s900-timer" for S900
>>> +- reg :  Offset and length of the register set for the device.
>>> +- interrupts  :  Should contain the interrupts.
>>> +- interrupt-names :  Valid names are: "2Hz0", "2Hz1",
>>> +  "Timer0", "Timer1", "Timer2", 
>>> "Timer3"
>>
>> Sticking to lower case for the names would be preferable.

Done.

> With that,
> 
> Acked-by: Rob Herring 

Thanks,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v3 03/25] dt-bindings: timer: Document Owl timer

2017-03-03 Thread Andreas Färber

Am 03.03.2017 um 07:20 schrieb Rob Herring:
> On Tue, Feb 28, 2017 at 12:39:08PM +, Mark Rutland wrote:
>> On Tue, Feb 28, 2017 at 07:35:13AM +0100, Andreas Färber wrote:
>>> The Actions Semi S500 SoC contains a timer block with two 2 Hz and two
>>> 32-bit timers. The S900 SoC timer block has four 32-bit timers.
>>>
>>> Signed-off-by: Andreas Färber 
>>> ---
>>>  v2 -> v3:
>>>  * Adopted interrupt-names
>>>  * Changed compatible for S500
>>>  * Added S900 compatible and interrupt names
>>>  
>>>  v2: new
>>>  
>>>  .../devicetree/bindings/timer/actions,owl-timer.txt  | 20 
>>> 
>>>  1 file changed, 20 insertions(+)
>>>  create mode 100644 
>>> Documentation/devicetree/bindings/timer/actions,owl-timer.txt
>>>
>>> diff --git a/Documentation/devicetree/bindings/timer/actions,owl-timer.txt 
>>> b/Documentation/devicetree/bindings/timer/actions,owl-timer.txt
>>> new file mode 100644
>>> index 000..5b4834d
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/timer/actions,owl-timer.txt
>>> @@ -0,0 +1,20 @@
>>> +Actions Semi Owl Timer
>>> +
>>> +Required properties:
>>> +- compatible  :  "actions,s500-timer" for S500
>>> + "actions,s900-timer" for S900
>>> +- reg :  Offset and length of the register set for the device.
>>> +- interrupts  :  Should contain the interrupts.
>>> +- interrupt-names :  Valid names are: "2Hz0", "2Hz1",
>>> +  "Timer0", "Timer1", "Timer2", 
>>> "Timer3"
>>
>> Sticking to lower case for the names would be preferable.

Done.

> With that,
> 
> Acked-by: Rob Herring 

Thanks,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH 0/2] fix the traced mt-exec deadlock

2017-03-03 Thread Eric W. Biederman

ebied...@xmission.com (Eric W. Biederman) writes:

> ebied...@xmission.com (Eric W. Biederman) writes:
>
>> The big lesson for me, and what was not obvious from your change
>> description is that we are changing the user space visible semantics
>> of exec+ptrace and that cred_guard_mutex is not at all the problem (as
>> we always take cred_guard_mutex in a killable or interruptible way).
>
> Just to follow up.
>
> Because the cred_guard_mutex is fine as is we don't need to move
> de_thread out from under cred_guard_mutex.  We just need to change
> de_thread to wait until all of the other threads are zombies.
> Which should remove about half your proposed patch.
>
> The other key thing is that knowning it isn't cred_guard_mutex let's us
> know that this kind of deadlock goes all of the way back to when
> CLONE_THREAD was merged into the kernel.
>
> Insteresingly enough looking at zap_other_threads and notify_count I
> have found a second bug.  When a multi-threaded processes becomes a
> zombie we don't send the notification to the parent process until the
> non-leader threads have been reaped.  Which means ptrace can mess up
> sending SIGCHLD to the parent.

Bah. I was misreading the code.  Nothing but exec uses notify_count
and group_exit_task.

Eric

Re: [PATCH 0/2] fix the traced mt-exec deadlock

2017-03-03 Thread Eric W. Biederman

ebied...@xmission.com (Eric W. Biederman) writes:

> ebied...@xmission.com (Eric W. Biederman) writes:
>
>> The big lesson for me, and what was not obvious from your change
>> description is that we are changing the user space visible semantics
>> of exec+ptrace and that cred_guard_mutex is not at all the problem (as
>> we always take cred_guard_mutex in a killable or interruptible way).
>
> Just to follow up.
>
> Because the cred_guard_mutex is fine as is we don't need to move
> de_thread out from under cred_guard_mutex.  We just need to change
> de_thread to wait until all of the other threads are zombies.
> Which should remove about half your proposed patch.
>
> The other key thing is that knowning it isn't cred_guard_mutex let's us
> know that this kind of deadlock goes all of the way back to when
> CLONE_THREAD was merged into the kernel.
>
> Insteresingly enough looking at zap_other_threads and notify_count I
> have found a second bug.  When a multi-threaded processes becomes a
> zombie we don't send the notification to the parent process until the
> non-leader threads have been reaped.  Which means ptrace can mess up
> sending SIGCHLD to the parent.

Bah. I was misreading the code.  Nothing but exec uses notify_count
and group_exit_task.

Eric

Re: [RFC PATCH v2 06/32] x86/pci: Use memremap when walking setup data

2017-03-03 Thread Tom Lendacky


On 3/3/2017 2:42 PM, Bjorn Helgaas wrote:

On Thu, Mar 02, 2017 at 10:13:10AM -0500, Brijesh Singh wrote:

From: Tom Lendacky 

The use of ioremap will force the setup data to be mapped decrypted even
though setup data is encrypted.  Switch to using memremap which will be
able to perform the proper mapping.


How should callers decide whether to use ioremap() or memremap()?

memremap() existed before SME and SEV, and this code is used even if
SME and SEV aren't supported, so the rationale for this change should
not need the decryption argument.


When SME or SEV is active an ioremap() will remove the encryption bit
from the pagetable entry when it is mapped.  This allows MMIO, which
doesn't support SME/SEV, to be performed successfully.  So my take is
that ioremap() should be used for MMIO and memremap() for pages in RAM.




Signed-off-by: Tom Lendacky 
---
 arch/x86/pci/common.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index a4fdfa7..0b06670 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -691,7 +691,7 @@ int pcibios_add_device(struct pci_dev *dev)

pa_data = boot_params.hdr.setup_data;
while (pa_data) {
-   data = ioremap(pa_data, sizeof(*rom));
+   data = memremap(pa_data, sizeof(*rom), MEMREMAP_WB);


I can't quite connect the dots here.  ioremap() on x86 would do
ioremap_nocache().  memremap(MEMREMAP_WB) would do arch_memremap_wb(),
which is ioremap_cache().  Is making a cacheable mapping the important
difference?


The memremap(MEMREMAP_WB) will actually check to see if it can perform
a __va(pa_data) in try_ram_remap() and then fallback to the
arch_memremap_wb().  So it's actually the __va() vs the ioremap_cache()
that is the difference.

Thanks,
Tom




if (!data)
return -ENOMEM;

@@ -710,7 +710,7 @@ int pcibios_add_device(struct pci_dev *dev)
}
}
pa_data = data->next;
-   iounmap(data);
+   memunmap(data);
}
set_dma_domain_ops(dev);
set_dev_domain_options(dev);

Re: [RFC PATCH v2 06/32] x86/pci: Use memremap when walking setup data

2017-03-03 Thread Tom Lendacky


On 3/3/2017 2:42 PM, Bjorn Helgaas wrote:

On Thu, Mar 02, 2017 at 10:13:10AM -0500, Brijesh Singh wrote:

From: Tom Lendacky 

The use of ioremap will force the setup data to be mapped decrypted even
though setup data is encrypted.  Switch to using memremap which will be
able to perform the proper mapping.


How should callers decide whether to use ioremap() or memremap()?

memremap() existed before SME and SEV, and this code is used even if
SME and SEV aren't supported, so the rationale for this change should
not need the decryption argument.


When SME or SEV is active an ioremap() will remove the encryption bit
from the pagetable entry when it is mapped.  This allows MMIO, which
doesn't support SME/SEV, to be performed successfully.  So my take is
that ioremap() should be used for MMIO and memremap() for pages in RAM.




Signed-off-by: Tom Lendacky 
---
 arch/x86/pci/common.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index a4fdfa7..0b06670 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -691,7 +691,7 @@ int pcibios_add_device(struct pci_dev *dev)

pa_data = boot_params.hdr.setup_data;
while (pa_data) {
-   data = ioremap(pa_data, sizeof(*rom));
+   data = memremap(pa_data, sizeof(*rom), MEMREMAP_WB);


I can't quite connect the dots here.  ioremap() on x86 would do
ioremap_nocache().  memremap(MEMREMAP_WB) would do arch_memremap_wb(),
which is ioremap_cache().  Is making a cacheable mapping the important
difference?


The memremap(MEMREMAP_WB) will actually check to see if it can perform
a __va(pa_data) in try_ram_remap() and then fallback to the
arch_memremap_wb().  So it's actually the __va() vs the ioremap_cache()
that is the difference.

Thanks,
Tom




if (!data)
return -ENOMEM;

@@ -710,7 +710,7 @@ int pcibios_add_device(struct pci_dev *dev)
}
}
pa_data = data->next;
-   iounmap(data);
+   memunmap(data);
}
set_dma_domain_ops(dev);
set_dev_domain_options(dev);

Re: [PATCH] zswap: Zero-filled pages handling

2017-03-03 Thread Dan Streetman

On Sat, Feb 25, 2017 at 12:18 PM, Sarbojit Ganguly
 wrote:
> On 25 February 2017 at 20:12, Srividya Desireddy
>  wrote:
>> From: Srividya Desireddy 
>> Date: Thu, 23 Feb 2017 15:04:06 +0530
>> Subject: [PATCH] zswap: Zero-filled pages handling

your email is base64-encoded; please send plain text emails.

>>
>> Zswap is a cache which compresses the pages that are being swapped out
>> and stores them into a dynamically allocated RAM-based memory pool.
>> Experiments have shown that around 10-20% of pages stored in zswap
>> are zero-filled pages (i.e. contents of the page are all zeros), but

20%?  that's a LOT of zero pages...which seems like applications are
wasting a lot of memory.  what kind of workload are you testing with?

>> these pages are handled as normal pages by compressing and allocating
>> memory in the pool.
>>
>> This patch adds a check in zswap_frontswap_store() to identify zero-filled
>> page before compression of the page. If the page is a zero-filled page, set
>> zswap_entry.zeroflag and skip the compression of the page and alloction
>> of memory in zpool. In zswap_frontswap_load(), check if the zeroflag is
>> set for the page in zswap_entry. If the flag is set, memset the page with
>> zero. This saves the decompression time during load.
>>
>> The overall overhead caused to check for a zero-filled page is very minimal
>> when compared to the time saved by avoiding compression and allocation in
>> case of zero-filled pages. Although, compressed size of a zero-filled page
>> is very less, with this patch load time of a zero-filled page is reduced by
>> 80% when compared to baseline.
>
> Is it possible to share the benchmark details?

Was there an answer to this?

>
>
>>
>> Signed-off-by: Srividya Desireddy 
>> ---
>>  mm/zswap.c |   48 +---
>>  1 file changed, 45 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/zswap.c b/mm/zswap.c
>> index 067a0d6..a574008 100644
>> --- a/mm/zswap.c
>> +++ b/mm/zswap.c
>> @@ -49,6 +49,8 @@
>>  static u64 zswap_pool_total_size;
>>  /* The number of compressed pages currently stored in zswap */
>>  static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
>> +/* The number of zero filled pages swapped out to zswap */
>> +static atomic_t zswap_zero_pages = ATOMIC_INIT(0);
>>
>>  /*
>>   * The statistics below are not protected from concurrent access for
>> @@ -140,6 +142,8 @@ struct zswap_pool {
>>   *  decompression
>>   * pool - the zswap_pool the entry's data is in
>>   * handle - zpool allocation handle that stores the compressed page data
>> + * zeroflag - the flag is set if the content of the page is filled with
>> + *zeros
>>   */
>>  struct zswap_entry {
>> struct rb_node rbnode;
>> @@ -148,6 +152,7 @@ struct zswap_entry {
>> unsigned int length;
>> struct zswap_pool *pool;
>> unsigned long handle;
>> +   unsigned char zeroflag;

instead of a flag, we can use length == 0; the length will never be 0
for any actually compressed page.

>>  };
>>
>>  struct zswap_header {
>> @@ -236,6 +241,7 @@ static struct zswap_entry *zswap_entry_cache_alloc(gfp_t 
>> gfp)
>> if (!entry)
>> return NULL;
>> entry->refcount = 1;
>> +   entry->zeroflag = 0;
>> RB_CLEAR_NODE(>rbnode);
>> return entry;
>>  }
>> @@ -306,8 +312,12 @@ static void zswap_rb_erase(struct rb_root *root, struct 
>> zswap_entry *entry)
>>   */
>>  static void zswap_free_entry(struct zswap_entry *entry)
>>  {
>> -   zpool_free(entry->pool->zpool, entry->handle);
>> -   zswap_pool_put(entry->pool);
>> +   if (entry->zeroflag)
>> +   atomic_dec(_zero_pages);
>> +   else {
>> +   zpool_free(entry->pool->zpool, entry->handle);
>> +   zswap_pool_put(entry->pool);
>> +   }
>> zswap_entry_cache_free(entry);
>> atomic_dec(_stored_pages);
>> zswap_update_total_size();
>> @@ -877,6 +887,19 @@ static int zswap_shrink(void)
>> return ret;
>>  }
>>
>> +static int zswap_is_page_zero_filled(void *ptr)
>> +{
>> +   unsigned int pos;
>> +   unsigned long *page;
>> +
>> +   page = (unsigned long *)ptr;
>> +   for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) {
>> +   if (page[pos])
>> +   return 0;
>> +   }
>> +   return 1;
>> +}
>> +
>>  /*
>>  * frontswap hooks
>>  **/
>> @@ -917,6 +940,15 @@ static int zswap_frontswap_store(unsigned type, pgoff_t 
>> offset,
>> goto reject;
>> }
>>
>> +   src = kmap_atomic(page);
>> +   if (zswap_is_page_zero_filled(src)) {
>> +   kunmap_atomic(src);
>> +   entry->offset = offset;
>> +   entry->zeroflag = 1;
>> +   atomic_inc(_zero_pages);
>> +

Re: [PATCH] zswap: Zero-filled pages handling

2017-03-03 Thread Dan Streetman

On Sat, Feb 25, 2017 at 12:18 PM, Sarbojit Ganguly
 wrote:
> On 25 February 2017 at 20:12, Srividya Desireddy
>  wrote:
>> From: Srividya Desireddy 
>> Date: Thu, 23 Feb 2017 15:04:06 +0530
>> Subject: [PATCH] zswap: Zero-filled pages handling

your email is base64-encoded; please send plain text emails.

>>
>> Zswap is a cache which compresses the pages that are being swapped out
>> and stores them into a dynamically allocated RAM-based memory pool.
>> Experiments have shown that around 10-20% of pages stored in zswap
>> are zero-filled pages (i.e. contents of the page are all zeros), but

20%?  that's a LOT of zero pages...which seems like applications are
wasting a lot of memory.  what kind of workload are you testing with?

>> these pages are handled as normal pages by compressing and allocating
>> memory in the pool.
>>
>> This patch adds a check in zswap_frontswap_store() to identify zero-filled
>> page before compression of the page. If the page is a zero-filled page, set
>> zswap_entry.zeroflag and skip the compression of the page and alloction
>> of memory in zpool. In zswap_frontswap_load(), check if the zeroflag is
>> set for the page in zswap_entry. If the flag is set, memset the page with
>> zero. This saves the decompression time during load.
>>
>> The overall overhead caused to check for a zero-filled page is very minimal
>> when compared to the time saved by avoiding compression and allocation in
>> case of zero-filled pages. Although, compressed size of a zero-filled page
>> is very less, with this patch load time of a zero-filled page is reduced by
>> 80% when compared to baseline.
>
> Is it possible to share the benchmark details?

Was there an answer to this?

>
>
>>
>> Signed-off-by: Srividya Desireddy 
>> ---
>>  mm/zswap.c |   48 +---
>>  1 file changed, 45 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/zswap.c b/mm/zswap.c
>> index 067a0d6..a574008 100644
>> --- a/mm/zswap.c
>> +++ b/mm/zswap.c
>> @@ -49,6 +49,8 @@
>>  static u64 zswap_pool_total_size;
>>  /* The number of compressed pages currently stored in zswap */
>>  static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
>> +/* The number of zero filled pages swapped out to zswap */
>> +static atomic_t zswap_zero_pages = ATOMIC_INIT(0);
>>
>>  /*
>>   * The statistics below are not protected from concurrent access for
>> @@ -140,6 +142,8 @@ struct zswap_pool {
>>   *  decompression
>>   * pool - the zswap_pool the entry's data is in
>>   * handle - zpool allocation handle that stores the compressed page data
>> + * zeroflag - the flag is set if the content of the page is filled with
>> + *zeros
>>   */
>>  struct zswap_entry {
>> struct rb_node rbnode;
>> @@ -148,6 +152,7 @@ struct zswap_entry {
>> unsigned int length;
>> struct zswap_pool *pool;
>> unsigned long handle;
>> +   unsigned char zeroflag;

instead of a flag, we can use length == 0; the length will never be 0
for any actually compressed page.

>>  };
>>
>>  struct zswap_header {
>> @@ -236,6 +241,7 @@ static struct zswap_entry *zswap_entry_cache_alloc(gfp_t 
>> gfp)
>> if (!entry)
>> return NULL;
>> entry->refcount = 1;
>> +   entry->zeroflag = 0;
>> RB_CLEAR_NODE(>rbnode);
>> return entry;
>>  }
>> @@ -306,8 +312,12 @@ static void zswap_rb_erase(struct rb_root *root, struct 
>> zswap_entry *entry)
>>   */
>>  static void zswap_free_entry(struct zswap_entry *entry)
>>  {
>> -   zpool_free(entry->pool->zpool, entry->handle);
>> -   zswap_pool_put(entry->pool);
>> +   if (entry->zeroflag)
>> +   atomic_dec(_zero_pages);
>> +   else {
>> +   zpool_free(entry->pool->zpool, entry->handle);
>> +   zswap_pool_put(entry->pool);
>> +   }
>> zswap_entry_cache_free(entry);
>> atomic_dec(_stored_pages);
>> zswap_update_total_size();
>> @@ -877,6 +887,19 @@ static int zswap_shrink(void)
>> return ret;
>>  }
>>
>> +static int zswap_is_page_zero_filled(void *ptr)
>> +{
>> +   unsigned int pos;
>> +   unsigned long *page;
>> +
>> +   page = (unsigned long *)ptr;
>> +   for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) {
>> +   if (page[pos])
>> +   return 0;
>> +   }
>> +   return 1;
>> +}
>> +
>>  /*
>>  * frontswap hooks
>>  **/
>> @@ -917,6 +940,15 @@ static int zswap_frontswap_store(unsigned type, pgoff_t 
>> offset,
>> goto reject;
>> }
>>
>> +   src = kmap_atomic(page);
>> +   if (zswap_is_page_zero_filled(src)) {
>> +   kunmap_atomic(src);
>> +   entry->offset = offset;
>> +   entry->zeroflag = 1;
>> +   atomic_inc(_zero_pages);
>> +   goto insert_entry;
>> +   }
>> +
>> /* if entry is successfully added, it

Re: net/ipv4: division by 0 in tcp_select_window

2017-03-03 Thread Eric Dumazet

On Fri, 2017-03-03 at 10:25 -0800, Eric Dumazet wrote:
> On Fri, Mar 3, 2017 at 10:10 AM, Dmitry Vyukov  wrote:
> > Hello,
> >
> > The following program triggers division by 0 in tcp_select_window:
> >
> > https://gist.githubusercontent.com/dvyukov/ef28c0fd2ab57a655508ef7621b12e6c/raw/079011e2a9523a390b0621cbc1e5d9d5e637fd6d/gistfile1.txt
> 
> Yeah, tcp_disconnect() should never have existed in the first place.
> 
> We'll send a patch, unless you take care of this before us .

Could you try this first patch ?

Probably others will also be needed.

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 
40d893556e6701ace6a02903e53c45822d6fa56d..2187ebf1f270d19e6dd019b8f9df5eef8d018e03
 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -552,7 +552,8 @@ void tcp_write_timer_handler(struct sock *sk)
struct inet_connection_sock *icsk = inet_csk(sk);
int event;
 
-   if (sk->sk_state == TCP_CLOSE || !icsk->icsk_pending)
+   if (((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)) ||
+   !icsk->icsk_pending)
goto out;
 
if (time_after(icsk->icsk_timeout, jiffies)) {

Re: net/ipv4: division by 0 in tcp_select_window

2017-03-03 Thread Eric Dumazet

On Fri, 2017-03-03 at 10:25 -0800, Eric Dumazet wrote:
> On Fri, Mar 3, 2017 at 10:10 AM, Dmitry Vyukov  wrote:
> > Hello,
> >
> > The following program triggers division by 0 in tcp_select_window:
> >
> > https://gist.githubusercontent.com/dvyukov/ef28c0fd2ab57a655508ef7621b12e6c/raw/079011e2a9523a390b0621cbc1e5d9d5e637fd6d/gistfile1.txt
> 
> Yeah, tcp_disconnect() should never have existed in the first place.
> 
> We'll send a patch, unless you take care of this before us .

Could you try this first patch ?

Probably others will also be needed.

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 
40d893556e6701ace6a02903e53c45822d6fa56d..2187ebf1f270d19e6dd019b8f9df5eef8d018e03
 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -552,7 +552,8 @@ void tcp_write_timer_handler(struct sock *sk)
struct inet_connection_sock *icsk = inet_csk(sk);
int event;
 
-   if (sk->sk_state == TCP_CLOSE || !icsk->icsk_pending)
+   if (((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)) ||
+   !icsk->icsk_pending)
goto out;
 
if (time_after(icsk->icsk_timeout, jiffies)) {

Re: Hundreds of null PATH records for *init_module syscall audit logs

2017-03-03 Thread Richard Guy Briggs

On 2017-02-28 23:15, Steve Grubb wrote:
> On Tuesday, February 28, 2017 10:37:04 PM EST Richard Guy Briggs wrote:
> > Sorry, I forgot to include Cc: in this cover letter for context to the 4
> > alt patches.
> > 
> > On 2017-02-28 22:15, Richard Guy Briggs wrote:
> > > The background to this is:
> > >   https://github.com/linux-audit/audit-kernel/issues/8
> > > 
> > > In short, audit SYSCALL records for *init_module were occasionally
> > > accompanied by hundreds to thousands of null PATH records.
> > > 
> > > I chatted with Al Viro and Eric Paris about this Friday afternoon and
> > > they seemed to vaguely recall this issue and didn't have any solid
> > > recommendations as to what was the right thing to do (other than the
> > > same suggestion from both that I won't print here).
> > > 
> > > It was reproducible on a number of vintages of distributions with
> > > default kernels, but triggering on very few of the many modules loaded
> > > at boot time.  It was reproduced with fs-nfs4 and nfsv4 modules on
> > > tracefs, but there are reports of it also happening with debugfs.  It
> > > was triggering only in __audit_inode_child with a parent that was not
> > > found in the task context's audit names_list.
> > > 
> > > I have four potential solutions listed in my order of preference and I'd
> > > like to get some feedback about which one would be the most acceptable.
> 
> 0.5 - Notice that we are in *init_module & delete_module and inhibit 
> generation of any record type except SYSCALL and KERN_MODULE ? There are some 
> classification routines for -F perms=wrxa that might be used to create a new 
> class for loading/deleting modules that sets a flag that we use to suppress 
> some record types.

Ok, I was partially able to do this.

If I try and catch it in audit_log_start() which is the common point for
all the record types to be able to limit to just SYSCALL and
KERN_MODULE, there will already be a linked list of hundreds to
thousands of audit_names and will still print a non-zero items count in
the SYSCALL record.  This also sounds like a potentially lazy way to
deal with other record spam (like setuid BRPM_FCAPS).

If I catch it in __audit_inode_child in the same place as I caught the
filesystem type, it is effective for only the PATH record, which is all
that is a problem at the moment.

It touches nine arch-related files, which is a lot more disruptive than
I was hoping.

> > > 1 - In __audit_inode_child, return immedialy upon detecting TRACEFS and
> > > 
> > > DEBUGFS (and potentially other filesystems identified, via s_magic).
> 
> XFS creates them too. Who knows what else.

Why would this happen?  I would assume it is a mounted filesystem.  Do
you have a sample of the extra records?

This brings me back to the original reaction I had to your suggestion
which is: Are you certain there is never a circumstance where *_module
syscalls never involve a file?  Say, the module itself on loading pulls
in other files from the mounted filesystem?

> -Steve
> 
> > > 2 - In __audit_inode_child, return after not finding the parent in that
> > > 
> > > task context's audit names_list.
> > > 
> > > 3 - In __audit_inode_child, mark the parent and its child as "hidden"
> > > 
> > > when the parent isn't found in that task context's audit names_list.
> > > This will still result in an "items=" count that does not match the
> > > number of accompanying PATH records for that SYSCALL record, which
> > > may upset userspace tools but would still indicate suppressed
> > > records.
> > > 
> > > 4 - In __audit_inode_child, when the parent isn't found, store the
> > > 
> > > child's dentry in the child's (new or not) audit_names structure
> > > (properly refcounted with dget) and store the parent's dentry in its
> > > newly created audit_names structure (via dget_parent), then if the
> > > name isn't available at PATH record generation time, use that stored
> > > value (with dentry_path_raw and released with dput)
> > >
> > > Is there another more elegant solution that I've missed that catches
> > > things before they get anywhere near audit_inode_child (called from
> > > tracefs' notifiers)?
> > > 
> > > I'll thread onto this message tested patches for all four solutions.

- RGB

--
Richard Guy Briggs 
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635

Re: Hundreds of null PATH records for *init_module syscall audit logs

2017-03-03 Thread Richard Guy Briggs

On 2017-02-28 23:15, Steve Grubb wrote:
> On Tuesday, February 28, 2017 10:37:04 PM EST Richard Guy Briggs wrote:
> > Sorry, I forgot to include Cc: in this cover letter for context to the 4
> > alt patches.
> > 
> > On 2017-02-28 22:15, Richard Guy Briggs wrote:
> > > The background to this is:
> > >   https://github.com/linux-audit/audit-kernel/issues/8
> > > 
> > > In short, audit SYSCALL records for *init_module were occasionally
> > > accompanied by hundreds to thousands of null PATH records.
> > > 
> > > I chatted with Al Viro and Eric Paris about this Friday afternoon and
> > > they seemed to vaguely recall this issue and didn't have any solid
> > > recommendations as to what was the right thing to do (other than the
> > > same suggestion from both that I won't print here).
> > > 
> > > It was reproducible on a number of vintages of distributions with
> > > default kernels, but triggering on very few of the many modules loaded
> > > at boot time.  It was reproduced with fs-nfs4 and nfsv4 modules on
> > > tracefs, but there are reports of it also happening with debugfs.  It
> > > was triggering only in __audit_inode_child with a parent that was not
> > > found in the task context's audit names_list.
> > > 
> > > I have four potential solutions listed in my order of preference and I'd
> > > like to get some feedback about which one would be the most acceptable.
> 
> 0.5 - Notice that we are in *init_module & delete_module and inhibit 
> generation of any record type except SYSCALL and KERN_MODULE ? There are some 
> classification routines for -F perms=wrxa that might be used to create a new 
> class for loading/deleting modules that sets a flag that we use to suppress 
> some record types.

Ok, I was partially able to do this.

If I try and catch it in audit_log_start() which is the common point for
all the record types to be able to limit to just SYSCALL and
KERN_MODULE, there will already be a linked list of hundreds to
thousands of audit_names and will still print a non-zero items count in
the SYSCALL record.  This also sounds like a potentially lazy way to
deal with other record spam (like setuid BRPM_FCAPS).

If I catch it in __audit_inode_child in the same place as I caught the
filesystem type, it is effective for only the PATH record, which is all
that is a problem at the moment.

It touches nine arch-related files, which is a lot more disruptive than
I was hoping.

> > > 1 - In __audit_inode_child, return immedialy upon detecting TRACEFS and
> > > 
> > > DEBUGFS (and potentially other filesystems identified, via s_magic).
> 
> XFS creates them too. Who knows what else.

Why would this happen?  I would assume it is a mounted filesystem.  Do
you have a sample of the extra records?

This brings me back to the original reaction I had to your suggestion
which is: Are you certain there is never a circumstance where *_module
syscalls never involve a file?  Say, the module itself on loading pulls
in other files from the mounted filesystem?

> -Steve
> 
> > > 2 - In __audit_inode_child, return after not finding the parent in that
> > > 
> > > task context's audit names_list.
> > > 
> > > 3 - In __audit_inode_child, mark the parent and its child as "hidden"
> > > 
> > > when the parent isn't found in that task context's audit names_list.
> > > This will still result in an "items=" count that does not match the
> > > number of accompanying PATH records for that SYSCALL record, which
> > > may upset userspace tools but would still indicate suppressed
> > > records.
> > > 
> > > 4 - In __audit_inode_child, when the parent isn't found, store the
> > > 
> > > child's dentry in the child's (new or not) audit_names structure
> > > (properly refcounted with dget) and store the parent's dentry in its
> > > newly created audit_names structure (via dget_parent), then if the
> > > name isn't available at PATH record generation time, use that stored
> > > value (with dentry_path_raw and released with dput)
> > >
> > > Is there another more elegant solution that I've missed that catches
> > > things before they get anywhere near audit_inode_child (called from
> > > tracefs' notifiers)?
> > > 
> > > I'll thread onto this message tested patches for all four solutions.

- RGB

--
Richard Guy Briggs 
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635

Re: [PATCH v2 3/3] ARM64: dts: meson-gx: Add MALI nodes for GXBB and GXL

2017-03-03 Thread Kevin Hilman

Neil Armstrong  writes:

> Hi Andreas,
> On 03/02/2017 01:31 PM, Andreas Färber wrote:
>> Hi Neil,
>> 
>> Am 01.03.2017 um 11:46 schrieb Neil Armstrong:
>>> The same MALI-450 MP3 GPU is present in the GXBB and GXL SoCs.
>> 
>> First of all, any reason you're upper-casing Mali in the commit message?
>> ARM doesn't.
>
> No reason, only a type, indeed it was lower-casing on the v1.
> Will fix in v2.
>
>> 
>>>
>>> The node is simply added in the meson-gxbb.dtsi file.
>> 
>> The GXBB part looks fine on a quick look.
>> 
>>>
>>> For GXL, since a lot is shared with the GXM that has a MALI-T820 IP, this
>>> patch adds a new meson-gxl-mali.dtsi and is included in the SoC specific
>>> dtsi files.
>> 
>> This part is slightly confusing though.
>> 
>> What exactly is the GXL vs. GXM difference that this can't be handled by
>> overriding node properties compatible/interrupts/clocks? I am missing a
>> GXM patch in this series as rationale for doing it this way.
>> 
>> In particular I am wondering whether the whole GXM-inherits-from-GXL
>> concept is flawed and should be adjusted if this leads to secondary
>> .dtsi files like this: My proposal would be to instead create a
>> meson-gxl-gxm.dtsi, that meson-gxl.dtsi and meson-gxm.dtsi can inherit
>> the current common parts from, then the Mali bits can simply go into
>> meson-gxl.dtsi without extra #includes needed in S905X and S905D. While
>> it's slightly more work to split once again, I think it would be cleaner.
>
> The GXL and GXM differences are very small :
>  - They share the same clock tree
>  - They share the same pinctrl and even the same pinout (S905D and S912 are 
> pin-to-pin compatible)
>  - They share all the peripherals
>
> The only changes are :
>  - Enhanced video encoding and decoding support, this will need a 
> family-specific compatible when pushed
>  - Slightly differences in the Video Processing Unit, this is why I 
> introduced family-specific compatibles
>  - A secondary Cortex-A53 cluster
>  - A secondary SCPI cpufreq clock entry
>  - A different Mali core, but with the same interrupts (less but they share 
> the same lower interrupts), clocks and memory space
>
> This is why it was decided to have a sub-dtsi, having a secondary dtsi will 
> simply copy 99% of the GXL dtsi,
> but surely we could also have an intermediate dtsi but for boards I'm ok with 
> it, but less for a SoC dtsi,
> since it could lead to some confusion.
>
> Finally, yes I could have added the mali node to the GXL dtsi, but the 
> midgard Mali dt-bindings are not upstream
> and the family is too big and recent enough to consider having stable 
> bindings for now.
>
> Nevertheless, nothing is final, this gxl-mali.dtsi could be merged into the 
> GXL dtsi in the future when we
> have proper dt-bindings and a real support of the T820 Mali on the S912.
>
> Kevin, what's your thought about this ?

I don't have a strong preference.  I'm OK with a separate Mali .dtsi due
to the signficant overlap between GXL/GXM in terms of clocks, interrupts
etc.

However, if the plan is to #include this from GXM .dts files, whould a
better name be meson-gx-mali.dtsi?

Kevin

Re: [PATCH v2 3/3] ARM64: dts: meson-gx: Add MALI nodes for GXBB and GXL

2017-03-03 Thread Kevin Hilman

Neil Armstrong  writes:

> Hi Andreas,
> On 03/02/2017 01:31 PM, Andreas Färber wrote:
>> Hi Neil,
>> 
>> Am 01.03.2017 um 11:46 schrieb Neil Armstrong:
>>> The same MALI-450 MP3 GPU is present in the GXBB and GXL SoCs.
>> 
>> First of all, any reason you're upper-casing Mali in the commit message?
>> ARM doesn't.
>
> No reason, only a type, indeed it was lower-casing on the v1.
> Will fix in v2.
>
>> 
>>>
>>> The node is simply added in the meson-gxbb.dtsi file.
>> 
>> The GXBB part looks fine on a quick look.
>> 
>>>
>>> For GXL, since a lot is shared with the GXM that has a MALI-T820 IP, this
>>> patch adds a new meson-gxl-mali.dtsi and is included in the SoC specific
>>> dtsi files.
>> 
>> This part is slightly confusing though.
>> 
>> What exactly is the GXL vs. GXM difference that this can't be handled by
>> overriding node properties compatible/interrupts/clocks? I am missing a
>> GXM patch in this series as rationale for doing it this way.
>> 
>> In particular I am wondering whether the whole GXM-inherits-from-GXL
>> concept is flawed and should be adjusted if this leads to secondary
>> .dtsi files like this: My proposal would be to instead create a
>> meson-gxl-gxm.dtsi, that meson-gxl.dtsi and meson-gxm.dtsi can inherit
>> the current common parts from, then the Mali bits can simply go into
>> meson-gxl.dtsi without extra #includes needed in S905X and S905D. While
>> it's slightly more work to split once again, I think it would be cleaner.
>
> The GXL and GXM differences are very small :
>  - They share the same clock tree
>  - They share the same pinctrl and even the same pinout (S905D and S912 are 
> pin-to-pin compatible)
>  - They share all the peripherals
>
> The only changes are :
>  - Enhanced video encoding and decoding support, this will need a 
> family-specific compatible when pushed
>  - Slightly differences in the Video Processing Unit, this is why I 
> introduced family-specific compatibles
>  - A secondary Cortex-A53 cluster
>  - A secondary SCPI cpufreq clock entry
>  - A different Mali core, but with the same interrupts (less but they share 
> the same lower interrupts), clocks and memory space
>
> This is why it was decided to have a sub-dtsi, having a secondary dtsi will 
> simply copy 99% of the GXL dtsi,
> but surely we could also have an intermediate dtsi but for boards I'm ok with 
> it, but less for a SoC dtsi,
> since it could lead to some confusion.
>
> Finally, yes I could have added the mali node to the GXL dtsi, but the 
> midgard Mali dt-bindings are not upstream
> and the family is too big and recent enough to consider having stable 
> bindings for now.
>
> Nevertheless, nothing is final, this gxl-mali.dtsi could be merged into the 
> GXL dtsi in the future when we
> have proper dt-bindings and a real support of the T820 Mali on the S912.
>
> Kevin, what's your thought about this ?

I don't have a strong preference.  I'm OK with a separate Mali .dtsi due
to the signficant overlap between GXL/GXM in terms of clocks, interrupts
etc.

However, if the plan is to #include this from GXM .dts files, whould a
better name be meson-gx-mali.dtsi?

Kevin

[bug] regulator: fixed, gpio: probe fails on unset regulator-name

2017-03-03 Thread Harald Geyer


Hi!

Documentation/devicetree/bindings/regulator/regulator.txt says that the
regulator-name property is optional. However fixed and gpio regulators
fail in probe with the following message, if the property is not 
present:


| reg-fixed-voltage regulators:sensor_supply: Failed to allocate supply 
name
| reg-fixed-voltage: probe of regulators:sensor_supply failed with 
error -12


This is caused by the following code in both drivers:
drvdata->desc.name = devm_kstrdup(>dev,
  config->supply_name,
  GFP_KERNEL);
if (drvdata->desc.name == NULL) {
dev_err(>dev, "Failed to allocate supply 
name\n");

return -ENOMEM;
}

If config->supply_name == NULL, then devm_kstrdup() also returns NULL.

I don't know whether the binding document or the implementation is 
wrong,

so can't propose a fix for this. Sorry.

TIA,
Harald

--
If you want to support my work:
see http://friends.ccbib.org/harald/supporting/
or donate via peercoin to P98LRdhit3gZbHDBe7ta5jtXrMJUms4p7w
or CLAM xASPBtezLNqj4cUe8MT5nZjthRSEjrRQXN

[bug] regulator: fixed, gpio: probe fails on unset regulator-name

2017-03-03 Thread Harald Geyer


Hi!

Documentation/devicetree/bindings/regulator/regulator.txt says that the
regulator-name property is optional. However fixed and gpio regulators
fail in probe with the following message, if the property is not 
present:


| reg-fixed-voltage regulators:sensor_supply: Failed to allocate supply 
name
| reg-fixed-voltage: probe of regulators:sensor_supply failed with 
error -12


This is caused by the following code in both drivers:
drvdata->desc.name = devm_kstrdup(>dev,
  config->supply_name,
  GFP_KERNEL);
if (drvdata->desc.name == NULL) {
dev_err(>dev, "Failed to allocate supply 
name\n");

return -ENOMEM;
}

If config->supply_name == NULL, then devm_kstrdup() also returns NULL.

I don't know whether the binding document or the implementation is 
wrong,

so can't propose a fix for this. Sorry.

TIA,
Harald

--
If you want to support my work:
see http://friends.ccbib.org/harald/supporting/
or donate via peercoin to P98LRdhit3gZbHDBe7ta5jtXrMJUms4p7w
or CLAM xASPBtezLNqj4cUe8MT5nZjthRSEjrRQXN

[PATCH] ARM: dts: imx6q-bx50v3/imx6q-b450v3/imx6q-b650v3: fix at25 spi-clk frequency issue

2017-03-03 Thread Ken Lin

Change the maxium spi clock frequency from 20MHz to 10MHz to meet the operation 
voltage range requirement recommended in AT25 datasheet

Signed-off-by: Ken Lin 
---
 arch/arm/boot/dts/imx6q-bx50v3.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/imx6q-bx50v3.dtsi 
b/arch/arm/boot/dts/imx6q-bx50v3.dtsi
index f99e6efda3e2..614c40e5f772 100644
--- a/arch/arm/boot/dts/imx6q-bx50v3.dtsi
+++ b/arch/arm/boot/dts/imx6q-bx50v3.dtsi
@@ -103,7 +103,7 @@
 
m25_eeprom: m25p80@0 {
compatible = "atmel,at25";
-   spi-max-frequency = <2000>;
+   spi-max-frequency = <1000>;
size = <0x8000>;
pagesize = <64>;
reg = <0>;
-- 
2.11.0

[PATCH] ARM: dts: imx6q-bx50v3/imx6q-b450v3/imx6q-b650v3: fix at25 spi-clk frequency issue

2017-03-03 Thread Ken Lin

Change the maxium spi clock frequency from 20MHz to 10MHz to meet the operation 
voltage range requirement recommended in AT25 datasheet

Signed-off-by: Ken Lin 
---
 arch/arm/boot/dts/imx6q-bx50v3.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/imx6q-bx50v3.dtsi 
b/arch/arm/boot/dts/imx6q-bx50v3.dtsi
index f99e6efda3e2..614c40e5f772 100644
--- a/arch/arm/boot/dts/imx6q-bx50v3.dtsi
+++ b/arch/arm/boot/dts/imx6q-bx50v3.dtsi
@@ -103,7 +103,7 @@
 
m25_eeprom: m25p80@0 {
compatible = "atmel,at25";
-   spi-max-frequency = <2000>;
+   spi-max-frequency = <1000>;
size = <0x8000>;
pagesize = <64>;
reg = <0>;
-- 
2.11.0

[PATCH] staging: ks7010: clean up code

2017-03-03 Thread Ernestas Kulik

This fixes type warnings generated by sparse, replaces instances of
ntohs() with be16_to_cpu() and removes unused fields in structs.

Signed-off-by: Ernestas Kulik 
---
 drivers/staging/ks7010/ks7010_sdio.c |  12 ++--
 drivers/staging/ks7010/ks_hostif.c   |  30 +
 drivers/staging/ks7010/ks_hostif.h   | 124 +--
 drivers/staging/ks7010/ks_wlan.h |   2 +-
 4 files changed, 85 insertions(+), 83 deletions(-)

diff --git a/drivers/staging/ks7010/ks7010_sdio.c 
b/drivers/staging/ks7010/ks7010_sdio.c
index 6f9f746a3a61..8e644ff8eca8 100644
--- a/drivers/staging/ks7010/ks7010_sdio.c
+++ b/drivers/staging/ks7010/ks7010_sdio.c
@@ -297,7 +297,8 @@ static int write_to_device(struct ks_wlan_private *priv, 
unsigned char *buffer,
hdr = (struct hostif_hdr *)buffer;
 
DPRINTK(4, "size=%d\n", hdr->size);
-   if (hdr->event < HIF_DATA_REQ || HIF_REQ_MAX < hdr->event) {
+   if (le16_to_cpu(hdr->event) < HIF_DATA_REQ ||
+   HIF_REQ_MAX < le16_to_cpu(hdr->event)) {
DPRINTK(1, "unknown event=%04X\n", hdr->event);
return 0;
}
@@ -361,13 +362,14 @@ int ks_wlan_hw_tx(struct ks_wlan_private *priv, void *p, 
unsigned long size,
 
hdr = (struct hostif_hdr *)p;
 
-   if (hdr->event < HIF_DATA_REQ || HIF_REQ_MAX < hdr->event) {
+   if (le16_to_cpu(hdr->event) < HIF_DATA_REQ ||
+   HIF_REQ_MAX < le16_to_cpu(hdr->event)) {
DPRINTK(1, "unknown event=%04X\n", hdr->event);
return 0;
}
 
/* add event to hostt buffer */
-   priv->hostt.buff[priv->hostt.qtail] = hdr->event;
+   priv->hostt.buff[priv->hostt.qtail] = le16_to_cpu(hdr->event);
priv->hostt.qtail = (priv->hostt.qtail + 1) % SME_EVENT_BUFF_SIZE;
 
DPRINTK(4, "event=%04X\n", hdr->event);
@@ -406,7 +408,7 @@ static void ks_wlan_hw_rx(void *dev, uint16_t size)
struct rx_device_buffer *rx_buffer;
struct hostif_hdr *hdr;
unsigned char read_status;
-   unsigned short event = 0;
+   __le16 event = 0;
 
DPRINTK(4, "\n");
 
@@ -459,7 +461,7 @@ static void ks_wlan_hw_rx(void *dev, uint16_t size)
DPRINTK(4, "READ_STATUS=%02X\n", read_status);
 
if (atomic_read(>psstatus.confirm_wait)) {
-   if (IS_HIF_CONF(event)) {
+   if (IS_HIF_CONF(le16_to_cpu(event))) {
DPRINTK(4, "IS_HIF_CONF true !!\n");
atomic_dec(>psstatus.confirm_wait);
}
diff --git a/drivers/staging/ks7010/ks_hostif.c 
b/drivers/staging/ks7010/ks_hostif.c
index da7c42ef05f5..4ad4a0c72ca8 100644
--- a/drivers/staging/ks7010/ks_hostif.c
+++ b/drivers/staging/ks7010/ks_hostif.c
@@ -339,7 +339,7 @@ void hostif_data_indication(struct ks_wlan_private *priv)
get_WORD(priv); /* Reserve Area */
 
eth_hdr = (struct ether_hdr *)(priv->rxp);
-   eth_proto = ntohs(eth_hdr->h_proto);
+   eth_proto = be16_to_cpu((__force __be16)eth_hdr->h_proto);
DPRINTK(3, "ether protocol = %04X\n", eth_proto);
 
/* source address check */
@@ -1200,7 +1200,7 @@ int hostif_data_request(struct ks_wlan_private *priv, 
struct sk_buff *packet)
 
/* for WPA */
eth_hdr = (struct ether_hdr *)>data[0];
-   eth_proto = ntohs(eth_hdr->h_proto);
+   eth_proto = be16_to_cpu((__force __be16)eth_hdr->h_proto);
 
/* for MIC FAILURE REPORT check */
if (eth_proto == ETHER_PROTOCOL_TYPE_EAP
@@ -1208,7 +1208,7 @@ int hostif_data_request(struct ks_wlan_private *priv, 
struct sk_buff *packet)
aa1x_hdr = (struct ieee802_1x_hdr *)(eth_hdr + 1);
if (aa1x_hdr->type == IEEE802_1X_TYPE_EAPOL_KEY) {
eap_key = (struct wpa_eapol_key *)(aa1x_hdr + 1);
-   keyinfo = ntohs(eap_key->key_info);
+   keyinfo = be16_to_cpu((__force 
__be16)eap_key->key_info);
}
}
 
@@ -1867,7 +1867,7 @@ void hostif_receive(struct ks_wlan_private *priv, 
unsigned char *p,
 static
 void hostif_sme_set_wep(struct ks_wlan_private *priv, int type)
 {
-   uint32_t val;
+   __le32 val;
 
switch (type) {
case SME_WEP_INDEX_REQUEST:
@@ -1916,13 +1916,13 @@ void hostif_sme_set_wep(struct ks_wlan_private *priv, 
int type)
 }
 
 struct wpa_suite_t {
-   unsigned short size;
+   __le16 size;
unsigned char suite[4][CIPHER_ID_LEN];
 } __packed;
 
 struct rsn_mode_t {
-   uint32_t rsn_mode;
-   uint16_t rsn_capability;
+   __le32 rsn_mode;
+   __le16 rsn_capability;
 } __packed;
 
 static
@@ -1930,7 +1930,7 @@ void hostif_sme_set_rsn(struct ks_wlan_private *priv, int 
type)
 {
struct wpa_suite_t wpa_suite;
struct rsn_mode_t rsn_mode;
-   uint32_t val;
+   __le32 val;
 
memset(_suite, 0, sizeof(wpa_suite));
 
@@ -1982,7 +1982,8 @@ void hostif_sme_set_rsn(struct ks_wlan_private

[PATCH] staging: ks7010: clean up code

2017-03-03 Thread Ernestas Kulik

This fixes type warnings generated by sparse, replaces instances of
ntohs() with be16_to_cpu() and removes unused fields in structs.

Signed-off-by: Ernestas Kulik 
---
 drivers/staging/ks7010/ks7010_sdio.c |  12 ++--
 drivers/staging/ks7010/ks_hostif.c   |  30 +
 drivers/staging/ks7010/ks_hostif.h   | 124 +--
 drivers/staging/ks7010/ks_wlan.h |   2 +-
 4 files changed, 85 insertions(+), 83 deletions(-)

diff --git a/drivers/staging/ks7010/ks7010_sdio.c 
b/drivers/staging/ks7010/ks7010_sdio.c
index 6f9f746a3a61..8e644ff8eca8 100644
--- a/drivers/staging/ks7010/ks7010_sdio.c
+++ b/drivers/staging/ks7010/ks7010_sdio.c
@@ -297,7 +297,8 @@ static int write_to_device(struct ks_wlan_private *priv, 
unsigned char *buffer,
hdr = (struct hostif_hdr *)buffer;
 
DPRINTK(4, "size=%d\n", hdr->size);
-   if (hdr->event < HIF_DATA_REQ || HIF_REQ_MAX < hdr->event) {
+   if (le16_to_cpu(hdr->event) < HIF_DATA_REQ ||
+   HIF_REQ_MAX < le16_to_cpu(hdr->event)) {
DPRINTK(1, "unknown event=%04X\n", hdr->event);
return 0;
}
@@ -361,13 +362,14 @@ int ks_wlan_hw_tx(struct ks_wlan_private *priv, void *p, 
unsigned long size,
 
hdr = (struct hostif_hdr *)p;
 
-   if (hdr->event < HIF_DATA_REQ || HIF_REQ_MAX < hdr->event) {
+   if (le16_to_cpu(hdr->event) < HIF_DATA_REQ ||
+   HIF_REQ_MAX < le16_to_cpu(hdr->event)) {
DPRINTK(1, "unknown event=%04X\n", hdr->event);
return 0;
}
 
/* add event to hostt buffer */
-   priv->hostt.buff[priv->hostt.qtail] = hdr->event;
+   priv->hostt.buff[priv->hostt.qtail] = le16_to_cpu(hdr->event);
priv->hostt.qtail = (priv->hostt.qtail + 1) % SME_EVENT_BUFF_SIZE;
 
DPRINTK(4, "event=%04X\n", hdr->event);
@@ -406,7 +408,7 @@ static void ks_wlan_hw_rx(void *dev, uint16_t size)
struct rx_device_buffer *rx_buffer;
struct hostif_hdr *hdr;
unsigned char read_status;
-   unsigned short event = 0;
+   __le16 event = 0;
 
DPRINTK(4, "\n");
 
@@ -459,7 +461,7 @@ static void ks_wlan_hw_rx(void *dev, uint16_t size)
DPRINTK(4, "READ_STATUS=%02X\n", read_status);
 
if (atomic_read(>psstatus.confirm_wait)) {
-   if (IS_HIF_CONF(event)) {
+   if (IS_HIF_CONF(le16_to_cpu(event))) {
DPRINTK(4, "IS_HIF_CONF true !!\n");
atomic_dec(>psstatus.confirm_wait);
}
diff --git a/drivers/staging/ks7010/ks_hostif.c 
b/drivers/staging/ks7010/ks_hostif.c
index da7c42ef05f5..4ad4a0c72ca8 100644
--- a/drivers/staging/ks7010/ks_hostif.c
+++ b/drivers/staging/ks7010/ks_hostif.c
@@ -339,7 +339,7 @@ void hostif_data_indication(struct ks_wlan_private *priv)
get_WORD(priv); /* Reserve Area */
 
eth_hdr = (struct ether_hdr *)(priv->rxp);
-   eth_proto = ntohs(eth_hdr->h_proto);
+   eth_proto = be16_to_cpu((__force __be16)eth_hdr->h_proto);
DPRINTK(3, "ether protocol = %04X\n", eth_proto);
 
/* source address check */
@@ -1200,7 +1200,7 @@ int hostif_data_request(struct ks_wlan_private *priv, 
struct sk_buff *packet)
 
/* for WPA */
eth_hdr = (struct ether_hdr *)>data[0];
-   eth_proto = ntohs(eth_hdr->h_proto);
+   eth_proto = be16_to_cpu((__force __be16)eth_hdr->h_proto);
 
/* for MIC FAILURE REPORT check */
if (eth_proto == ETHER_PROTOCOL_TYPE_EAP
@@ -1208,7 +1208,7 @@ int hostif_data_request(struct ks_wlan_private *priv, 
struct sk_buff *packet)
aa1x_hdr = (struct ieee802_1x_hdr *)(eth_hdr + 1);
if (aa1x_hdr->type == IEEE802_1X_TYPE_EAPOL_KEY) {
eap_key = (struct wpa_eapol_key *)(aa1x_hdr + 1);
-   keyinfo = ntohs(eap_key->key_info);
+   keyinfo = be16_to_cpu((__force 
__be16)eap_key->key_info);
}
}
 
@@ -1867,7 +1867,7 @@ void hostif_receive(struct ks_wlan_private *priv, 
unsigned char *p,
 static
 void hostif_sme_set_wep(struct ks_wlan_private *priv, int type)
 {
-   uint32_t val;
+   __le32 val;
 
switch (type) {
case SME_WEP_INDEX_REQUEST:
@@ -1916,13 +1916,13 @@ void hostif_sme_set_wep(struct ks_wlan_private *priv, 
int type)
 }
 
 struct wpa_suite_t {
-   unsigned short size;
+   __le16 size;
unsigned char suite[4][CIPHER_ID_LEN];
 } __packed;
 
 struct rsn_mode_t {
-   uint32_t rsn_mode;
-   uint16_t rsn_capability;
+   __le32 rsn_mode;
+   __le16 rsn_capability;
 } __packed;
 
 static
@@ -1930,7 +1930,7 @@ void hostif_sme_set_rsn(struct ks_wlan_private *priv, int 
type)
 {
struct wpa_suite_t wpa_suite;
struct rsn_mode_t rsn_mode;
-   uint32_t val;
+   __le32 val;
 
memset(_suite, 0, sizeof(wpa_suite));
 
@@ -1982,7 +1982,8 @@ void hostif_sme_set_rsn(struct ks_wlan_private *priv, int 
type)

Re: [RFC PATCH v2 01/32] x86: Add the Secure Encrypted Virtualization CPU feature

2017-03-03 Thread Brijesh Singh


Hi Boris,

On 03/03/2017 10:59 AM, Borislav Petkov wrote:

On Thu, Mar 02, 2017 at 10:12:09AM -0500, Brijesh Singh wrote:

From: Tom Lendacky 

Update the CPU features to include identifying and reporting on the
Secure Encrypted Virtualization (SEV) feature.  SME is identified by
CPUID 0x801f, but requires BIOS support to enable it (set bit 23 of
MSR_K8_SYSCFG and set bit 0 of MSR_K7_HWCR).  Only show the SEV feature
as available if reported by CPUID and enabled by BIOS.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/cpufeatures.h |1 +
 arch/x86/include/asm/msr-index.h   |2 ++
 arch/x86/kernel/cpu/amd.c  |   22 ++
 arch/x86/kernel/cpu/scattered.c|1 +
 4 files changed, 22 insertions(+), 4 deletions(-)


So this patchset is not really ontop of Tom's patchset because this
patch doesn't apply. The reason is, Tom did the SME bit this way:

https://lkml.kernel.org/r/20170216154236.19244.7580.st...@tlendack-t1.amdoffice.net

but it should've been in scattered.c.


diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index cabda87..c3f58d9 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -31,6 +31,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_CPB,  CPUID_EDX,  9, 0x8007, 0 },
{ X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 },
{ X86_FEATURE_SME,  CPUID_EAX,  0, 0x801f, 0 },
+   { X86_FEATURE_SEV,  CPUID_EAX,  1, 0x801f, 0 },
{ 0, 0, 0, 0, 0 }


... and here it is in scattered.c, as it should be. So you've used an
older version of the patch, it seems.

Please sync with Tom to see whether he's reworked the v4 version of that
patch already. If yes, then you could send only the SME and SEV adding
patches as a reply to this message so that I can continue reviewing in
the meantime.



Just realized my error, I actually end up using Tom's recent updates to 
v4 instead of original v4. Here is the diff. If you have Tom's v4 
applied then apply this diff before applying SEV v2 version. Sorry about 
that.


Optionally, you also pull the complete tree from github [1].

[1] https://github.com/codomania/tip/tree/sev-rfc-v2


diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt

index 91c40fa..b91e2495 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2153,8 +2153,8 @@
mem_encrypt=on: Activate SME
mem_encrypt=off:Do not activate SME

-   Refer to the SME documentation for details on when
-   memory encryption can be activated.
+   Refer to Documentation/x86/amd-memory-encryption.txt
+   for details on when memory encryption can be activated.

mem_sleep_default=  [SUSPEND] Default system suspend mode:
s2idle  - Suspend-To-Idle
diff --git a/Documentation/x86/amd-memory-encryption.txt 
b/Documentation/x86/amd-memory-encryption.txt

index 0938e89..0b72ff2 100644
--- a/Documentation/x86/amd-memory-encryption.txt
+++ b/Documentation/x86/amd-memory-encryption.txt
@@ -7,9 +7,9 @@ DRAM.  SME can therefore be used to protect the contents 
of DRAM from physical

 attacks on the system.

 A page is encrypted when a page table entry has the encryption bit set 
(see

-below how to determine the position of the bit).  The encryption bit can be
-specified in the cr3 register, allowing the PGD table to be encrypted. Each
-successive level of page tables can also be encrypted.
+below on how to determine its position).  The encryption bit can be 
specified
+in the cr3 register, allowing the PGD table to be encrypted. Each 
successive

+level of page tables can also be encrypted.

 Support for SME can be determined through the CPUID instruction. The CPUID
 function 0x801f reports information related to SME:
@@ -17,13 +17,14 @@ function 0x801f reports information related to SME:
0x801f[eax]:
Bit[0] indicates support for SME
0x801f[ebx]:
-   Bit[5:0]  pagetable bit number used to activate memory
- encryption
-   Bit[11:6] reduction in physical address space, in bits, when
- memory encryption is enabled (this only affects system
- physical addresses, not guest physical addresses)
-
-If support for SME is present, MSR 0xc00100010 (SYS_CFG) can be used to
+   Bits[5:0]  pagetable bit number used to activate memory
+  encryption
+   Bits[11:6] reduction in physical address space, in bits, when
+  memory encryption is enabled (this only affects
+

Re: [RFC PATCH v2 01/32] x86: Add the Secure Encrypted Virtualization CPU feature

2017-03-03 Thread Brijesh Singh


Hi Boris,

On 03/03/2017 10:59 AM, Borislav Petkov wrote:

On Thu, Mar 02, 2017 at 10:12:09AM -0500, Brijesh Singh wrote:

From: Tom Lendacky 

Update the CPU features to include identifying and reporting on the
Secure Encrypted Virtualization (SEV) feature.  SME is identified by
CPUID 0x801f, but requires BIOS support to enable it (set bit 23 of
MSR_K8_SYSCFG and set bit 0 of MSR_K7_HWCR).  Only show the SEV feature
as available if reported by CPUID and enabled by BIOS.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/cpufeatures.h |1 +
 arch/x86/include/asm/msr-index.h   |2 ++
 arch/x86/kernel/cpu/amd.c  |   22 ++
 arch/x86/kernel/cpu/scattered.c|1 +
 4 files changed, 22 insertions(+), 4 deletions(-)


So this patchset is not really ontop of Tom's patchset because this
patch doesn't apply. The reason is, Tom did the SME bit this way:

https://lkml.kernel.org/r/20170216154236.19244.7580.st...@tlendack-t1.amdoffice.net

but it should've been in scattered.c.


diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index cabda87..c3f58d9 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -31,6 +31,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_CPB,  CPUID_EDX,  9, 0x8007, 0 },
{ X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 },
{ X86_FEATURE_SME,  CPUID_EAX,  0, 0x801f, 0 },
+   { X86_FEATURE_SEV,  CPUID_EAX,  1, 0x801f, 0 },
{ 0, 0, 0, 0, 0 }


... and here it is in scattered.c, as it should be. So you've used an
older version of the patch, it seems.

Please sync with Tom to see whether he's reworked the v4 version of that
patch already. If yes, then you could send only the SME and SEV adding
patches as a reply to this message so that I can continue reviewing in
the meantime.



Just realized my error, I actually end up using Tom's recent updates to 
v4 instead of original v4. Here is the diff. If you have Tom's v4 
applied then apply this diff before applying SEV v2 version. Sorry about 
that.


Optionally, you also pull the complete tree from github [1].

[1] https://github.com/codomania/tip/tree/sev-rfc-v2


diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt

index 91c40fa..b91e2495 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2153,8 +2153,8 @@
mem_encrypt=on: Activate SME
mem_encrypt=off:Do not activate SME

-   Refer to the SME documentation for details on when
-   memory encryption can be activated.
+   Refer to Documentation/x86/amd-memory-encryption.txt
+   for details on when memory encryption can be activated.

mem_sleep_default=  [SUSPEND] Default system suspend mode:
s2idle  - Suspend-To-Idle
diff --git a/Documentation/x86/amd-memory-encryption.txt 
b/Documentation/x86/amd-memory-encryption.txt

index 0938e89..0b72ff2 100644
--- a/Documentation/x86/amd-memory-encryption.txt
+++ b/Documentation/x86/amd-memory-encryption.txt
@@ -7,9 +7,9 @@ DRAM.  SME can therefore be used to protect the contents 
of DRAM from physical

 attacks on the system.

 A page is encrypted when a page table entry has the encryption bit set 
(see

-below how to determine the position of the bit).  The encryption bit can be
-specified in the cr3 register, allowing the PGD table to be encrypted. Each
-successive level of page tables can also be encrypted.
+below on how to determine its position).  The encryption bit can be 
specified
+in the cr3 register, allowing the PGD table to be encrypted. Each 
successive

+level of page tables can also be encrypted.

 Support for SME can be determined through the CPUID instruction. The CPUID
 function 0x801f reports information related to SME:
@@ -17,13 +17,14 @@ function 0x801f reports information related to SME:
0x801f[eax]:
Bit[0] indicates support for SME
0x801f[ebx]:
-   Bit[5:0]  pagetable bit number used to activate memory
- encryption
-   Bit[11:6] reduction in physical address space, in bits, when
- memory encryption is enabled (this only affects system
- physical addresses, not guest physical addresses)
-
-If support for SME is present, MSR 0xc00100010 (SYS_CFG) can be used to
+   Bits[5:0]  pagetable bit number used to activate memory
+  encryption
+   Bits[11:6] reduction in physical address space, in bits, when
+  memory encryption is enabled (this only affects
+  system physical addresses, not guest

Re: [RFC PATCH v2 00/32] x86: Secure Encrypted Virtualization (AMD)

2017-03-03 Thread Borislav Petkov

On Fri, Mar 03, 2017 at 02:33:23PM -0600, Bjorn Helgaas wrote:
> On Thu, Mar 02, 2017 at 10:12:01AM -0500, Brijesh Singh wrote:
> > This RFC series provides support for AMD's new Secure Encrypted 
> > Virtualization
> > (SEV) feature. This RFC is build upon Secure Memory Encryption (SME) RFCv4 
> > [1].
> 
> What kernel version is this series based on?

Yeah, see that mail in [1]:

https://lkml.kernel.org/r/20170216154158.19244.66630.st...@tlendack-t1.amdoffice.net

"This patch series is based off of the master branch of tip.
  Commit a27cb9e1b2b4 ("Merge branch 'WIP.sched/core'")"

$ git describe a27cb9e1b2b4
v4.10-rc7-681-ga27cb9e1b2b4

So you need the SME pile first and then that SVE pile. But the first
patch needs refreshing as it is using a different base than the SME
pile. :-)

Tom, Brijesh, perhaps you guys could push a full tree somewhere - github
or so - for people to pull, in addition to the patchset on lkml.

Thanks.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [RFC PATCH v2 00/32] x86: Secure Encrypted Virtualization (AMD)

2017-03-03 Thread Borislav Petkov

On Fri, Mar 03, 2017 at 02:33:23PM -0600, Bjorn Helgaas wrote:
> On Thu, Mar 02, 2017 at 10:12:01AM -0500, Brijesh Singh wrote:
> > This RFC series provides support for AMD's new Secure Encrypted 
> > Virtualization
> > (SEV) feature. This RFC is build upon Secure Memory Encryption (SME) RFCv4 
> > [1].
> 
> What kernel version is this series based on?

Yeah, see that mail in [1]:

https://lkml.kernel.org/r/20170216154158.19244.66630.st...@tlendack-t1.amdoffice.net

"This patch series is based off of the master branch of tip.
  Commit a27cb9e1b2b4 ("Merge branch 'WIP.sched/core'")"

$ git describe a27cb9e1b2b4
v4.10-rc7-681-ga27cb9e1b2b4

So you need the SME pile first and then that SVE pile. But the first
patch needs refreshing as it is using a different base than the SME
pile. :-)

Tom, Brijesh, perhaps you guys could push a full tree somewhere - github
or so - for people to pull, in addition to the patchset on lkml.

Thanks.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [RFC] arm64: support HAVE_ARCH_RARE_WRITE

2017-03-03 Thread Andy Lutomirski

On Thu, Mar 2, 2017 at 7:00 AM, Hoeun Ryu  wrote:
> +unsigned long __rare_write_rw_alias_start = TASK_SIZE_64 / 4;
> +
> +__always_inline unsigned long __arch_rare_write_map(void)
> +{
> +   struct mm_struct *mm = _write_mm;
> +
> +   preempt_disable();
> +
> +   __switch_mm(mm);

...

> +__always_inline unsigned long __arch_rare_write_unmap(void)
> +{
> +   struct mm_struct *mm = current->active_mm;
> +
> +   __switch_mm(mm);
> +

This reminds me: this code imposes constraints on the context in which
it's called.  I'd advise making it very explicit, asserting
correctness, and putting the onus on the caller to set things up.  For
example:

DEBUG_LOCKS_WARN_ON(preemptible() || in_interrupt() || in_nmi());

in both the map and unmap functions, along with getting rid of the
preempt_disable().  I don't think we want the preempt-disabledness to
depend on the arch.  The generic non-arch rare_write helpers can do
the preempt_disable().

This code also won't work if the mm is wacky when called.  On x86, we could do:

DEBUG_LOCKS_WARN_ON(read_cr3() != current->active_mm->pgd);

or similar (since that surely doesn't compile as is).

--Andy

Re: [RFC] arm64: support HAVE_ARCH_RARE_WRITE

2017-03-03 Thread Andy Lutomirski

On Thu, Mar 2, 2017 at 7:00 AM, Hoeun Ryu  wrote:
> +unsigned long __rare_write_rw_alias_start = TASK_SIZE_64 / 4;
> +
> +__always_inline unsigned long __arch_rare_write_map(void)
> +{
> +   struct mm_struct *mm = _write_mm;
> +
> +   preempt_disable();
> +
> +   __switch_mm(mm);

...

> +__always_inline unsigned long __arch_rare_write_unmap(void)
> +{
> +   struct mm_struct *mm = current->active_mm;
> +
> +   __switch_mm(mm);
> +

This reminds me: this code imposes constraints on the context in which
it's called.  I'd advise making it very explicit, asserting
correctness, and putting the onus on the caller to set things up.  For
example:

DEBUG_LOCKS_WARN_ON(preemptible() || in_interrupt() || in_nmi());

in both the map and unmap functions, along with getting rid of the
preempt_disable().  I don't think we want the preempt-disabledness to
depend on the arch.  The generic non-arch rare_write helpers can do
the preempt_disable().

This code also won't work if the mm is wacky when called.  On x86, we could do:

DEBUG_LOCKS_WARN_ON(read_cr3() != current->active_mm->pgd);

or similar (since that surely doesn't compile as is).

--Andy

Re: [RFC PATCH v2 06/32] x86/pci: Use memremap when walking setup data

2017-03-03 Thread Bjorn Helgaas

On Thu, Mar 02, 2017 at 10:13:10AM -0500, Brijesh Singh wrote:
> From: Tom Lendacky 
> 
> The use of ioremap will force the setup data to be mapped decrypted even
> though setup data is encrypted.  Switch to using memremap which will be
> able to perform the proper mapping.

How should callers decide whether to use ioremap() or memremap()?

memremap() existed before SME and SEV, and this code is used even if
SME and SEV aren't supported, so the rationale for this change should
not need the decryption argument.

> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/pci/common.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index a4fdfa7..0b06670 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -691,7 +691,7 @@ int pcibios_add_device(struct pci_dev *dev)
>  
>   pa_data = boot_params.hdr.setup_data;
>   while (pa_data) {
> - data = ioremap(pa_data, sizeof(*rom));
> + data = memremap(pa_data, sizeof(*rom), MEMREMAP_WB);

I can't quite connect the dots here.  ioremap() on x86 would do
ioremap_nocache().  memremap(MEMREMAP_WB) would do arch_memremap_wb(),
which is ioremap_cache().  Is making a cacheable mapping the important
difference?

>   if (!data)
>   return -ENOMEM;
>  
> @@ -710,7 +710,7 @@ int pcibios_add_device(struct pci_dev *dev)
>   }
>   }
>   pa_data = data->next;
> - iounmap(data);
> + memunmap(data);
>   }
>   set_dma_domain_ops(dev);
>   set_dev_domain_options(dev);
>

Re: [RFC PATCH v2 06/32] x86/pci: Use memremap when walking setup data

2017-03-03 Thread Bjorn Helgaas

On Thu, Mar 02, 2017 at 10:13:10AM -0500, Brijesh Singh wrote:
> From: Tom Lendacky 
> 
> The use of ioremap will force the setup data to be mapped decrypted even
> though setup data is encrypted.  Switch to using memremap which will be
> able to perform the proper mapping.

How should callers decide whether to use ioremap() or memremap()?

memremap() existed before SME and SEV, and this code is used even if
SME and SEV aren't supported, so the rationale for this change should
not need the decryption argument.

> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/pci/common.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index a4fdfa7..0b06670 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -691,7 +691,7 @@ int pcibios_add_device(struct pci_dev *dev)
>  
>   pa_data = boot_params.hdr.setup_data;
>   while (pa_data) {
> - data = ioremap(pa_data, sizeof(*rom));
> + data = memremap(pa_data, sizeof(*rom), MEMREMAP_WB);

I can't quite connect the dots here.  ioremap() on x86 would do
ioremap_nocache().  memremap(MEMREMAP_WB) would do arch_memremap_wb(),
which is ioremap_cache().  Is making a cacheable mapping the important
difference?

>   if (!data)
>   return -ENOMEM;
>  
> @@ -710,7 +710,7 @@ int pcibios_add_device(struct pci_dev *dev)
>   }
>   }
>   pa_data = data->next;
> - iounmap(data);
> + memunmap(data);
>   }
>   set_dma_domain_ops(dev);
>   set_dev_domain_options(dev);
>

Re: [PATCH] dt-bindings: display: rk3288-mipi-dsi: add reset property

2017-03-03 Thread Brian Norris

On Fri, Mar 03, 2017 at 11:39:45AM +, John Keeping wrote:
> This reset is required in order to fully reset the internal state of the
> MIPI controller.
> 
> Signed-off-by: John Keeping 
> ---
> On Thu, 2 Mar 2017 13:56:46 -0800, Brian Norris wrote:
> > On Fri, Feb 24, 2017 at 12:55:06PM +, John Keeping wrote:
> > > + /*
> > > +  * Note that the reset was not defined in the initial device tree, so
> > > +  * we have to be prepared for it not being found.
> > > +  */
> > > + apb_rst = devm_reset_control_get(dev, "apb");  
> > 
> > Did this reset ever get documented in the device tree bindings? I
> > couldn't find it. Perhaps a follow-up patch is in order?
> 
> Here's a patch to do that.

FWIW:

Reviewed-by: Brian Norris 

Thanks.

Re: [PATCH] dt-bindings: display: rk3288-mipi-dsi: add reset property

2017-03-03 Thread Brian Norris

On Fri, Mar 03, 2017 at 11:39:45AM +, John Keeping wrote:
> This reset is required in order to fully reset the internal state of the
> MIPI controller.
> 
> Signed-off-by: John Keeping 
> ---
> On Thu, 2 Mar 2017 13:56:46 -0800, Brian Norris wrote:
> > On Fri, Feb 24, 2017 at 12:55:06PM +, John Keeping wrote:
> > > + /*
> > > +  * Note that the reset was not defined in the initial device tree, so
> > > +  * we have to be prepared for it not being found.
> > > +  */
> > > + apb_rst = devm_reset_control_get(dev, "apb");  
> > 
> > Did this reset ever get documented in the device tree bindings? I
> > couldn't find it. Perhaps a follow-up patch is in order?
> 
> Here's a patch to do that.

FWIW:

Reviewed-by: Brian Norris 

Thanks.

[PATCH 05/10] Replaced pr_err by dev_err. Modified debug message

2017-03-03 Thread Ryan Lee

Signed-off-by: Ryan Lee 
---
Replaced 'pr_err' by 'dev_err'. Modified error message.

 sound/soc/codecs/max98927.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/sound/soc/codecs/max98927.c b/sound/soc/codecs/max98927.c
index efc761b..0abf6d3 100755
--- a/sound/soc/codecs/max98927.c
+++ b/sound/soc/codecs/max98927.c
@@ -254,6 +254,7 @@ static const int rate_table[] = {
 static int max98927_set_clock(struct max98927_priv *max98927,
struct snd_pcm_hw_params *params)
 {
+   struct snd_soc_codec *codec = max98927->codec;
/* BCLK/LRCLK ratio calculation */
int blr_clk_ratio = params_channels(params) * max98927->ch_size;
int reg = MAX98927_R0022_PCM_CLK_SETUP;
@@ -268,8 +269,7 @@ static int max98927_set_clock(struct max98927_priv 
*max98927,
break;
}
if (i == ARRAY_SIZE(rate_table)) {
-   pr_err("%s couldn't get the MCLK to match codec\n",
-   __func__);
+   dev_err(codec->dev, "failed to find proper clock 
rate.\n");
return -EINVAL;
}
regmap_update_bits(max98927->regmap,
@@ -327,12 +327,12 @@ static int max98927_dai_hw_params(struct 
snd_pcm_substream *substream,
max98927->ch_size = 32;
break;
default:
-   pr_err("%s: format unsupported %d",
-   __func__, params_format(params));
+   dev_err(codec->dev, "format unsupported %d",
+   params_format(params));
goto err;
}
-   dev_dbg(codec->dev, "%s: format supported %d",
-   __func__, params_format(params));
+   dev_dbg(codec->dev, "format supported %d",
+   params_format(params));
 
/* sampling rate configuration */
switch (params_rate(params)) {
@@ -364,8 +364,8 @@ static int max98927_dai_hw_params(struct snd_pcm_substream 
*substream,
sampling_rate |= MAX98927_PCM_SR_SET1_SR_48000;
break;
default:
-   pr_err("%s rate %d not supported\n",
-   __func__, params_rate(params));
+   dev_err(codec->dev, "rate %d not supported\n",
+   params_rate(params));
goto err;
}
/* set DAI_SR to correct LRCLK frequency */
@@ -490,7 +490,7 @@ static int max98927_digital_gain_get(struct snd_kcontrol 
*kcontrol,
struct max98927_priv *max98927 = snd_soc_codec_get_drvdata(codec);
 
ucontrol->value.integer.value[0] = max98927->digital_gain;
-   dev_dbg(codec->dev, "%s: spk_gain setting returned %d\n", __func__,
+   dev_dbg(codec->dev, "%s: digital_gain setting returned %d\n", __func__,
(int) ucontrol->value.integer.value[0]);
return 0;
 }
-- 
2.7.4

[PATCH 05/10] Replaced pr_err by dev_err. Modified debug message

2017-03-03 Thread Ryan Lee

Signed-off-by: Ryan Lee 
---
Replaced 'pr_err' by 'dev_err'. Modified error message.

 sound/soc/codecs/max98927.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/sound/soc/codecs/max98927.c b/sound/soc/codecs/max98927.c
index efc761b..0abf6d3 100755
--- a/sound/soc/codecs/max98927.c
+++ b/sound/soc/codecs/max98927.c
@@ -254,6 +254,7 @@ static const int rate_table[] = {
 static int max98927_set_clock(struct max98927_priv *max98927,
struct snd_pcm_hw_params *params)
 {
+   struct snd_soc_codec *codec = max98927->codec;
/* BCLK/LRCLK ratio calculation */
int blr_clk_ratio = params_channels(params) * max98927->ch_size;
int reg = MAX98927_R0022_PCM_CLK_SETUP;
@@ -268,8 +269,7 @@ static int max98927_set_clock(struct max98927_priv 
*max98927,
break;
}
if (i == ARRAY_SIZE(rate_table)) {
-   pr_err("%s couldn't get the MCLK to match codec\n",
-   __func__);
+   dev_err(codec->dev, "failed to find proper clock 
rate.\n");
return -EINVAL;
}
regmap_update_bits(max98927->regmap,
@@ -327,12 +327,12 @@ static int max98927_dai_hw_params(struct 
snd_pcm_substream *substream,
max98927->ch_size = 32;
break;
default:
-   pr_err("%s: format unsupported %d",
-   __func__, params_format(params));
+   dev_err(codec->dev, "format unsupported %d",
+   params_format(params));
goto err;
}
-   dev_dbg(codec->dev, "%s: format supported %d",
-   __func__, params_format(params));
+   dev_dbg(codec->dev, "format supported %d",
+   params_format(params));
 
/* sampling rate configuration */
switch (params_rate(params)) {
@@ -364,8 +364,8 @@ static int max98927_dai_hw_params(struct snd_pcm_substream 
*substream,
sampling_rate |= MAX98927_PCM_SR_SET1_SR_48000;
break;
default:
-   pr_err("%s rate %d not supported\n",
-   __func__, params_rate(params));
+   dev_err(codec->dev, "rate %d not supported\n",
+   params_rate(params));
goto err;
}
/* set DAI_SR to correct LRCLK frequency */
@@ -490,7 +490,7 @@ static int max98927_digital_gain_get(struct snd_kcontrol 
*kcontrol,
struct max98927_priv *max98927 = snd_soc_codec_get_drvdata(codec);
 
ucontrol->value.integer.value[0] = max98927->digital_gain;
-   dev_dbg(codec->dev, "%s: spk_gain setting returned %d\n", __func__,
+   dev_dbg(codec->dev, "%s: digital_gain setting returned %d\n", __func__,
(int) ucontrol->value.integer.value[0]);
return 0;
 }
-- 
2.7.4

[PATCH 06/10] Added mask variable to apply it in one round after the switch

2017-03-03 Thread Ryan Lee

Signed-off-by: Ryan Lee 
---
Added the mask variable to apply in one round after the switch.

 sound/soc/codecs/max98927.c | 64 ++---
 1 file changed, 31 insertions(+), 33 deletions(-)

diff --git a/sound/soc/codecs/max98927.c b/sound/soc/codecs/max98927.c
index 0abf6d3..9e70883 100755
--- a/sound/soc/codecs/max98927.c
+++ b/sound/soc/codecs/max98927.c
@@ -171,34 +171,31 @@ static int max98927_dai_set_fmt(struct snd_soc_dai 
*codec_dai, unsigned int fmt)
 {
struct snd_soc_codec *codec = codec_dai->codec;
struct max98927_priv *max98927 = snd_soc_codec_get_drvdata(codec);
+   unsigned int mode = 0;
unsigned int invert = 0;
 
dev_dbg(codec->dev, "%s: fmt 0x%08X\n", __func__, fmt);
 
switch (fmt & SND_SOC_DAIFMT_MASTER_MASK) {
case SND_SOC_DAIFMT_CBS_CFS:
-   regmap_update_bits(max98927->regmap,
-   MAX98927_R0021_PCM_MASTER_MODE,
-   MAX98927_PCM_MASTER_MODE_MASK,
-   MAX98927_PCM_MASTER_MODE_SLAVE);
+   mode = MAX98927_PCM_MASTER_MODE_SLAVE;
break;
case SND_SOC_DAIFMT_CBM_CFM:
max98927->master = true;
-   regmap_update_bits(max98927->regmap,
-   MAX98927_R0021_PCM_MASTER_MODE,
-   MAX98927_PCM_MASTER_MODE_MASK,
-   MAX98927_PCM_MASTER_MODE_MASTER);
+   mode = MAX98927_PCM_MASTER_MODE_MASTER;
break;
case SND_SOC_DAIFMT_CBS_CFM:
-   regmap_update_bits(max98927->regmap,
-   MAX98927_R0021_PCM_MASTER_MODE,
-   MAX98927_PCM_MASTER_MODE_MASK,
-   MAX98927_PCM_MASTER_MODE_HYBRID);
+   mode = MAX98927_PCM_MASTER_MODE_HYBRID;
default:
dev_err(codec->dev, "DAI clock mode unsupported");
return -EINVAL;
}
 
+   regmap_update_bits(max98927->regmap,
+   MAX98927_R0021_PCM_MASTER_MODE,
+   MAX98927_PCM_MASTER_MODE_MASK,
+   mode);
+
switch (fmt & SND_SOC_DAIFMT_INV_MASK) {
case SND_SOC_DAIFMT_NB_NF:
break;
@@ -210,24 +207,28 @@ static int max98927_dai_set_fmt(struct snd_soc_dai 
*codec_dai, unsigned int fmt)
return -EINVAL;
}
 
+   regmap_update_bits(max98927->regmap,
+   MAX98927_R0020_PCM_MODE_CFG,
+   MAX98927_PCM_MODE_CFG_PCM_BCLKEDGE,
+   invert);
+
/* interface format */
switch (fmt & SND_SOC_DAIFMT_FORMAT_MASK) {
case SND_SOC_DAIFMT_I2S:
max98927->iface |= SND_SOC_DAIFMT_I2S;
-   regmap_update_bits(max98927->regmap,
-   MAX98927_R0020_PCM_MODE_CFG,
-   max98927->iface, max98927->iface);
-   break;
+
+   break;
case SND_SOC_DAIFMT_LEFT_J:
max98927->iface |= SND_SOC_DAIFMT_LEFT_J;
-   regmap_update_bits(max98927->regmap,
-   MAX98927_R0020_PCM_MODE_CFG,
-   max98927->iface, max98927->iface);
-   break;
+   break;
default:
return -EINVAL;
}
 
+   regmap_update_bits(max98927->regmap,
+   MAX98927_R0020_PCM_MODE_CFG,
+   max98927->iface, max98927->iface);
+
/* pcm channel configuration */
if (max98927->iface & (SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_LEFT_J)) {
regmap_write(max98927->regmap,
@@ -301,29 +302,21 @@ static int max98927_dai_hw_params(struct 
snd_pcm_substream *substream,
 {
struct snd_soc_codec *codec = dai->codec;
struct max98927_priv *max98927 = snd_soc_codec_get_drvdata(codec);
-   int sampling_rate = 0;
+   unsigned int sampling_rate = 0;
+   unsigned int chan_sz = 0;
 
/* pcm mode configuration */
switch (snd_pcm_format_width(params_format(params))) {
case 16:
-   regmap_update_bits(max98927->regmap,
-   MAX98927_R0020_PCM_MODE_CFG,
-   MAX98927_PCM_MODE_CFG_CHANSZ_16,
-   MAX98927_PCM_MODE_CFG_CHANSZ_16);
+   chan_sz = MAX98927_PCM_MODE_CFG_CHANSZ_16;
max98927->ch_size = 16;
break;
case 24:
-   regmap_update_bits(max98927->regmap,
-   MAX98927_R0020_PCM_MODE_CFG,
-   MAX98927_PCM_MODE_CFG_CHANSZ_24,
-   MAX98927_PCM_MODE_CFG_CHANSZ_24);
+   chan_sz = MAX98927_PCM_MODE_CFG_CHANSZ_24;
max98927->ch_size = 24;
break;
case 32:
-   regmap_update_bits(max98927->regmap,
-   MAX98927_R0020_PCM_MODE_CFG,
-   MAX98927_PCM_MODE_CFG_CHANSZ_32,
-

[PATCH 06/10] Added mask variable to apply it in one round after the switch

2017-03-03 Thread Ryan Lee

Signed-off-by: Ryan Lee 
---
Added the mask variable to apply in one round after the switch.

 sound/soc/codecs/max98927.c | 64 ++---
 1 file changed, 31 insertions(+), 33 deletions(-)

diff --git a/sound/soc/codecs/max98927.c b/sound/soc/codecs/max98927.c
index 0abf6d3..9e70883 100755
--- a/sound/soc/codecs/max98927.c
+++ b/sound/soc/codecs/max98927.c
@@ -171,34 +171,31 @@ static int max98927_dai_set_fmt(struct snd_soc_dai 
*codec_dai, unsigned int fmt)
 {
struct snd_soc_codec *codec = codec_dai->codec;
struct max98927_priv *max98927 = snd_soc_codec_get_drvdata(codec);
+   unsigned int mode = 0;
unsigned int invert = 0;
 
dev_dbg(codec->dev, "%s: fmt 0x%08X\n", __func__, fmt);
 
switch (fmt & SND_SOC_DAIFMT_MASTER_MASK) {
case SND_SOC_DAIFMT_CBS_CFS:
-   regmap_update_bits(max98927->regmap,
-   MAX98927_R0021_PCM_MASTER_MODE,
-   MAX98927_PCM_MASTER_MODE_MASK,
-   MAX98927_PCM_MASTER_MODE_SLAVE);
+   mode = MAX98927_PCM_MASTER_MODE_SLAVE;
break;
case SND_SOC_DAIFMT_CBM_CFM:
max98927->master = true;
-   regmap_update_bits(max98927->regmap,
-   MAX98927_R0021_PCM_MASTER_MODE,
-   MAX98927_PCM_MASTER_MODE_MASK,
-   MAX98927_PCM_MASTER_MODE_MASTER);
+   mode = MAX98927_PCM_MASTER_MODE_MASTER;
break;
case SND_SOC_DAIFMT_CBS_CFM:
-   regmap_update_bits(max98927->regmap,
-   MAX98927_R0021_PCM_MASTER_MODE,
-   MAX98927_PCM_MASTER_MODE_MASK,
-   MAX98927_PCM_MASTER_MODE_HYBRID);
+   mode = MAX98927_PCM_MASTER_MODE_HYBRID;
default:
dev_err(codec->dev, "DAI clock mode unsupported");
return -EINVAL;
}
 
+   regmap_update_bits(max98927->regmap,
+   MAX98927_R0021_PCM_MASTER_MODE,
+   MAX98927_PCM_MASTER_MODE_MASK,
+   mode);
+
switch (fmt & SND_SOC_DAIFMT_INV_MASK) {
case SND_SOC_DAIFMT_NB_NF:
break;
@@ -210,24 +207,28 @@ static int max98927_dai_set_fmt(struct snd_soc_dai 
*codec_dai, unsigned int fmt)
return -EINVAL;
}
 
+   regmap_update_bits(max98927->regmap,
+   MAX98927_R0020_PCM_MODE_CFG,
+   MAX98927_PCM_MODE_CFG_PCM_BCLKEDGE,
+   invert);
+
/* interface format */
switch (fmt & SND_SOC_DAIFMT_FORMAT_MASK) {
case SND_SOC_DAIFMT_I2S:
max98927->iface |= SND_SOC_DAIFMT_I2S;
-   regmap_update_bits(max98927->regmap,
-   MAX98927_R0020_PCM_MODE_CFG,
-   max98927->iface, max98927->iface);
-   break;
+
+   break;
case SND_SOC_DAIFMT_LEFT_J:
max98927->iface |= SND_SOC_DAIFMT_LEFT_J;
-   regmap_update_bits(max98927->regmap,
-   MAX98927_R0020_PCM_MODE_CFG,
-   max98927->iface, max98927->iface);
-   break;
+   break;
default:
return -EINVAL;
}
 
+   regmap_update_bits(max98927->regmap,
+   MAX98927_R0020_PCM_MODE_CFG,
+   max98927->iface, max98927->iface);
+
/* pcm channel configuration */
if (max98927->iface & (SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_LEFT_J)) {
regmap_write(max98927->regmap,
@@ -301,29 +302,21 @@ static int max98927_dai_hw_params(struct 
snd_pcm_substream *substream,
 {
struct snd_soc_codec *codec = dai->codec;
struct max98927_priv *max98927 = snd_soc_codec_get_drvdata(codec);
-   int sampling_rate = 0;
+   unsigned int sampling_rate = 0;
+   unsigned int chan_sz = 0;
 
/* pcm mode configuration */
switch (snd_pcm_format_width(params_format(params))) {
case 16:
-   regmap_update_bits(max98927->regmap,
-   MAX98927_R0020_PCM_MODE_CFG,
-   MAX98927_PCM_MODE_CFG_CHANSZ_16,
-   MAX98927_PCM_MODE_CFG_CHANSZ_16);
+   chan_sz = MAX98927_PCM_MODE_CFG_CHANSZ_16;
max98927->ch_size = 16;
break;
case 24:
-   regmap_update_bits(max98927->regmap,
-   MAX98927_R0020_PCM_MODE_CFG,
-   MAX98927_PCM_MODE_CFG_CHANSZ_24,
-   MAX98927_PCM_MODE_CFG_CHANSZ_24);
+   chan_sz = MAX98927_PCM_MODE_CFG_CHANSZ_24;
max98927->ch_size = 24;
break;
case 32:
-   regmap_update_bits(max98927->regmap,
-   MAX98927_R0020_PCM_MODE_CFG,
-   MAX98927_PCM_MODE_CFG_CHANSZ_32,
-   MAX98927_PCM_MODE_CFG_CHANSZ_32);
+

Re: [RFC PATCH v2 00/32] x86: Secure Encrypted Virtualization (AMD)

2017-03-03 Thread Bjorn Helgaas

On Thu, Mar 02, 2017 at 10:12:01AM -0500, Brijesh Singh wrote:
> This RFC series provides support for AMD's new Secure Encrypted Virtualization
> (SEV) feature. This RFC is build upon Secure Memory Encryption (SME) RFCv4 
> [1].

What kernel version is this series based on?

Re: [RFC PATCH v2 00/32] x86: Secure Encrypted Virtualization (AMD)

2017-03-03 Thread Bjorn Helgaas

On Thu, Mar 02, 2017 at 10:12:01AM -0500, Brijesh Singh wrote:
> This RFC series provides support for AMD's new Secure Encrypted Virtualization
> (SEV) feature. This RFC is build upon Secure Memory Encryption (SME) RFCv4 
> [1].

What kernel version is this series based on?

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 1450 matches

Mail list logo