Re: 32-bit powerpc, aty128fb: vmap allocation for size 135168 failed

2017-08-18 Thread Meelis Roos
> Meelis Roos  writes:
> 
> > I was trying 4.13.0-rc5-00075-gac9a40905a61 on my PowerMac G4 with 1G 
> > RAM and after some time of sddm respawning and X trying to restart, 
> > dmesg is full of messages about vmap allocation failures.
> 
> Did it just start happening? ie. did rc4 work?

It goes back to at least 4.0 - that's the oldest kernel I had laying 
around precompiled. The messages about ROM signature changed somewehere 
between 4.0 and 4.7 (4.7 is already like 4.13) but after some time, the 
same vmalloc errors appear.

Maybe the userspace has changed with more respawning that brings the 
problem out.

I tried to read the code but I do not understand it yet. The warning 
seems to come from generic pci_map_rom() checking ROM size, and 
returning rom pointer to aty128fb (it returns resource size too but that 
is ignored). aty128fb starts to look at the x86 PCI ROM signature again 
but does not tell that the signature is missing. How come?

-- 
Meelis Roos (mr...@linux.ee)


Re: 32-bit powerpc, aty128fb: vmap allocation for size 135168 failed

2017-08-18 Thread Meelis Roos
> Meelis Roos  writes:
> 
> > I was trying 4.13.0-rc5-00075-gac9a40905a61 on my PowerMac G4 with 1G 
> > RAM and after some time of sddm respawning and X trying to restart, 
> > dmesg is full of messages about vmap allocation failures.
> 
> Did it just start happening? ie. did rc4 work?

It goes back to at least 4.0 - that's the oldest kernel I had laying 
around precompiled. The messages about ROM signature changed somewehere 
between 4.0 and 4.7 (4.7 is already like 4.13) but after some time, the 
same vmalloc errors appear.

Maybe the userspace has changed with more respawning that brings the 
problem out.

I tried to read the code but I do not understand it yet. The warning 
seems to come from generic pci_map_rom() checking ROM size, and 
returning rom pointer to aty128fb (it returns resource size too but that 
is ignored). aty128fb starts to look at the x86 PCI ROM signature again 
but does not tell that the signature is missing. How come?

-- 
Meelis Roos (mr...@linux.ee)


Re: [PATCH] X.509: Fix the buffer overflow in the utility function for OID string

2017-08-18 Thread Takashi Iwai
On Sat, 19 Aug 2017 06:19:44 +0200,
Lee, Chun-Yi wrote:
> 
> From: Takashi Iwai 
> 
> The sprint_oid() utility function doesn't properly check the buffer
> size that it causes that the warning in vsnprintf() be triggered.
> For example on v4.1 kernel:
> 
> [   49.612536] [ cut here ]
> [   49.612543] WARNING: CPU: 0 PID: 2357 at lib/vsprintf.c:1867 
> vsnprintf+0x5a7/0x5c0()
> ...
> 
> We can trigger this issue by injecting maliciously crafted x509 cert
> in DER format. Just using hex editor to change the length of OID to
> over the length of the SEQUENCE container. For example:
> 
> 0:d=0  hl=4 l= 980 cons: SEQUENCE
> 4:d=1  hl=4 l= 700 cons:  SEQUENCE
> 8:d=2  hl=2 l=   3 cons:   cont [ 0 ]
>10:d=3  hl=2 l=   1 prim:INTEGER   :02
>13:d=2  hl=2 l=   9 prim:   INTEGER   :9B47FAF791E7D1E3
>24:d=2  hl=2 l=  13 cons:   SEQUENCE
>26:d=3  hl=2 l=   9 prim:OBJECT:sha256WithRSAEncryption
>37:d=3  hl=2 l=   0 prim:NULL
>39:d=2  hl=2 l= 121 cons:   SEQUENCE
>41:d=3  hl=2 l=  22 cons:SET
>43:d=4  hl=2 l=  20 cons: SEQUENCE  <=== the SEQ length is 20
>45:d=5  hl=2 l=   3 prim:  OBJECT:organizationName
>   <=== the original length is 3, change the length of OID to over the 
> length of SEQUENCE
> 
> Pawel Wieczorkiewicz reported this problem and Takashi Iwai provided
> patch to fix it by checking the bufsize in sprint_oid().
> 
> From: Takashi Iwai 
> Reported-by: Pawel Wieczorkiewicz 
> Cc: David Howells 
> Cc: Rusty Russell 
> Cc: Takashi Iwai 
> Cc: Pawel Wieczorkiewicz 
> Signed-off-by: "Lee, Chun-Yi" 

I seem to have forgotten to give my sign-off:
  Signed-off-by: Takashi Iwai 


thanks,

Takashi

> ---
>  lib/oid_registry.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/oid_registry.c b/lib/oid_registry.c
> index 318f382..41b9e50 100644
> --- a/lib/oid_registry.c
> +++ b/lib/oid_registry.c
> @@ -142,9 +142,9 @@ int sprint_oid(const void *data, size_t datasize, char 
> *buffer, size_t bufsize)
>   }
>   ret += count = snprintf(buffer, bufsize, ".%lu", num);
>   buffer += count;
> - bufsize -= count;
> - if (bufsize == 0)
> + if (bufsize <= count)
>   return -ENOBUFS;
> + bufsize -= count;
>   }
>  
>   return ret;
> -- 
> 2.10.2
> 


Re: [PATCH] X.509: Fix the buffer overflow in the utility function for OID string

2017-08-18 Thread Takashi Iwai
On Sat, 19 Aug 2017 06:19:44 +0200,
Lee, Chun-Yi wrote:
> 
> From: Takashi Iwai 
> 
> The sprint_oid() utility function doesn't properly check the buffer
> size that it causes that the warning in vsnprintf() be triggered.
> For example on v4.1 kernel:
> 
> [   49.612536] [ cut here ]
> [   49.612543] WARNING: CPU: 0 PID: 2357 at lib/vsprintf.c:1867 
> vsnprintf+0x5a7/0x5c0()
> ...
> 
> We can trigger this issue by injecting maliciously crafted x509 cert
> in DER format. Just using hex editor to change the length of OID to
> over the length of the SEQUENCE container. For example:
> 
> 0:d=0  hl=4 l= 980 cons: SEQUENCE
> 4:d=1  hl=4 l= 700 cons:  SEQUENCE
> 8:d=2  hl=2 l=   3 cons:   cont [ 0 ]
>10:d=3  hl=2 l=   1 prim:INTEGER   :02
>13:d=2  hl=2 l=   9 prim:   INTEGER   :9B47FAF791E7D1E3
>24:d=2  hl=2 l=  13 cons:   SEQUENCE
>26:d=3  hl=2 l=   9 prim:OBJECT:sha256WithRSAEncryption
>37:d=3  hl=2 l=   0 prim:NULL
>39:d=2  hl=2 l= 121 cons:   SEQUENCE
>41:d=3  hl=2 l=  22 cons:SET
>43:d=4  hl=2 l=  20 cons: SEQUENCE  <=== the SEQ length is 20
>45:d=5  hl=2 l=   3 prim:  OBJECT:organizationName
>   <=== the original length is 3, change the length of OID to over the 
> length of SEQUENCE
> 
> Pawel Wieczorkiewicz reported this problem and Takashi Iwai provided
> patch to fix it by checking the bufsize in sprint_oid().
> 
> From: Takashi Iwai 
> Reported-by: Pawel Wieczorkiewicz 
> Cc: David Howells 
> Cc: Rusty Russell 
> Cc: Takashi Iwai 
> Cc: Pawel Wieczorkiewicz 
> Signed-off-by: "Lee, Chun-Yi" 

I seem to have forgotten to give my sign-off:
  Signed-off-by: Takashi Iwai 


thanks,

Takashi

> ---
>  lib/oid_registry.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/oid_registry.c b/lib/oid_registry.c
> index 318f382..41b9e50 100644
> --- a/lib/oid_registry.c
> +++ b/lib/oid_registry.c
> @@ -142,9 +142,9 @@ int sprint_oid(const void *data, size_t datasize, char 
> *buffer, size_t bufsize)
>   }
>   ret += count = snprintf(buffer, bufsize, ".%lu", num);
>   buffer += count;
> - bufsize -= count;
> - if (bufsize == 0)
> + if (bufsize <= count)
>   return -ENOBUFS;
> + bufsize -= count;
>   }
>  
>   return ret;
> -- 
> 2.10.2
> 


Re: [PATCH v2] sched/fair: Make PELT signal more accurate

2017-08-18 Thread Mike Galbraith
On Fri, 2017-08-18 at 16:50 -0700, Joel Fernandes wrote:
> The PELT signal (sa->load_avg and sa->util_avg) are not updated if the amount
> accumulated during a single update doesn't cross a period boundary. This is
> fine in cases where the amount accrued is much smaller than the size of a
> single PELT window (1ms) however if the amount accrued is high then the
> relative error (calculated against what the actual signal would be had we
> updated the averages) can be high - as much 2% in my testing. On plotting
> signals, I found that there are errors especially high when we update just
> before the period boundary is hit. These errors can be significantly reduced 
> if
> we update the averages more often.
> 
> Inorder to fix this, this patch does the average update by also checking how
> much time has elapsed since the last update and update the averages if it has
> been long enough (as a threshold I chose 512us).

Ok, I gotta ask:  In order to fix what?  What exactly does the small
but existent overhead increase buy us other than an ever so slightly
different chart?  What is your motivation to care about a microscopic
change in signal shape?

-Mike


Re: [PATCH v2] sched/fair: Make PELT signal more accurate

2017-08-18 Thread Mike Galbraith
On Fri, 2017-08-18 at 16:50 -0700, Joel Fernandes wrote:
> The PELT signal (sa->load_avg and sa->util_avg) are not updated if the amount
> accumulated during a single update doesn't cross a period boundary. This is
> fine in cases where the amount accrued is much smaller than the size of a
> single PELT window (1ms) however if the amount accrued is high then the
> relative error (calculated against what the actual signal would be had we
> updated the averages) can be high - as much 2% in my testing. On plotting
> signals, I found that there are errors especially high when we update just
> before the period boundary is hit. These errors can be significantly reduced 
> if
> we update the averages more often.
> 
> Inorder to fix this, this patch does the average update by also checking how
> much time has elapsed since the last update and update the averages if it has
> been long enough (as a threshold I chose 512us).

Ok, I gotta ask:  In order to fix what?  What exactly does the small
but existent overhead increase buy us other than an ever so slightly
different chart?  What is your motivation to care about a microscopic
change in signal shape?

-Mike


[PATCH v2] membarrier: Document scheduler barrier requirements

2017-08-18 Thread Mathieu Desnoyers
Document the membarrier requirement on having a full memory barrier in
__schedule() after coming from user-space, before storing to rq->curr.
It is provided by smp_mb__before_spinlock() in __schedule().

Document that membarrier requires a full barrier on transition from
kernel thread to userspace thread, which skips the call to switch_mm(). We
currently have an implicit barrier from atomic_dec_and_test() in mmdrop() that
ensures this.

The x86 switch_mm_irqs_off() full barrier is currently provided by many cpumask
update operations as well as load_cr3(). Document that load_cr3() is providing
this barrier.

[ Rebased on top of linux-rcu for-mingo branch.
  Applies on top of "membarrier: Provide expedited private command". ]

Signed-off-by: Mathieu Desnoyers 
CC: Peter Zijlstra 
CC: Paul E. McKenney 
CC: Boqun Feng 
CC: Andrew Hunter 
CC: Maged Michael 
CC: gro...@google.com
CC: Avi Kivity 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Michael Ellerman 
CC: Dave Watson 
---
 arch/x86/mm/tlb.c| 3 +++
 include/linux/sched/mm.h | 4 
 kernel/sched/core.c  | 9 +
 3 files changed, 16 insertions(+)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 014d07a..cd815b6 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -133,6 +133,9 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
 * and neither LOCK nor MFENCE orders them.
 * Fortunately, load_cr3() is serializing and gives the
 * ordering guarantee we need.
+*
+* This full barrier is also required by the membarrier
+* system call.
 */
load_cr3(next->pgd);
 
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 2b24a69..fe29d06 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -38,6 +38,10 @@ static inline void mmgrab(struct mm_struct *mm)
 extern void __mmdrop(struct mm_struct *);
 static inline void mmdrop(struct mm_struct *mm)
 {
+   /*
+* The implicit full barrier implied by atomic_dec_and_test is
+* required by the membarrier system call.
+*/
if (unlikely(atomic_dec_and_test(>mm_count)))
__mmdrop(mm);
 }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3f29c6a..b0f199f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2654,6 +2654,12 @@ static struct rq *finish_task_switch(struct task_struct 
*prev)
finish_arch_post_lock_switch();
 
fire_sched_in_preempt_notifiers(current);
+   /*
+* When transitioning from a kernel thread to a userspace
+* thread, mmdrop()'s implicit full barrier is required by the
+* membarrier system call, because the current active_mm can
+* become the current mm without going through switch_mm().
+*/
if (mm)
mmdrop(mm);
if (unlikely(prev_state == TASK_DEAD)) {
@@ -3295,6 +3301,9 @@ static void __sched notrace __schedule(bool preempt)
 * Make sure that signal_pending_state()->signal_pending() below
 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
 * done by the caller to avoid the race with signal_wake_up().
+*
+* The membarrier system call requires a full memory barrier
+* after coming from user-space, before storing to rq->curr.
 */
smp_mb__before_spinlock();
rq_lock(rq, );
-- 
1.9.1



[PATCH v2] membarrier: Document scheduler barrier requirements

2017-08-18 Thread Mathieu Desnoyers
Document the membarrier requirement on having a full memory barrier in
__schedule() after coming from user-space, before storing to rq->curr.
It is provided by smp_mb__before_spinlock() in __schedule().

Document that membarrier requires a full barrier on transition from
kernel thread to userspace thread, which skips the call to switch_mm(). We
currently have an implicit barrier from atomic_dec_and_test() in mmdrop() that
ensures this.

The x86 switch_mm_irqs_off() full barrier is currently provided by many cpumask
update operations as well as load_cr3(). Document that load_cr3() is providing
this barrier.

[ Rebased on top of linux-rcu for-mingo branch.
  Applies on top of "membarrier: Provide expedited private command". ]

Signed-off-by: Mathieu Desnoyers 
CC: Peter Zijlstra 
CC: Paul E. McKenney 
CC: Boqun Feng 
CC: Andrew Hunter 
CC: Maged Michael 
CC: gro...@google.com
CC: Avi Kivity 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Michael Ellerman 
CC: Dave Watson 
---
 arch/x86/mm/tlb.c| 3 +++
 include/linux/sched/mm.h | 4 
 kernel/sched/core.c  | 9 +
 3 files changed, 16 insertions(+)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 014d07a..cd815b6 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -133,6 +133,9 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
 * and neither LOCK nor MFENCE orders them.
 * Fortunately, load_cr3() is serializing and gives the
 * ordering guarantee we need.
+*
+* This full barrier is also required by the membarrier
+* system call.
 */
load_cr3(next->pgd);
 
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 2b24a69..fe29d06 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -38,6 +38,10 @@ static inline void mmgrab(struct mm_struct *mm)
 extern void __mmdrop(struct mm_struct *);
 static inline void mmdrop(struct mm_struct *mm)
 {
+   /*
+* The implicit full barrier implied by atomic_dec_and_test is
+* required by the membarrier system call.
+*/
if (unlikely(atomic_dec_and_test(>mm_count)))
__mmdrop(mm);
 }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3f29c6a..b0f199f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2654,6 +2654,12 @@ static struct rq *finish_task_switch(struct task_struct 
*prev)
finish_arch_post_lock_switch();
 
fire_sched_in_preempt_notifiers(current);
+   /*
+* When transitioning from a kernel thread to a userspace
+* thread, mmdrop()'s implicit full barrier is required by the
+* membarrier system call, because the current active_mm can
+* become the current mm without going through switch_mm().
+*/
if (mm)
mmdrop(mm);
if (unlikely(prev_state == TASK_DEAD)) {
@@ -3295,6 +3301,9 @@ static void __sched notrace __schedule(bool preempt)
 * Make sure that signal_pending_state()->signal_pending() below
 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
 * done by the caller to avoid the race with signal_wake_up().
+*
+* The membarrier system call requires a full memory barrier
+* after coming from user-space, before storing to rq->curr.
 */
smp_mb__before_spinlock();
rq_lock(rq, );
-- 
1.9.1



[PATCH] X.509: Fix the buffer overflow in the utility function for OID string

2017-08-18 Thread Lee, Chun-Yi
From: Takashi Iwai 

The sprint_oid() utility function doesn't properly check the buffer
size that it causes that the warning in vsnprintf() be triggered.
For example on v4.1 kernel:

[   49.612536] [ cut here ]
[   49.612543] WARNING: CPU: 0 PID: 2357 at lib/vsprintf.c:1867 
vsnprintf+0x5a7/0x5c0()
...

We can trigger this issue by injecting maliciously crafted x509 cert
in DER format. Just using hex editor to change the length of OID to
over the length of the SEQUENCE container. For example:

0:d=0  hl=4 l= 980 cons: SEQUENCE
4:d=1  hl=4 l= 700 cons:  SEQUENCE
8:d=2  hl=2 l=   3 cons:   cont [ 0 ]
   10:d=3  hl=2 l=   1 prim:INTEGER   :02
   13:d=2  hl=2 l=   9 prim:   INTEGER   :9B47FAF791E7D1E3
   24:d=2  hl=2 l=  13 cons:   SEQUENCE
   26:d=3  hl=2 l=   9 prim:OBJECT:sha256WithRSAEncryption
   37:d=3  hl=2 l=   0 prim:NULL
   39:d=2  hl=2 l= 121 cons:   SEQUENCE
   41:d=3  hl=2 l=  22 cons:SET
   43:d=4  hl=2 l=  20 cons: SEQUENCE  <=== the SEQ length is 20
   45:d=5  hl=2 l=   3 prim:  OBJECT:organizationName
<=== the original length is 3, change the length of OID to over the 
length of SEQUENCE

Pawel Wieczorkiewicz reported this problem and Takashi Iwai provided
patch to fix it by checking the bufsize in sprint_oid().

From: Takashi Iwai 
Reported-by: Pawel Wieczorkiewicz 
Cc: David Howells 
Cc: Rusty Russell 
Cc: Takashi Iwai 
Cc: Pawel Wieczorkiewicz 
Signed-off-by: "Lee, Chun-Yi" 
---
 lib/oid_registry.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/oid_registry.c b/lib/oid_registry.c
index 318f382..41b9e50 100644
--- a/lib/oid_registry.c
+++ b/lib/oid_registry.c
@@ -142,9 +142,9 @@ int sprint_oid(const void *data, size_t datasize, char 
*buffer, size_t bufsize)
}
ret += count = snprintf(buffer, bufsize, ".%lu", num);
buffer += count;
-   bufsize -= count;
-   if (bufsize == 0)
+   if (bufsize <= count)
return -ENOBUFS;
+   bufsize -= count;
}
 
return ret;
-- 
2.10.2



[PATCH] X.509: Fix the buffer overflow in the utility function for OID string

2017-08-18 Thread Lee, Chun-Yi
From: Takashi Iwai 

The sprint_oid() utility function doesn't properly check the buffer
size that it causes that the warning in vsnprintf() be triggered.
For example on v4.1 kernel:

[   49.612536] [ cut here ]
[   49.612543] WARNING: CPU: 0 PID: 2357 at lib/vsprintf.c:1867 
vsnprintf+0x5a7/0x5c0()
...

We can trigger this issue by injecting maliciously crafted x509 cert
in DER format. Just using hex editor to change the length of OID to
over the length of the SEQUENCE container. For example:

0:d=0  hl=4 l= 980 cons: SEQUENCE
4:d=1  hl=4 l= 700 cons:  SEQUENCE
8:d=2  hl=2 l=   3 cons:   cont [ 0 ]
   10:d=3  hl=2 l=   1 prim:INTEGER   :02
   13:d=2  hl=2 l=   9 prim:   INTEGER   :9B47FAF791E7D1E3
   24:d=2  hl=2 l=  13 cons:   SEQUENCE
   26:d=3  hl=2 l=   9 prim:OBJECT:sha256WithRSAEncryption
   37:d=3  hl=2 l=   0 prim:NULL
   39:d=2  hl=2 l= 121 cons:   SEQUENCE
   41:d=3  hl=2 l=  22 cons:SET
   43:d=4  hl=2 l=  20 cons: SEQUENCE  <=== the SEQ length is 20
   45:d=5  hl=2 l=   3 prim:  OBJECT:organizationName
<=== the original length is 3, change the length of OID to over the 
length of SEQUENCE

Pawel Wieczorkiewicz reported this problem and Takashi Iwai provided
patch to fix it by checking the bufsize in sprint_oid().

From: Takashi Iwai 
Reported-by: Pawel Wieczorkiewicz 
Cc: David Howells 
Cc: Rusty Russell 
Cc: Takashi Iwai 
Cc: Pawel Wieczorkiewicz 
Signed-off-by: "Lee, Chun-Yi" 
---
 lib/oid_registry.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/oid_registry.c b/lib/oid_registry.c
index 318f382..41b9e50 100644
--- a/lib/oid_registry.c
+++ b/lib/oid_registry.c
@@ -142,9 +142,9 @@ int sprint_oid(const void *data, size_t datasize, char 
*buffer, size_t bufsize)
}
ret += count = snprintf(buffer, bufsize, ".%lu", num);
buffer += count;
-   bufsize -= count;
-   if (bufsize == 0)
+   if (bufsize <= count)
return -ENOBUFS;
+   bufsize -= count;
}
 
return ret;
-- 
2.10.2



Re: [PATCH v2 3/3] vfio/pci: Don't probe devices that can't be reset

2017-08-18 Thread Alex Williamson
On Fri, 18 Aug 2017 08:57:09 -0700
David Daney  wrote:

> On 08/18/2017 07:12 AM, Alex Williamson wrote:
> > On Fri, 18 Aug 2017 15:42:31 +0200
> > Jan Glauber  wrote:
> >   
> >> On Thu, Aug 17, 2017 at 07:00:17AM -0600, Alex Williamson wrote:  
> >>> On Thu, 17 Aug 2017 10:14:23 +0200
> >>> Jan Glauber  wrote:
> >>>  
>  If a PCI device supports neither function-level reset, nor slot
>  or bus reset then refuse to probe it. A line is printed to inform
>  the user.  
> >>>
> >>> But that's not what this does, this requires that the device is on a
> >>> reset-able bus.  This is a massive regression.  With this we could no
> >>> longer assign devices on the root complex or any device which doesn't
> >>> return from bus reset and currently makes use of the NO_BUS_RESET flag
> >>> and works happily otherwise.  Full NAK.  Thanks,  
> >>
> >> Looks like I missed the slot reset check. So how about this:
> >>
> >> if (pci_probe_reset_slot(pdev->slot) && pci_probe_reset_bus(pdev->bus)) {
> >>dev_warn(...);
> >>return -ENODEV;
> >> }
> >>
> >> Or am I still missing something here?  
> > 
> > We don't require that a device is on a reset-able bus/slot, so any
> > attempt to impose that requirement means that there are devices that
> > might work perfectly fine that are now excluded from assignment.  The
> > entire premise is unacceptable.  Thanks,  
> 
> 
> You previously rejected the idea to silently ignore bus reset requests 
> on buses that do not support it.
> 
> So this leaves us with two options:
> 
> 1) Do nothing, and crash the kernel on systems with bad combinations of 
> PCIe target devices and cn88xx when vfio_pci is used.
> 
> 2) Do something else.
> 
> We are trying to figure out what that something else should be.  The 
> general concept we are working on is that if vfio_pci wants to reset a 
> device, *and* bus reset is the only option available, *and* cn88xx, then 
> make vfio_pci fail.

But that's not what these attempts do, they say if we can't do a bus or
slot reset, fail the device probe.  The comment is trying to suggest
they do something else, am I misinterpreting the actual code change?
There are plenty of devices out there that don't care if bus reset
doesn't work, they support FLR or PM reset or device specific reset or
just deal without a reset.  We can't suddenly say this new thing is a
requirement and sorry if you were happily using device assignment
before, but there's a slim chance you're on this platform that falls
over if we attempt to do a secondary bus reset.

> What is your opinion of doing that (assuming it is properly implemented)?

It seems like these attempts are trying to completely turn off vfio-pci
on cn88xx, do you just want it unsupported on these platforms?  Should
we blacklist anything where dev->bus->self is this root port?
Otherwise, what's wrong with returning an error if a bus reset fails,
because we should *never* silently ignore the request and pretend that
it worked, perhaps even dev_warn()'ing that the platform doesn't
support bus resets?  Thanks,

Alex


Re: [PATCH v2 3/3] vfio/pci: Don't probe devices that can't be reset

2017-08-18 Thread Alex Williamson
On Fri, 18 Aug 2017 08:57:09 -0700
David Daney  wrote:

> On 08/18/2017 07:12 AM, Alex Williamson wrote:
> > On Fri, 18 Aug 2017 15:42:31 +0200
> > Jan Glauber  wrote:
> >   
> >> On Thu, Aug 17, 2017 at 07:00:17AM -0600, Alex Williamson wrote:  
> >>> On Thu, 17 Aug 2017 10:14:23 +0200
> >>> Jan Glauber  wrote:
> >>>  
>  If a PCI device supports neither function-level reset, nor slot
>  or bus reset then refuse to probe it. A line is printed to inform
>  the user.  
> >>>
> >>> But that's not what this does, this requires that the device is on a
> >>> reset-able bus.  This is a massive regression.  With this we could no
> >>> longer assign devices on the root complex or any device which doesn't
> >>> return from bus reset and currently makes use of the NO_BUS_RESET flag
> >>> and works happily otherwise.  Full NAK.  Thanks,  
> >>
> >> Looks like I missed the slot reset check. So how about this:
> >>
> >> if (pci_probe_reset_slot(pdev->slot) && pci_probe_reset_bus(pdev->bus)) {
> >>dev_warn(...);
> >>return -ENODEV;
> >> }
> >>
> >> Or am I still missing something here?  
> > 
> > We don't require that a device is on a reset-able bus/slot, so any
> > attempt to impose that requirement means that there are devices that
> > might work perfectly fine that are now excluded from assignment.  The
> > entire premise is unacceptable.  Thanks,  
> 
> 
> You previously rejected the idea to silently ignore bus reset requests 
> on buses that do not support it.
> 
> So this leaves us with two options:
> 
> 1) Do nothing, and crash the kernel on systems with bad combinations of 
> PCIe target devices and cn88xx when vfio_pci is used.
> 
> 2) Do something else.
> 
> We are trying to figure out what that something else should be.  The 
> general concept we are working on is that if vfio_pci wants to reset a 
> device, *and* bus reset is the only option available, *and* cn88xx, then 
> make vfio_pci fail.

But that's not what these attempts do, they say if we can't do a bus or
slot reset, fail the device probe.  The comment is trying to suggest
they do something else, am I misinterpreting the actual code change?
There are plenty of devices out there that don't care if bus reset
doesn't work, they support FLR or PM reset or device specific reset or
just deal without a reset.  We can't suddenly say this new thing is a
requirement and sorry if you were happily using device assignment
before, but there's a slim chance you're on this platform that falls
over if we attempt to do a secondary bus reset.

> What is your opinion of doing that (assuming it is properly implemented)?

It seems like these attempts are trying to completely turn off vfio-pci
on cn88xx, do you just want it unsupported on these platforms?  Should
we blacklist anything where dev->bus->self is this root port?
Otherwise, what's wrong with returning an error if a bus reset fails,
because we should *never* silently ignore the request and pretend that
it worked, perhaps even dev_warn()'ing that the platform doesn't
support bus resets?  Thanks,

Alex


[PATCH v2 1/1] usb:xhci: update condition to select bus->sysdev from parent device

2017-08-18 Thread Thang Q. Nguyen
From: "Thang Q. Nguyen" 

For commit 4c39d4b949d3 ("usb: xhci: use bus->sysdev for DMA
configuration"), sysdev points to devices known to the system firmware
or hardware for DMA parameters.
However, the parent of the system firmware/hardware device checking
logic does not work in ACPI boot mode. This patch updates the formulation
to check this case in both DT and ACPI.

Signed-off-by: Tung Nguyen 
Signed-off-by: Thang Q. Nguyen 
---
Change since v1:
 - Update codes to work with kernel 4.13-rc5
---
 drivers/usb/host/xhci-plat.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index c04144b..1bb9729 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -187,7 +187,10 @@ static int xhci_plat_probe(struct platform_device *pdev)
 * 3. xhci_plat is grandchild of a pci device (dwc3-pci)
 */
sysdev = >dev;
-   if (sysdev->parent && !sysdev->of_node && sysdev->parent->of_node)
+   if (sysdev->parent && !is_of_node(sysdev->fwnode) &&
+   !is_acpi_device_node(sysdev->fwnode) &&
+   (is_of_node(sysdev->parent->fwnode) ||
+is_acpi_device_node(sysdev->parent->fwnode)))
sysdev = sysdev->parent;
 #ifdef CONFIG_PCI
else if (sysdev->parent && sysdev->parent->parent &&
-- 
1.8.3.1



[PATCH v2 1/1] usb:xhci: update condition to select bus->sysdev from parent device

2017-08-18 Thread Thang Q. Nguyen
From: "Thang Q. Nguyen" 

For commit 4c39d4b949d3 ("usb: xhci: use bus->sysdev for DMA
configuration"), sysdev points to devices known to the system firmware
or hardware for DMA parameters.
However, the parent of the system firmware/hardware device checking
logic does not work in ACPI boot mode. This patch updates the formulation
to check this case in both DT and ACPI.

Signed-off-by: Tung Nguyen 
Signed-off-by: Thang Q. Nguyen 
---
Change since v1:
 - Update codes to work with kernel 4.13-rc5
---
 drivers/usb/host/xhci-plat.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index c04144b..1bb9729 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -187,7 +187,10 @@ static int xhci_plat_probe(struct platform_device *pdev)
 * 3. xhci_plat is grandchild of a pci device (dwc3-pci)
 */
sysdev = >dev;
-   if (sysdev->parent && !sysdev->of_node && sysdev->parent->of_node)
+   if (sysdev->parent && !is_of_node(sysdev->fwnode) &&
+   !is_acpi_device_node(sysdev->fwnode) &&
+   (is_of_node(sysdev->parent->fwnode) ||
+is_acpi_device_node(sysdev->parent->fwnode)))
sysdev = sysdev->parent;
 #ifdef CONFIG_PCI
else if (sysdev->parent && sysdev->parent->parent &&
-- 
1.8.3.1



Re: [PATCH] staging: lustre: uapi: properly include lustre_errno.h

2017-08-18 Thread Greg Kroah-Hartman
On Fri, Aug 18, 2017 at 11:04:14PM -0400, James Simmons wrote:
> The path for lustre_errno.h was not updated to the new path
> for errno.c. Also the header lustre_net.h uses code found in
> lustre_errno.h but doesn't include the header directly. It
> is easy to forget to add the header lustre_errno.h before
> including lustre_net.h so it is just easier to include
> lustre_errno.h in lustre_net.h. This should resolve the
> build issues seen on the ia64 platform.
> 
> Signed-off-by: James Simmons 
> ---
>  drivers/staging/lustre/lustre/include/lustre_net.h | 1 +
>  drivers/staging/lustre/lustre/ptlrpc/errno.c   | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)

Merge this into the correct patch and send both patch series again
please.

thanks,

greg k-h


Re: [PATCH] staging: lustre: uapi: properly include lustre_errno.h

2017-08-18 Thread Greg Kroah-Hartman
On Fri, Aug 18, 2017 at 11:04:14PM -0400, James Simmons wrote:
> The path for lustre_errno.h was not updated to the new path
> for errno.c. Also the header lustre_net.h uses code found in
> lustre_errno.h but doesn't include the header directly. It
> is easy to forget to add the header lustre_errno.h before
> including lustre_net.h so it is just easier to include
> lustre_errno.h in lustre_net.h. This should resolve the
> build issues seen on the ia64 platform.
> 
> Signed-off-by: James Simmons 
> ---
>  drivers/staging/lustre/lustre/include/lustre_net.h | 1 +
>  drivers/staging/lustre/lustre/ptlrpc/errno.c   | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)

Merge this into the correct patch and send both patch series again
please.

thanks,

greg k-h


Re: [f2fs-dev] [PATCH v3] f2fs: add cur_reserved_blocks to support soft block reservation

2017-08-18 Thread Chao Yu
On 2017/8/18 23:09, Yunlong Song wrote:
> This patch adds cur_reserved_blocks to extend reserved_blocks sysfs
> interface to be soft threshold, which allows user configure it exceeding
> current available user space. To ensure there is enough space for
> supporting system's activation, this patch does not set the reserved space

I still don't think soft block reservation could be used in android scenario,
which has users on f2fs data partition, like system or apps, who are sensitive
with free space. IMO, hard block reservation is enough.

Instead, I expect soft block reservation could be used as a method of global
disk quota controlling or enforced over-provision ratio alteration that is
operated by administrator, most possibly in distributed storage scenario, in
where single disk state (free space) changing does affect whole system.

Thanks,

> to the configured reserved_blocks value at once, instead, it safely
> increase cur_reserved_blocks in dev_valid_block(,node)_count to only take
> up the blocks which are just obsoleted.
> 
> Signed-off-by: Yunlong Song 
> Signed-off-by: Chao Yu 
> ---
>  Documentation/ABI/testing/sysfs-fs-f2fs |  3 ++-
>  fs/f2fs/f2fs.h  | 13 +++--
>  fs/f2fs/super.c |  3 ++-
>  fs/f2fs/sysfs.c | 15 +--
>  4 files changed, 28 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
> b/Documentation/ABI/testing/sysfs-fs-f2fs
> index 11b7f4e..ba282ca 100644
> --- a/Documentation/ABI/testing/sysfs-fs-f2fs
> +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
> @@ -138,7 +138,8 @@ What: /sys/fs/f2fs//reserved_blocks
>  Date:June 2017
>  Contact: "Chao Yu" 
>  Description:
> -  Controls current reserved blocks in system.
> +  Controls current reserved blocks in system, the threshold
> +  is soft, it could exceed current available user space.
>  
>  What:/sys/fs/f2fs//gc_urgent
>  Date:August 2017
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 2f20b6b..84ccbdc 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1041,6 +1041,7 @@ struct f2fs_sb_info {
>   block_t discard_blks;   /* discard command candidats */
>   block_t last_valid_block_count; /* for recovery */
>   block_t reserved_blocks;/* configurable reserved blocks 
> */
> + block_t cur_reserved_blocks;/* current reserved blocks */
>  
>   u32 s_next_generation;  /* for NFS support */
>  
> @@ -1515,7 +1516,8 @@ static inline int inc_valid_block_count(struct 
> f2fs_sb_info *sbi,
>  
>   spin_lock(>stat_lock);
>   sbi->total_valid_block_count += (block_t)(*count);
> - avail_user_block_count = sbi->user_block_count - sbi->reserved_blocks;
> + avail_user_block_count = sbi->user_block_count -
> + sbi->cur_reserved_blocks;
>   if (unlikely(sbi->total_valid_block_count > avail_user_block_count)) {
>   diff = sbi->total_valid_block_count - avail_user_block_count;
>   *count -= diff;
> @@ -1549,6 +1551,10 @@ static inline void dec_valid_block_count(struct 
> f2fs_sb_info *sbi,
>   f2fs_bug_on(sbi, sbi->total_valid_block_count < (block_t) count);
>   f2fs_bug_on(sbi, inode->i_blocks < sectors);
>   sbi->total_valid_block_count -= (block_t)count;
> + if (sbi->reserved_blocks &&
> + sbi->reserved_blocks != sbi->cur_reserved_blocks)
> + sbi->cur_reserved_blocks = min(sbi->reserved_blocks,
> + sbi->cur_reserved_blocks + count);
>   spin_unlock(>stat_lock);
>   f2fs_i_blocks_write(inode, count, false, true);
>  }
> @@ -1695,7 +1701,7 @@ static inline int inc_valid_node_count(struct 
> f2fs_sb_info *sbi,
>   spin_lock(>stat_lock);
>  
>   valid_block_count = sbi->total_valid_block_count + 1;
> - if (unlikely(valid_block_count + sbi->reserved_blocks >
> + if (unlikely(valid_block_count + sbi->cur_reserved_blocks >
>   sbi->user_block_count)) {
>   spin_unlock(>stat_lock);
>   goto enospc;
> @@ -1738,6 +1744,9 @@ static inline void dec_valid_node_count(struct 
> f2fs_sb_info *sbi,
>  
>   sbi->total_valid_node_count--;
>   sbi->total_valid_block_count--;
> + if (sbi->reserved_blocks &&
> + sbi->reserved_blocks != sbi->cur_reserved_blocks)
> + sbi->cur_reserved_blocks++;
>  
>   spin_unlock(>stat_lock);
>  
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 4c1bdcb..16a805f 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -957,7 +957,7 @@ static int f2fs_statfs(struct dentry *dentry, struct 
> kstatfs *buf)
>   buf->f_blocks = total_count - start_count;
>  

Re: [f2fs-dev] [PATCH v3] f2fs: add cur_reserved_blocks to support soft block reservation

2017-08-18 Thread Chao Yu
On 2017/8/18 23:09, Yunlong Song wrote:
> This patch adds cur_reserved_blocks to extend reserved_blocks sysfs
> interface to be soft threshold, which allows user configure it exceeding
> current available user space. To ensure there is enough space for
> supporting system's activation, this patch does not set the reserved space

I still don't think soft block reservation could be used in android scenario,
which has users on f2fs data partition, like system or apps, who are sensitive
with free space. IMO, hard block reservation is enough.

Instead, I expect soft block reservation could be used as a method of global
disk quota controlling or enforced over-provision ratio alteration that is
operated by administrator, most possibly in distributed storage scenario, in
where single disk state (free space) changing does affect whole system.

Thanks,

> to the configured reserved_blocks value at once, instead, it safely
> increase cur_reserved_blocks in dev_valid_block(,node)_count to only take
> up the blocks which are just obsoleted.
> 
> Signed-off-by: Yunlong Song 
> Signed-off-by: Chao Yu 
> ---
>  Documentation/ABI/testing/sysfs-fs-f2fs |  3 ++-
>  fs/f2fs/f2fs.h  | 13 +++--
>  fs/f2fs/super.c |  3 ++-
>  fs/f2fs/sysfs.c | 15 +--
>  4 files changed, 28 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
> b/Documentation/ABI/testing/sysfs-fs-f2fs
> index 11b7f4e..ba282ca 100644
> --- a/Documentation/ABI/testing/sysfs-fs-f2fs
> +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
> @@ -138,7 +138,8 @@ What: /sys/fs/f2fs//reserved_blocks
>  Date:June 2017
>  Contact: "Chao Yu" 
>  Description:
> -  Controls current reserved blocks in system.
> +  Controls current reserved blocks in system, the threshold
> +  is soft, it could exceed current available user space.
>  
>  What:/sys/fs/f2fs//gc_urgent
>  Date:August 2017
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 2f20b6b..84ccbdc 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1041,6 +1041,7 @@ struct f2fs_sb_info {
>   block_t discard_blks;   /* discard command candidats */
>   block_t last_valid_block_count; /* for recovery */
>   block_t reserved_blocks;/* configurable reserved blocks 
> */
> + block_t cur_reserved_blocks;/* current reserved blocks */
>  
>   u32 s_next_generation;  /* for NFS support */
>  
> @@ -1515,7 +1516,8 @@ static inline int inc_valid_block_count(struct 
> f2fs_sb_info *sbi,
>  
>   spin_lock(>stat_lock);
>   sbi->total_valid_block_count += (block_t)(*count);
> - avail_user_block_count = sbi->user_block_count - sbi->reserved_blocks;
> + avail_user_block_count = sbi->user_block_count -
> + sbi->cur_reserved_blocks;
>   if (unlikely(sbi->total_valid_block_count > avail_user_block_count)) {
>   diff = sbi->total_valid_block_count - avail_user_block_count;
>   *count -= diff;
> @@ -1549,6 +1551,10 @@ static inline void dec_valid_block_count(struct 
> f2fs_sb_info *sbi,
>   f2fs_bug_on(sbi, sbi->total_valid_block_count < (block_t) count);
>   f2fs_bug_on(sbi, inode->i_blocks < sectors);
>   sbi->total_valid_block_count -= (block_t)count;
> + if (sbi->reserved_blocks &&
> + sbi->reserved_blocks != sbi->cur_reserved_blocks)
> + sbi->cur_reserved_blocks = min(sbi->reserved_blocks,
> + sbi->cur_reserved_blocks + count);
>   spin_unlock(>stat_lock);
>   f2fs_i_blocks_write(inode, count, false, true);
>  }
> @@ -1695,7 +1701,7 @@ static inline int inc_valid_node_count(struct 
> f2fs_sb_info *sbi,
>   spin_lock(>stat_lock);
>  
>   valid_block_count = sbi->total_valid_block_count + 1;
> - if (unlikely(valid_block_count + sbi->reserved_blocks >
> + if (unlikely(valid_block_count + sbi->cur_reserved_blocks >
>   sbi->user_block_count)) {
>   spin_unlock(>stat_lock);
>   goto enospc;
> @@ -1738,6 +1744,9 @@ static inline void dec_valid_node_count(struct 
> f2fs_sb_info *sbi,
>  
>   sbi->total_valid_node_count--;
>   sbi->total_valid_block_count--;
> + if (sbi->reserved_blocks &&
> + sbi->reserved_blocks != sbi->cur_reserved_blocks)
> + sbi->cur_reserved_blocks++;
>  
>   spin_unlock(>stat_lock);
>  
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 4c1bdcb..16a805f 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -957,7 +957,7 @@ static int f2fs_statfs(struct dentry *dentry, struct 
> kstatfs *buf)
>   buf->f_blocks = total_count - start_count;
>   buf->f_bfree = user_block_count - valid_user_blocks(sbi) + 

[PATCH] staging: lustre: uapi: properly include lustre_errno.h

2017-08-18 Thread James Simmons
The path for lustre_errno.h was not updated to the new path
for errno.c. Also the header lustre_net.h uses code found in
lustre_errno.h but doesn't include the header directly. It
is easy to forget to add the header lustre_errno.h before
including lustre_net.h so it is just easier to include
lustre_errno.h in lustre_net.h. This should resolve the
build issues seen on the ia64 platform.

Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/include/lustre_net.h | 1 +
 drivers/staging/lustre/lustre/ptlrpc/errno.c   | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h 
b/drivers/staging/lustre/lustre/include/lustre_net.h
index 1ae7e83..bc18131 100644
--- a/drivers/staging/lustre/lustre/include/lustre_net.h
+++ b/drivers/staging/lustre/lustre/include/lustre_net.h
@@ -55,6 +55,7 @@
 #include "../../include/uapi/linux/lnet/nidstr.h"
 #include "../../include/linux/lnet/api.h"
 #include "../../include/uapi/linux/lustre/lustre_idl.h"
+#include "lustre_errno.h"
 #include "lustre_ha.h"
 #include "lustre_sec.h"
 #include "lustre_import.h"
diff --git a/drivers/staging/lustre/lustre/ptlrpc/errno.c 
b/drivers/staging/lustre/lustre/ptlrpc/errno.c
index 73f8374..faad8d8 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/errno.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/errno.c
@@ -26,7 +26,7 @@
  */
 
 #include "../../include/linux/libcfs/libcfs.h"
-#include "../include/lustre/lustre_errno.h"
+#include "../include/lustre_errno.h"
 
 /*
  * The two translation tables below must define a one-to-one mapping between
-- 
1.8.3.1



[PATCH] staging: lustre: uapi: properly include lustre_errno.h

2017-08-18 Thread James Simmons
The path for lustre_errno.h was not updated to the new path
for errno.c. Also the header lustre_net.h uses code found in
lustre_errno.h but doesn't include the header directly. It
is easy to forget to add the header lustre_errno.h before
including lustre_net.h so it is just easier to include
lustre_errno.h in lustre_net.h. This should resolve the
build issues seen on the ia64 platform.

Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/include/lustre_net.h | 1 +
 drivers/staging/lustre/lustre/ptlrpc/errno.c   | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h 
b/drivers/staging/lustre/lustre/include/lustre_net.h
index 1ae7e83..bc18131 100644
--- a/drivers/staging/lustre/lustre/include/lustre_net.h
+++ b/drivers/staging/lustre/lustre/include/lustre_net.h
@@ -55,6 +55,7 @@
 #include "../../include/uapi/linux/lnet/nidstr.h"
 #include "../../include/linux/lnet/api.h"
 #include "../../include/uapi/linux/lustre/lustre_idl.h"
+#include "lustre_errno.h"
 #include "lustre_ha.h"
 #include "lustre_sec.h"
 #include "lustre_import.h"
diff --git a/drivers/staging/lustre/lustre/ptlrpc/errno.c 
b/drivers/staging/lustre/lustre/ptlrpc/errno.c
index 73f8374..faad8d8 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/errno.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/errno.c
@@ -26,7 +26,7 @@
  */
 
 #include "../../include/linux/libcfs/libcfs.h"
-#include "../include/lustre/lustre_errno.h"
+#include "../include/lustre_errno.h"
 
 /*
  * The two translation tables below must define a one-to-one mapping between
-- 
1.8.3.1



Re: [RFC PATCH v3] pci: Concurrency issue during pci enable bridge

2017-08-18 Thread Bjorn Helgaas
On Wed, Aug 16, 2017 at 10:33:12PM +0530, Srinath Mannam wrote:
> Hi Bjorn,
> 
> Thank you for the feedback.
> 
> My comments are in lined.
> 
> On Wed, Aug 16, 2017 at 7:13 PM, Bjorn Helgaas  wrote:
> > On Fri, Aug 04, 2017 at 08:27:28PM +0530, Srinath Mannam wrote:
> >> Concurrency issue is observed during pci enable bridge called
> >> for multiple pci devices initialization in SMP system.
> >>
> >> Setup details:
> >>  - SMP system with 8 ARMv8 cores running Linux kernel(4.11).
> >>  - Two EPs are connected to PCIe RC through bridge as shown
> >>in the below figure.
> >>
> >>[RC]
> >> |
> >>  [BRIDGE]
> >> |
> >>---
> >>   |   |
> >>  [EP][EP]
> >>
> >> Issue description:
> >> After PCIe enumeration completed EP driver probe function called
> >> for both the devices from two CPUS simultaneously.
> >> From EP probe function, pci_enable_device_mem called for both the EPs.
> >> This function called pci_enable_bridge enable for all the bridges
> >> recursively in the path of EP to RC.
> >>
> >> Inside pci_enable_bridge function, at two places concurrency issue is
> >> observed.
> >>
> >> Place 1:
> >>   CPU 0:
> >> 1. Done Atomic increment dev->enable_cnt
> >>in pci_enable_device_flags
> >> 2. Inside pci_enable_resources
> >> 3. Completed pci_read_config_word(dev, PCI_COMMAND, )
> >> 4. Ready to set PCI_COMMAND_MEMORY (0x2) in
> >>pci_write_config_word(dev, PCI_COMMAND, cmd)
> >>   CPU 1:
> >> 1. Check pci_is_enabled in function pci_enable_bridge
> >>and it is true
> >> 2. Check (!dev->is_busmaster) also true
> >> 3. Gone into pci_set_master
> >> 4. Completed pci_read_config_word(dev, PCI_COMMAND, _cmd)
> >> 5. Ready to set PCI_COMMAND_MASTER (0x4) in
> >>pci_write_config_word(dev, PCI_COMMAND, cmd)
> >>
> >> By the time of last point for both the CPUs are read value 0 and
> >> ready to write 2 and 4.
> >> After last point final value in PCI_COMMAND register is 4 instead of 6.
> >>
> >> Place 2:
> >>   CPU 0:
> >> 1. Done Atomic increment dev->enable_cnt in
> >>pci_enable_device_flags
> >>
> >> Signed-off-by: Srinath Mannam 
> >> ---
> >>  drivers/pci/pci.c | 8 ++--
> >>  1 file changed, 6 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >> index af0cc34..12721df 100644
> >> --- a/drivers/pci/pci.c
> >> +++ b/drivers/pci/pci.c
> >> @@ -52,6 +52,7 @@ static void pci_pme_list_scan(struct work_struct *work);
> >>  static LIST_HEAD(pci_pme_list);
> >>  static DEFINE_MUTEX(pci_pme_list_mutex);
> >>  static DECLARE_DELAYED_WORK(pci_pme_work, pci_pme_list_scan);
> >> +static DEFINE_MUTEX(pci_bridge_mutex);
> >>
> >>  struct pci_pme_device {
> >>   struct list_head list;
> >> @@ -1348,10 +1349,11 @@ static void pci_enable_bridge(struct pci_dev *dev)
> >>   if (bridge)
> >>   pci_enable_bridge(bridge);
> >>
> >> + mutex_lock(_bridge_mutex);
> >>   if (pci_is_enabled(dev)) {
> >>   if (!dev->is_busmaster)
> >>   pci_set_master(dev);
> >> - return;
> >> + goto end;
> >>   }
> >>
> >>   retval = pci_enable_device(dev);
> >> @@ -1359,6 +1361,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
> >>   dev_err(>dev, "Error enabling bridge (%d), 
> >> continuing\n",
> >>   retval);
> >>   pci_set_master(dev);
> >> +end:
> >> + mutex_unlock(_bridge_mutex);
> >
> > I think this will deadlock because we're holding pci_bridge_mutex
> > while we call pci_enable_device(), which may recursively call
> > pci_enable_bridge(), which would try to acquire pci_bridge_mutex
> > again.  My original suggestion of a mutex in the host bridge would
> > have the same problem.
> 
> This extra check "if (bridge && !pci_is_enabled(bridge))" will resolve
> the deadlock in the present patch.

Oh, I see.  In pci_enable_bridge(X), the recursive call to enable
upstream bridges happens *before* acquiring pci_bridge_mutex, so when
we call pci_enable_device(X) while holding the mutex, any upstream
bridges will have been already enabled.

We recurse all the way to the top of the hierarchy, acquire the mutex,
enable the bridge, release the mutex, then unwind and repeat for each
bridge on the way down to the endpoint.

> > We talked about using device_lock() earlier.  You found some problems
> > with that, and I'd like to understand them better.  You said:
> >
> >> But the pci_enable_bridge is called in the context of the driver
> >> probe function, we will have nexted lock problem.
> >
> > The driver core does hold device_lock() while calling the driver probe
> > function, in this path:
> >
> >   device_initial_probe
> > __device_attach
> >   device_lock(dev)# <-- lock
> >   

Re: [RFC PATCH v3] pci: Concurrency issue during pci enable bridge

2017-08-18 Thread Bjorn Helgaas
On Wed, Aug 16, 2017 at 10:33:12PM +0530, Srinath Mannam wrote:
> Hi Bjorn,
> 
> Thank you for the feedback.
> 
> My comments are in lined.
> 
> On Wed, Aug 16, 2017 at 7:13 PM, Bjorn Helgaas  wrote:
> > On Fri, Aug 04, 2017 at 08:27:28PM +0530, Srinath Mannam wrote:
> >> Concurrency issue is observed during pci enable bridge called
> >> for multiple pci devices initialization in SMP system.
> >>
> >> Setup details:
> >>  - SMP system with 8 ARMv8 cores running Linux kernel(4.11).
> >>  - Two EPs are connected to PCIe RC through bridge as shown
> >>in the below figure.
> >>
> >>[RC]
> >> |
> >>  [BRIDGE]
> >> |
> >>---
> >>   |   |
> >>  [EP][EP]
> >>
> >> Issue description:
> >> After PCIe enumeration completed EP driver probe function called
> >> for both the devices from two CPUS simultaneously.
> >> From EP probe function, pci_enable_device_mem called for both the EPs.
> >> This function called pci_enable_bridge enable for all the bridges
> >> recursively in the path of EP to RC.
> >>
> >> Inside pci_enable_bridge function, at two places concurrency issue is
> >> observed.
> >>
> >> Place 1:
> >>   CPU 0:
> >> 1. Done Atomic increment dev->enable_cnt
> >>in pci_enable_device_flags
> >> 2. Inside pci_enable_resources
> >> 3. Completed pci_read_config_word(dev, PCI_COMMAND, )
> >> 4. Ready to set PCI_COMMAND_MEMORY (0x2) in
> >>pci_write_config_word(dev, PCI_COMMAND, cmd)
> >>   CPU 1:
> >> 1. Check pci_is_enabled in function pci_enable_bridge
> >>and it is true
> >> 2. Check (!dev->is_busmaster) also true
> >> 3. Gone into pci_set_master
> >> 4. Completed pci_read_config_word(dev, PCI_COMMAND, _cmd)
> >> 5. Ready to set PCI_COMMAND_MASTER (0x4) in
> >>pci_write_config_word(dev, PCI_COMMAND, cmd)
> >>
> >> By the time of last point for both the CPUs are read value 0 and
> >> ready to write 2 and 4.
> >> After last point final value in PCI_COMMAND register is 4 instead of 6.
> >>
> >> Place 2:
> >>   CPU 0:
> >> 1. Done Atomic increment dev->enable_cnt in
> >>pci_enable_device_flags
> >>
> >> Signed-off-by: Srinath Mannam 
> >> ---
> >>  drivers/pci/pci.c | 8 ++--
> >>  1 file changed, 6 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >> index af0cc34..12721df 100644
> >> --- a/drivers/pci/pci.c
> >> +++ b/drivers/pci/pci.c
> >> @@ -52,6 +52,7 @@ static void pci_pme_list_scan(struct work_struct *work);
> >>  static LIST_HEAD(pci_pme_list);
> >>  static DEFINE_MUTEX(pci_pme_list_mutex);
> >>  static DECLARE_DELAYED_WORK(pci_pme_work, pci_pme_list_scan);
> >> +static DEFINE_MUTEX(pci_bridge_mutex);
> >>
> >>  struct pci_pme_device {
> >>   struct list_head list;
> >> @@ -1348,10 +1349,11 @@ static void pci_enable_bridge(struct pci_dev *dev)
> >>   if (bridge)
> >>   pci_enable_bridge(bridge);
> >>
> >> + mutex_lock(_bridge_mutex);
> >>   if (pci_is_enabled(dev)) {
> >>   if (!dev->is_busmaster)
> >>   pci_set_master(dev);
> >> - return;
> >> + goto end;
> >>   }
> >>
> >>   retval = pci_enable_device(dev);
> >> @@ -1359,6 +1361,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
> >>   dev_err(>dev, "Error enabling bridge (%d), 
> >> continuing\n",
> >>   retval);
> >>   pci_set_master(dev);
> >> +end:
> >> + mutex_unlock(_bridge_mutex);
> >
> > I think this will deadlock because we're holding pci_bridge_mutex
> > while we call pci_enable_device(), which may recursively call
> > pci_enable_bridge(), which would try to acquire pci_bridge_mutex
> > again.  My original suggestion of a mutex in the host bridge would
> > have the same problem.
> 
> This extra check "if (bridge && !pci_is_enabled(bridge))" will resolve
> the deadlock in the present patch.

Oh, I see.  In pci_enable_bridge(X), the recursive call to enable
upstream bridges happens *before* acquiring pci_bridge_mutex, so when
we call pci_enable_device(X) while holding the mutex, any upstream
bridges will have been already enabled.

We recurse all the way to the top of the hierarchy, acquire the mutex,
enable the bridge, release the mutex, then unwind and repeat for each
bridge on the way down to the endpoint.

> > We talked about using device_lock() earlier.  You found some problems
> > with that, and I'd like to understand them better.  You said:
> >
> >> But the pci_enable_bridge is called in the context of the driver
> >> probe function, we will have nexted lock problem.
> >
> > The driver core does hold device_lock() while calling the driver probe
> > function, in this path:
> >
> >   device_initial_probe
> > __device_attach
> >   device_lock(dev)# <-- lock
> >   __device_attach_driver
> > ...
> >   

Re: [PATCH] dts: Make it easier to enable __symbols__ generation in dtb files

2017-08-18 Thread Tom Rini
On Wed, Aug 16, 2017 at 11:32:08PM -0700, Frank Rowand wrote:
> Hi Tom,
> 
> Some nit picking on the patch comment.  :-)
> 
> 
> On 08/16/17 17:35, Tom Rini wrote:
> > This introduces the variabe DTC_EXTRA_FLAGS to allow for additional
> > flags to be passed to dtc.  While this can have many uses (such as
> > easier testing of new warning flags) the use case I wish to spell out
> > here is passing of -@ to enable __symbols__ to be included in the
> > resulting dtb and thus 'stacked' overlays to be able to be used.
> 
> Not just stacked overlays (I'm guessing by that you mean an
> overlay applied on top of an overlay), but even an overlay on
> top of the base dtb.

In the case of outside of the kernel, we can today have some classes of
overlays that work today.  "Stacked" is how Pantelis has been referring
to overlays that make use of __symbols__ so that we can resolve
references.  I guess this ended up being referenced as plugin support in
dtc.git tho?  I'm not quite sure how to re-word the above to be clear
that it's extending what can be used.

> > When passing in -@ this will increase the size of the dtb (and resident
> > memory usage) in exchange for allowing more types of overlays to be
> > applied to the dtb file prior to passing it to Linux and additional
> > functionality within the kernel when OF_OVERLAY is enabled and in tools
> > outside of the kernel.
> 
> Not so much additional functionality within the kernel.  It is basic
> functionality for any overlay to be applied in the kernel.

Oh, I thought the kernel already handled using that information.  I'll
think how to reword the line there then.

> > Cc: Jason Kridner 
> > Cc: Drew Fustini 
> > Cc: Robert Nelson 
> > Cc: Lokesh Vutla 
> > Cc: Frank Rowand 
> > Cc: Rob Herring 
> > Cc: Mark Rutland 
> > Cc: Russell King 
> > Cc: Masahiro Yamada 
> > Cc: Michal Marek 
> > CC: linux-kbu...@vger.kernel.org
> > Cc: devicet...@vger.kernel.org
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Tom Rini 
> > ---
> > This would be v3 of my attempt to enable __symbols__ more widely (v2,
> > for people not on the CC, enabled it always, but for some boards).
> > 
> > To be clear, this version of the patch is more flexible in that it
> > allows for arbitrary flags to be passed to dtc, which can have other
> > uses.  But, I'm still spelling out -@ being the motivation behind this
> > for clarity.
> 
> Thank you, that is very useful info to capture in the commit history.
> 
> 
> > Both Frank and Rob were uncomfortable with v2, and Frank
> > suggested this approach to make it easier for users  / distributions /
> > etc to opt-in to this functionality.  For clarity, Frank was suggesting
> > (at least at first) perhaps also a tie-in CONFIG option, which I have
> > not done here as I feel that's adding extra hurdles to something that
> > must already be opt-in.  Given that today many of the immediate uses are
> > for "pre-boot" application of overlays (SoM + carrier + add-ons or EVM +
> > LCD for example), I believe this is an acceptable trade-off to make
> > currently.
> 
> If I understand correctly, then you are following Rob's suggestion of
> passing the DTC_EXTRA_FLAGS value in the dtb make command.
> 
> That seems to be a reasonable approach for now.  Rob did mention that
> he would like to remove the current hard coding of a base DTC_FLAGS
> value, where several architectures initialize it with one of:
> 
>DTC_FLAGS ?= -p 1024
>DTC_FLAGS := -p 1024
> 
> If those initializations can be removed, then we could just use
> DTC_FLAGS in the make command, instead of adding DTC_EXTRA_FLAGS.
> Can you discuss that option with Rob?  (Sorry Rob, I'll be off-grid
> soon.)

With respect to the current users of DTC_FLAGS, we could just change c6x
/ microblaze / openrisc / powerpc to use '+=' not '?=' or ':=' as
one-line patches, from a quick test on microblaze.  Or maybe I'm missing
something?

But we can't only pass DTC_FLAGS on the command-line directly, unless we
also pass in all other valid and normal flags (ie
-Wno-unit_address_vs_reg).  This is in-line with how for example CFLAGS
is handled in the kernel, rather than by what, based on some googling,
Makefiles can be made to do, in general.  Thanks!

-- 
Tom


signature.asc
Description: Digital signature


Re: [PATCH] dts: Make it easier to enable __symbols__ generation in dtb files

2017-08-18 Thread Tom Rini
On Wed, Aug 16, 2017 at 11:32:08PM -0700, Frank Rowand wrote:
> Hi Tom,
> 
> Some nit picking on the patch comment.  :-)
> 
> 
> On 08/16/17 17:35, Tom Rini wrote:
> > This introduces the variabe DTC_EXTRA_FLAGS to allow for additional
> > flags to be passed to dtc.  While this can have many uses (such as
> > easier testing of new warning flags) the use case I wish to spell out
> > here is passing of -@ to enable __symbols__ to be included in the
> > resulting dtb and thus 'stacked' overlays to be able to be used.
> 
> Not just stacked overlays (I'm guessing by that you mean an
> overlay applied on top of an overlay), but even an overlay on
> top of the base dtb.

In the case of outside of the kernel, we can today have some classes of
overlays that work today.  "Stacked" is how Pantelis has been referring
to overlays that make use of __symbols__ so that we can resolve
references.  I guess this ended up being referenced as plugin support in
dtc.git tho?  I'm not quite sure how to re-word the above to be clear
that it's extending what can be used.

> > When passing in -@ this will increase the size of the dtb (and resident
> > memory usage) in exchange for allowing more types of overlays to be
> > applied to the dtb file prior to passing it to Linux and additional
> > functionality within the kernel when OF_OVERLAY is enabled and in tools
> > outside of the kernel.
> 
> Not so much additional functionality within the kernel.  It is basic
> functionality for any overlay to be applied in the kernel.

Oh, I thought the kernel already handled using that information.  I'll
think how to reword the line there then.

> > Cc: Jason Kridner 
> > Cc: Drew Fustini 
> > Cc: Robert Nelson 
> > Cc: Lokesh Vutla 
> > Cc: Frank Rowand 
> > Cc: Rob Herring 
> > Cc: Mark Rutland 
> > Cc: Russell King 
> > Cc: Masahiro Yamada 
> > Cc: Michal Marek 
> > CC: linux-kbu...@vger.kernel.org
> > Cc: devicet...@vger.kernel.org
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Tom Rini 
> > ---
> > This would be v3 of my attempt to enable __symbols__ more widely (v2,
> > for people not on the CC, enabled it always, but for some boards).
> > 
> > To be clear, this version of the patch is more flexible in that it
> > allows for arbitrary flags to be passed to dtc, which can have other
> > uses.  But, I'm still spelling out -@ being the motivation behind this
> > for clarity.
> 
> Thank you, that is very useful info to capture in the commit history.
> 
> 
> > Both Frank and Rob were uncomfortable with v2, and Frank
> > suggested this approach to make it easier for users  / distributions /
> > etc to opt-in to this functionality.  For clarity, Frank was suggesting
> > (at least at first) perhaps also a tie-in CONFIG option, which I have
> > not done here as I feel that's adding extra hurdles to something that
> > must already be opt-in.  Given that today many of the immediate uses are
> > for "pre-boot" application of overlays (SoM + carrier + add-ons or EVM +
> > LCD for example), I believe this is an acceptable trade-off to make
> > currently.
> 
> If I understand correctly, then you are following Rob's suggestion of
> passing the DTC_EXTRA_FLAGS value in the dtb make command.
> 
> That seems to be a reasonable approach for now.  Rob did mention that
> he would like to remove the current hard coding of a base DTC_FLAGS
> value, where several architectures initialize it with one of:
> 
>DTC_FLAGS ?= -p 1024
>DTC_FLAGS := -p 1024
> 
> If those initializations can be removed, then we could just use
> DTC_FLAGS in the make command, instead of adding DTC_EXTRA_FLAGS.
> Can you discuss that option with Rob?  (Sorry Rob, I'll be off-grid
> soon.)

With respect to the current users of DTC_FLAGS, we could just change c6x
/ microblaze / openrisc / powerpc to use '+=' not '?=' or ':=' as
one-line patches, from a quick test on microblaze.  Or maybe I'm missing
something?

But we can't only pass DTC_FLAGS on the command-line directly, unless we
also pass in all other valid and normal flags (ie
-Wno-unit_address_vs_reg).  This is in-line with how for example CFLAGS
is handled in the kernel, rather than by what, based on some googling,
Makefiles can be made to do, in general.  Thanks!

-- 
Tom


signature.asc
Description: Digital signature


[PATCH] iio: accel: mma8452: code improvements to handle more than one event

2017-08-18 Thread Harinath Nampally
This driver supports multiple devices like mma8653, mma8652, mma8452, mma8453 
and
fxls8471. Almost all these devices have more than one event. Current driver 
design
hardcodes the event specific information, so only one event can be supported by 
this
driver and current design doesn't have the flexibility to add more events.

This patch improves by detaching the event related information from chip_info 
struct,
and based on channel type and event direction the corresponding event 
configuration registers
are picked dynamically. Hence both transient and freefall events can be handled 
in read/write callbacks.

Changes are thoroughly tested on fxls8471 device on imx6UL Eval board using 
iio_event_monitor user space program.

After this fix both Freefall and Transient events are handled by the driver 
without any conflicts.

Changes since v2 -> v3
-Fix typo in commit message
-Replace word 'Bugfix' with 'Improvements'
-Describe more accurate commit message
-Replace breaks with returns
-Initialise transient event threshold mask
-Remove unrelated change of IIO_ACCEL channel type check in read/write event 
callbacks

Changes since v1 -> v2
-Fix indentations
-Remove unused fields in mma8452_event_regs struct
-Remove redundant return statement
-Remove unrelated changes like checkpatch.pl warning fixes

Signed-off-by: Harinath Nampally 
---
 drivers/iio/accel/mma8452.c | 272 +++-
 1 file changed, 118 insertions(+), 154 deletions(-)

diff --git a/drivers/iio/accel/mma8452.c b/drivers/iio/accel/mma8452.c
index eb6e3dc..8de9928 100644
--- a/drivers/iio/accel/mma8452.c
+++ b/drivers/iio/accel/mma8452.c
@@ -59,7 +59,9 @@
 #define MMA8452_FF_MT_THS  0x17
 #define  MMA8452_FF_MT_THS_MASK0x7f
 #define MMA8452_FF_MT_COUNT0x18
+#define MMA8452_FF_MT_CHAN_SHIFT   3
 #define MMA8452_TRANSIENT_CFG  0x1d
+#define  MMA8452_TRANSIENT_CFG_CHAN(chan)  BIT(chan + 1)
 #define  MMA8452_TRANSIENT_CFG_HPF_BYP BIT(0)
 #define  MMA8452_TRANSIENT_CFG_ELE BIT(4)
 #define MMA8452_TRANSIENT_SRC  0x1e
@@ -69,6 +71,7 @@
 #define MMA8452_TRANSIENT_THS  0x1f
 #define  MMA8452_TRANSIENT_THS_MASKGENMASK(6, 0)
 #define MMA8452_TRANSIENT_COUNT0x20
+#define MMA8452_TRANSIENT_CHAN_SHIFT 1
 #define MMA8452_CTRL_REG1  0x2a
 #define  MMA8452_CTRL_ACTIVE   BIT(0)
 #define  MMA8452_CTRL_DR_MASK  GENMASK(5, 3)
@@ -107,6 +110,26 @@ struct mma8452_data {
const struct mma_chip_info *chip_info;
 };
 
+ /**
+  * struct mma8452_event_regs - chip specific data related to events
+  * @ev_cfg:   event config register address
+  * @ev_src:   event source register address
+  * @ev_ths:   event threshold register address
+  * @ev_ths_mask:  mask for the threshold value
+  * @ev_count: event count (period) register address
+  *
+  * Since not all chips supported by the driver support comparing high pass
+  * filtered data for events (interrupts), different interrupt sources are
+  * used for different chips and the relevant registers are included here.
+  */
+struct mma8452_event_regs {
+   u8 ev_cfg;
+   u8 ev_src;
+   u8 ev_ths;
+   u8 ev_ths_mask;
+   u8 ev_count;
+};
+
 /**
  * struct mma_chip_info - chip specific data
  * @chip_id:   WHO_AM_I register's value
@@ -116,40 +139,12 @@ struct mma8452_data {
  * @mma_scales:scale factors for converting register 
values
  * to m/s^2; 3 modes: 2g, 4g, 8g; 2 integers
  * per mode: m/s^2 and micro m/s^2
- * @ev_cfg:event config register address
- * @ev_cfg_ele:latch bit in event config register
- * @ev_cfg_chan_shift: number of the bit to enable events in X
- * direction; in event config register
- * @ev_src:event source register address
- * @ev_src_xe: bit in event source register that indicates
- * an event in X direction
- * @ev_src_ye: bit in event source register that indicates
- * an event in Y direction
- * @ev_src_ze: bit in event source register that indicates
- * an event in Z direction
- * @ev_ths:event threshold register address
- * @ev_ths_mask:   mask for the threshold value
- * @ev_count:  event count (period) register address
- *
- * Since not all chips supported by the driver support comparing high pass
- * filtered data for events (interrupts), different interrupt sources are
- * used for different chips and the 

[PATCH] iio: accel: mma8452: code improvements to handle more than one event

2017-08-18 Thread Harinath Nampally
This driver supports multiple devices like mma8653, mma8652, mma8452, mma8453 
and
fxls8471. Almost all these devices have more than one event. Current driver 
design
hardcodes the event specific information, so only one event can be supported by 
this
driver and current design doesn't have the flexibility to add more events.

This patch improves by detaching the event related information from chip_info 
struct,
and based on channel type and event direction the corresponding event 
configuration registers
are picked dynamically. Hence both transient and freefall events can be handled 
in read/write callbacks.

Changes are thoroughly tested on fxls8471 device on imx6UL Eval board using 
iio_event_monitor user space program.

After this fix both Freefall and Transient events are handled by the driver 
without any conflicts.

Changes since v2 -> v3
-Fix typo in commit message
-Replace word 'Bugfix' with 'Improvements'
-Describe more accurate commit message
-Replace breaks with returns
-Initialise transient event threshold mask
-Remove unrelated change of IIO_ACCEL channel type check in read/write event 
callbacks

Changes since v1 -> v2
-Fix indentations
-Remove unused fields in mma8452_event_regs struct
-Remove redundant return statement
-Remove unrelated changes like checkpatch.pl warning fixes

Signed-off-by: Harinath Nampally 
---
 drivers/iio/accel/mma8452.c | 272 +++-
 1 file changed, 118 insertions(+), 154 deletions(-)

diff --git a/drivers/iio/accel/mma8452.c b/drivers/iio/accel/mma8452.c
index eb6e3dc..8de9928 100644
--- a/drivers/iio/accel/mma8452.c
+++ b/drivers/iio/accel/mma8452.c
@@ -59,7 +59,9 @@
 #define MMA8452_FF_MT_THS  0x17
 #define  MMA8452_FF_MT_THS_MASK0x7f
 #define MMA8452_FF_MT_COUNT0x18
+#define MMA8452_FF_MT_CHAN_SHIFT   3
 #define MMA8452_TRANSIENT_CFG  0x1d
+#define  MMA8452_TRANSIENT_CFG_CHAN(chan)  BIT(chan + 1)
 #define  MMA8452_TRANSIENT_CFG_HPF_BYP BIT(0)
 #define  MMA8452_TRANSIENT_CFG_ELE BIT(4)
 #define MMA8452_TRANSIENT_SRC  0x1e
@@ -69,6 +71,7 @@
 #define MMA8452_TRANSIENT_THS  0x1f
 #define  MMA8452_TRANSIENT_THS_MASKGENMASK(6, 0)
 #define MMA8452_TRANSIENT_COUNT0x20
+#define MMA8452_TRANSIENT_CHAN_SHIFT 1
 #define MMA8452_CTRL_REG1  0x2a
 #define  MMA8452_CTRL_ACTIVE   BIT(0)
 #define  MMA8452_CTRL_DR_MASK  GENMASK(5, 3)
@@ -107,6 +110,26 @@ struct mma8452_data {
const struct mma_chip_info *chip_info;
 };
 
+ /**
+  * struct mma8452_event_regs - chip specific data related to events
+  * @ev_cfg:   event config register address
+  * @ev_src:   event source register address
+  * @ev_ths:   event threshold register address
+  * @ev_ths_mask:  mask for the threshold value
+  * @ev_count: event count (period) register address
+  *
+  * Since not all chips supported by the driver support comparing high pass
+  * filtered data for events (interrupts), different interrupt sources are
+  * used for different chips and the relevant registers are included here.
+  */
+struct mma8452_event_regs {
+   u8 ev_cfg;
+   u8 ev_src;
+   u8 ev_ths;
+   u8 ev_ths_mask;
+   u8 ev_count;
+};
+
 /**
  * struct mma_chip_info - chip specific data
  * @chip_id:   WHO_AM_I register's value
@@ -116,40 +139,12 @@ struct mma8452_data {
  * @mma_scales:scale factors for converting register 
values
  * to m/s^2; 3 modes: 2g, 4g, 8g; 2 integers
  * per mode: m/s^2 and micro m/s^2
- * @ev_cfg:event config register address
- * @ev_cfg_ele:latch bit in event config register
- * @ev_cfg_chan_shift: number of the bit to enable events in X
- * direction; in event config register
- * @ev_src:event source register address
- * @ev_src_xe: bit in event source register that indicates
- * an event in X direction
- * @ev_src_ye: bit in event source register that indicates
- * an event in Y direction
- * @ev_src_ze: bit in event source register that indicates
- * an event in Z direction
- * @ev_ths:event threshold register address
- * @ev_ths_mask:   mask for the threshold value
- * @ev_count:  event count (period) register address
- *
- * Since not all chips supported by the driver support comparing high pass
- * filtered data for events (interrupts), different interrupt sources are
- * used for different chips and the relevant registers are 

[PATCH] serial: earlycon: Only try fdt when specify 'earlycon' exactly

2017-08-18 Thread Jeffy Chen
When moving earlycon early_param handling to serial, the devicetree
earlycons enable condition changed slightly.

We used to only do that for 'console'/'earlycon', but now would also
for 'console='/'earlycon='.

Fix it by using the same condition like before.

Fixes: d503187b6cc4 (of/serial: move earlycon early_param handling to serial)
Signed-off-by: Jeffy Chen 
---

 drivers/tty/serial/earlycon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/serial/earlycon.c b/drivers/tty/serial/earlycon.c
index c3651540e1ba..335933e1822c 100644
--- a/drivers/tty/serial/earlycon.c
+++ b/drivers/tty/serial/earlycon.c
@@ -220,7 +220,7 @@ static int __init param_setup_earlycon(char *buf)
if (IS_ENABLED(CONFIG_ACPI_SPCR_TABLE)) {
earlycon_init_is_deferred = true;
return 0;
-   } else {
+   } else if (!buf) {
return early_init_dt_scan_chosen_stdout();
}
}
-- 
2.11.0




[PATCH] serial: earlycon: Only try fdt when specify 'earlycon' exactly

2017-08-18 Thread Jeffy Chen
When moving earlycon early_param handling to serial, the devicetree
earlycons enable condition changed slightly.

We used to only do that for 'console'/'earlycon', but now would also
for 'console='/'earlycon='.

Fix it by using the same condition like before.

Fixes: d503187b6cc4 (of/serial: move earlycon early_param handling to serial)
Signed-off-by: Jeffy Chen 
---

 drivers/tty/serial/earlycon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/serial/earlycon.c b/drivers/tty/serial/earlycon.c
index c3651540e1ba..335933e1822c 100644
--- a/drivers/tty/serial/earlycon.c
+++ b/drivers/tty/serial/earlycon.c
@@ -220,7 +220,7 @@ static int __init param_setup_earlycon(char *buf)
if (IS_ENABLED(CONFIG_ACPI_SPCR_TABLE)) {
earlycon_init_is_deferred = true;
return 0;
-   } else {
+   } else if (!buf) {
return early_init_dt_scan_chosen_stdout();
}
}
-- 
2.11.0




Re: [PATCH v9 2/2] x86/boot/KASLR: Restrict kernel to be randomized in mirror regions

2017-08-18 Thread Baoquan He
On 08/18/17 at 04:10pm, Ard Biesheuvel wrote:
> On 17 August 2017 at 14:04, Baoquan He  wrote:

> > Thanks a lot for helping improving the patch log, Ingo! Will pay more
> > attention to the description in words and paragraph partition of log.
> >
> >>
> >> So if EFI is detected, iterate EFI memory map and pick the mirror region
> >> to process for adding candidate of randomization slot. If EFI is disabled
> >> or no mirror region found, still process e820 memory map.
> >>
> >> Signed-off-by: Baoquan He 
> 
> 
> Could the x86 people on cc either take these directly, or indicate
> whether they are ok with this going in via the EFI tree?

Thanks for looking into this, Ard. I saw Ingo has put these into
tip/x86/boot.

> 
> >> ---
> >>  arch/x86/boot/compressed/kaslr.c | 68 
> >> ++--
> >>  1 file changed, 66 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/x86/boot/compressed/kaslr.c 
> >> b/arch/x86/boot/compressed/kaslr.c
> >> index 99c7194f7ea6..7de23bb279ce 100644
> >> --- a/arch/x86/boot/compressed/kaslr.c
> >> +++ b/arch/x86/boot/compressed/kaslr.c
> >> @@ -37,7 +37,9 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>  #include 
> >> +#include 
> >>
> >>  /* Macros used by the included decompressor code below. */
> >>  #define STATIC
> >> @@ -558,6 +560,65 @@ static void process_mem_region(struct mem_vector 
> >> *entry,
> >>   }
> >>  }
> >>
> >> +#ifdef CONFIG_EFI
> >> +/*
> >> + * Returns true if mirror region found (and must have been processed
> >> + * for slots adding)
> >> + */
> >> +static bool
> >> +process_efi_entries(unsigned long minimum, unsigned long image_size)
> >> +{
> >> + struct efi_info *e = _params->efi_info;
> >> + bool efi_mirror_found = false;
> >> + struct mem_vector region;
> >> + efi_memory_desc_t *md;
> >> + unsigned long pmap;
> >> + char *signature;
> >> + u32 nr_desc;
> >> + int i;
> >> +
> >> + signature = (char *)>efi_loader_signature;
> >> + if (strncmp(signature, EFI32_LOADER_SIGNATURE, 4) &&
> >> + strncmp(signature, EFI64_LOADER_SIGNATURE, 4))
> >> + return false;
> >> +
> >> +#ifdef CONFIG_X86_32
> >> + /* Can't handle data above 4GB at this time */
> >> + if (e->efi_memmap_hi) {
> >> + warn("EFI memmap is above 4GB, can't be handled now on 
> >> x86_32. EFI should be disabled.\n");
> >> + return false;
> >> + }
> >> + pmap =  e->efi_memmap;
> >> +#else
> >> + pmap = (e->efi_memmap | ((__u64)e->efi_memmap_hi << 32));
> >> +#endif
> >> +
> >> + nr_desc = e->efi_memmap_size / e->efi_memdesc_size;
> >> + for (i = 0; i < nr_desc; i++) {
> >> + md = efi_early_memdesc_ptr(pmap, e->efi_memdesc_size, i);
> >> + if (md->attribute & EFI_MEMORY_MORE_RELIABLE) {
> >> + region.start = md->phys_addr;
> >> + region.size = md->num_pages << EFI_PAGE_SHIFT;
> >> + process_mem_region(, minimum, image_size);
> >> + efi_mirror_found = true;
> >> +
> >> + if (slot_area_index == MAX_SLOT_AREA) {
> >> + debug_putstr("Aborted EFI scan (slot_areas 
> >> full)!\n");
> >> + break;
> >> + }
> >> + }
> >> + }
> >> +
> >> + return efi_mirror_found;
> >> +}
> >> +#else
> >> +static inline bool
> >> +process_efi_entries(unsigned long minimum, unsigned long image_size)
> >> +{
> >> + return false;
> >> +}
> >> +#endif
> >> +
> >>  static void process_e820_entries(unsigned long minimum,
> >>unsigned long image_size)
> >>  {
> >> @@ -586,13 +647,16 @@ static unsigned long find_random_phys_addr(unsigned 
> >> long minimum,
> >>  {
> >>   /* Check if we had too many memmaps. */
> >>   if (memmap_too_large) {
> >> - debug_putstr("Aborted e820 scan (more than 4 memmap= 
> >> args)!\n");
> >> + debug_putstr("Aborted memory entries scan (more than 4 
> >> memmap= args)!\n");
> >>   return 0;
> >>   }
> >>
> >>   /* Make sure minimum is aligned. */
> >>   minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
> >>
> >> + if (process_efi_entries(minimum, image_size))
> >> + return slots_fetch_random();
> >> +
> >>   process_e820_entries(minimum, image_size);
> >>   return slots_fetch_random();
> >>  }
> >> @@ -652,7 +716,7 @@ void choose_random_location(unsigned long input,
> >>*/
> >>   min_addr = min(*output, 512UL << 20);
> >>
> >> - /* Walk e820 and find a random address. */
> >> + /* Walk available memory entries to find a random address. */
> >>   random_addr = find_random_phys_addr(min_addr, output_size);
> >>   if (!random_addr) {
> >>   warn("Physical KASLR disabled: no suitable memory region!");
> >> --
> >> 2.5.5
> >>


Re: [patch v3 2/3] drivers: jtag: Add Aspeed SoC 24xx and 25xx families JTAG master driver

2017-08-18 Thread kbuild test robot
Hi Oleksandr,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.13-rc5 next-20170817]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Oleksandr-Shamray/drivers-jtag-Add-JTAG-core-driver/20170818-114739
config: um-allyesconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=um 

All errors (new ones prefixed by >>):

   arch/um/drivers/vde.o: In function `vde_open_real':
   (.text+0xfb1): warning: Using 'getgrnam' in statically linked applications 
requires at runtime the shared libraries from the glibc version used for linking
   arch/um/drivers/vde.o: In function `vde_open_real':
   (.text+0xdfc): warning: Using 'getpwuid' in statically linked applications 
requires at runtime the shared libraries from the glibc version used for linking
   arch/um/drivers/vde.o: In function `vde_open_real':
   (.text+0x1115): warning: Using 'getaddrinfo' in statically linked 
applications requires at runtime the shared libraries from the glibc version 
used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoaddr':
   (.text+0xe475): warning: Using 'gethostbyname' in statically linked 
applications requires at runtime the shared libraries from the glibc version 
used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametonetaddr':
   (.text+0xe515): warning: Using 'getnetbyname' in statically linked 
applications requires at runtime the shared libraries from the glibc version 
used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoproto':
   (.text+0xe735): warning: Using 'getprotobyname' in statically linked 
applications requires at runtime the shared libraries from the glibc version 
used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoport':
   (.text+0xe567): warning: Using 'getservbyname' in statically linked 
applications requires at runtime the shared libraries from the glibc version 
used for linking
   drivers/auxdisplay/img-ascii-lcd.o: In function `img_ascii_lcd_probe':
   drivers/auxdisplay/img-ascii-lcd.c:386: undefined reference to 
`devm_ioremap_resource'
   drivers/jtag/jtag-aspeed.o: In function `aspeed_jtag_init':
>> drivers/jtag/jtag-aspeed.c:635: undefined reference to 
>> `devm_ioremap_resource'
   collect2: error: ld returned 1 exit status

vim +635 drivers/jtag/jtag-aspeed.c

   627  
   628  int aspeed_jtag_init(struct platform_device *pdev,
   629   struct aspeed_jtag *aspeed_jtag)
   630  {
   631  struct resource *res;
   632  int err;
   633  
   634  res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 > 635  aspeed_jtag->reg_base = devm_ioremap_resource(aspeed_jtag->dev, 
 > res);
   636  if (IS_ERR(aspeed_jtag->reg_base)) {
   637  err = -ENOMEM;
   638  goto out_region;
   639  }
   640  
   641  aspeed_jtag->pclk = devm_clk_get(aspeed_jtag->dev, NULL);
   642  if (IS_ERR(aspeed_jtag->pclk)) {
   643  dev_err(aspeed_jtag->dev, "devm_clk_get failed\n");
   644  return PTR_ERR(aspeed_jtag->pclk);
   645  }
   646  
   647  clk_prepare_enable(aspeed_jtag->pclk);
   648  
   649  aspeed_jtag->irq = platform_get_irq(pdev, 0);
   650  if (aspeed_jtag->irq < 0) {
   651  dev_err(aspeed_jtag->dev, "no irq specified\n");
   652  err = -ENOENT;
   653  goto out_region;
   654  }
   655  
   656  /* Enable clock */
   657  aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_CTL_ENG_EN |
   658ASPEED_JTAG_CTL_ENG_OUT_EN, ASPEED_JTAG_CTRL);
   659  aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_SW_MODE_EN |
   660ASPEED_JTAG_SW_MODE_TDIO, ASPEED_JTAG_SW);
   661  
   662  err = devm_request_irq(aspeed_jtag->dev, aspeed_jtag->irq,
   663 aspeed_jtag_interrupt, 0,
   664 "aspeed-jtag", aspeed_jtag);
   665  if (err) {
   666  dev_err(aspeed_jtag->dev, "aspeed_jtag unable to get 
IRQ");
   667  goto out_region;
   668  }
   669  dev_dbg(>dev, "aspeed_jtag:IRQ %d.\n", aspeed_jtag->irq);
   670  
   671  aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_ISR_INST_PAUSE |
   672ASPEED_JTAG_ISR_INST_COMPLETE |
   673ASPEED_JTAG_ISR_DATA_PAUSE |
   674ASPEED_JTAG_ISR_DATA_COMPLETE |
   675ASPEED_JTAG_ISR_INST_PAUSE_EN |
   676   

Re: [PATCH v9 2/2] x86/boot/KASLR: Restrict kernel to be randomized in mirror regions

2017-08-18 Thread Baoquan He
On 08/18/17 at 04:10pm, Ard Biesheuvel wrote:
> On 17 August 2017 at 14:04, Baoquan He  wrote:

> > Thanks a lot for helping improving the patch log, Ingo! Will pay more
> > attention to the description in words and paragraph partition of log.
> >
> >>
> >> So if EFI is detected, iterate EFI memory map and pick the mirror region
> >> to process for adding candidate of randomization slot. If EFI is disabled
> >> or no mirror region found, still process e820 memory map.
> >>
> >> Signed-off-by: Baoquan He 
> 
> 
> Could the x86 people on cc either take these directly, or indicate
> whether they are ok with this going in via the EFI tree?

Thanks for looking into this, Ard. I saw Ingo has put these into
tip/x86/boot.

> 
> >> ---
> >>  arch/x86/boot/compressed/kaslr.c | 68 
> >> ++--
> >>  1 file changed, 66 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/x86/boot/compressed/kaslr.c 
> >> b/arch/x86/boot/compressed/kaslr.c
> >> index 99c7194f7ea6..7de23bb279ce 100644
> >> --- a/arch/x86/boot/compressed/kaslr.c
> >> +++ b/arch/x86/boot/compressed/kaslr.c
> >> @@ -37,7 +37,9 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>  #include 
> >> +#include 
> >>
> >>  /* Macros used by the included decompressor code below. */
> >>  #define STATIC
> >> @@ -558,6 +560,65 @@ static void process_mem_region(struct mem_vector 
> >> *entry,
> >>   }
> >>  }
> >>
> >> +#ifdef CONFIG_EFI
> >> +/*
> >> + * Returns true if mirror region found (and must have been processed
> >> + * for slots adding)
> >> + */
> >> +static bool
> >> +process_efi_entries(unsigned long minimum, unsigned long image_size)
> >> +{
> >> + struct efi_info *e = _params->efi_info;
> >> + bool efi_mirror_found = false;
> >> + struct mem_vector region;
> >> + efi_memory_desc_t *md;
> >> + unsigned long pmap;
> >> + char *signature;
> >> + u32 nr_desc;
> >> + int i;
> >> +
> >> + signature = (char *)>efi_loader_signature;
> >> + if (strncmp(signature, EFI32_LOADER_SIGNATURE, 4) &&
> >> + strncmp(signature, EFI64_LOADER_SIGNATURE, 4))
> >> + return false;
> >> +
> >> +#ifdef CONFIG_X86_32
> >> + /* Can't handle data above 4GB at this time */
> >> + if (e->efi_memmap_hi) {
> >> + warn("EFI memmap is above 4GB, can't be handled now on 
> >> x86_32. EFI should be disabled.\n");
> >> + return false;
> >> + }
> >> + pmap =  e->efi_memmap;
> >> +#else
> >> + pmap = (e->efi_memmap | ((__u64)e->efi_memmap_hi << 32));
> >> +#endif
> >> +
> >> + nr_desc = e->efi_memmap_size / e->efi_memdesc_size;
> >> + for (i = 0; i < nr_desc; i++) {
> >> + md = efi_early_memdesc_ptr(pmap, e->efi_memdesc_size, i);
> >> + if (md->attribute & EFI_MEMORY_MORE_RELIABLE) {
> >> + region.start = md->phys_addr;
> >> + region.size = md->num_pages << EFI_PAGE_SHIFT;
> >> + process_mem_region(, minimum, image_size);
> >> + efi_mirror_found = true;
> >> +
> >> + if (slot_area_index == MAX_SLOT_AREA) {
> >> + debug_putstr("Aborted EFI scan (slot_areas 
> >> full)!\n");
> >> + break;
> >> + }
> >> + }
> >> + }
> >> +
> >> + return efi_mirror_found;
> >> +}
> >> +#else
> >> +static inline bool
> >> +process_efi_entries(unsigned long minimum, unsigned long image_size)
> >> +{
> >> + return false;
> >> +}
> >> +#endif
> >> +
> >>  static void process_e820_entries(unsigned long minimum,
> >>unsigned long image_size)
> >>  {
> >> @@ -586,13 +647,16 @@ static unsigned long find_random_phys_addr(unsigned 
> >> long minimum,
> >>  {
> >>   /* Check if we had too many memmaps. */
> >>   if (memmap_too_large) {
> >> - debug_putstr("Aborted e820 scan (more than 4 memmap= 
> >> args)!\n");
> >> + debug_putstr("Aborted memory entries scan (more than 4 
> >> memmap= args)!\n");
> >>   return 0;
> >>   }
> >>
> >>   /* Make sure minimum is aligned. */
> >>   minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
> >>
> >> + if (process_efi_entries(minimum, image_size))
> >> + return slots_fetch_random();
> >> +
> >>   process_e820_entries(minimum, image_size);
> >>   return slots_fetch_random();
> >>  }
> >> @@ -652,7 +716,7 @@ void choose_random_location(unsigned long input,
> >>*/
> >>   min_addr = min(*output, 512UL << 20);
> >>
> >> - /* Walk e820 and find a random address. */
> >> + /* Walk available memory entries to find a random address. */
> >>   random_addr = find_random_phys_addr(min_addr, output_size);
> >>   if (!random_addr) {
> >>   warn("Physical KASLR disabled: no suitable memory region!");
> >> --
> >> 2.5.5
> >>


Re: [patch v3 2/3] drivers: jtag: Add Aspeed SoC 24xx and 25xx families JTAG master driver

2017-08-18 Thread kbuild test robot
Hi Oleksandr,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.13-rc5 next-20170817]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Oleksandr-Shamray/drivers-jtag-Add-JTAG-core-driver/20170818-114739
config: um-allyesconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=um 

All errors (new ones prefixed by >>):

   arch/um/drivers/vde.o: In function `vde_open_real':
   (.text+0xfb1): warning: Using 'getgrnam' in statically linked applications 
requires at runtime the shared libraries from the glibc version used for linking
   arch/um/drivers/vde.o: In function `vde_open_real':
   (.text+0xdfc): warning: Using 'getpwuid' in statically linked applications 
requires at runtime the shared libraries from the glibc version used for linking
   arch/um/drivers/vde.o: In function `vde_open_real':
   (.text+0x1115): warning: Using 'getaddrinfo' in statically linked 
applications requires at runtime the shared libraries from the glibc version 
used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoaddr':
   (.text+0xe475): warning: Using 'gethostbyname' in statically linked 
applications requires at runtime the shared libraries from the glibc version 
used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametonetaddr':
   (.text+0xe515): warning: Using 'getnetbyname' in statically linked 
applications requires at runtime the shared libraries from the glibc version 
used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoproto':
   (.text+0xe735): warning: Using 'getprotobyname' in statically linked 
applications requires at runtime the shared libraries from the glibc version 
used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoport':
   (.text+0xe567): warning: Using 'getservbyname' in statically linked 
applications requires at runtime the shared libraries from the glibc version 
used for linking
   drivers/auxdisplay/img-ascii-lcd.o: In function `img_ascii_lcd_probe':
   drivers/auxdisplay/img-ascii-lcd.c:386: undefined reference to 
`devm_ioremap_resource'
   drivers/jtag/jtag-aspeed.o: In function `aspeed_jtag_init':
>> drivers/jtag/jtag-aspeed.c:635: undefined reference to 
>> `devm_ioremap_resource'
   collect2: error: ld returned 1 exit status

vim +635 drivers/jtag/jtag-aspeed.c

   627  
   628  int aspeed_jtag_init(struct platform_device *pdev,
   629   struct aspeed_jtag *aspeed_jtag)
   630  {
   631  struct resource *res;
   632  int err;
   633  
   634  res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 > 635  aspeed_jtag->reg_base = devm_ioremap_resource(aspeed_jtag->dev, 
 > res);
   636  if (IS_ERR(aspeed_jtag->reg_base)) {
   637  err = -ENOMEM;
   638  goto out_region;
   639  }
   640  
   641  aspeed_jtag->pclk = devm_clk_get(aspeed_jtag->dev, NULL);
   642  if (IS_ERR(aspeed_jtag->pclk)) {
   643  dev_err(aspeed_jtag->dev, "devm_clk_get failed\n");
   644  return PTR_ERR(aspeed_jtag->pclk);
   645  }
   646  
   647  clk_prepare_enable(aspeed_jtag->pclk);
   648  
   649  aspeed_jtag->irq = platform_get_irq(pdev, 0);
   650  if (aspeed_jtag->irq < 0) {
   651  dev_err(aspeed_jtag->dev, "no irq specified\n");
   652  err = -ENOENT;
   653  goto out_region;
   654  }
   655  
   656  /* Enable clock */
   657  aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_CTL_ENG_EN |
   658ASPEED_JTAG_CTL_ENG_OUT_EN, ASPEED_JTAG_CTRL);
   659  aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_SW_MODE_EN |
   660ASPEED_JTAG_SW_MODE_TDIO, ASPEED_JTAG_SW);
   661  
   662  err = devm_request_irq(aspeed_jtag->dev, aspeed_jtag->irq,
   663 aspeed_jtag_interrupt, 0,
   664 "aspeed-jtag", aspeed_jtag);
   665  if (err) {
   666  dev_err(aspeed_jtag->dev, "aspeed_jtag unable to get 
IRQ");
   667  goto out_region;
   668  }
   669  dev_dbg(>dev, "aspeed_jtag:IRQ %d.\n", aspeed_jtag->irq);
   670  
   671  aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_ISR_INST_PAUSE |
   672ASPEED_JTAG_ISR_INST_COMPLETE |
   673ASPEED_JTAG_ISR_DATA_PAUSE |
   674ASPEED_JTAG_ISR_DATA_COMPLETE |
   675ASPEED_JTAG_ISR_INST_PAUSE_EN |
   676   

Re: [PATCH 0/3] MIPS,bpf: Improvements for MIPS eBPF JIT

2017-08-18 Thread Daniel Borkmann

On 08/19/2017 01:40 AM, David Daney wrote:

Here are several improvements and bug fixes for the MIPS eBPF JIT.

The main change is the addition of support for JLT, JLE, JSLT and JSLE
ops, that were recently added.

Also fix WARN output when used with preemptable kernel, and a small
cleanup/optimization in the use of BPF_OP(insn->code).

I suggest that the whole thing go via the BPF/net-next path as there
are dependencies on code that is not yet merged to Linus' tree.


Yes, this would be via net-next.


Still pending are changes to reduce stack usage when the verifier can
determine the maximum stack size.


Awesome, thanks a lot!


Re: [PATCH 0/3] MIPS,bpf: Improvements for MIPS eBPF JIT

2017-08-18 Thread Daniel Borkmann

On 08/19/2017 01:40 AM, David Daney wrote:

Here are several improvements and bug fixes for the MIPS eBPF JIT.

The main change is the addition of support for JLT, JLE, JSLT and JSLE
ops, that were recently added.

Also fix WARN output when used with preemptable kernel, and a small
cleanup/optimization in the use of BPF_OP(insn->code).

I suggest that the whole thing go via the BPF/net-next path as there
are dependencies on code that is not yet merged to Linus' tree.


Yes, this would be via net-next.


Still pending are changes to reduce stack usage when the verifier can
determine the maximum stack size.


Awesome, thanks a lot!


Re: [PATCH 1/2] iommu/amd: Fix compiler warning in copy_device_table()

2017-08-18 Thread Baoquan He
On 08/19/17 at 12:40am, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> This was reported by the kbuild bot. The condition in which
> entry would be used uninitialized can not happen, because
> when there is no iommu this function would never be called.
> But its no fast-path, so fix the warning anyway.
> 
> Reported-by: kbuild test robot
> Signed-off-by: Joerg Roedel 

Thanks for fixing these code bugs, Joerg.

Ack the series!

Acked-by: Baoquan He 

> ---
>  drivers/iommu/amd_iommu_init.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index c7d0325..f2023cd 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -854,7 +854,7 @@ static int get_dev_entry_bit(u16 devid, u8 bit)
>  
>  static bool copy_device_table(void)
>  {
> - u64 int_ctl, int_tab_len, entry, last_entry = 0;
> + u64 int_ctl, int_tab_len, entry = 0, last_entry = 0;
>   struct dev_table_entry *old_devtb = NULL;
>   u32 lo, hi, devid, old_devtb_size;
>   phys_addr_t old_devtb_phys;
> -- 
> 2.7.4
> 


Re: [PATCH 1/2] iommu/amd: Fix compiler warning in copy_device_table()

2017-08-18 Thread Baoquan He
On 08/19/17 at 12:40am, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> This was reported by the kbuild bot. The condition in which
> entry would be used uninitialized can not happen, because
> when there is no iommu this function would never be called.
> But its no fast-path, so fix the warning anyway.
> 
> Reported-by: kbuild test robot
> Signed-off-by: Joerg Roedel 

Thanks for fixing these code bugs, Joerg.

Ack the series!

Acked-by: Baoquan He 

> ---
>  drivers/iommu/amd_iommu_init.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index c7d0325..f2023cd 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -854,7 +854,7 @@ static int get_dev_entry_bit(u16 devid, u8 bit)
>  
>  static bool copy_device_table(void)
>  {
> - u64 int_ctl, int_tab_len, entry, last_entry = 0;
> + u64 int_ctl, int_tab_len, entry = 0, last_entry = 0;
>   struct dev_table_entry *old_devtb = NULL;
>   u32 lo, hi, devid, old_devtb_size;
>   phys_addr_t old_devtb_phys;
> -- 
> 2.7.4
> 


Re: [GIT PULL] apparmor updates for next

2017-08-18 Thread James Morris
On Fri, 18 Aug 2017, John Johansen wrote:

> Hi James,
> 
> Please pull these apparmor changes for next.
> 

Pulled, thanks.


-- 
James Morris




Re: [GIT PULL] apparmor updates for next

2017-08-18 Thread James Morris
On Fri, 18 Aug 2017, John Johansen wrote:

> Hi James,
> 
> Please pull these apparmor changes for next.
> 

Pulled, thanks.


-- 
James Morris




Re: [PATCH] selftests: timers: Fix run_destructive_tests target to handle skipped tests

2017-08-18 Thread John Stultz
On Thu, Aug 17, 2017 at 3:48 PM, Shuah Khan  wrote:
> When a test exits with skip exit code of 4, "make run_destructive_tests"
> halts testing. Fix run_destructive_tests target to handle error exit codes.
>
> Reported-by: John Stultz 
> Signed-off-by: Shuah Khan 
> ---
>  tools/testing/selftests/timers/Makefile | 26 +-
>  1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/tools/testing/selftests/timers/Makefile 
> b/tools/testing/selftests/timers/Makefile
> index c805ab048d26..6c1327278d5f 100644
> --- a/tools/testing/selftests/timers/Makefile
> +++ b/tools/testing/selftests/timers/Makefile
> @@ -13,20 +13,20 @@ TEST_GEN_PROGS_EXTENDED = alarmtimer-suspend 
> valid-adjtimex adjtick change_skew
>
>  include ../lib.mk
>
> +define RUN_DESTRUCTIVE_TESTS
> +   @for TEST in $(TEST_GEN_PROGS_EXTENDED); do \
> +   BASENAME_TEST=`basename $$TEST`;\
> +   if [ ! -x $$BASENAME_TEST ]; then   \
> +   echo "selftests: Warning: file $$BASENAME_TEST is not 
> executable, correct this.";\
> +   echo "selftests: $$BASENAME_TEST [FAIL]"; \
> +   else\
> +   cd `dirname $$TEST`; (./$$BASENAME_TEST && echo 
> "selftests: $$BASENAME_TEST [PASS]") || echo "selftests:  $$BASENAME_TEST 
> [FAIL]"; cd -;\
> +   fi; \
> +   done;
> +endef

One more on this...  you might remove rtctest_setdate from the
TEST_GEN_PROGS_EXTENDED list, since it too requires arguments to test,
and it wasn't a part of run_destructive_tests previously.

I see Benjamin added this test not long ago, but I'm not sure exactly
how he expects it to be run (just manually?).

thanks
-john


Re: [PATCH] selftests: timers: Fix run_destructive_tests target to handle skipped tests

2017-08-18 Thread John Stultz
On Thu, Aug 17, 2017 at 3:48 PM, Shuah Khan  wrote:
> When a test exits with skip exit code of 4, "make run_destructive_tests"
> halts testing. Fix run_destructive_tests target to handle error exit codes.
>
> Reported-by: John Stultz 
> Signed-off-by: Shuah Khan 
> ---
>  tools/testing/selftests/timers/Makefile | 26 +-
>  1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/tools/testing/selftests/timers/Makefile 
> b/tools/testing/selftests/timers/Makefile
> index c805ab048d26..6c1327278d5f 100644
> --- a/tools/testing/selftests/timers/Makefile
> +++ b/tools/testing/selftests/timers/Makefile
> @@ -13,20 +13,20 @@ TEST_GEN_PROGS_EXTENDED = alarmtimer-suspend 
> valid-adjtimex adjtick change_skew
>
>  include ../lib.mk
>
> +define RUN_DESTRUCTIVE_TESTS
> +   @for TEST in $(TEST_GEN_PROGS_EXTENDED); do \
> +   BASENAME_TEST=`basename $$TEST`;\
> +   if [ ! -x $$BASENAME_TEST ]; then   \
> +   echo "selftests: Warning: file $$BASENAME_TEST is not 
> executable, correct this.";\
> +   echo "selftests: $$BASENAME_TEST [FAIL]"; \
> +   else\
> +   cd `dirname $$TEST`; (./$$BASENAME_TEST && echo 
> "selftests: $$BASENAME_TEST [PASS]") || echo "selftests:  $$BASENAME_TEST 
> [FAIL]"; cd -;\
> +   fi; \
> +   done;
> +endef

One more on this...  you might remove rtctest_setdate from the
TEST_GEN_PROGS_EXTENDED list, since it too requires arguments to test,
and it wasn't a part of run_destructive_tests previously.

I see Benjamin added this test not long ago, but I'm not sure exactly
how he expects it to be run (just manually?).

thanks
-john


[PATCH v8 01/28] x86/mm: Relocate page fault error codes to traps.h

2017-08-18 Thread Ricardo Neri
Up to this point, only fault.c used the definitions of the page fault error
codes. Thus, it made sense to keep them within such file. Other portions of
code might be interested in those definitions too. For instance, the User-
Mode Instruction Prevention emulation code will use such definitions to
emulate a page fault when it is unable to successfully copy the results
of the emulated instructions to user space.

While relocating the error code enumeration, the prefix X86_ is used to
make it consistent with the rest of the definitions in traps.h. Of course,
code using the enumeration had to be updated as well. No functional changes
were performed.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Andy Lutomirski 
Cc: "Kirill A. Shutemov" 
Cc: Josh Poimboeuf 
Cc: Dave Hansen 
Cc: Paul Gortmaker 
Cc: x...@kernel.org
Reviewed-by: Andy Lutomirski 
Reviewed-by: Borislav Petkov 
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/traps.h | 18 +
 arch/x86/mm/fault.c  | 88 +---
 2 files changed, 52 insertions(+), 54 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 01fd0a7f48cd..4a2e5852eacc 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -148,4 +148,22 @@ enum {
X86_TRAP_IRET = 32, /* 32, IRET Exception */
 };
 
+/*
+ * Page fault error code bits:
+ *
+ *   bit 0 ==   0: no page found   1: protection fault
+ *   bit 1 ==   0: read access 1: write access
+ *   bit 2 ==   0: kernel-mode access  1: user-mode access
+ *   bit 3 ==  1: use of reserved bit detected
+ *   bit 4 ==  1: fault was an instruction fetch
+ *   bit 5 ==  1: protection keys block access
+ */
+enum x86_pf_error_code {
+   X86_PF_PROT =   1 << 0,
+   X86_PF_WRITE=   1 << 1,
+   X86_PF_USER =   1 << 2,
+   X86_PF_RSVD =   1 << 3,
+   X86_PF_INSTR=   1 << 4,
+   X86_PF_PK   =   1 << 5,
+};
 #endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2a1fa10c6a98..dc87badd16e9 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -29,26 +29,6 @@
 #include 
 
 /*
- * Page fault error code bits:
- *
- *   bit 0 ==   0: no page found   1: protection fault
- *   bit 1 ==   0: read access 1: write access
- *   bit 2 ==   0: kernel-mode access  1: user-mode access
- *   bit 3 ==  1: use of reserved bit detected
- *   bit 4 ==  1: fault was an instruction fetch
- *   bit 5 ==  1: protection keys block access
- */
-enum x86_pf_error_code {
-
-   PF_PROT =   1 << 0,
-   PF_WRITE=   1 << 1,
-   PF_USER =   1 << 2,
-   PF_RSVD =   1 << 3,
-   PF_INSTR=   1 << 4,
-   PF_PK   =   1 << 5,
-};
-
-/*
  * Returns 0 if mmiotrace is disabled, or if the fault is not
  * handled by mmiotrace:
  */
@@ -149,7 +129,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, 
unsigned long addr)
 * If it was a exec (instruction fetch) fault on NX page, then
 * do not ignore the fault:
 */
-   if (error_code & PF_INSTR)
+   if (error_code & X86_PF_INSTR)
return 0;
 
instr = (void *)convert_ip_to_linear(current, regs);
@@ -179,7 +159,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, 
unsigned long addr)
  * siginfo so userspace can discover which protection key was set
  * on the PTE.
  *
- * If we get here, we know that the hardware signaled a PF_PK
+ * If we get here, we know that the hardware signaled a X86_PF_PK
  * fault and that there was a VMA once we got in the fault
  * handler.  It does *not* guarantee that the VMA we find here
  * was the one that we faulted on.
@@ -205,7 +185,7 @@ static void fill_sig_info_pkey(int si_code, siginfo_t *info,
/*
 * force_sig_info_fault() is called from a number of
 * contexts, some of which have a VMA and some of which
-* do not.  The PF_PK handing happens after we have a
+* do not.  The X86_PF_PK handing happens after we have a
 * valid VMA, so we should never reach this without a
 * valid VMA.
 */
@@ -695,7 +675,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long 
error_code,
if (!oops_may_print())
return;
 
-   if (error_code & PF_INSTR) {
+   if (error_code & X86_PF_INSTR) {

[PATCH v8 01/28] x86/mm: Relocate page fault error codes to traps.h

2017-08-18 Thread Ricardo Neri
Up to this point, only fault.c used the definitions of the page fault error
codes. Thus, it made sense to keep them within such file. Other portions of
code might be interested in those definitions too. For instance, the User-
Mode Instruction Prevention emulation code will use such definitions to
emulate a page fault when it is unable to successfully copy the results
of the emulated instructions to user space.

While relocating the error code enumeration, the prefix X86_ is used to
make it consistent with the rest of the definitions in traps.h. Of course,
code using the enumeration had to be updated as well. No functional changes
were performed.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Andy Lutomirski 
Cc: "Kirill A. Shutemov" 
Cc: Josh Poimboeuf 
Cc: Dave Hansen 
Cc: Paul Gortmaker 
Cc: x...@kernel.org
Reviewed-by: Andy Lutomirski 
Reviewed-by: Borislav Petkov 
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/traps.h | 18 +
 arch/x86/mm/fault.c  | 88 +---
 2 files changed, 52 insertions(+), 54 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 01fd0a7f48cd..4a2e5852eacc 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -148,4 +148,22 @@ enum {
X86_TRAP_IRET = 32, /* 32, IRET Exception */
 };
 
+/*
+ * Page fault error code bits:
+ *
+ *   bit 0 ==   0: no page found   1: protection fault
+ *   bit 1 ==   0: read access 1: write access
+ *   bit 2 ==   0: kernel-mode access  1: user-mode access
+ *   bit 3 ==  1: use of reserved bit detected
+ *   bit 4 ==  1: fault was an instruction fetch
+ *   bit 5 ==  1: protection keys block access
+ */
+enum x86_pf_error_code {
+   X86_PF_PROT =   1 << 0,
+   X86_PF_WRITE=   1 << 1,
+   X86_PF_USER =   1 << 2,
+   X86_PF_RSVD =   1 << 3,
+   X86_PF_INSTR=   1 << 4,
+   X86_PF_PK   =   1 << 5,
+};
 #endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2a1fa10c6a98..dc87badd16e9 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -29,26 +29,6 @@
 #include 
 
 /*
- * Page fault error code bits:
- *
- *   bit 0 ==   0: no page found   1: protection fault
- *   bit 1 ==   0: read access 1: write access
- *   bit 2 ==   0: kernel-mode access  1: user-mode access
- *   bit 3 ==  1: use of reserved bit detected
- *   bit 4 ==  1: fault was an instruction fetch
- *   bit 5 ==  1: protection keys block access
- */
-enum x86_pf_error_code {
-
-   PF_PROT =   1 << 0,
-   PF_WRITE=   1 << 1,
-   PF_USER =   1 << 2,
-   PF_RSVD =   1 << 3,
-   PF_INSTR=   1 << 4,
-   PF_PK   =   1 << 5,
-};
-
-/*
  * Returns 0 if mmiotrace is disabled, or if the fault is not
  * handled by mmiotrace:
  */
@@ -149,7 +129,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, 
unsigned long addr)
 * If it was a exec (instruction fetch) fault on NX page, then
 * do not ignore the fault:
 */
-   if (error_code & PF_INSTR)
+   if (error_code & X86_PF_INSTR)
return 0;
 
instr = (void *)convert_ip_to_linear(current, regs);
@@ -179,7 +159,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, 
unsigned long addr)
  * siginfo so userspace can discover which protection key was set
  * on the PTE.
  *
- * If we get here, we know that the hardware signaled a PF_PK
+ * If we get here, we know that the hardware signaled a X86_PF_PK
  * fault and that there was a VMA once we got in the fault
  * handler.  It does *not* guarantee that the VMA we find here
  * was the one that we faulted on.
@@ -205,7 +185,7 @@ static void fill_sig_info_pkey(int si_code, siginfo_t *info,
/*
 * force_sig_info_fault() is called from a number of
 * contexts, some of which have a VMA and some of which
-* do not.  The PF_PK handing happens after we have a
+* do not.  The X86_PF_PK handing happens after we have a
 * valid VMA, so we should never reach this without a
 * valid VMA.
 */
@@ -695,7 +675,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long 
error_code,
if (!oops_may_print())
return;
 
-   if (error_code & PF_INSTR) {
+   if (error_code & X86_PF_INSTR) {
unsigned int level;
pgd_t *pgd;
pte_t *pte;
@@ -779,7 +759,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 */
if (current->thread.sig_on_uaccess_err && signal) {
   

[PATCH v8 06/28] x86/mpx: Do not use SIB.index if its value is 100b and ModRM.mod is not 11b

2017-08-18 Thread Ricardo Neri
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when ModRM.mod !=11b and
ModRM.rm = 100b indexed register-indirect addressing is used. In other
words, a SIB byte follows the ModRM byte. In the specific case of
SIB.index = 100b, the scale*index portion of the computation of the
effective address is null. To signal callers of this particular situation,
get_reg_offset() can return -EDOM (-EINVAL continues to indicate that an
error when decoding the SIB byte).

An example of this situation can be the following instruction:

   8b 4c 23 80   mov -0x80(%rbx,%riz,1),%rcx
   ModRM:0x4c [mod:1b][reg:1b][rm:100b]
   SIB:  0x23 [scale:0b][index:100b][base:11b]
   Displacement: 0x80  (1-byte, as per ModRM.mod = 1b)

The %riz 'register' indicates a null index.

In long mode, a REX prefix may be used. When a REX prefix is present,
REX.X adds a fourth bit to the register selection of SIB.index. This gives
the ability to refer to all the 16 general purpose registers. When REX.X is
1b and SIB.index is 100b, the index is indicated in %r12. In our example,
this would look like:

   42 8b 4c 23 80mov -0x80(%rbx,%r12,1),%rcx
   REX:  0x42 [W:0b][R:0b][X:1b][B:0b]
   ModRM:0x4c [mod:1b][reg:1b][rm:100b]
   SIB:  0x23 [scale:0b][.X: 1b, index:100b][.B:0b, base:11b]
   Displacement: 0x80  (1-byte, as per ModRM.mod = 1b)

%r12 is a valid register to use in the scale*index part of the effective
address computation.

Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Peter Zijlstra 
Cc: Nathan Howard 
Cc: Adan Hawthorn 
Cc: Joe Perches 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/mm/mpx.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 9eec98022510..892aa6468805 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -110,6 +110,15 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
regno = X86_SIB_INDEX(insn->sib.value);
if (X86_REX_X(insn->rex_prefix.value))
regno += 8;
+
+   /*
+* If ModRM.mod != 3 and SIB.index = 4 the scale*index
+* portion of the address computation is null. This is
+* true only if REX.X is 0. In such a case, the SIB index
+* is used in the address computation.
+*/
+   if (X86_MODRM_MOD(insn->modrm.value) != 3 && regno == 4)
+   return -EDOM;
break;
 
case REG_TYPE_BASE:
@@ -160,11 +169,20 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
goto out_err;
 
indx_offset = get_reg_offset(insn, regs, 
REG_TYPE_INDEX);
-   if (indx_offset < 0)
+
+   /*
+* A negative offset generally means a error, except
+* -EDOM, which means that the contents of the register
+* should not be used as index.
+*/
+   if (indx_offset == -EDOM)
+   indx = 0;
+   else if (indx_offset < 0)
goto out_err;
+   else
+   indx = regs_get_register(regs, indx_offset);
 
base = regs_get_register(regs, base_offset);
-   indx = regs_get_register(regs, indx_offset);
 
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
-- 
2.13.0



[PATCH v8 06/28] x86/mpx: Do not use SIB.index if its value is 100b and ModRM.mod is not 11b

2017-08-18 Thread Ricardo Neri
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when ModRM.mod !=11b and
ModRM.rm = 100b indexed register-indirect addressing is used. In other
words, a SIB byte follows the ModRM byte. In the specific case of
SIB.index = 100b, the scale*index portion of the computation of the
effective address is null. To signal callers of this particular situation,
get_reg_offset() can return -EDOM (-EINVAL continues to indicate that an
error when decoding the SIB byte).

An example of this situation can be the following instruction:

   8b 4c 23 80   mov -0x80(%rbx,%riz,1),%rcx
   ModRM:0x4c [mod:1b][reg:1b][rm:100b]
   SIB:  0x23 [scale:0b][index:100b][base:11b]
   Displacement: 0x80  (1-byte, as per ModRM.mod = 1b)

The %riz 'register' indicates a null index.

In long mode, a REX prefix may be used. When a REX prefix is present,
REX.X adds a fourth bit to the register selection of SIB.index. This gives
the ability to refer to all the 16 general purpose registers. When REX.X is
1b and SIB.index is 100b, the index is indicated in %r12. In our example,
this would look like:

   42 8b 4c 23 80mov -0x80(%rbx,%r12,1),%rcx
   REX:  0x42 [W:0b][R:0b][X:1b][B:0b]
   ModRM:0x4c [mod:1b][reg:1b][rm:100b]
   SIB:  0x23 [scale:0b][.X: 1b, index:100b][.B:0b, base:11b]
   Displacement: 0x80  (1-byte, as per ModRM.mod = 1b)

%r12 is a valid register to use in the scale*index part of the effective
address computation.

Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Peter Zijlstra 
Cc: Nathan Howard 
Cc: Adan Hawthorn 
Cc: Joe Perches 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/mm/mpx.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 9eec98022510..892aa6468805 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -110,6 +110,15 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
regno = X86_SIB_INDEX(insn->sib.value);
if (X86_REX_X(insn->rex_prefix.value))
regno += 8;
+
+   /*
+* If ModRM.mod != 3 and SIB.index = 4 the scale*index
+* portion of the address computation is null. This is
+* true only if REX.X is 0. In such a case, the SIB index
+* is used in the address computation.
+*/
+   if (X86_MODRM_MOD(insn->modrm.value) != 3 && regno == 4)
+   return -EDOM;
break;
 
case REG_TYPE_BASE:
@@ -160,11 +169,20 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
goto out_err;
 
indx_offset = get_reg_offset(insn, regs, 
REG_TYPE_INDEX);
-   if (indx_offset < 0)
+
+   /*
+* A negative offset generally means a error, except
+* -EDOM, which means that the contents of the register
+* should not be used as index.
+*/
+   if (indx_offset == -EDOM)
+   indx = 0;
+   else if (indx_offset < 0)
goto out_err;
+   else
+   indx = regs_get_register(regs, indx_offset);
 
base = regs_get_register(regs, base_offset);
-   indx = regs_get_register(regs, indx_offset);
 
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
-- 
2.13.0



[PATCH v8 07/28] x86/mpx: Do not use SIB.base if its value is 101b and ModRM.mod = 0

2017-08-18 Thread Ricardo Neri
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that if a SIB byte is used and
SIB.base is 101b and ModRM.mod is zero, then the base part of the base
part of the effective address computation is null. To signal this
situation, a -EDOM error is returned to indicate callers to ignore the
base value present in the register operand.

In this scenario, a 32-bit displacement follows the SIB byte. Displacement
is obtained when the instruction decoder parses the operands.

Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Peter Zijlstra 
Cc: Nathan Howard 
Cc: Adan Hawthorn 
Cc: Joe Perches 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/mm/mpx.c | 28 +++-
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 892aa6468805..53e24ca01f29 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -123,6 +123,14 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 
case REG_TYPE_BASE:
regno = X86_SIB_BASE(insn->sib.value);
+   /*
+* If ModRM.mod is 0 and SIB.base == 5, the base of the
+* register-indirect addressing is 0. In this case, a
+* 32-bit displacement follows the SIB byte.
+*/
+   if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
+   return -EDOM;
+
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
@@ -164,17 +172,21 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
eff_addr = regs_get_register(regs, addr_offset);
} else {
if (insn->sib.nbytes) {
+   /*
+* Negative values in the base and index offset means
+* an error when decoding the SIB byte. Except -EDOM,
+* which means that the registers should not be used
+* in the address computation.
+*/
base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
-   if (base_offset < 0)
+   if (base_offset == -EDOM)
+   base = 0;
+   else if (base_offset < 0)
goto out_err;
+   else
+   base = regs_get_register(regs, base_offset);
 
indx_offset = get_reg_offset(insn, regs, 
REG_TYPE_INDEX);
-
-   /*
-* A negative offset generally means a error, except
-* -EDOM, which means that the contents of the register
-* should not be used as index.
-*/
if (indx_offset == -EDOM)
indx = 0;
else if (indx_offset < 0)
@@ -182,8 +194,6 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
else
indx = regs_get_register(regs, indx_offset);
 
-   base = regs_get_register(regs, base_offset);
-
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-- 
2.13.0



[PATCH v8 07/28] x86/mpx: Do not use SIB.base if its value is 101b and ModRM.mod = 0

2017-08-18 Thread Ricardo Neri
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that if a SIB byte is used and
SIB.base is 101b and ModRM.mod is zero, then the base part of the base
part of the effective address computation is null. To signal this
situation, a -EDOM error is returned to indicate callers to ignore the
base value present in the register operand.

In this scenario, a 32-bit displacement follows the SIB byte. Displacement
is obtained when the instruction decoder parses the operands.

Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Peter Zijlstra 
Cc: Nathan Howard 
Cc: Adan Hawthorn 
Cc: Joe Perches 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/mm/mpx.c | 28 +++-
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 892aa6468805..53e24ca01f29 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -123,6 +123,14 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 
case REG_TYPE_BASE:
regno = X86_SIB_BASE(insn->sib.value);
+   /*
+* If ModRM.mod is 0 and SIB.base == 5, the base of the
+* register-indirect addressing is 0. In this case, a
+* 32-bit displacement follows the SIB byte.
+*/
+   if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
+   return -EDOM;
+
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
@@ -164,17 +172,21 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
eff_addr = regs_get_register(regs, addr_offset);
} else {
if (insn->sib.nbytes) {
+   /*
+* Negative values in the base and index offset means
+* an error when decoding the SIB byte. Except -EDOM,
+* which means that the registers should not be used
+* in the address computation.
+*/
base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
-   if (base_offset < 0)
+   if (base_offset == -EDOM)
+   base = 0;
+   else if (base_offset < 0)
goto out_err;
+   else
+   base = regs_get_register(regs, base_offset);
 
indx_offset = get_reg_offset(insn, regs, 
REG_TYPE_INDEX);
-
-   /*
-* A negative offset generally means a error, except
-* -EDOM, which means that the contents of the register
-* should not be used as index.
-*/
if (indx_offset == -EDOM)
indx = 0;
else if (indx_offset < 0)
@@ -182,8 +194,6 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
else
indx = regs_get_register(regs, indx_offset);
 
-   base = regs_get_register(regs, base_offset);
-
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-- 
2.13.0



[PATCH v8 02/28] x86/boot: Relocate definition of the initial state of CR0

2017-08-18 Thread Ricardo Neri
Both head_32.S and head_64.S utilize the same value to initialize the
control register CR0. Also, other parts of the kernel might want to access
to this initial definition (e.g., emulation code for User-Mode Instruction
Prevention uses this state to provide a sane dummy value for CR0 when
emulating the smsw instruction). Thus, relocate this definition to a
header file from which it can be conveniently accessed.

Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
Suggested-by: Borislav Petkov 
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/uapi/asm/processor-flags.h | 6 ++
 arch/x86/kernel/head_32.S   | 3 ---
 arch/x86/kernel/head_64.S   | 3 ---
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/uapi/asm/processor-flags.h 
b/arch/x86/include/uapi/asm/processor-flags.h
index 185f3d10c194..aae1f2aa7563 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -151,5 +151,11 @@
 #define CX86_ARR_BASE  0xc4
 #define CX86_RCR_BASE  0xdc
 
+/*
+ * Initial state of CR0 for head_32/64.S
+ */
+#define CR0_STATE  (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \
+X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \
+X86_CR0_PG)
 
 #endif /* _UAPI_ASM_X86_PROCESSOR_FLAGS_H */
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 0332664eb158..f64059835863 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -213,9 +213,6 @@ ENTRY(startup_32_smp)
 #endif
 
 .Ldefault_entry:
-#define CR0_STATE  (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \
-X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \
-X86_CR0_PG)
movl $(CR0_STATE & ~X86_CR0_PG),%eax
movl %eax,%cr0
 
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 513cbb012ecc..5e1bfdd86b5b 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -149,9 +149,6 @@ ENTRY(secondary_startup_64)
 1: wrmsr   /* Make changes effective */
 
/* Setup cr0 */
-#define CR0_STATE  (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \
-X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \
-X86_CR0_PG)
movl$CR0_STATE, %eax
/* Make changes effective */
movq%rax, %cr0
-- 
2.13.0



[PATCH v8 02/28] x86/boot: Relocate definition of the initial state of CR0

2017-08-18 Thread Ricardo Neri
Both head_32.S and head_64.S utilize the same value to initialize the
control register CR0. Also, other parts of the kernel might want to access
to this initial definition (e.g., emulation code for User-Mode Instruction
Prevention uses this state to provide a sane dummy value for CR0 when
emulating the smsw instruction). Thus, relocate this definition to a
header file from which it can be conveniently accessed.

Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
Suggested-by: Borislav Petkov 
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/uapi/asm/processor-flags.h | 6 ++
 arch/x86/kernel/head_32.S   | 3 ---
 arch/x86/kernel/head_64.S   | 3 ---
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/uapi/asm/processor-flags.h 
b/arch/x86/include/uapi/asm/processor-flags.h
index 185f3d10c194..aae1f2aa7563 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -151,5 +151,11 @@
 #define CX86_ARR_BASE  0xc4
 #define CX86_RCR_BASE  0xdc
 
+/*
+ * Initial state of CR0 for head_32/64.S
+ */
+#define CR0_STATE  (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \
+X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \
+X86_CR0_PG)
 
 #endif /* _UAPI_ASM_X86_PROCESSOR_FLAGS_H */
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 0332664eb158..f64059835863 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -213,9 +213,6 @@ ENTRY(startup_32_smp)
 #endif
 
 .Ldefault_entry:
-#define CR0_STATE  (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \
-X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \
-X86_CR0_PG)
movl $(CR0_STATE & ~X86_CR0_PG),%eax
movl %eax,%cr0
 
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 513cbb012ecc..5e1bfdd86b5b 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -149,9 +149,6 @@ ENTRY(secondary_startup_64)
 1: wrmsr   /* Make changes effective */
 
/* Setup cr0 */
-#define CR0_STATE  (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \
-X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \
-X86_CR0_PG)
movl$CR0_STATE, %eax
/* Make changes effective */
movq%rax, %cr0
-- 
2.13.0



[PATCH v8 03/28] ptrace,x86: Make user_64bit_mode() available to 32-bit builds

2017-08-18 Thread Ricardo Neri
In its current form, user_64bit_mode() can only be used when CONFIG_X86_64
is selected. This implies that code built with CONFIG_X86_64=n cannot use
it. If a piece of code needs to be built for both CONFIG_X86_64=y and
CONFIG_X86_64=n and wants to use this function, it needs to wrap it in
an #ifdef/#endif; potentially, in multiple places.

This can be easily avoided with a single #ifdef/#endif pair within
user_64bit_mode() itself.

Suggested-by: Borislav Petkov 
Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Reviewed-by: Borislav Petkov 
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/ptrace.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 91c04c8e67fa..e2afbf689309 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -135,9 +135,9 @@ static inline int v8086_mode(struct pt_regs *regs)
 #endif
 }
 
-#ifdef CONFIG_X86_64
 static inline bool user_64bit_mode(struct pt_regs *regs)
 {
+#ifdef CONFIG_X86_64
 #ifndef CONFIG_PARAVIRT
/*
 * On non-paravirt systems, this is the only long mode CPL 3
@@ -148,8 +148,12 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
/* Headers are too twisted for this to go in paravirt.h. */
return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs;
 #endif
+#else /* !CONFIG_X86_64 */
+   return false;
+#endif
 }
 
+#ifdef CONFIG_X86_64
 #define current_user_stack_pointer()   current_pt_regs()->sp
 #define compat_user_stack_pointer()current_pt_regs()->sp
 #endif
-- 
2.13.0



[PATCH v8 00/28] x86: Enable User-Mode Instruction Prevention

2017-08-18 Thread Ricardo Neri
This is v8 of this series. The seven previous submissions can be found
here [1], here [2], here[3], here[4], here[5], here[6] and here[7]. This
version addresses the feedback comments from Borislav Petkov received on
v7. Please see details in the change log.

=== What is UMIP?

User-Mode Instruction Prevention (UMIP) is a security feature present in
new Intel Processors. If enabled, it prevents the execution of certain
instructions if the Current Privilege Level (CPL) is greater than 0. If
these instructions were executed while in CPL > 0, user space applications
could have access to system-wide settings such as the global and local
descriptor tables, the segment selectors to the current task state and the
local descriptor table. Hiding these system resources reduces the tools
available to craft privilege escalation attacks such as [8].

These are the instructions covered by UMIP:
* SGDT - Store Global Descriptor Table
* SIDT - Store Interrupt Descriptor Table
* SLDT - Store Local Descriptor Table
* SMSW - Store Machine Status Word
* STR - Store Task Register

If any of these instructions is executed with CPL > 0, a general protection
exception is issued when UMIP is enabled.

=== How does it impact applications?

When enabled, However, UMIP will change the behavior that certain
applications expect from the operating system. For instance, programs
running on WineHQ and DOSEMU2 rely on some of these instructions to
function. Stas Sergeev found that Microsoft Windows 3.1 and dos4gw use the
instruction SMSW when running in virtual-8086 mode[9]. SGDT and SIDT can
also be used on virtual-8086 mode.

In order to not change the behavior of the system. This patchset emulates
SGDT, SIDT and SMSW. This should be sufficient to not break the
applications mentioned above. Regarding the two remaining instructions, STR
and SLDT, the WineHQ team has shown interest catching the general protection
fault and use it as a vehicle to fix broken applications[10]. Furthermore,
STR and SLDT can only run in protected and long modes.

DOSEMU2 emulates virtual-8086 mode via KVM. No applications will be broken
unless DOSEMU2 decides to enable the CR4.UMIP bit in platforms that support
it. Also, this should not pose a security risk as no system resouces would
be revealed. Instead, code running inside the KVM would only see the KVM's
GDT, IDT and MSW.

Please note that UMIP is always enabled for both 64-bit and 32-bit Linux
builds. However, emulation of the UMIP-protected instructions is not done
for 64-bit processes. 64-bit user space applications will receive the
SIGSEGV signal when UMIP instructions causes a general protection fault.

=== How are UMIP-protected instructions emulated?

UMIP is kept enabled at all times when the CONFIG_x86_INTEL_UMIP option is
selected. If a general protection fault caused by the instructions
protected by UMIP is detected, such fault will be trapped and fixed-up. The
return values will be dummy as follows:
 
 * SGDT and SIDT return hard-coded dummy values as the base of the global
   descriptor and interrupt descriptor tables. These hard-coded values
   correspond to memory addresses that are near the end of the kernel
   memory map. This is also the case for virtual-8086 mode tasks. In all
   my experiments with 32-bit processes, the base of GDT and IDT was always
   a 4-byte address, even for 16-bit operands. Thus, my emulation code does
   the same. In all cases, the limit of the table is set to 0.
 * SMSW returns the value with which the CR0 register is programmed in
   head_32/64.S at boot time. This is, the following bits are enabled:
   CR0.0 for Protection Enable, CR.1 for Monitor Coprocessor, CR.4 for
   Extension Type, which will always be 1 in recent processors with UMIP;
   CR.5 for Numeric Error, CR0.16 for Write Protect, CR0.18 for Alignment
   Mask and CR0.31 for Paging. As per the Intel 64 and IA-32 Architectures
   Software Developer's Manual, SMSW returns a 16-bit results for memory
   operands. However, when the operand is a register, the results can be up
   to CR0[63:0]. Since the emulation code only kicks-in for 32-bit
   processes, we return up to CR[31:0].
 * The proposed emulation code is handles faults that happens in both
   protected and virtual-8086 mode.
 * Again, STR and SLDT are not emulated.

=== How is this series laid out?

++ Preparatory work
As per suggestions from Andy Lutormirsky and Borislav Petkov, I moved
the x86 page fault error codes to a header. Also, I made user_64bit_mode
available to x86_32 builds. This helps to reuse code and reduce the number
of #ifdef's in these patches. Borislav also suggested to uprobes should use
the existing definitions in arch/x86/include/asm/inat.h instead of hard-
coded values when checking instruction prefixes. I included this change
in the series.

++ Fix bugs in MPX address decoder
I found very useful the code for Intel MPX (Memory Protection Extensions)
used to parse opcodes and the memory locations contained in the general

[PATCH v8 03/28] ptrace,x86: Make user_64bit_mode() available to 32-bit builds

2017-08-18 Thread Ricardo Neri
In its current form, user_64bit_mode() can only be used when CONFIG_X86_64
is selected. This implies that code built with CONFIG_X86_64=n cannot use
it. If a piece of code needs to be built for both CONFIG_X86_64=y and
CONFIG_X86_64=n and wants to use this function, it needs to wrap it in
an #ifdef/#endif; potentially, in multiple places.

This can be easily avoided with a single #ifdef/#endif pair within
user_64bit_mode() itself.

Suggested-by: Borislav Petkov 
Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Reviewed-by: Borislav Petkov 
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/ptrace.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 91c04c8e67fa..e2afbf689309 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -135,9 +135,9 @@ static inline int v8086_mode(struct pt_regs *regs)
 #endif
 }
 
-#ifdef CONFIG_X86_64
 static inline bool user_64bit_mode(struct pt_regs *regs)
 {
+#ifdef CONFIG_X86_64
 #ifndef CONFIG_PARAVIRT
/*
 * On non-paravirt systems, this is the only long mode CPL 3
@@ -148,8 +148,12 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
/* Headers are too twisted for this to go in paravirt.h. */
return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs;
 #endif
+#else /* !CONFIG_X86_64 */
+   return false;
+#endif
 }
 
+#ifdef CONFIG_X86_64
 #define current_user_stack_pointer()   current_pt_regs()->sp
 #define compat_user_stack_pointer()current_pt_regs()->sp
 #endif
-- 
2.13.0



[PATCH v8 00/28] x86: Enable User-Mode Instruction Prevention

2017-08-18 Thread Ricardo Neri
This is v8 of this series. The seven previous submissions can be found
here [1], here [2], here[3], here[4], here[5], here[6] and here[7]. This
version addresses the feedback comments from Borislav Petkov received on
v7. Please see details in the change log.

=== What is UMIP?

User-Mode Instruction Prevention (UMIP) is a security feature present in
new Intel Processors. If enabled, it prevents the execution of certain
instructions if the Current Privilege Level (CPL) is greater than 0. If
these instructions were executed while in CPL > 0, user space applications
could have access to system-wide settings such as the global and local
descriptor tables, the segment selectors to the current task state and the
local descriptor table. Hiding these system resources reduces the tools
available to craft privilege escalation attacks such as [8].

These are the instructions covered by UMIP:
* SGDT - Store Global Descriptor Table
* SIDT - Store Interrupt Descriptor Table
* SLDT - Store Local Descriptor Table
* SMSW - Store Machine Status Word
* STR - Store Task Register

If any of these instructions is executed with CPL > 0, a general protection
exception is issued when UMIP is enabled.

=== How does it impact applications?

When enabled, However, UMIP will change the behavior that certain
applications expect from the operating system. For instance, programs
running on WineHQ and DOSEMU2 rely on some of these instructions to
function. Stas Sergeev found that Microsoft Windows 3.1 and dos4gw use the
instruction SMSW when running in virtual-8086 mode[9]. SGDT and SIDT can
also be used on virtual-8086 mode.

In order to not change the behavior of the system. This patchset emulates
SGDT, SIDT and SMSW. This should be sufficient to not break the
applications mentioned above. Regarding the two remaining instructions, STR
and SLDT, the WineHQ team has shown interest catching the general protection
fault and use it as a vehicle to fix broken applications[10]. Furthermore,
STR and SLDT can only run in protected and long modes.

DOSEMU2 emulates virtual-8086 mode via KVM. No applications will be broken
unless DOSEMU2 decides to enable the CR4.UMIP bit in platforms that support
it. Also, this should not pose a security risk as no system resouces would
be revealed. Instead, code running inside the KVM would only see the KVM's
GDT, IDT and MSW.

Please note that UMIP is always enabled for both 64-bit and 32-bit Linux
builds. However, emulation of the UMIP-protected instructions is not done
for 64-bit processes. 64-bit user space applications will receive the
SIGSEGV signal when UMIP instructions causes a general protection fault.

=== How are UMIP-protected instructions emulated?

UMIP is kept enabled at all times when the CONFIG_x86_INTEL_UMIP option is
selected. If a general protection fault caused by the instructions
protected by UMIP is detected, such fault will be trapped and fixed-up. The
return values will be dummy as follows:
 
 * SGDT and SIDT return hard-coded dummy values as the base of the global
   descriptor and interrupt descriptor tables. These hard-coded values
   correspond to memory addresses that are near the end of the kernel
   memory map. This is also the case for virtual-8086 mode tasks. In all
   my experiments with 32-bit processes, the base of GDT and IDT was always
   a 4-byte address, even for 16-bit operands. Thus, my emulation code does
   the same. In all cases, the limit of the table is set to 0.
 * SMSW returns the value with which the CR0 register is programmed in
   head_32/64.S at boot time. This is, the following bits are enabled:
   CR0.0 for Protection Enable, CR.1 for Monitor Coprocessor, CR.4 for
   Extension Type, which will always be 1 in recent processors with UMIP;
   CR.5 for Numeric Error, CR0.16 for Write Protect, CR0.18 for Alignment
   Mask and CR0.31 for Paging. As per the Intel 64 and IA-32 Architectures
   Software Developer's Manual, SMSW returns a 16-bit results for memory
   operands. However, when the operand is a register, the results can be up
   to CR0[63:0]. Since the emulation code only kicks-in for 32-bit
   processes, we return up to CR[31:0].
 * The proposed emulation code is handles faults that happens in both
   protected and virtual-8086 mode.
 * Again, STR and SLDT are not emulated.

=== How is this series laid out?

++ Preparatory work
As per suggestions from Andy Lutormirsky and Borislav Petkov, I moved
the x86 page fault error codes to a header. Also, I made user_64bit_mode
available to x86_32 builds. This helps to reuse code and reduce the number
of #ifdef's in these patches. Borislav also suggested to uprobes should use
the existing definitions in arch/x86/include/asm/inat.h instead of hard-
coded values when checking instruction prefixes. I included this change
in the series.

++ Fix bugs in MPX address decoder
I found very useful the code for Intel MPX (Memory Protection Extensions)
used to parse opcodes and the memory locations contained in the general

[PATCH v8 12/28] x86/insn-eval: Add utility functions to get segment selector

2017-08-18 Thread Ricardo Neri
When computing a linear address and segmentation is used, we need to know
the base address of the segment involved in the computation. In most of
the cases, the segment base address will be zero as in USER_DS/USER32_DS.
However, it may be possible that a user space program defines its own
segments via a local descriptor table. In such a case, the segment base
address may not be zero .Thus, the segment base address is needed to
calculate correctly the linear address.

If running in protected mode, the segment selector to be used when
computing a linear address is determined by either any of segment override
prefixes in the instruction or inferred from the registers involved in the
computation of the effective address; in that order. Also, there are cases
when the segment override prefixes shall be ignored (i.e., code segments
are always selected by the CS segment register; string instructions always
use the ES segment register when using (E)DI register as operand). In long
mode, segment registers are ignored, except for FS and GS. In these two
cases, base addresses are obtained from the respective MSRs.

For clarity, this process can be split into three steps (and an equal
number of functions): parse the segment override prefixes, if any; resolve
the relevant segment register to use, and, once known, read its value to
obtain the segment selector.

The method to obtain the segment selector depends on several factors. In
32-bit builds, segment selectors are saved into a pt_regs structure
when switching to kernel mode. The same is also true for virtual-8086
mode. In 64-bit builds, segmentation is mostly ignored, except when
running a program in 32-bit legacy mode. In this case, CS and SS can be
obtained from pt_regs. DS, ES, FS and GS can be read directly from
the respective segment registers.

In order to identify the segment registers, a new set of #defines is
introduced. It also includes two special identifiers. One of them
indicates when the default segment register associated with instruction
operands shall be used. Another one indicates that the contents of the
segment register shall be ignored; this identifier is used when in long
mode.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/inat.h |  10 ++
 arch/x86/lib/insn-eval.c| 278 
 2 files changed, 288 insertions(+)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 02aff0867211..1c78580e58be 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -97,6 +97,16 @@
 #define INAT_MAKE_GROUP(grp)   ((grp << INAT_GRP_OFFS) | INAT_MODRM)
 #define INAT_MAKE_IMM(imm) (imm << INAT_IMM_OFFS)
 
+/* Identifiers for segment registers */
+#define INAT_SEG_REG_IGNORE0
+#define INAT_SEG_REG_DEFAULT   1
+#define INAT_SEG_REG_CS2
+#define INAT_SEG_REG_SS3
+#define INAT_SEG_REG_DS4
+#define INAT_SEG_REG_ES5
+#define INAT_SEG_REG_FS6
+#define INAT_SEG_REG_GS7
+
 /* Attribute search APIs */
 extern insn_attr_t inat_get_opcode_attribute(insn_byte_t opcode);
 extern int inat_get_last_prefix_id(insn_byte_t last_pfx);
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 25b2eb3c64c1..86f58ce6c302 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 enum reg_type {
REG_TYPE_RM = 0,
@@ -42,6 +43,283 @@ static bool is_string_insn(struct insn *insn)
}
 }
 
+/**
+ * get_overridden_seg_reg() - obtain segment register to use from prefixes
+ * @insn:  Instruction structure with segment override prefixes
+ * @regs:  Structure with register values as seen when entering kernel mode
+ * @regoff:Operand offset, in pt_regs, used to deterimine segment register
+ *
+ * The segment register to which an effective address refers depends on
+ * a) whether running in long mode (in such a case semgment override prefixes
+ * are ignored. b) Whether segment override prefixes must be ignored for 
certain
+ * registers: always use CS when the register is (R|E)IP; always use ES when
+ * operand register is (E)DI with a string instruction as defined in the Intel
+ * documentation. c) If segment overrides prefixes are found in the 

[PATCH v8 12/28] x86/insn-eval: Add utility functions to get segment selector

2017-08-18 Thread Ricardo Neri
When computing a linear address and segmentation is used, we need to know
the base address of the segment involved in the computation. In most of
the cases, the segment base address will be zero as in USER_DS/USER32_DS.
However, it may be possible that a user space program defines its own
segments via a local descriptor table. In such a case, the segment base
address may not be zero .Thus, the segment base address is needed to
calculate correctly the linear address.

If running in protected mode, the segment selector to be used when
computing a linear address is determined by either any of segment override
prefixes in the instruction or inferred from the registers involved in the
computation of the effective address; in that order. Also, there are cases
when the segment override prefixes shall be ignored (i.e., code segments
are always selected by the CS segment register; string instructions always
use the ES segment register when using (E)DI register as operand). In long
mode, segment registers are ignored, except for FS and GS. In these two
cases, base addresses are obtained from the respective MSRs.

For clarity, this process can be split into three steps (and an equal
number of functions): parse the segment override prefixes, if any; resolve
the relevant segment register to use, and, once known, read its value to
obtain the segment selector.

The method to obtain the segment selector depends on several factors. In
32-bit builds, segment selectors are saved into a pt_regs structure
when switching to kernel mode. The same is also true for virtual-8086
mode. In 64-bit builds, segmentation is mostly ignored, except when
running a program in 32-bit legacy mode. In this case, CS and SS can be
obtained from pt_regs. DS, ES, FS and GS can be read directly from
the respective segment registers.

In order to identify the segment registers, a new set of #defines is
introduced. It also includes two special identifiers. One of them
indicates when the default segment register associated with instruction
operands shall be used. Another one indicates that the contents of the
segment register shall be ignored; this identifier is used when in long
mode.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/inat.h |  10 ++
 arch/x86/lib/insn-eval.c| 278 
 2 files changed, 288 insertions(+)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 02aff0867211..1c78580e58be 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -97,6 +97,16 @@
 #define INAT_MAKE_GROUP(grp)   ((grp << INAT_GRP_OFFS) | INAT_MODRM)
 #define INAT_MAKE_IMM(imm) (imm << INAT_IMM_OFFS)
 
+/* Identifiers for segment registers */
+#define INAT_SEG_REG_IGNORE0
+#define INAT_SEG_REG_DEFAULT   1
+#define INAT_SEG_REG_CS2
+#define INAT_SEG_REG_SS3
+#define INAT_SEG_REG_DS4
+#define INAT_SEG_REG_ES5
+#define INAT_SEG_REG_FS6
+#define INAT_SEG_REG_GS7
+
 /* Attribute search APIs */
 extern insn_attr_t inat_get_opcode_attribute(insn_byte_t opcode);
 extern int inat_get_last_prefix_id(insn_byte_t last_pfx);
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 25b2eb3c64c1..86f58ce6c302 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 enum reg_type {
REG_TYPE_RM = 0,
@@ -42,6 +43,283 @@ static bool is_string_insn(struct insn *insn)
}
 }
 
+/**
+ * get_overridden_seg_reg() - obtain segment register to use from prefixes
+ * @insn:  Instruction structure with segment override prefixes
+ * @regs:  Structure with register values as seen when entering kernel mode
+ * @regoff:Operand offset, in pt_regs, used to deterimine segment register
+ *
+ * The segment register to which an effective address refers depends on
+ * a) whether running in long mode (in such a case semgment override prefixes
+ * are ignored. b) Whether segment override prefixes must be ignored for 
certain
+ * registers: always use CS when the register is (R|E)IP; always use ES when
+ * operand register is (E)DI with a string instruction as defined in the Intel
+ * documentation. c) If segment overrides prefixes are found in the instruction
+ * prefixes. d) Use the default segment register associated with the operand
+ * register.
+ *
+ * This function returns the overridden segment register to use, if any, as per
+ * the conditions described above. Please note that this function
+ * does not return the value in the segment register (i.e., the segment
+ * selector). The segment 

[PATCH v8 13/28] x86/insn-eval: Add utility function to get segment descriptor

2017-08-18 Thread Ricardo Neri
The segment descriptor contains information that is relevant to how linear
addresses need to be computed. It contains the default size of addresses
as well as the base address of the segment. Thus, given a segment
selector, we ought look at segment descriptor to correctly calculate the
linear address.

In protected mode, the segment selector might indicate a segment
descriptor from either the global descriptor table or a local descriptor
table. Both cases are considered in this function.

This function is a prerequisite for functions in subsequent commits that
will obtain the aforementioned attributes of the segment descriptor.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 55 
 1 file changed, 55 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 86f58ce6c302..9cf2c49afc15 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -6,9 +6,13 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
 enum reg_type {
@@ -402,6 +406,57 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 }
 
 /**
+ * get_desc() - Obtain address of segment descriptor
+ * @sel:   Segment selector
+ *
+ * Given a segment selector, obtain a pointer to the segment descriptor.
+ * Both global and local descriptor tables are supported.
+ *
+ * Return: pointer to segment descriptor on success. NULL on error.
+ */
+static struct desc_struct *get_desc(unsigned short sel)
+{
+   struct desc_ptr gdt_desc = {0, 0};
+   struct desc_struct *desc = NULL;
+   unsigned long desc_base;
+
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+   if ((sel & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+   /* Bits [15:3] contain the index of the desired entry. */
+   sel >>= 3;
+
+   mutex_lock(>active_mm->context.lock);
+   /* The size of the LDT refers to the number of entries. */
+   if (!current->active_mm->context.ldt ||
+   sel >= current->active_mm->context.ldt->nr_entries) {
+   mutex_unlock(>active_mm->context.lock);
+   return NULL;
+   }
+
+   desc = >active_mm->context.ldt->entries[sel];
+   mutex_unlock(>active_mm->context.lock);
+   return desc;
+   }
+#endif
+   native_store_gdt(_desc);
+
+   /*
+* Segment descriptors have a size of 8 bytes. Thus, the index is
+* multiplied by 8 to obtain the memory offset of the desired descriptor
+* from the base of the GDT. As bits [15:3] of the segment selector
+* contain the index, it can be regarded as multiplied by 8 already.
+* All that remains is to clear bits [2:0].
+*/
+   desc_base = sel & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
+
+   if (desc_base > gdt_desc.size)
+   return NULL;
+
+   desc = (struct desc_struct *)(gdt_desc.address + desc_base);
+   return desc;
+}
+
+/**
  * insn_get_modrm_rm_off() - Obtain register in r/m part of ModRM byte
  * @insn:  Instruction structure containing the ModRM byte
  * @regs:  Structure with register values as seen when entering kernel mode
-- 
2.13.0



[PATCH v8 15/28] x86/insn-eval: Add function to get default params of code segment

2017-08-18 Thread Ricardo Neri
Obtain the default values of the address and operand sizes as specified in
the D and L bits of the the segment descriptor selected by the register
CS. The function can be used for both protected and long modes.
For virtual-8086 mode, the default address and operand sizes are always 2
bytes.

The returned parameters are encoded in a signed 8-bit data type. Auxiliar
macros are provided to encode and decode such values.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  5 
 arch/x86/lib/insn-eval.c | 59 
 2 files changed, 64 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 7f3c7fe72cd0..e8c3e7cd1673 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -11,9 +11,14 @@
 #include 
 #include 
 
+#define INSN_CODE_SEG_ADDR_SZ(params) ((params >> 4) & 0xf)
+#define INSN_CODE_SEG_OPND_SZ(params) (params & 0xf)
+#define INSN_CODE_SEG_PARAMS(oper_sz, addr_sz) (oper_sz | (addr_sz << 4))
+
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
 int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
 unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
int regoff);
+char insn_get_code_seg_defaults(struct pt_regs *regs);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 2c5e7081957d..a8e12bd0aecd 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -584,6 +584,65 @@ static unsigned long get_seg_limit(struct pt_regs *regs, 
struct insn *insn,
 }
 
 /**
+ * insn_get_code_seg_defaults() - Obtain code segment default parameters
+ * @regs:  Structure with register values as seen when entering kernel mode
+ *
+ * Obtain the default parameters of the code segment: address and operand 
sizes.
+ * The code segment is obtained from the selector contained in the CS register
+ * in regs. In protected mode, the default address is determined by inspecting
+ * the L and D bits of the segment descriptor. In virtual-8086 mode, the 
default
+ * is always two bytes for both address and operand sizes.
+ *
+ * Return: A signed 8-bit value containing the default parameters on success 
and
+ * -EINVAL on error.
+ */
+char insn_get_code_seg_defaults(struct pt_regs *regs)
+{
+   struct desc_struct *desc;
+   unsigned short sel;
+
+   if (v8086_mode(regs))
+   /* Address and operand size are both 16-bit. */
+   return INSN_CODE_SEG_PARAMS(2, 2);
+
+   sel = (unsigned short)regs->cs;
+
+   desc = get_desc(sel);
+   if (!desc)
+   return -EINVAL;
+
+   /*
+* The most significant byte of the Type field of the segment descriptor
+* determines whether a segment contains data or code. If this is a data
+* segment, return error.
+*/
+   if (!(desc->type & BIT(3)))
+   return -EINVAL;
+
+   switch ((desc->l << 1) | desc->d) {
+   case 0: /*
+* Legacy mode. CS.L=0, CS.D=0. Address and operand size are
+* both 16-bit.
+*/
+   return INSN_CODE_SEG_PARAMS(2, 2);
+   case 1: /*
+* Legacy mode. CS.L=0, CS.D=1. Address and operand size are
+* both 32-bit.
+*/
+   return INSN_CODE_SEG_PARAMS(4, 4);
+   case 2: /*
+* IA-32e 64-bit mode. CS.L=1, CS.D=0. Address size is 64-bit;
+* operand size is 32-bit.
+*/
+   return INSN_CODE_SEG_PARAMS(4, 8);
+   case 3: /* Invalid setting. CS.L=1, CS.D=1 */
+   /* fall through */
+   default:
+   return -EINVAL;
+   }
+}
+
+/**
  * insn_get_modrm_rm_off() - Obtain register in r/m part of ModRM byte
  * @insn:  Instruction structure containing the ModRM byte
  * @regs:  Structure with register values as seen when entering kernel mode
-- 
2.13.0



[PATCH v8 13/28] x86/insn-eval: Add utility function to get segment descriptor

2017-08-18 Thread Ricardo Neri
The segment descriptor contains information that is relevant to how linear
addresses need to be computed. It contains the default size of addresses
as well as the base address of the segment. Thus, given a segment
selector, we ought look at segment descriptor to correctly calculate the
linear address.

In protected mode, the segment selector might indicate a segment
descriptor from either the global descriptor table or a local descriptor
table. Both cases are considered in this function.

This function is a prerequisite for functions in subsequent commits that
will obtain the aforementioned attributes of the segment descriptor.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 55 
 1 file changed, 55 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 86f58ce6c302..9cf2c49afc15 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -6,9 +6,13 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
 enum reg_type {
@@ -402,6 +406,57 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 }
 
 /**
+ * get_desc() - Obtain address of segment descriptor
+ * @sel:   Segment selector
+ *
+ * Given a segment selector, obtain a pointer to the segment descriptor.
+ * Both global and local descriptor tables are supported.
+ *
+ * Return: pointer to segment descriptor on success. NULL on error.
+ */
+static struct desc_struct *get_desc(unsigned short sel)
+{
+   struct desc_ptr gdt_desc = {0, 0};
+   struct desc_struct *desc = NULL;
+   unsigned long desc_base;
+
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+   if ((sel & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+   /* Bits [15:3] contain the index of the desired entry. */
+   sel >>= 3;
+
+   mutex_lock(>active_mm->context.lock);
+   /* The size of the LDT refers to the number of entries. */
+   if (!current->active_mm->context.ldt ||
+   sel >= current->active_mm->context.ldt->nr_entries) {
+   mutex_unlock(>active_mm->context.lock);
+   return NULL;
+   }
+
+   desc = >active_mm->context.ldt->entries[sel];
+   mutex_unlock(>active_mm->context.lock);
+   return desc;
+   }
+#endif
+   native_store_gdt(_desc);
+
+   /*
+* Segment descriptors have a size of 8 bytes. Thus, the index is
+* multiplied by 8 to obtain the memory offset of the desired descriptor
+* from the base of the GDT. As bits [15:3] of the segment selector
+* contain the index, it can be regarded as multiplied by 8 already.
+* All that remains is to clear bits [2:0].
+*/
+   desc_base = sel & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK);
+
+   if (desc_base > gdt_desc.size)
+   return NULL;
+
+   desc = (struct desc_struct *)(gdt_desc.address + desc_base);
+   return desc;
+}
+
+/**
  * insn_get_modrm_rm_off() - Obtain register in r/m part of ModRM byte
  * @insn:  Instruction structure containing the ModRM byte
  * @regs:  Structure with register values as seen when entering kernel mode
-- 
2.13.0



[PATCH v8 15/28] x86/insn-eval: Add function to get default params of code segment

2017-08-18 Thread Ricardo Neri
Obtain the default values of the address and operand sizes as specified in
the D and L bits of the the segment descriptor selected by the register
CS. The function can be used for both protected and long modes.
For virtual-8086 mode, the default address and operand sizes are always 2
bytes.

The returned parameters are encoded in a signed 8-bit data type. Auxiliar
macros are provided to encode and decode such values.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  5 
 arch/x86/lib/insn-eval.c | 59 
 2 files changed, 64 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 7f3c7fe72cd0..e8c3e7cd1673 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -11,9 +11,14 @@
 #include 
 #include 
 
+#define INSN_CODE_SEG_ADDR_SZ(params) ((params >> 4) & 0xf)
+#define INSN_CODE_SEG_OPND_SZ(params) (params & 0xf)
+#define INSN_CODE_SEG_PARAMS(oper_sz, addr_sz) (oper_sz | (addr_sz << 4))
+
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
 int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
 unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
int regoff);
+char insn_get_code_seg_defaults(struct pt_regs *regs);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 2c5e7081957d..a8e12bd0aecd 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -584,6 +584,65 @@ static unsigned long get_seg_limit(struct pt_regs *regs, 
struct insn *insn,
 }
 
 /**
+ * insn_get_code_seg_defaults() - Obtain code segment default parameters
+ * @regs:  Structure with register values as seen when entering kernel mode
+ *
+ * Obtain the default parameters of the code segment: address and operand 
sizes.
+ * The code segment is obtained from the selector contained in the CS register
+ * in regs. In protected mode, the default address is determined by inspecting
+ * the L and D bits of the segment descriptor. In virtual-8086 mode, the 
default
+ * is always two bytes for both address and operand sizes.
+ *
+ * Return: A signed 8-bit value containing the default parameters on success 
and
+ * -EINVAL on error.
+ */
+char insn_get_code_seg_defaults(struct pt_regs *regs)
+{
+   struct desc_struct *desc;
+   unsigned short sel;
+
+   if (v8086_mode(regs))
+   /* Address and operand size are both 16-bit. */
+   return INSN_CODE_SEG_PARAMS(2, 2);
+
+   sel = (unsigned short)regs->cs;
+
+   desc = get_desc(sel);
+   if (!desc)
+   return -EINVAL;
+
+   /*
+* The most significant byte of the Type field of the segment descriptor
+* determines whether a segment contains data or code. If this is a data
+* segment, return error.
+*/
+   if (!(desc->type & BIT(3)))
+   return -EINVAL;
+
+   switch ((desc->l << 1) | desc->d) {
+   case 0: /*
+* Legacy mode. CS.L=0, CS.D=0. Address and operand size are
+* both 16-bit.
+*/
+   return INSN_CODE_SEG_PARAMS(2, 2);
+   case 1: /*
+* Legacy mode. CS.L=0, CS.D=1. Address and operand size are
+* both 32-bit.
+*/
+   return INSN_CODE_SEG_PARAMS(4, 4);
+   case 2: /*
+* IA-32e 64-bit mode. CS.L=1, CS.D=0. Address size is 64-bit;
+* operand size is 32-bit.
+*/
+   return INSN_CODE_SEG_PARAMS(4, 8);
+   case 3: /* Invalid setting. CS.L=1, CS.D=1 */
+   /* fall through */
+   default:
+   return -EINVAL;
+   }
+}
+
+/**
  * insn_get_modrm_rm_off() - Obtain register in r/m part of ModRM byte
  * @insn:  Instruction structure containing the ModRM byte
  * @regs:  Structure with register values as seen when entering kernel mode
-- 
2.13.0



[PATCH v8 16/28] x86/insn-eval: Indicate a 32-bit displacement if ModRM.mod is 0 and ModRM.rm is 101b

2017-08-18 Thread Ricardo Neri
Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when ModRM.mod is zero and
ModRM.rm is 101b, a 32-bit displacement follows the ModRM byte. This means
that none of the registers are used in the computation of the effective
address. A return value of -EDOM indicates callers that they should not
use the value of registers when computing the effective address for the
instruction.

In long mode, the effective address is given by the 32-bit displacement
plus the location of the next instruction. In protected mode, only the
displacement is used.

The instruction decoder takes care of obtaining the displacement.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index a8e12bd0aecd..04f696c3793e 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -360,6 +360,14 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
switch (type) {
case REG_TYPE_RM:
regno = X86_MODRM_RM(insn->modrm.value);
+
+   /*
+* ModRM.mod == 0 and ModRM.rm == 5 means a 32-bit displacement
+* follows the ModRM byte.
+*/
+   if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
+   return -EDOM;
+
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
@@ -706,10 +714,22 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-   if (addr_offset < 0)
-   goto out_err;
 
-   eff_addr = regs_get_register(regs, addr_offset);
+   /*
+* -EDOM means that we must ignore the address_offset.
+* In such a case, in 64-bit mode the effective address
+* relative to the RIP of the following instruction.
+*/
+   if (addr_offset == -EDOM) {
+   if (user_64bit_mode(regs))
+   eff_addr = (long)regs->ip + 
insn->length;
+   else
+   eff_addr = 0;
+   } else if (addr_offset < 0) {
+   goto out_err;
+   } else {
+   eff_addr = regs_get_register(regs, addr_offset);
+   }
}
 
eff_addr += insn->displacement.value;
-- 
2.13.0



[PATCH v8 14/28] x86/insn-eval: Add utility functions to get segment descriptor base address and limit

2017-08-18 Thread Ricardo Neri
With segmentation, the base address of the segment is needed to compute a
linear address. This base address is obtained from the applicable segment
descriptor. Such segment descriptor is referenced from a segment selector.
The segment selector is stored in the segment register associated with
operands in the instruction being executed or indicated in the instruction
prefixes. Thus, both a structure containing such instruction and its
prefixes as well as the register operand (specified as the offset from the
base of pt_regs) are given as inputs to the new function
insn_get_seg_base() that retrieves the base address indicated in the
segment descriptor.

The logic to obtain the segment selector is wrapped in the function
get_seg_selector() with the inputs described above. Once the selector is
known, the base address is determined. In protected mode, the selector is
used to obtain the segment descriptor and then its base address. In 64-bit
user mode, the segment base address is zero except when FS or GS are used.
In virtual-8086 mode, the base address is computed as the value of the
segment selector shifted 4 positions to the left.

In protected mode, segment limits are enforced. Thus, a function to
determine the limit of the segment is added. Segment limits are not
enforced in long or virtual-8086. For the latter, addresses are limited
to 20 bits; address size will be handled when computing the linear
address.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |   2 +
 arch/x86/lib/insn-eval.c | 127 +++
 2 files changed, 129 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 7e8c9633a377..7f3c7fe72cd0 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -13,5 +13,7 @@
 
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
 int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+   int regoff);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 9cf2c49afc15..2c5e7081957d 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -457,6 +457,133 @@ static struct desc_struct *get_desc(unsigned short sel)
 }
 
 /**
+ * insn_get_seg_base() - Obtain base address of segment descriptor.
+ * @regs:  Structure with register values as seen when entering kernel mode
+ * @insn:  Instruction structure with selector override prefixes
+ * @regoff:Operand offset, in pt_regs, of which the selector is needed
+ *
+ * Obtain the base address of the segment descriptor as indicated by either
+ * any segment override prefixes contained in insn or the default segment
+ * applicable to the register indicated by regoff. regoff is specified as the
+ * offset in bytes from the base of pt_regs.
+ *
+ * Return: In protected mode, base address of the segment. Zero in long mode,
+ * except when FS or GS are used. In virtual-8086 mode, the segment
+ * selector shifted 4 positions to the right. -1L in case of
+ * error.
+ */
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+   int regoff)
+{
+   struct desc_struct *desc;
+   int seg_reg;
+   short sel;
+
+   seg_reg = resolve_seg_register(insn, regs, regoff);
+   if (seg_reg < 0)
+   return -1L;
+
+   sel = get_segment_selector(regs, seg_reg);
+   if (sel < 0)
+   return -1L;
+
+   if (v8086_mode(regs))
+   /*
+* Base is simply the segment selector shifted 4
+* positions to the right.
+*/
+   return (unsigned long)(sel << 4);
+
+   if (user_64bit_mode(regs)) {
+   /*
+* Only FS or GS will have a base address, the rest of
+* the segments' bases are forced to 0.
+*/
+   unsigned long base;
+
+   if (seg_reg == INAT_SEG_REG_FS)
+   rdmsrl(MSR_FS_BASE, base);
+   else if (seg_reg == INAT_SEG_REG_GS)
+   /*
+* swapgs was called at the kernel entry point. 

[PATCH v8 16/28] x86/insn-eval: Indicate a 32-bit displacement if ModRM.mod is 0 and ModRM.rm is 101b

2017-08-18 Thread Ricardo Neri
Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software
Developer's Manual volume 2A states that when ModRM.mod is zero and
ModRM.rm is 101b, a 32-bit displacement follows the ModRM byte. This means
that none of the registers are used in the computation of the effective
address. A return value of -EDOM indicates callers that they should not
use the value of registers when computing the effective address for the
instruction.

In long mode, the effective address is given by the 32-bit displacement
plus the location of the next instruction. In protected mode, only the
displacement is used.

The instruction decoder takes care of obtaining the displacement.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index a8e12bd0aecd..04f696c3793e 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -360,6 +360,14 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
switch (type) {
case REG_TYPE_RM:
regno = X86_MODRM_RM(insn->modrm.value);
+
+   /*
+* ModRM.mod == 0 and ModRM.rm == 5 means a 32-bit displacement
+* follows the ModRM byte.
+*/
+   if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5)
+   return -EDOM;
+
if (X86_REX_B(insn->rex_prefix.value))
regno += 8;
break;
@@ -706,10 +714,22 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
-   if (addr_offset < 0)
-   goto out_err;
 
-   eff_addr = regs_get_register(regs, addr_offset);
+   /*
+* -EDOM means that we must ignore the address_offset.
+* In such a case, in 64-bit mode the effective address
+* relative to the RIP of the following instruction.
+*/
+   if (addr_offset == -EDOM) {
+   if (user_64bit_mode(regs))
+   eff_addr = (long)regs->ip + 
insn->length;
+   else
+   eff_addr = 0;
+   } else if (addr_offset < 0) {
+   goto out_err;
+   } else {
+   eff_addr = regs_get_register(regs, addr_offset);
+   }
}
 
eff_addr += insn->displacement.value;
-- 
2.13.0



[PATCH v8 14/28] x86/insn-eval: Add utility functions to get segment descriptor base address and limit

2017-08-18 Thread Ricardo Neri
With segmentation, the base address of the segment is needed to compute a
linear address. This base address is obtained from the applicable segment
descriptor. Such segment descriptor is referenced from a segment selector.
The segment selector is stored in the segment register associated with
operands in the instruction being executed or indicated in the instruction
prefixes. Thus, both a structure containing such instruction and its
prefixes as well as the register operand (specified as the offset from the
base of pt_regs) are given as inputs to the new function
insn_get_seg_base() that retrieves the base address indicated in the
segment descriptor.

The logic to obtain the segment selector is wrapped in the function
get_seg_selector() with the inputs described above. Once the selector is
known, the base address is determined. In protected mode, the selector is
used to obtain the segment descriptor and then its base address. In 64-bit
user mode, the segment base address is zero except when FS or GS are used.
In virtual-8086 mode, the base address is computed as the value of the
segment selector shifted 4 positions to the left.

In protected mode, segment limits are enforced. Thus, a function to
determine the limit of the segment is added. Segment limits are not
enforced in long or virtual-8086. For the latter, addresses are limited
to 20 bits; address size will be handled when computing the linear
address.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |   2 +
 arch/x86/lib/insn-eval.c | 127 +++
 2 files changed, 129 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 7e8c9633a377..7f3c7fe72cd0 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -13,5 +13,7 @@
 
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
 int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+   int regoff);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 9cf2c49afc15..2c5e7081957d 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -457,6 +457,133 @@ static struct desc_struct *get_desc(unsigned short sel)
 }
 
 /**
+ * insn_get_seg_base() - Obtain base address of segment descriptor.
+ * @regs:  Structure with register values as seen when entering kernel mode
+ * @insn:  Instruction structure with selector override prefixes
+ * @regoff:Operand offset, in pt_regs, of which the selector is needed
+ *
+ * Obtain the base address of the segment descriptor as indicated by either
+ * any segment override prefixes contained in insn or the default segment
+ * applicable to the register indicated by regoff. regoff is specified as the
+ * offset in bytes from the base of pt_regs.
+ *
+ * Return: In protected mode, base address of the segment. Zero in long mode,
+ * except when FS or GS are used. In virtual-8086 mode, the segment
+ * selector shifted 4 positions to the right. -1L in case of
+ * error.
+ */
+unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn,
+   int regoff)
+{
+   struct desc_struct *desc;
+   int seg_reg;
+   short sel;
+
+   seg_reg = resolve_seg_register(insn, regs, regoff);
+   if (seg_reg < 0)
+   return -1L;
+
+   sel = get_segment_selector(regs, seg_reg);
+   if (sel < 0)
+   return -1L;
+
+   if (v8086_mode(regs))
+   /*
+* Base is simply the segment selector shifted 4
+* positions to the right.
+*/
+   return (unsigned long)(sel << 4);
+
+   if (user_64bit_mode(regs)) {
+   /*
+* Only FS or GS will have a base address, the rest of
+* the segments' bases are forced to 0.
+*/
+   unsigned long base;
+
+   if (seg_reg == INAT_SEG_REG_FS)
+   rdmsrl(MSR_FS_BASE, base);
+   else if (seg_reg == INAT_SEG_REG_GS)
+   /*
+* swapgs was called at the kernel entry point. Thus,
+* MSR_KERNEL_GS_BASE will have the user-space GS base.
+*/
+   rdmsrl(MSR_KERNEL_GS_BASE, base);
+   else if (seg_reg != INAT_SEG_REG_IGNORE)
+   /* We should ignore the rest of segment registers. */
+   base = -1L;
+ 

[PATCH v8 04/28] uprobes/x86: Use existing definitions for segment override prefixes

2017-08-18 Thread Ricardo Neri
Rather than using hard-coded values of the segment override prefixes,
leverage the existing definitions provided in inat.h.

Suggested-by: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: Borislav Petkov 
Cc: Masami Hiramatsu 
Cc: Denys Vlasenko 
Cc: Srikar Dronamraju 
Cc: Ravi V. Shankar 
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/uprobes.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 495c776de4b4..a3755d293a48 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -271,12 +271,15 @@ static bool is_prefix_bad(struct insn *insn)
int i;
 
for (i = 0; i < insn->prefixes.nbytes; i++) {
-   switch (insn->prefixes.bytes[i]) {
-   case 0x26:  /* INAT_PFX_ES   */
-   case 0x2E:  /* INAT_PFX_CS   */
-   case 0x36:  /* INAT_PFX_DS   */
-   case 0x3E:  /* INAT_PFX_SS   */
-   case 0xF0:  /* INAT_PFX_LOCK */
+   insn_attr_t attr;
+
+   attr = inat_get_opcode_attribute(insn->prefixes.bytes[i]);
+   switch (attr) {
+   case INAT_MAKE_PREFIX(INAT_PFX_ES):
+   case INAT_MAKE_PREFIX(INAT_PFX_CS):
+   case INAT_MAKE_PREFIX(INAT_PFX_DS):
+   case INAT_MAKE_PREFIX(INAT_PFX_SS):
+   case INAT_MAKE_PREFIX(INAT_PFX_LOCK):
return true;
}
}
-- 
2.13.0



[PATCH v8 19/28] x86/insn-eval: Add wrapper function for 32 and 64-bit addresses

2017-08-18 Thread Ricardo Neri
The function insn_get_addr_ref() is capable of handling only 64-bit
addresses. A previous commit introduced a function to handle 32-bit
addresses. Invoke these two functions from a third wrapper function that
calls the appropriate routine based on the address size specified in the
instruction structure (obtained by looking at the code segment default
address size and the address override prefix, if present).

While doing this, rename the original function insn_get_addr_ref() with
the more appropriate name get_addr_ref_64(), ensure it is only used
for 64-bit addresses and returns a 64-bit error value.

Also, since 64-bit addresses are not possible in 32-bit builds, provide
a dummy function such case.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 53 ++--
 1 file changed, 47 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 6730c9ba02c5..6537b613d0b3 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -812,12 +812,25 @@ static void __user *get_addr_ref_32(struct insn *insn, 
struct pt_regs *regs)
return (void __user *)-1L;
 }
 
-/*
- * return the address being referenced be instruction
- * for rm=3 returning the content of the rm reg
- * for rm!=3 calculates the address using SIB and Disp
+/**
+ * get_addr_ref_64() - Obtain a 64-bit linear address
+ * @insn:  Instruction struct with ModRM and SIB bytes and displacement
+ * @regs:  Structure with register values as seen when entering kernel mode
+ *
+ * This function is to be used with 64-bit address encodings to obtain the
+ * linear memory address referred by the instruction's ModRM, SIB,
+ * displacement bytes and segment base address, as applicable.
+ *
+ * Return: linear address referenced by instruction and registers on success.
+ * -1L on error.
  */
-void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+#ifndef CONFIG_X86_64
+static void __user *get_addr_ref_64(struct insn *insn, struct pt_regs *regs)
+{
+   return (void __user *)-1L;
+}
+#else
+static void __user *get_addr_ref_64(struct insn *insn, struct pt_regs *regs)
 {
int addr_offset, base_offset, indx_offset;
unsigned long linear_addr, seg_base_addr;
@@ -828,6 +841,9 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
insn_get_sib(insn);
sib = insn->sib.value;
 
+   if (insn->addr_bytes != 8)
+   goto out_err;
+
if (X86_MODRM_MOD(insn->modrm.value) == 3) {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
@@ -900,5 +916,30 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
 
return (void __user *)linear_addr;
 out_err:
-   return (void __user *)-1;
+   return (void __user *)-1L;
+}
+#endif /* CONFIG_X86_64 */
+
+/**
+ * insn_get_addr_ref() - Obtain the linear address referred by instruction
+ * @insn:  Instruction structure containing ModRM byte and displacement
+ * @regs:  Structure with register values as seen when entering kernel mode
+ *
+ * Obtain the linear address referred by the instruction's ModRM, SIB and
+ * displacement bytes, and segment base, as applicable. In protected mode,
+ * segment limits are enforced.
+ *
+ * Return: linear address referenced by instruction and registers on success.
+ * -1L on error.
+ */
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+{
+   switch (insn->addr_bytes) {
+   case 4:
+   return get_addr_ref_32(insn, regs);
+   case 8:
+   return get_addr_ref_64(insn, regs);
+   default:
+   return (void __user *)-1L;
+   }
 }
-- 
2.13.0



[PATCH v8 19/28] x86/insn-eval: Add wrapper function for 32 and 64-bit addresses

2017-08-18 Thread Ricardo Neri
The function insn_get_addr_ref() is capable of handling only 64-bit
addresses. A previous commit introduced a function to handle 32-bit
addresses. Invoke these two functions from a third wrapper function that
calls the appropriate routine based on the address size specified in the
instruction structure (obtained by looking at the code segment default
address size and the address override prefix, if present).

While doing this, rename the original function insn_get_addr_ref() with
the more appropriate name get_addr_ref_64(), ensure it is only used
for 64-bit addresses and returns a 64-bit error value.

Also, since 64-bit addresses are not possible in 32-bit builds, provide
a dummy function such case.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 53 ++--
 1 file changed, 47 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 6730c9ba02c5..6537b613d0b3 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -812,12 +812,25 @@ static void __user *get_addr_ref_32(struct insn *insn, 
struct pt_regs *regs)
return (void __user *)-1L;
 }
 
-/*
- * return the address being referenced be instruction
- * for rm=3 returning the content of the rm reg
- * for rm!=3 calculates the address using SIB and Disp
+/**
+ * get_addr_ref_64() - Obtain a 64-bit linear address
+ * @insn:  Instruction struct with ModRM and SIB bytes and displacement
+ * @regs:  Structure with register values as seen when entering kernel mode
+ *
+ * This function is to be used with 64-bit address encodings to obtain the
+ * linear memory address referred by the instruction's ModRM, SIB,
+ * displacement bytes and segment base address, as applicable.
+ *
+ * Return: linear address referenced by instruction and registers on success.
+ * -1L on error.
  */
-void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+#ifndef CONFIG_X86_64
+static void __user *get_addr_ref_64(struct insn *insn, struct pt_regs *regs)
+{
+   return (void __user *)-1L;
+}
+#else
+static void __user *get_addr_ref_64(struct insn *insn, struct pt_regs *regs)
 {
int addr_offset, base_offset, indx_offset;
unsigned long linear_addr, seg_base_addr;
@@ -828,6 +841,9 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
insn_get_sib(insn);
sib = insn->sib.value;
 
+   if (insn->addr_bytes != 8)
+   goto out_err;
+
if (X86_MODRM_MOD(insn->modrm.value) == 3) {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
@@ -900,5 +916,30 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
 
return (void __user *)linear_addr;
 out_err:
-   return (void __user *)-1;
+   return (void __user *)-1L;
+}
+#endif /* CONFIG_X86_64 */
+
+/**
+ * insn_get_addr_ref() - Obtain the linear address referred by instruction
+ * @insn:  Instruction structure containing ModRM byte and displacement
+ * @regs:  Structure with register values as seen when entering kernel mode
+ *
+ * Obtain the linear address referred by the instruction's ModRM, SIB and
+ * displacement bytes, and segment base, as applicable. In protected mode,
+ * segment limits are enforced.
+ *
+ * Return: linear address referenced by instruction and registers on success.
+ * -1L on error.
+ */
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
+{
+   switch (insn->addr_bytes) {
+   case 4:
+   return get_addr_ref_32(insn, regs);
+   case 8:
+   return get_addr_ref_64(insn, regs);
+   default:
+   return (void __user *)-1L;
+   }
 }
-- 
2.13.0



[PATCH v8 04/28] uprobes/x86: Use existing definitions for segment override prefixes

2017-08-18 Thread Ricardo Neri
Rather than using hard-coded values of the segment override prefixes,
leverage the existing definitions provided in inat.h.

Suggested-by: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: Borislav Petkov 
Cc: Masami Hiramatsu 
Cc: Denys Vlasenko 
Cc: Srikar Dronamraju 
Cc: Ravi V. Shankar 
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/uprobes.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 495c776de4b4..a3755d293a48 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -271,12 +271,15 @@ static bool is_prefix_bad(struct insn *insn)
int i;
 
for (i = 0; i < insn->prefixes.nbytes; i++) {
-   switch (insn->prefixes.bytes[i]) {
-   case 0x26:  /* INAT_PFX_ES   */
-   case 0x2E:  /* INAT_PFX_CS   */
-   case 0x36:  /* INAT_PFX_DS   */
-   case 0x3E:  /* INAT_PFX_SS   */
-   case 0xF0:  /* INAT_PFX_LOCK */
+   insn_attr_t attr;
+
+   attr = inat_get_opcode_attribute(insn->prefixes.bytes[i]);
+   switch (attr) {
+   case INAT_MAKE_PREFIX(INAT_PFX_ES):
+   case INAT_MAKE_PREFIX(INAT_PFX_CS):
+   case INAT_MAKE_PREFIX(INAT_PFX_DS):
+   case INAT_MAKE_PREFIX(INAT_PFX_SS):
+   case INAT_MAKE_PREFIX(INAT_PFX_LOCK):
return true;
}
}
-- 
2.13.0



[PATCH v8 21/28] x86/insn-eval: Add support to resolve 16-bit addressing encodings

2017-08-18 Thread Ricardo Neri
Tasks running in virtual-8086 mode, in protected mode with code segment
descriptors that specify 16-bit default address sizes via the
D bit, or via an address override prefix will use 16-bit addressing form
encodings as described in the Intel 64 and IA-32 Architecture Software
Developer's Manual Volume 2A Section 2.1.5, Table 2-1.

16-bit addressing encodings differ in several ways from the 32-bit/64-bit
addressing form encodings: ModRM.rm points to different registers and, in
some cases, effective addresses are indicated by the addition of the value
of two registers. Also, there is no support for SIB bytes. Thus, a
separate function is needed to parse this form of addressing.

A couple of functions are introduced. get_reg_offset_16() obtains the
offset from the base of pt_regs of the registers indicated by the ModRM
byte of the address encoding. get_addr_ref_16() computes the linear
address indicated by the instructions using the value of the registers
given by ModRM and the base address of the applicable segment.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 171 +++
 1 file changed, 171 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 93a6d1f57c2d..6abe46aed6fd 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -414,6 +414,78 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 }
 
 /**
+ * get_reg_offset_16() - Obtain offset of register indicated by instruction
+ * @insn:  Instruction structure containing ModRM and SIB bytes
+ * @regs:  Structure with register values as seen when entering kernel mode
+ * @offs1: Offset of the first operand register
+ * @offs2: Offset of the second opeand register, if applicable
+ *
+ * Obtain the offset, in pt_regs, of the registers indicated by the ModRM byte
+ * within insn. This function is to be used with 16-bit address encodings. The
+ * offs1 and offs2 will be written with the offset of the two registers
+ * indicated by the instruction. In cases where any of the registers is not
+ * referenced by the instruction, the value will be set to -EDOM.
+ *
+ * Return: 0 on success, -EINVAL on failure.
+ */
+static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs,
+int *offs1, int *offs2)
+{
+   /*
+* 16-bit addressing can use one or two registers. Specifics of
+* encodings are given in Table 2-1. "16-Bit Addressing Forms with the
+* ModR/M Byte" of the Intel Software Development Manual.
+*/
+   static const int regoff1[] = {
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, bx),
+   };
+
+   static const int regoff2[] = {
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   -EDOM,
+   -EDOM,
+   -EDOM,
+   -EDOM,
+   };
+
+   if (!offs1 || !offs2)
+   return -EINVAL;
+
+   /* Operand is a register, use the generic function. */
+   if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+   *offs1 = insn_get_modrm_rm_off(insn, regs);
+   *offs2 = -EDOM;
+   return 0;
+   }
+
+   *offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)];
+   *offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)];
+
+   /*
+* If ModRM.mod is 0 and ModRM.rm is 110b, then we use displacement-
+* only addressing. This means that no registers are involved in
+* computing the effective address. Thus, ensure that the first
+* register offset is invalild. The second register offset is already
+* invalid under the aforementioned conditions.
+*/
+   if ((X86_MODRM_MOD(insn->modrm.value) == 0) &&
+   (X86_MODRM_RM(insn->modrm.value) == 6))
+   *offs1 = -EDOM;
+
+   return 0;
+}
+
+/**
  * get_desc() - 

[PATCH v8 21/28] x86/insn-eval: Add support to resolve 16-bit addressing encodings

2017-08-18 Thread Ricardo Neri
Tasks running in virtual-8086 mode, in protected mode with code segment
descriptors that specify 16-bit default address sizes via the
D bit, or via an address override prefix will use 16-bit addressing form
encodings as described in the Intel 64 and IA-32 Architecture Software
Developer's Manual Volume 2A Section 2.1.5, Table 2-1.

16-bit addressing encodings differ in several ways from the 32-bit/64-bit
addressing form encodings: ModRM.rm points to different registers and, in
some cases, effective addresses are indicated by the addition of the value
of two registers. Also, there is no support for SIB bytes. Thus, a
separate function is needed to parse this form of addressing.

A couple of functions are introduced. get_reg_offset_16() obtains the
offset from the base of pt_regs of the registers indicated by the ModRM
byte of the address encoding. get_addr_ref_16() computes the linear
address indicated by the instructions using the value of the registers
given by ModRM and the base address of the applicable segment.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 171 +++
 1 file changed, 171 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 93a6d1f57c2d..6abe46aed6fd 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -414,6 +414,78 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
 }
 
 /**
+ * get_reg_offset_16() - Obtain offset of register indicated by instruction
+ * @insn:  Instruction structure containing ModRM and SIB bytes
+ * @regs:  Structure with register values as seen when entering kernel mode
+ * @offs1: Offset of the first operand register
+ * @offs2: Offset of the second opeand register, if applicable
+ *
+ * Obtain the offset, in pt_regs, of the registers indicated by the ModRM byte
+ * within insn. This function is to be used with 16-bit address encodings. The
+ * offs1 and offs2 will be written with the offset of the two registers
+ * indicated by the instruction. In cases where any of the registers is not
+ * referenced by the instruction, the value will be set to -EDOM.
+ *
+ * Return: 0 on success, -EINVAL on failure.
+ */
+static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs,
+int *offs1, int *offs2)
+{
+   /*
+* 16-bit addressing can use one or two registers. Specifics of
+* encodings are given in Table 2-1. "16-Bit Addressing Forms with the
+* ModR/M Byte" of the Intel Software Development Manual.
+*/
+   static const int regoff1[] = {
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, bx),
+   };
+
+   static const int regoff2[] = {
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+   -EDOM,
+   -EDOM,
+   -EDOM,
+   -EDOM,
+   };
+
+   if (!offs1 || !offs2)
+   return -EINVAL;
+
+   /* Operand is a register, use the generic function. */
+   if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+   *offs1 = insn_get_modrm_rm_off(insn, regs);
+   *offs2 = -EDOM;
+   return 0;
+   }
+
+   *offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)];
+   *offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)];
+
+   /*
+* If ModRM.mod is 0 and ModRM.rm is 110b, then we use displacement-
+* only addressing. This means that no registers are involved in
+* computing the effective address. Thus, ensure that the first
+* register offset is invalild. The second register offset is already
+* invalid under the aforementioned conditions.
+*/
+   if ((X86_MODRM_MOD(insn->modrm.value) == 0) &&
+   (X86_MODRM_RM(insn->modrm.value) == 6))
+   *offs1 = -EDOM;
+
+   return 0;
+}
+
+/**
  * get_desc() - Obtain address of segment descriptor
  * @sel:   Segment selector
  *
@@ -666,6 +738,103 @@ int insn_get_modrm_rm_off(struct insn *insn, struct 
pt_regs *regs)
 }
 
 /**
+ * get_addr_ref_16() - Obtain the 16-bit address referred by instruction
+ * @insn:  Instruction structure containing ModRM byte and displacement
+ * @regs:  Structure 

[PATCH v8 20/28] x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode

2017-08-18 Thread Ricardo Neri
It is possible to utilize 32-bit address encodings in virtual-8086 mode via
an address override instruction prefix. However, the range of the
effective address is still limited to [0x-0x]. In such a case, return
error.

Also, linear addresses in virtual-8086 mode are limited to 20 bits. Enforce
such limit by truncating the most significant bytes of the computed linear
address.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 6537b613d0b3..93a6d1f57c2d 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -801,12 +801,23 @@ static void __user *get_addr_ref_32(struct insn *insn, 
struct pt_regs *regs)
goto out_err;
 
/*
+* Even though 32-bit address encodings are allowed in virtual-8086
+* mode, the address range is still limited to [0x-0x].
+*/
+   if (v8086_mode(regs) && (eff_addr & ~0x))
+   goto out_err;
+
+   /*
 * Data type long could be 64 bits in size. Ensure that our 32-bit
 * effective address is not sign-extended when computing the linear
 * address.
 */
linear_addr = (unsigned long)(eff_addr & 0x) + seg_base_addr;
 
+   /* Limit linear address to 20 bits */
+   if (v8086_mode(regs))
+   linear_addr &= 0xf;
+
return (void __user *)linear_addr;
 out_err:
return (void __user *)-1L;
-- 
2.13.0



[PATCH v8 20/28] x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode

2017-08-18 Thread Ricardo Neri
It is possible to utilize 32-bit address encodings in virtual-8086 mode via
an address override instruction prefix. However, the range of the
effective address is still limited to [0x-0x]. In such a case, return
error.

Also, linear addresses in virtual-8086 mode are limited to 20 bits. Enforce
such limit by truncating the most significant bytes of the computed linear
address.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 6537b613d0b3..93a6d1f57c2d 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -801,12 +801,23 @@ static void __user *get_addr_ref_32(struct insn *insn, 
struct pt_regs *regs)
goto out_err;
 
/*
+* Even though 32-bit address encodings are allowed in virtual-8086
+* mode, the address range is still limited to [0x-0x].
+*/
+   if (v8086_mode(regs) && (eff_addr & ~0x))
+   goto out_err;
+
+   /*
 * Data type long could be 64 bits in size. Ensure that our 32-bit
 * effective address is not sign-extended when computing the linear
 * address.
 */
linear_addr = (unsigned long)(eff_addr & 0x) + seg_base_addr;
 
+   /* Limit linear address to 20 bits */
+   if (v8086_mode(regs))
+   linear_addr &= 0xf;
+
return (void __user *)linear_addr;
 out_err:
return (void __user *)-1L;
-- 
2.13.0



[PATCH v8 24/28] x86/umip: Force a page fault when unable to copy emulated result to user

2017-08-18 Thread Ricardo Neri
fixup_umip_exception() will be called from do_general_protection(). If the
former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV.
However, when emulation is successful but the emulated result cannot be
copied to user space memory, it is more accurate to issue a SIGSEGV with
SEGV_MAPERR with the offending address. A new function, inspired in
force_sig_info_fault(), is introduced to model the page fault.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/umip.c | 45 +++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index cd3201a7deca..6e38b8f5d305 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -185,6 +185,41 @@ static int emulate_umip_insn(struct insn *insn, int 
umip_inst,
 }
 
 /**
+ * force_sig_info_umip_fault() - Force a SIGSEGV with SEGV_MAPERR
+ * @addr:  Address that caused the signal
+ * @regs:  Register set containing the instruction pointer
+ *
+ * Force a SIGSEGV signal with SEGV_MAPERR as the error code. This function is
+ * intended to be used to provide a segmentation fault when the result of the
+ * UMIP emulation could not be copied to the user space memory.
+ *
+ * Return: none
+ */
+static void force_sig_info_umip_fault(void __user *addr, struct pt_regs *regs)
+{
+   siginfo_t info;
+   struct task_struct *tsk = current;
+
+   tsk->thread.cr2 = (unsigned long)addr;
+   tsk->thread.error_code  = X86_PF_USER | X86_PF_WRITE;
+   tsk->thread.trap_nr = X86_TRAP_PF;
+
+   info.si_signo   = SIGSEGV;
+   info.si_errno   = 0;
+   info.si_code= SEGV_MAPERR;
+   info.si_addr= addr;
+   force_sig_info(SIGSEGV, , tsk);
+
+   if (!(show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)))
+   return;
+
+   pr_err_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx 
error:%x in %lx\n",
+  tsk->comm, task_pid_nr(tsk), regs->ip,
+  regs->sp, X86_PF_USER | X86_PF_WRITE,
+  regs->ip);
+}
+
+/**
  * fixup_umip_exception() - Fixup #GP faults caused by UMIP
  * @regs:  Registers as saved when entering the #GP trap
  *
@@ -293,8 +328,14 @@ bool fixup_umip_exception(struct pt_regs *regs)
return false;
 
nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
-   if (nr_copied  > 0)
-   return false;
+   if (nr_copied  > 0) {
+   /*
+* If copy fails, send a signal and tell caller that
+* fault was fixed up.
+*/
+   force_sig_info_umip_fault(uaddr, regs);
+   return true;
+   }
}
 
/* increase IP to let the program keep going */
-- 
2.13.0



[PATCH v8 05/28] x86/mpx: Use signed variables to compute effective addresses

2017-08-18 Thread Ricardo Neri
Even though memory addresses are unsigned, the operands used to compute the
effective address do have a sign. This is true for ModRM.rm, SIB.base,
SIB.index as well as the displacement bytes. Thus, signed variables shall
be used when computing the effective address from these operands. Once the
signed effective address has been computed, it is casted to an unsigned
long to determine the linear address.

Variables are renamed to better reflect the type of address being
computed.

Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Peter Zijlstra 
Cc: Nathan Howard 
Cc: Adan Hawthorn 
Cc: Joe Perches 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/mm/mpx.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 9ceaa955d2ba..9eec98022510 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -138,8 +138,9 @@ static int get_reg_offset(struct insn *insn, struct pt_regs 
*regs,
  */
 static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 {
-   unsigned long addr, base, indx;
int addr_offset, base_offset, indx_offset;
+   unsigned long linear_addr;
+   long eff_addr, base, indx;
insn_byte_t sib;
 
insn_get_modrm(insn);
@@ -150,7 +151,8 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
goto out_err;
-   addr = regs_get_register(regs, addr_offset);
+
+   eff_addr = regs_get_register(regs, addr_offset);
} else {
if (insn->sib.nbytes) {
base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
@@ -163,16 +165,22 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
 
base = regs_get_register(regs, base_offset);
indx = regs_get_register(regs, indx_offset);
-   addr = base + indx * (1 << X86_SIB_SCALE(sib));
+
+   eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
goto out_err;
-   addr = regs_get_register(regs, addr_offset);
+
+   eff_addr = regs_get_register(regs, addr_offset);
}
-   addr += insn->displacement.value;
+
+   eff_addr += insn->displacement.value;
}
-   return (void __user *)addr;
+
+   linear_addr = (unsigned long)eff_addr;
+
+   return (void __user *)linear_addr;
 out_err:
return (void __user *)-1;
 }
-- 
2.13.0



[PATCH v8 24/28] x86/umip: Force a page fault when unable to copy emulated result to user

2017-08-18 Thread Ricardo Neri
fixup_umip_exception() will be called from do_general_protection(). If the
former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV.
However, when emulation is successful but the emulated result cannot be
copied to user space memory, it is more accurate to issue a SIGSEGV with
SEGV_MAPERR with the offending address. A new function, inspired in
force_sig_info_fault(), is introduced to model the page fault.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/umip.c | 45 +++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index cd3201a7deca..6e38b8f5d305 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -185,6 +185,41 @@ static int emulate_umip_insn(struct insn *insn, int 
umip_inst,
 }
 
 /**
+ * force_sig_info_umip_fault() - Force a SIGSEGV with SEGV_MAPERR
+ * @addr:  Address that caused the signal
+ * @regs:  Register set containing the instruction pointer
+ *
+ * Force a SIGSEGV signal with SEGV_MAPERR as the error code. This function is
+ * intended to be used to provide a segmentation fault when the result of the
+ * UMIP emulation could not be copied to the user space memory.
+ *
+ * Return: none
+ */
+static void force_sig_info_umip_fault(void __user *addr, struct pt_regs *regs)
+{
+   siginfo_t info;
+   struct task_struct *tsk = current;
+
+   tsk->thread.cr2 = (unsigned long)addr;
+   tsk->thread.error_code  = X86_PF_USER | X86_PF_WRITE;
+   tsk->thread.trap_nr = X86_TRAP_PF;
+
+   info.si_signo   = SIGSEGV;
+   info.si_errno   = 0;
+   info.si_code= SEGV_MAPERR;
+   info.si_addr= addr;
+   force_sig_info(SIGSEGV, , tsk);
+
+   if (!(show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)))
+   return;
+
+   pr_err_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx 
error:%x in %lx\n",
+  tsk->comm, task_pid_nr(tsk), regs->ip,
+  regs->sp, X86_PF_USER | X86_PF_WRITE,
+  regs->ip);
+}
+
+/**
  * fixup_umip_exception() - Fixup #GP faults caused by UMIP
  * @regs:  Registers as saved when entering the #GP trap
  *
@@ -293,8 +328,14 @@ bool fixup_umip_exception(struct pt_regs *regs)
return false;
 
nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
-   if (nr_copied  > 0)
-   return false;
+   if (nr_copied  > 0) {
+   /*
+* If copy fails, send a signal and tell caller that
+* fault was fixed up.
+*/
+   force_sig_info_umip_fault(uaddr, regs);
+   return true;
+   }
}
 
/* increase IP to let the program keep going */
-- 
2.13.0



[PATCH v8 05/28] x86/mpx: Use signed variables to compute effective addresses

2017-08-18 Thread Ricardo Neri
Even though memory addresses are unsigned, the operands used to compute the
effective address do have a sign. This is true for ModRM.rm, SIB.base,
SIB.index as well as the displacement bytes. Thus, signed variables shall
be used when computing the effective address from these operands. Once the
signed effective address has been computed, it is casted to an unsigned
long to determine the linear address.

Variables are renamed to better reflect the type of address being
computed.

Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Peter Zijlstra 
Cc: Nathan Howard 
Cc: Adan Hawthorn 
Cc: Joe Perches 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/mm/mpx.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 9ceaa955d2ba..9eec98022510 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -138,8 +138,9 @@ static int get_reg_offset(struct insn *insn, struct pt_regs 
*regs,
  */
 static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 {
-   unsigned long addr, base, indx;
int addr_offset, base_offset, indx_offset;
+   unsigned long linear_addr;
+   long eff_addr, base, indx;
insn_byte_t sib;
 
insn_get_modrm(insn);
@@ -150,7 +151,8 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
goto out_err;
-   addr = regs_get_register(regs, addr_offset);
+
+   eff_addr = regs_get_register(regs, addr_offset);
} else {
if (insn->sib.nbytes) {
base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
@@ -163,16 +165,22 @@ static void __user *mpx_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
 
base = regs_get_register(regs, base_offset);
indx = regs_get_register(regs, indx_offset);
-   addr = base + indx * (1 << X86_SIB_SCALE(sib));
+
+   eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
if (addr_offset < 0)
goto out_err;
-   addr = regs_get_register(regs, addr_offset);
+
+   eff_addr = regs_get_register(regs, addr_offset);
}
-   addr += insn->displacement.value;
+
+   eff_addr += insn->displacement.value;
}
-   return (void __user *)addr;
+
+   linear_addr = (unsigned long)eff_addr;
+
+   return (void __user *)linear_addr;
 out_err:
return (void __user *)-1;
 }
-- 
2.13.0



[PATCH v8 22/28] x86/cpufeature: Add User-Mode Instruction Prevention definitions

2017-08-18 Thread Ricardo Neri
User-Mode Instruction Prevention is a security feature present in new
Intel processors that, when set, prevents the execution of a subset of
instructions if such instructions are executed in user mode (CPL > 0).
Attempting to execute such instructions causes a general protection
exception.

The subset of instructions comprises:

 * SGDT - Store Global Descriptor Table
 * SIDT - Store Interrupt Descriptor Table
 * SLDT - Store Local Descriptor Table
 * SMSW - Store Machine Status Word
 * STR  - Store Task Register

This feature is also added to the list of disabled-features to allow
a cleaner handling of build-time configuration.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: x...@kernel.org
Reviewed-by: Borislav Petkov 
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/cpufeatures.h  | 1 +
 arch/x86/include/asm/disabled-features.h| 8 +++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 42bbbf0f173d..7b1aa7fc8657 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -291,6 +291,7 @@
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (ecx), word 16 */
 #define X86_FEATURE_AVX512VBMI  (16*32+ 1) /* AVX512 Vector Bit Manipulation 
instructions*/
+#define X86_FEATURE_UMIP   (16*32+ 2) /* User Mode Instruction Protection 
*/
 #define X86_FEATURE_PKU(16*32+ 3) /* Protection Keys for 
Userspace */
 #define X86_FEATURE_OSPKE  (16*32+ 4) /* OS Protection Keys Enable */
 #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW 
*/
diff --git a/arch/x86/include/asm/disabled-features.h 
b/arch/x86/include/asm/disabled-features.h
index c10c9128f54e..14d6d5007314 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -16,6 +16,12 @@
 # define DISABLE_MPX   (1<<(X86_FEATURE_MPX & 31))
 #endif
 
+#ifdef CONFIG_X86_INTEL_UMIP
+# define DISABLE_UMIP  0
+#else
+# define DISABLE_UMIP  (1<<(X86_FEATURE_UMIP & 31))
+#endif
+
 #ifdef CONFIG_X86_64
 # define DISABLE_VME   (1<<(X86_FEATURE_VME & 31))
 # define DISABLE_K6_MTRR   (1<<(X86_FEATURE_K6_MTRR & 31))
@@ -63,7 +69,7 @@
 #define DISABLED_MASK130
 #define DISABLED_MASK140
 #define DISABLED_MASK150
-#define DISABLED_MASK16(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57)
+#define DISABLED_MASK16
(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP)
 #define DISABLED_MASK170
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
 
diff --git a/arch/x86/include/uapi/asm/processor-flags.h 
b/arch/x86/include/uapi/asm/processor-flags.h
index aae1f2aa7563..6ee8425b92d9 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -104,6 +104,8 @@
 #define X86_CR4_OSFXSR _BITUL(X86_CR4_OSFXSR_BIT)
 #define X86_CR4_OSXMMEXCPT_BIT 10 /* enable unmasked SSE exceptions */
 #define X86_CR4_OSXMMEXCPT _BITUL(X86_CR4_OSXMMEXCPT_BIT)
+#define X86_CR4_UMIP_BIT   11 /* enable UMIP support */
+#define X86_CR4_UMIP   _BITUL(X86_CR4_UMIP_BIT)
 #define X86_CR4_LA57_BIT   12 /* enable 5-level page tables */
 #define X86_CR4_LA57   _BITUL(X86_CR4_LA57_BIT)
 #define X86_CR4_VMXE_BIT   13 /* enable VMX virtualization */
-- 
2.13.0



[PATCH v8 22/28] x86/cpufeature: Add User-Mode Instruction Prevention definitions

2017-08-18 Thread Ricardo Neri
User-Mode Instruction Prevention is a security feature present in new
Intel processors that, when set, prevents the execution of a subset of
instructions if such instructions are executed in user mode (CPL > 0).
Attempting to execute such instructions causes a general protection
exception.

The subset of instructions comprises:

 * SGDT - Store Global Descriptor Table
 * SIDT - Store Interrupt Descriptor Table
 * SLDT - Store Local Descriptor Table
 * SMSW - Store Machine Status Word
 * STR  - Store Task Register

This feature is also added to the list of disabled-features to allow
a cleaner handling of build-time configuration.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: x...@kernel.org
Reviewed-by: Borislav Petkov 
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/cpufeatures.h  | 1 +
 arch/x86/include/asm/disabled-features.h| 8 +++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 42bbbf0f173d..7b1aa7fc8657 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -291,6 +291,7 @@
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (ecx), word 16 */
 #define X86_FEATURE_AVX512VBMI  (16*32+ 1) /* AVX512 Vector Bit Manipulation 
instructions*/
+#define X86_FEATURE_UMIP   (16*32+ 2) /* User Mode Instruction Protection 
*/
 #define X86_FEATURE_PKU(16*32+ 3) /* Protection Keys for 
Userspace */
 #define X86_FEATURE_OSPKE  (16*32+ 4) /* OS Protection Keys Enable */
 #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW 
*/
diff --git a/arch/x86/include/asm/disabled-features.h 
b/arch/x86/include/asm/disabled-features.h
index c10c9128f54e..14d6d5007314 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -16,6 +16,12 @@
 # define DISABLE_MPX   (1<<(X86_FEATURE_MPX & 31))
 #endif
 
+#ifdef CONFIG_X86_INTEL_UMIP
+# define DISABLE_UMIP  0
+#else
+# define DISABLE_UMIP  (1<<(X86_FEATURE_UMIP & 31))
+#endif
+
 #ifdef CONFIG_X86_64
 # define DISABLE_VME   (1<<(X86_FEATURE_VME & 31))
 # define DISABLE_K6_MTRR   (1<<(X86_FEATURE_K6_MTRR & 31))
@@ -63,7 +69,7 @@
 #define DISABLED_MASK130
 #define DISABLED_MASK140
 #define DISABLED_MASK150
-#define DISABLED_MASK16(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57)
+#define DISABLED_MASK16
(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP)
 #define DISABLED_MASK170
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
 
diff --git a/arch/x86/include/uapi/asm/processor-flags.h 
b/arch/x86/include/uapi/asm/processor-flags.h
index aae1f2aa7563..6ee8425b92d9 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -104,6 +104,8 @@
 #define X86_CR4_OSFXSR _BITUL(X86_CR4_OSFXSR_BIT)
 #define X86_CR4_OSXMMEXCPT_BIT 10 /* enable unmasked SSE exceptions */
 #define X86_CR4_OSXMMEXCPT _BITUL(X86_CR4_OSXMMEXCPT_BIT)
+#define X86_CR4_UMIP_BIT   11 /* enable UMIP support */
+#define X86_CR4_UMIP   _BITUL(X86_CR4_UMIP_BIT)
 #define X86_CR4_LA57_BIT   12 /* enable 5-level page tables */
 #define X86_CR4_LA57   _BITUL(X86_CR4_LA57_BIT)
 #define X86_CR4_VMXE_BIT   13 /* enable VMX virtualization */
-- 
2.13.0



[PATCH v8 23/28] x86: Add emulation code for UMIP instructions

2017-08-18 Thread Ricardo Neri
The feature User-Mode Instruction Prevention present in recent Intel
processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and
str) from being executed with CPL > 0. Otherwise, a general protection
fault is issued.

Rather than relaying to the user space the general protection fault caused
by the UMIP-protected instructions (in the form of a SIGSEGV signal), it
can be trapped and emulate the result of such instructions to provide dummy
values. This allows to both conserve the current kernel behavior and not
reveal the system resources that UMIP intends to protect (i.e., the
locations of the global descriptor and interrupt descriptor tables, the
segment selectors of the local descriptor table, the value of the task
state register and the contents of the CR0 register).

This emulation is needed because certain applications (e.g., WineHQ and
DOSEMU2) rely on this subset of instructions to function. Given that sldt
and str are not commonly used in programs that run on WineHQ or DOSEMU2,
they are not emulated. Also, emulation is provided only for 32-bit
processes; 64-bit processes that attempt to use the instructions that UMIP
protects will receive the SIGSEGV signal issued as a consequence of the
general protection fault.

The instructions protected by UMIP can be split in two groups. Those which
return a kernel memory address (sgdt and sidt) and those which return a
value (sldt, str and smsw).

For the instructions that return a kernel memory address, applications such
as WineHQ rely on the result being located in the kernel memory space, not
the actual location of the table. The result is emulated as a hard-coded
value that lies close to the top of the kernel memory. The limit for the
GDT and the IDT are set to zero.

The instruction smsw is emulated to return the value that the register CR0
has at boot time as set in the head_32.

Care is taken to appropriately emulate the results when segmentation is
used. That is, rather than relying on USER_DS and USER_CS, the function
insn_get_addr_ref() inspects the segment descriptor pointed by the
registers in pt_regs. This ensures that we correctly obtain the segment
base address and the address and operand sizes even if the user space
application uses a local descriptor table.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/umip.h |  12 ++
 arch/x86/kernel/Makefile|   1 +
 arch/x86/kernel/umip.c  | 303 
 3 files changed, 316 insertions(+)
 create mode 100644 arch/x86/include/asm/umip.h
 create mode 100644 arch/x86/kernel/umip.c

diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h
new file mode 100644
index ..db43f2a0d92c
--- /dev/null
+++ b/arch/x86/include/asm/umip.h
@@ -0,0 +1,12 @@
+#ifndef _ASM_X86_UMIP_H
+#define _ASM_X86_UMIP_H
+
+#include 
+#include 
+
+#ifdef CONFIG_X86_INTEL_UMIP
+bool fixup_umip_exception(struct pt_regs *regs);
+#else
+static inline bool fixup_umip_exception(struct pt_regs *regs) { return false; }
+#endif  /* CONFIG_X86_INTEL_UMIP */
+#endif  /* _ASM_X86_UMIP_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 287eac7d207f..e057e22cd0d4 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -125,6 +125,7 @@ obj-$(CONFIG_EFI)   += sysfb_efi.o
 obj-$(CONFIG_PERF_EVENTS)  += perf_regs.o
 obj-$(CONFIG_TRACING)  += tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)+= itmt.o
+obj-$(CONFIG_X86_INTEL_UMIP)   += umip.o
 
 obj-$(CONFIG_ORC_UNWINDER) += unwind_orc.o
 obj-$(CONFIG_FRAME_POINTER_UNWINDER)   += unwind_frame.o
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
new file mode 100644
index ..cd3201a7deca
--- /dev/null
+++ b/arch/x86/kernel/umip.c
@@ -0,0 +1,303 @@
+/*
+ * umip.c Emulation for instruction protected by the Intel User-Mode
+ * Instruction Prevention feature
+ *
+ * Copyright (c) 2017, Intel Corporation.
+ * Ricardo Neri 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 

[PATCH v8 23/28] x86: Add emulation code for UMIP instructions

2017-08-18 Thread Ricardo Neri
The feature User-Mode Instruction Prevention present in recent Intel
processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and
str) from being executed with CPL > 0. Otherwise, a general protection
fault is issued.

Rather than relaying to the user space the general protection fault caused
by the UMIP-protected instructions (in the form of a SIGSEGV signal), it
can be trapped and emulate the result of such instructions to provide dummy
values. This allows to both conserve the current kernel behavior and not
reveal the system resources that UMIP intends to protect (i.e., the
locations of the global descriptor and interrupt descriptor tables, the
segment selectors of the local descriptor table, the value of the task
state register and the contents of the CR0 register).

This emulation is needed because certain applications (e.g., WineHQ and
DOSEMU2) rely on this subset of instructions to function. Given that sldt
and str are not commonly used in programs that run on WineHQ or DOSEMU2,
they are not emulated. Also, emulation is provided only for 32-bit
processes; 64-bit processes that attempt to use the instructions that UMIP
protects will receive the SIGSEGV signal issued as a consequence of the
general protection fault.

The instructions protected by UMIP can be split in two groups. Those which
return a kernel memory address (sgdt and sidt) and those which return a
value (sldt, str and smsw).

For the instructions that return a kernel memory address, applications such
as WineHQ rely on the result being located in the kernel memory space, not
the actual location of the table. The result is emulated as a hard-coded
value that lies close to the top of the kernel memory. The limit for the
GDT and the IDT are set to zero.

The instruction smsw is emulated to return the value that the register CR0
has at boot time as set in the head_32.

Care is taken to appropriately emulate the results when segmentation is
used. That is, rather than relying on USER_DS and USER_CS, the function
insn_get_addr_ref() inspects the segment descriptor pointed by the
registers in pt_regs. This ensures that we correctly obtain the segment
base address and the address and operand sizes even if the user space
application uses a local descriptor table.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/umip.h |  12 ++
 arch/x86/kernel/Makefile|   1 +
 arch/x86/kernel/umip.c  | 303 
 3 files changed, 316 insertions(+)
 create mode 100644 arch/x86/include/asm/umip.h
 create mode 100644 arch/x86/kernel/umip.c

diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h
new file mode 100644
index ..db43f2a0d92c
--- /dev/null
+++ b/arch/x86/include/asm/umip.h
@@ -0,0 +1,12 @@
+#ifndef _ASM_X86_UMIP_H
+#define _ASM_X86_UMIP_H
+
+#include 
+#include 
+
+#ifdef CONFIG_X86_INTEL_UMIP
+bool fixup_umip_exception(struct pt_regs *regs);
+#else
+static inline bool fixup_umip_exception(struct pt_regs *regs) { return false; }
+#endif  /* CONFIG_X86_INTEL_UMIP */
+#endif  /* _ASM_X86_UMIP_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 287eac7d207f..e057e22cd0d4 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -125,6 +125,7 @@ obj-$(CONFIG_EFI)   += sysfb_efi.o
 obj-$(CONFIG_PERF_EVENTS)  += perf_regs.o
 obj-$(CONFIG_TRACING)  += tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)+= itmt.o
+obj-$(CONFIG_X86_INTEL_UMIP)   += umip.o
 
 obj-$(CONFIG_ORC_UNWINDER) += unwind_orc.o
 obj-$(CONFIG_FRAME_POINTER_UNWINDER)   += unwind_frame.o
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
new file mode 100644
index ..cd3201a7deca
--- /dev/null
+++ b/arch/x86/kernel/umip.c
@@ -0,0 +1,303 @@
+/*
+ * umip.c Emulation for instruction protected by the Intel User-Mode
+ * Instruction Prevention feature
+ *
+ * Copyright (c) 2017, Intel Corporation.
+ * Ricardo Neri 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/** DOC: Emulation for User-Mode Instruction Prevention (UMIP)
+ *
+ * The feature User-Mode Instruction Prevention present in recent Intel
+ * processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and str)
+ * from being executed with CPL > 0. Otherwise, a general protection fault is
+ * issued.
+ *
+ * Rather than relaying to the user space the general protection fault caused 
by
+ * the UMIP-protected instructions (in the form of a SIGSEGV signal), it can be
+ * trapped and emulate 

[PATCH v8 09/28] x86/insn-eval: Do not BUG on invalid register type

2017-08-18 Thread Ricardo Neri
We are not in a critical failure path. The invalid register type is caused
when trying to decode invalid instruction bytes from a user-space program.
Thus, simply print an error message. To prevent this warning from being
abused from user space programs, use the rate-limited variant of pr_err().

Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 2bb8303ba92f..3919458fecbf 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -5,6 +5,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -85,9 +86,8 @@ static int get_reg_offset(struct insn *insn, struct pt_regs 
*regs,
break;
 
default:
-   pr_err("invalid register type");
-   BUG();
-   break;
+   pr_err_ratelimited("insn: x86: invalid register type");
+   return -EINVAL;
}
 
if (regno >= nr_registers) {
-- 
2.13.0



[PATCH v8 09/28] x86/insn-eval: Do not BUG on invalid register type

2017-08-18 Thread Ricardo Neri
We are not in a critical failure path. The invalid register type is caused
when trying to decode invalid instruction bytes from a user-space program.
Thus, simply print an error message. To prevent this warning from being
abused from user space programs, use the rate-limited variant of pr_err().

Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 2bb8303ba92f..3919458fecbf 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -5,6 +5,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -85,9 +86,8 @@ static int get_reg_offset(struct insn *insn, struct pt_regs 
*regs,
break;
 
default:
-   pr_err("invalid register type");
-   BUG();
-   break;
+   pr_err_ratelimited("insn: x86: invalid register type");
+   return -EINVAL;
}
 
if (regno >= nr_registers) {
-- 
2.13.0



[PATCH v8 25/28] x86: Enable User-Mode Instruction Prevention

2017-08-18 Thread Ricardo Neri
User-Mode Instruction Prevention (UMIP) is enabled by setting/clearing a
bit in %cr4.

It makes sense to enable UMIP at some point while booting, before user
spaces come up. Like SMAP and SMEP, is not critical to have it enabled
very early during boot. This is because UMIP is relevant only when there is
a userspace to be protected from. Given the similarities in relevance, it
makes sense to enable UMIP along with SMAP and SMEP.

UMIP is enabled by default. It can be disabled by adding clearcpuid=514
to the kernel parameters.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/Kconfig | 10 ++
 arch/x86/kernel/cpu/common.c | 25 -
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ce3ed304288d..5c384d926937 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1801,6 +1801,16 @@ config X86_SMAP
 
  If unsure, say Y.
 
+config X86_INTEL_UMIP
+   def_bool n
+   depends on CPU_SUP_INTEL
+   prompt "Intel User Mode Instruction Prevention" if EXPERT
+   ---help---
+ The User Mode Instruction Prevention (UMIP) is a security
+ feature in newer Intel processors. If enabled, a general
+ protection fault is issued if the instructions SGDT, SLDT,
+ SIDT, SMSW and STR are executed in user mode.
+
 config X86_INTEL_MPX
prompt "Intel MPX (Memory Protection Extensions)"
def_bool n
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index b95cd94ca97b..5066d7ffa55e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -348,6 +348,28 @@ static void setup_pcid(struct cpuinfo_x86 *c)
}
 }
 
+static __always_inline void setup_umip(struct cpuinfo_x86 *c)
+{
+   /* Check the boot processor, plus build option for UMIP. */
+   if (!cpu_feature_enabled(X86_FEATURE_UMIP))
+   goto out;
+
+   /* Check the current processor's cpuid bits. */
+   if (!cpu_has(c, X86_FEATURE_UMIP))
+   goto out;
+
+   cr4_set_bits(X86_CR4_UMIP);
+
+   return;
+
+out:
+   /*
+* Make sure UMIP is disabled in case it was enabled in a
+* previous boot (e.g., via kexec).
+*/
+   cr4_clear_bits(X86_CR4_UMIP);
+}
+
 /*
  * Protection Keys are not available in 32-bit mode.
  */
@@ -1158,9 +1180,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);
 
-   /* Set up SMEP/SMAP */
+   /* Set up SMEP/SMAP/UMIP */
setup_smep(c);
setup_smap(c);
+   setup_umip(c);
 
/* Set up PCID */
setup_pcid(c);
-- 
2.13.0



[PATCH v8 25/28] x86: Enable User-Mode Instruction Prevention

2017-08-18 Thread Ricardo Neri
User-Mode Instruction Prevention (UMIP) is enabled by setting/clearing a
bit in %cr4.

It makes sense to enable UMIP at some point while booting, before user
spaces come up. Like SMAP and SMEP, is not critical to have it enabled
very early during boot. This is because UMIP is relevant only when there is
a userspace to be protected from. Given the similarities in relevance, it
makes sense to enable UMIP along with SMAP and SMEP.

UMIP is enabled by default. It can be disabled by adding clearcpuid=514
to the kernel parameters.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/Kconfig | 10 ++
 arch/x86/kernel/cpu/common.c | 25 -
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ce3ed304288d..5c384d926937 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1801,6 +1801,16 @@ config X86_SMAP
 
  If unsure, say Y.
 
+config X86_INTEL_UMIP
+   def_bool n
+   depends on CPU_SUP_INTEL
+   prompt "Intel User Mode Instruction Prevention" if EXPERT
+   ---help---
+ The User Mode Instruction Prevention (UMIP) is a security
+ feature in newer Intel processors. If enabled, a general
+ protection fault is issued if the instructions SGDT, SLDT,
+ SIDT, SMSW and STR are executed in user mode.
+
 config X86_INTEL_MPX
prompt "Intel MPX (Memory Protection Extensions)"
def_bool n
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index b95cd94ca97b..5066d7ffa55e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -348,6 +348,28 @@ static void setup_pcid(struct cpuinfo_x86 *c)
}
 }
 
+static __always_inline void setup_umip(struct cpuinfo_x86 *c)
+{
+   /* Check the boot processor, plus build option for UMIP. */
+   if (!cpu_feature_enabled(X86_FEATURE_UMIP))
+   goto out;
+
+   /* Check the current processor's cpuid bits. */
+   if (!cpu_has(c, X86_FEATURE_UMIP))
+   goto out;
+
+   cr4_set_bits(X86_CR4_UMIP);
+
+   return;
+
+out:
+   /*
+* Make sure UMIP is disabled in case it was enabled in a
+* previous boot (e.g., via kexec).
+*/
+   cr4_clear_bits(X86_CR4_UMIP);
+}
+
 /*
  * Protection Keys are not available in 32-bit mode.
  */
@@ -1158,9 +1180,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);
 
-   /* Set up SMEP/SMAP */
+   /* Set up SMEP/SMAP/UMIP */
setup_smep(c);
setup_smap(c);
+   setup_umip(c);
 
/* Set up PCID */
setup_pcid(c);
-- 
2.13.0



[PATCH v8 26/28] x86/traps: Fixup general protection faults caused by UMIP

2017-08-18 Thread Ricardo Neri
If the User-Mode Instruction Prevention CPU feature is available and
enabled, a general protection fault will be issued if the instructions
sgdt, sldt, sidt, str or smsw are executed from user-mode context
(CPL > 0). If the fault was caused by any of the instructions protected
by UMIP, fixup_umip_exception() will emulate dummy results for these
instructions as follows: if running a 32-bit process, sgdt, sidt and smsw
are emulated; str and sldt are not emulated. No emulation is done for
64-bit processes.

If emulation is successful, the result is passed to the user space program
and no SIGSEGV signal is emitted.

Please note that fixup_umip_exception() also caters for the case when
the fault originated while running in virtual-8086 mode.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: x...@kernel.org
Reviewed-by: Andy Lutomirski 
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/traps.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index bf54309b85da..1c1bb7992f70 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -65,6 +65,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -526,6 +527,10 @@ do_general_protection(struct pt_regs *regs, long 
error_code)
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
cond_local_irq_enable(regs);
 
+   if (static_cpu_has(X86_FEATURE_UMIP))
+   if (user_mode(regs) && fixup_umip_exception(regs))
+   return;
+
if (v8086_mode(regs)) {
local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
-- 
2.13.0



[PATCH v8 26/28] x86/traps: Fixup general protection faults caused by UMIP

2017-08-18 Thread Ricardo Neri
If the User-Mode Instruction Prevention CPU feature is available and
enabled, a general protection fault will be issued if the instructions
sgdt, sldt, sidt, str or smsw are executed from user-mode context
(CPL > 0). If the fault was caused by any of the instructions protected
by UMIP, fixup_umip_exception() will emulate dummy results for these
instructions as follows: if running a 32-bit process, sgdt, sidt and smsw
are emulated; str and sldt are not emulated. No emulation is done for
64-bit processes.

If emulation is successful, the result is passed to the user space program
and no SIGSEGV signal is emitted.

Please note that fixup_umip_exception() also caters for the case when
the fault originated while running in virtual-8086 mode.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Liang Z. Li 
Cc: x...@kernel.org
Reviewed-by: Andy Lutomirski 
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/traps.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index bf54309b85da..1c1bb7992f70 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -65,6 +65,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -526,6 +527,10 @@ do_general_protection(struct pt_regs *regs, long 
error_code)
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
cond_local_irq_enable(regs);
 
+   if (static_cpu_has(X86_FEATURE_UMIP))
+   if (user_mode(regs) && fixup_umip_exception(regs))
+   return;
+
if (v8086_mode(regs)) {
local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
-- 
2.13.0



[PATCH v8 27/28] selftests/x86: Add tests for User-Mode Instruction Prevention

2017-08-18 Thread Ricardo Neri
Certain user space programs that run on virtual-8086 mode may utilize
instructions protected by the User-Mode Instruction Prevention (UMIP)
security feature present in new Intel processors: SGDT, SIDT and SMSW. In
such a case, a general protection fault is issued if UMIP is enabled. When
such a fault happens, the kernel traps it and emulates the results of
these instructions with dummy values. The purpose of this new
test is to verify whether the impacted instructions can be executed
without causing such #GP. If no #GP exceptions occur, we expect to exit
virtual-8086 mode from INT3.

The instructions protected by UMIP are executed in representative use
cases:
 a) displacement-only memory addressing
 b) register-indirect memory addressing
 c) results stored directly in operands

Unfortunately, it is not possible to check the results against a set of
expected values because no emulation will occur in systems that do not
have the UMIP feature. Instead, results are printed for verification. A
simple verification is done to ensure that results of all tests are
identical.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Signed-off-by: Ricardo Neri 
---
 tools/testing/selftests/x86/entry_from_vm86.c | 73 ++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/entry_from_vm86.c 
b/tools/testing/selftests/x86/entry_from_vm86.c
index d075ea0e5ca1..130e8ad1db05 100644
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -95,6 +95,22 @@ asm (
"int3\n\t"
"vmcode_int80:\n\t"
"int $0x80\n\t"
+   "vmcode_umip:\n\t"
+   /* addressing via displacements */
+   "smsw (2052)\n\t"
+   "sidt (2054)\n\t"
+   "sgdt (2060)\n\t"
+   /* addressing via registers */
+   "mov $2066, %bx\n\t"
+   "smsw (%bx)\n\t"
+   "mov $2068, %bx\n\t"
+   "sidt (%bx)\n\t"
+   "mov $2074, %bx\n\t"
+   "sgdt (%bx)\n\t"
+   /* register operands, only for smsw */
+   "smsw %ax\n\t"
+   "mov %ax, (2080)\n\t"
+   "int3\n\t"
".size vmcode, . - vmcode\n\t"
"end_vmcode:\n\t"
".code32\n\t"
@@ -103,7 +119,7 @@ asm (
 
 extern unsigned char vmcode[], end_vmcode[];
 extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
-   vmcode_sti[], vmcode_int3[], vmcode_int80[];
+   vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[];
 
 /* Returns false if the test was skipped. */
 static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -160,6 +176,58 @@ static bool do_test(struct vm86plus_struct *v86, unsigned 
long eip,
return true;
 }
 
+void do_umip_tests(struct vm86plus_struct *vm86, unsigned char *test_mem)
+{
+   struct table_desc {
+   unsigned short limit;
+   unsigned long base;
+   } __attribute__((packed));
+
+   /* Initialize variables with arbitrary values */
+   struct table_desc gdt1 = { .base = 0x3c3c3c3c, .limit = 0x };
+   struct table_desc gdt2 = { .base = 0x1a1a1a1a, .limit = 0xaeae };
+   struct table_desc idt1 = { .base = 0x7b7b7b7b, .limit = 0xf1f1 };
+   struct table_desc idt2 = { .base = 0x89898989, .limit = 0x1313 };
+   unsigned short msw1 = 0x1414, msw2 = 0x2525, msw3 = 3737;
+
+   /* UMIP -- exit with INT3 unless kernel emulation did not trap #GP */
+   do_test(vm86, vmcode_umip - vmcode, VM86_TRAP, 3, "UMIP tests");
+
+   /* Results from displacement-only addressing */
+   msw1 = *(unsigned short *)(test_mem + 2052);
+   memcpy(, test_mem + 2054, sizeof(idt1));
+   memcpy(, test_mem + 2060, sizeof(gdt1));
+
+   /* Results from register-indirect addressing */
+   msw2 = *(unsigned short *)(test_mem + 2066);
+   memcpy(, test_mem + 2068, sizeof(idt2));
+   memcpy(, test_mem + 2074, sizeof(gdt2));
+
+   /* Results when using register operands */
+   msw3 = *(unsigned short *)(test_mem + 2080);
+
+   printf("[INFO]\tResult from SMSW:[0x%04x]\n", msw1);
+   printf("[INFO]\tResult from SIDT: limit[0x%04x]base[0x%08lx]\n",
+  idt1.limit, idt1.base);
+   printf("[INFO]\tResult from SGDT: limit[0x%04x]base[0x%08lx]\n",
+  gdt1.limit, gdt1.base);
+
+   if ((msw1 != msw2) || 

[PATCH v8 28/28] selftests/x86: Add tests for instruction str and sldt

2017-08-18 Thread Ricardo Neri
The instructions str and sldt are not recognized when running on virtual-
8086 mode and generate an invalid operand exception. These two
instructions are protected by the Intel User-Mode Instruction Prevention
(UMIP) security feature. In protected mode, if UMIP is enabled, these
instructions generate a general protection fault if called from CPL > 0.
Linux traps the general protection fault and emulates the instructions
sgdt, sidt and smsw; but not str and sldt.

These tests are added to verify that the emulation code does not emulate
these two instructions but the expected invalid operand exception is
seen.

Tests fallback to exit with int3 in case emulation does happen.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Signed-off-by: Ricardo Neri 
---
 tools/testing/selftests/x86/entry_from_vm86.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/entry_from_vm86.c 
b/tools/testing/selftests/x86/entry_from_vm86.c
index 130e8ad1db05..b7a0c9024477 100644
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -111,6 +111,11 @@ asm (
"smsw %ax\n\t"
"mov %ax, (2080)\n\t"
"int3\n\t"
+   "vmcode_umip_str:\n\t"
+   "str %eax\n\t"
+   "vmcode_umip_sldt:\n\t"
+   "sldt %eax\n\t"
+   "int3\n\t"
".size vmcode, . - vmcode\n\t"
"end_vmcode:\n\t"
".code32\n\t"
@@ -119,7 +124,8 @@ asm (
 
 extern unsigned char vmcode[], end_vmcode[];
 extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
-   vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[];
+   vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[],
+   vmcode_umip_str[], vmcode_umip_sldt[];
 
 /* Returns false if the test was skipped. */
 static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -226,6 +232,16 @@ void do_umip_tests(struct vm86plus_struct *vm86, unsigned 
char *test_mem)
printf("[FAIL]\tAll the results of SIDT should be the same.\n");
else
printf("[PASS]\tAll the results from SIDT are identical.\n");
+
+   sethandler(SIGILL, sighandler, 0);
+   do_test(vm86, vmcode_umip_str - vmcode, VM86_SIGNAL, 0,
+   "STR instruction");
+   clearhandler(SIGILL);
+
+   sethandler(SIGILL, sighandler, 0);
+   do_test(vm86, vmcode_umip_sldt - vmcode, VM86_SIGNAL, 0,
+   "SLDT instruction");
+   clearhandler(SIGILL);
 }
 
 int main(void)
-- 
2.13.0



[PATCH v8 27/28] selftests/x86: Add tests for User-Mode Instruction Prevention

2017-08-18 Thread Ricardo Neri
Certain user space programs that run on virtual-8086 mode may utilize
instructions protected by the User-Mode Instruction Prevention (UMIP)
security feature present in new Intel processors: SGDT, SIDT and SMSW. In
such a case, a general protection fault is issued if UMIP is enabled. When
such a fault happens, the kernel traps it and emulates the results of
these instructions with dummy values. The purpose of this new
test is to verify whether the impacted instructions can be executed
without causing such #GP. If no #GP exceptions occur, we expect to exit
virtual-8086 mode from INT3.

The instructions protected by UMIP are executed in representative use
cases:
 a) displacement-only memory addressing
 b) register-indirect memory addressing
 c) results stored directly in operands

Unfortunately, it is not possible to check the results against a set of
expected values because no emulation will occur in systems that do not
have the UMIP feature. Instead, results are printed for verification. A
simple verification is done to ensure that results of all tests are
identical.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Signed-off-by: Ricardo Neri 
---
 tools/testing/selftests/x86/entry_from_vm86.c | 73 ++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/entry_from_vm86.c 
b/tools/testing/selftests/x86/entry_from_vm86.c
index d075ea0e5ca1..130e8ad1db05 100644
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -95,6 +95,22 @@ asm (
"int3\n\t"
"vmcode_int80:\n\t"
"int $0x80\n\t"
+   "vmcode_umip:\n\t"
+   /* addressing via displacements */
+   "smsw (2052)\n\t"
+   "sidt (2054)\n\t"
+   "sgdt (2060)\n\t"
+   /* addressing via registers */
+   "mov $2066, %bx\n\t"
+   "smsw (%bx)\n\t"
+   "mov $2068, %bx\n\t"
+   "sidt (%bx)\n\t"
+   "mov $2074, %bx\n\t"
+   "sgdt (%bx)\n\t"
+   /* register operands, only for smsw */
+   "smsw %ax\n\t"
+   "mov %ax, (2080)\n\t"
+   "int3\n\t"
".size vmcode, . - vmcode\n\t"
"end_vmcode:\n\t"
".code32\n\t"
@@ -103,7 +119,7 @@ asm (
 
 extern unsigned char vmcode[], end_vmcode[];
 extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
-   vmcode_sti[], vmcode_int3[], vmcode_int80[];
+   vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[];
 
 /* Returns false if the test was skipped. */
 static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -160,6 +176,58 @@ static bool do_test(struct vm86plus_struct *v86, unsigned 
long eip,
return true;
 }
 
+void do_umip_tests(struct vm86plus_struct *vm86, unsigned char *test_mem)
+{
+   struct table_desc {
+   unsigned short limit;
+   unsigned long base;
+   } __attribute__((packed));
+
+   /* Initialize variables with arbitrary values */
+   struct table_desc gdt1 = { .base = 0x3c3c3c3c, .limit = 0x };
+   struct table_desc gdt2 = { .base = 0x1a1a1a1a, .limit = 0xaeae };
+   struct table_desc idt1 = { .base = 0x7b7b7b7b, .limit = 0xf1f1 };
+   struct table_desc idt2 = { .base = 0x89898989, .limit = 0x1313 };
+   unsigned short msw1 = 0x1414, msw2 = 0x2525, msw3 = 3737;
+
+   /* UMIP -- exit with INT3 unless kernel emulation did not trap #GP */
+   do_test(vm86, vmcode_umip - vmcode, VM86_TRAP, 3, "UMIP tests");
+
+   /* Results from displacement-only addressing */
+   msw1 = *(unsigned short *)(test_mem + 2052);
+   memcpy(, test_mem + 2054, sizeof(idt1));
+   memcpy(, test_mem + 2060, sizeof(gdt1));
+
+   /* Results from register-indirect addressing */
+   msw2 = *(unsigned short *)(test_mem + 2066);
+   memcpy(, test_mem + 2068, sizeof(idt2));
+   memcpy(, test_mem + 2074, sizeof(gdt2));
+
+   /* Results when using register operands */
+   msw3 = *(unsigned short *)(test_mem + 2080);
+
+   printf("[INFO]\tResult from SMSW:[0x%04x]\n", msw1);
+   printf("[INFO]\tResult from SIDT: limit[0x%04x]base[0x%08lx]\n",
+  idt1.limit, idt1.base);
+   printf("[INFO]\tResult from SGDT: limit[0x%04x]base[0x%08lx]\n",
+  gdt1.limit, gdt1.base);
+
+   if ((msw1 != msw2) || (msw1 != msw3))
+   printf("[FAIL]\tAll the results of SMSW should be the same.\n");
+   else
+   printf("[PASS]\tAll the results from SMSW are identical.\n");
+
+   if (memcmp(, , sizeof(gdt1)))
+   printf("[FAIL]\tAll the results of SGDT should be the same.\n");
+   else
+   printf("[PASS]\tAll the results from SGDT are 

[PATCH v8 28/28] selftests/x86: Add tests for instruction str and sldt

2017-08-18 Thread Ricardo Neri
The instructions str and sldt are not recognized when running on virtual-
8086 mode and generate an invalid operand exception. These two
instructions are protected by the Intel User-Mode Instruction Prevention
(UMIP) security feature. In protected mode, if UMIP is enabled, these
instructions generate a general protection fault if called from CPL > 0.
Linux traps the general protection fault and emulates the instructions
sgdt, sidt and smsw; but not str and sldt.

These tests are added to verify that the emulation code does not emulate
these two instructions but the expected invalid operand exception is
seen.

Tests fallback to exit with int3 in case emulation does happen.

Cc: Andy Lutomirski 
Cc: Andrew Morton 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Chen Yucong 
Cc: Chris Metcalf 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Huang Rui 
Cc: Jiri Slaby 
Cc: Jonathan Corbet 
Cc: Michael S. Tsirkin 
Cc: Paul Gortmaker 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Shuah Khan 
Cc: Vlastimil Babka 
Signed-off-by: Ricardo Neri 
---
 tools/testing/selftests/x86/entry_from_vm86.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/entry_from_vm86.c 
b/tools/testing/selftests/x86/entry_from_vm86.c
index 130e8ad1db05..b7a0c9024477 100644
--- a/tools/testing/selftests/x86/entry_from_vm86.c
+++ b/tools/testing/selftests/x86/entry_from_vm86.c
@@ -111,6 +111,11 @@ asm (
"smsw %ax\n\t"
"mov %ax, (2080)\n\t"
"int3\n\t"
+   "vmcode_umip_str:\n\t"
+   "str %eax\n\t"
+   "vmcode_umip_sldt:\n\t"
+   "sldt %eax\n\t"
+   "int3\n\t"
".size vmcode, . - vmcode\n\t"
"end_vmcode:\n\t"
".code32\n\t"
@@ -119,7 +124,8 @@ asm (
 
 extern unsigned char vmcode[], end_vmcode[];
 extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[],
-   vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[];
+   vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[],
+   vmcode_umip_str[], vmcode_umip_sldt[];
 
 /* Returns false if the test was skipped. */
 static bool do_test(struct vm86plus_struct *v86, unsigned long eip,
@@ -226,6 +232,16 @@ void do_umip_tests(struct vm86plus_struct *vm86, unsigned 
char *test_mem)
printf("[FAIL]\tAll the results of SIDT should be the same.\n");
else
printf("[PASS]\tAll the results from SIDT are identical.\n");
+
+   sethandler(SIGILL, sighandler, 0);
+   do_test(vm86, vmcode_umip_str - vmcode, VM86_SIGNAL, 0,
+   "STR instruction");
+   clearhandler(SIGILL);
+
+   sethandler(SIGILL, sighandler, 0);
+   do_test(vm86, vmcode_umip_sldt - vmcode, VM86_SIGNAL, 0,
+   "SLDT instruction");
+   clearhandler(SIGILL);
 }
 
 int main(void)
-- 
2.13.0



[PATCH v8 11/28] x86/insn-eval: Add utility function to identify string instructions

2017-08-18 Thread Ricardo Neri
String instructions are special because, in protected mode, the linear
address is always obtained via the ES segment register in operands that
use the (E)DI register; the DS segment register in operands that use
the (E)SI register. Furthermore, segment override prefixes are ignored
when calculating a linear address involving the (E)DI register; segment
override prefixes can be used when calculating linear addresses involving
the (E)SI register.

It follows that linear addresses are calculated differently for the case of
string instructions. The purpose of this utility function is to identify
such instructions for callers to determine a linear address correctly.

Note that this function only identifies string instructions; it does not
determine what segment register to use in the address computation. That is
left to callers. A subsequent commmit introduces a function to determine
the segment register to use given the instruction, operands and
segment override prefixes.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index c6120e9298f5..25b2eb3c64c1 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -16,6 +16,32 @@ enum reg_type {
REG_TYPE_BASE,
 };
 
+/**
+ * is_string_insn() - Determine if instruction is a string instruction
+ * @insn:  Instruction structure containing the opcode
+ *
+ * Return: true if the instruction, determined by the opcode, is any of the
+ * string instructions as defined in the Intel Software Development manual.
+ * False otherwise.
+ */
+static bool is_string_insn(struct insn *insn)
+{
+   insn_get_opcode(insn);
+
+   /* All string instructions have a 1-byte opcode. */
+   if (insn->opcode.nbytes != 1)
+   return false;
+
+   switch (insn->opcode.bytes[0]) {
+   case 0x6c ... 0x6f: /* INS, OUTS */
+   case 0xa4 ... 0xa7: /* MOVS, CMPS */
+   case 0xaa ... 0xaf: /* STOS, LODS, SCAS */
+   return true;
+   default:
+   return false;
+   }
+}
+
 static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
  enum reg_type type)
 {
-- 
2.13.0



[PATCH v8 11/28] x86/insn-eval: Add utility function to identify string instructions

2017-08-18 Thread Ricardo Neri
String instructions are special because, in protected mode, the linear
address is always obtained via the ES segment register in operands that
use the (E)DI register; the DS segment register in operands that use
the (E)SI register. Furthermore, segment override prefixes are ignored
when calculating a linear address involving the (E)DI register; segment
override prefixes can be used when calculating linear addresses involving
the (E)SI register.

It follows that linear addresses are calculated differently for the case of
string instructions. The purpose of this utility function is to identify
such instructions for callers to determine a linear address correctly.

Note that this function only identifies string instructions; it does not
determine what segment register to use in the address computation. That is
left to callers. A subsequent commmit introduces a function to determine
the segment register to use given the instruction, operands and
segment override prefixes.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index c6120e9298f5..25b2eb3c64c1 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -16,6 +16,32 @@ enum reg_type {
REG_TYPE_BASE,
 };
 
+/**
+ * is_string_insn() - Determine if instruction is a string instruction
+ * @insn:  Instruction structure containing the opcode
+ *
+ * Return: true if the instruction, determined by the opcode, is any of the
+ * string instructions as defined in the Intel Software Development manual.
+ * False otherwise.
+ */
+static bool is_string_insn(struct insn *insn)
+{
+   insn_get_opcode(insn);
+
+   /* All string instructions have a 1-byte opcode. */
+   if (insn->opcode.nbytes != 1)
+   return false;
+
+   switch (insn->opcode.bytes[0]) {
+   case 0x6c ... 0x6f: /* INS, OUTS */
+   case 0xa4 ... 0xa7: /* MOVS, CMPS */
+   case 0xaa ... 0xaf: /* STOS, LODS, SCAS */
+   return true;
+   default:
+   return false;
+   }
+}
+
 static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
  enum reg_type type)
 {
-- 
2.13.0



[PATCH v8 08/28] x86/mpx, x86/insn: Relocate insn util functions to a new insn-eval file

2017-08-18 Thread Ricardo Neri
Other kernel submodules can benefit from using the utility functions
defined in mpx.c to obtain the addresses and values of operands contained
in the general purpose registers. An instance of this is the emulation code
used for instructions protected by the Intel User-Mode Instruction
Prevention feature.

Thus, these functions are relocated to a new insn-eval.c file. The reason
to not relocate these utilities into insn.c is that the latter solely
analyses instructions given by a struct insn without any knowledge of the
meaning of the values of instruction operands. This new utility insn-
eval.c aims to be used to resolve and userspace linear addresses based on
the contents of the instruction operands as well as the contents of pt_regs
structure.

These utilities come with a separate header. This is to avoid taking insn.c
out of sync from the instructions decoders under tools/obj and tools/perf.
This also avoids adding cumbersome #ifdef's for the #include'd files
required to decode instructions in a kernel context.

Functions are simply relocated. There are not functional or indentation
changes. The checkpatch script issues the following warning with this
commit:

WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code
rather than BUG() or BUG_ON()
+   BUG();

This warning will be fixed in a subsequent patch.

Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  16 
 arch/x86/lib/Makefile|   2 +-
 arch/x86/lib/insn-eval.c | 163 +++
 arch/x86/mm/mpx.c| 156 +
 4 files changed, 182 insertions(+), 155 deletions(-)
 create mode 100644 arch/x86/include/asm/insn-eval.h
 create mode 100644 arch/x86/lib/insn-eval.c

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
new file mode 100644
index ..5cab1b1da84d
--- /dev/null
+++ b/arch/x86/include/asm/insn-eval.h
@@ -0,0 +1,16 @@
+#ifndef _ASM_X86_INSN_EVAL_H
+#define _ASM_X86_INSN_EVAL_H
+/*
+ * A collection of utility functions for x86 instruction analysis to be
+ * used in a kernel context. Useful when, for instance, making sense
+ * of the registers indicated by operands.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+
+#endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 34a74131a12c..675d7b075fed 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o
 lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
-lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
new file mode 100644
index ..2bb8303ba92f
--- /dev/null
+++ b/arch/x86/lib/insn-eval.c
@@ -0,0 +1,163 @@
+/*
+ * Utility functions for x86 operand and address decoding
+ *
+ * Copyright (C) Intel Corporation 2017
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+enum reg_type {
+   REG_TYPE_RM = 0,
+   REG_TYPE_INDEX,
+   REG_TYPE_BASE,
+};
+
+static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
+ enum reg_type type)
+{
+   int regno = 0;
+
+   static const int regoff[] = {
+   offsetof(struct pt_regs, ax),
+   offsetof(struct pt_regs, cx),
+   offsetof(struct pt_regs, dx),
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, sp),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+   offsetof(struct pt_regs, r8),
+   offsetof(struct pt_regs, r9),
+   offsetof(struct pt_regs, r10),
+   offsetof(struct pt_regs, r11),
+   offsetof(struct pt_regs, r12),
+   offsetof(struct pt_regs, r13),
+   offsetof(struct 

[PATCH v8 08/28] x86/mpx, x86/insn: Relocate insn util functions to a new insn-eval file

2017-08-18 Thread Ricardo Neri
Other kernel submodules can benefit from using the utility functions
defined in mpx.c to obtain the addresses and values of operands contained
in the general purpose registers. An instance of this is the emulation code
used for instructions protected by the Intel User-Mode Instruction
Prevention feature.

Thus, these functions are relocated to a new insn-eval.c file. The reason
to not relocate these utilities into insn.c is that the latter solely
analyses instructions given by a struct insn without any knowledge of the
meaning of the values of instruction operands. This new utility insn-
eval.c aims to be used to resolve and userspace linear addresses based on
the contents of the instruction operands as well as the contents of pt_regs
structure.

These utilities come with a separate header. This is to avoid taking insn.c
out of sync from the instructions decoders under tools/obj and tools/perf.
This also avoids adding cumbersome #ifdef's for the #include'd files
required to decode instructions in a kernel context.

Functions are simply relocated. There are not functional or indentation
changes. The checkpatch script issues the following warning with this
commit:

WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code
rather than BUG() or BUG_ON()
+   BUG();

This warning will be fixed in a subsequent patch.

Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  16 
 arch/x86/lib/Makefile|   2 +-
 arch/x86/lib/insn-eval.c | 163 +++
 arch/x86/mm/mpx.c| 156 +
 4 files changed, 182 insertions(+), 155 deletions(-)
 create mode 100644 arch/x86/include/asm/insn-eval.h
 create mode 100644 arch/x86/lib/insn-eval.c

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
new file mode 100644
index ..5cab1b1da84d
--- /dev/null
+++ b/arch/x86/include/asm/insn-eval.h
@@ -0,0 +1,16 @@
+#ifndef _ASM_X86_INSN_EVAL_H
+#define _ASM_X86_INSN_EVAL_H
+/*
+ * A collection of utility functions for x86 instruction analysis to be
+ * used in a kernel context. Useful when, for instance, making sense
+ * of the registers indicated by operands.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+
+#endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 34a74131a12c..675d7b075fed 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o
 lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
-lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
new file mode 100644
index ..2bb8303ba92f
--- /dev/null
+++ b/arch/x86/lib/insn-eval.c
@@ -0,0 +1,163 @@
+/*
+ * Utility functions for x86 operand and address decoding
+ *
+ * Copyright (C) Intel Corporation 2017
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+enum reg_type {
+   REG_TYPE_RM = 0,
+   REG_TYPE_INDEX,
+   REG_TYPE_BASE,
+};
+
+static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
+ enum reg_type type)
+{
+   int regno = 0;
+
+   static const int regoff[] = {
+   offsetof(struct pt_regs, ax),
+   offsetof(struct pt_regs, cx),
+   offsetof(struct pt_regs, dx),
+   offsetof(struct pt_regs, bx),
+   offsetof(struct pt_regs, sp),
+   offsetof(struct pt_regs, bp),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+   offsetof(struct pt_regs, r8),
+   offsetof(struct pt_regs, r9),
+   offsetof(struct pt_regs, r10),
+   offsetof(struct pt_regs, r11),
+   offsetof(struct pt_regs, r12),
+   offsetof(struct pt_regs, r13),
+   offsetof(struct pt_regs, r14),
+   offsetof(struct pt_regs, r15),
+#endif
+   };
+   int nr_registers = ARRAY_SIZE(regoff);
+   /*
+* Don't possibly decode a 32-bit instructions as
+* reading a 64-bit-only register.
+*/
+   if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64)
+   nr_registers -= 8;
+
+   switch (type) {
+

[PATCH v8 18/28] x86/insn-eval: Add support to resolve 32-bit address encodings

2017-08-18 Thread Ricardo Neri
32-bit and 64-bit address encodings are identical. Thus, the same logic
could be used to resolve the effective address. However, there are two key
differences: address size and enforcement of segment limits.

If running a 32-bit process on a 64-bit kernel, it is best to perform
the address calculation using 32-bit data types. In this manner hardware
is used for the arithmetic, including handling of signs and overflows.

32-bit addresses are generally used in protected mode; segment limits are
enforced in this mode. This implementation obtains the limit of the
segment associated with the instruction operands and prefixes. If the
computed address is outside the segment limits, an error is returned. It
is also possible to use 32-bit address in long mode and virtual-8086 mode
by using an address override prefix. In such cases, segment limits are not
enforced.

The new function get_addr_ref_32() is almost identical to the existing
function insn_get_addr_ref() (used for 64-bit addresses); except for the
differences mentioned above. For the sake of simplicity and readability,
it is better to use two separate functions.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 147 +++
 1 file changed, 147 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 8ae110a273de..6730c9ba02c5 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -665,6 +665,153 @@ int insn_get_modrm_rm_off(struct insn *insn, struct 
pt_regs *regs)
return get_reg_offset(insn, regs, REG_TYPE_RM);
 }
 
+/**
+ * get_addr_ref_32() - Obtain a 32-bit linear address
+ * @insn:  Instruction struct with ModRM and SIB bytes and displacement
+ * @regs:  Structure with register values as seen when entering kernel mode
+ *
+ * This function is to be used with 32-bit address encodings to obtain the
+ * linear memory address referred by the instruction's ModRM, SIB,
+ * displacement bytes and segment base address, as applicable. If in protected
+ * mode, segment limits are enforced.
+ *
+ * Return: linear address referenced by instruction and registers on success.
+ * -1L on error.
+ */
+static void __user *get_addr_ref_32(struct insn *insn, struct pt_regs *regs)
+{
+   int eff_addr, base, indx, addr_offset, base_offset, indx_offset;
+   unsigned long linear_addr, seg_base_addr, seg_limit, tmp;
+   insn_byte_t sib;
+
+   insn_get_modrm(insn);
+   insn_get_sib(insn);
+   sib = insn->sib.value;
+
+   if (insn->addr_bytes != 4)
+   goto out_err;
+
+   if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+   addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
+   if (addr_offset < 0)
+   goto out_err;
+
+   tmp = regs_get_register(regs, addr_offset);
+   /* The 4 most significant bytes must be zero. */
+   if (tmp & ~0xL)
+   goto out_err;
+
+   eff_addr = (int)(tmp & 0x);
+
+   seg_base_addr = insn_get_seg_base(regs, insn, addr_offset);
+   if (seg_base_addr == -1L)
+   goto out_err;
+
+   seg_limit = get_seg_limit(regs, insn, addr_offset);
+   } else {
+   if (insn->sib.nbytes) {
+   /*
+* Negative values in the base and index offset means
+* an error when decoding the SIB byte. Except -EDOM,
+* which means that the registers should not be used
+* in the address computation.
+*/
+   base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
+   if (base_offset == -EDOM) {
+   base = 0;
+   } else if (base_offset < 0) {
+   goto out_err;
+   } else {
+   tmp = regs_get_register(regs, base_offset);
+   /* The 4 most significant bytes must be zero. */
+   if (tmp & ~0xL)
+   goto out_err;
+
+   base = (int)(tmp & 0x);
+ 

[PATCH v8 18/28] x86/insn-eval: Add support to resolve 32-bit address encodings

2017-08-18 Thread Ricardo Neri
32-bit and 64-bit address encodings are identical. Thus, the same logic
could be used to resolve the effective address. However, there are two key
differences: address size and enforcement of segment limits.

If running a 32-bit process on a 64-bit kernel, it is best to perform
the address calculation using 32-bit data types. In this manner hardware
is used for the arithmetic, including handling of signs and overflows.

32-bit addresses are generally used in protected mode; segment limits are
enforced in this mode. This implementation obtains the limit of the
segment associated with the instruction operands and prefixes. If the
computed address is outside the segment limits, an error is returned. It
is also possible to use 32-bit address in long mode and virtual-8086 mode
by using an address override prefix. In such cases, segment limits are not
enforced.

The new function get_addr_ref_32() is almost identical to the existing
function insn_get_addr_ref() (used for 64-bit addresses); except for the
differences mentioned above. For the sake of simplicity and readability,
it is better to use two separate functions.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 147 +++
 1 file changed, 147 insertions(+)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 8ae110a273de..6730c9ba02c5 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -665,6 +665,153 @@ int insn_get_modrm_rm_off(struct insn *insn, struct 
pt_regs *regs)
return get_reg_offset(insn, regs, REG_TYPE_RM);
 }
 
+/**
+ * get_addr_ref_32() - Obtain a 32-bit linear address
+ * @insn:  Instruction struct with ModRM and SIB bytes and displacement
+ * @regs:  Structure with register values as seen when entering kernel mode
+ *
+ * This function is to be used with 32-bit address encodings to obtain the
+ * linear memory address referred by the instruction's ModRM, SIB,
+ * displacement bytes and segment base address, as applicable. If in protected
+ * mode, segment limits are enforced.
+ *
+ * Return: linear address referenced by instruction and registers on success.
+ * -1L on error.
+ */
+static void __user *get_addr_ref_32(struct insn *insn, struct pt_regs *regs)
+{
+   int eff_addr, base, indx, addr_offset, base_offset, indx_offset;
+   unsigned long linear_addr, seg_base_addr, seg_limit, tmp;
+   insn_byte_t sib;
+
+   insn_get_modrm(insn);
+   insn_get_sib(insn);
+   sib = insn->sib.value;
+
+   if (insn->addr_bytes != 4)
+   goto out_err;
+
+   if (X86_MODRM_MOD(insn->modrm.value) == 3) {
+   addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
+   if (addr_offset < 0)
+   goto out_err;
+
+   tmp = regs_get_register(regs, addr_offset);
+   /* The 4 most significant bytes must be zero. */
+   if (tmp & ~0xL)
+   goto out_err;
+
+   eff_addr = (int)(tmp & 0x);
+
+   seg_base_addr = insn_get_seg_base(regs, insn, addr_offset);
+   if (seg_base_addr == -1L)
+   goto out_err;
+
+   seg_limit = get_seg_limit(regs, insn, addr_offset);
+   } else {
+   if (insn->sib.nbytes) {
+   /*
+* Negative values in the base and index offset means
+* an error when decoding the SIB byte. Except -EDOM,
+* which means that the registers should not be used
+* in the address computation.
+*/
+   base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE);
+   if (base_offset == -EDOM) {
+   base = 0;
+   } else if (base_offset < 0) {
+   goto out_err;
+   } else {
+   tmp = regs_get_register(regs, base_offset);
+   /* The 4 most significant bytes must be zero. */
+   if (tmp & ~0xL)
+   goto out_err;
+
+   base = (int)(tmp & 0x);
+   }
+
+   indx_offset = get_reg_offset(insn, regs, 
REG_TYPE_INDEX);
+   if (indx_offset == -EDOM) {
+   indx = 0;
+   } else if (indx_offset < 0) {
+   goto out_err;
+   } else {
+   tmp 

[PATCH v8 17/28] x86/insn-eval: Incorporate segment base in linear address computation

2017-08-18 Thread Ricardo Neri
insn_get_addr_ref() returns the effective address as defined by the
section 3.7.5.1 Vol 1 of the Intel 64 and IA-32 Architectures Software
Developer's Manual. In order to compute the linear address, we must add
to the effective address the segment base address as set in the segment
descriptor. The segment descriptor to use depends on the register used as
operand and segment override prefixes, if any.

In most cases, the segment base address will be 0 if the USER_DS/USER32_DS
segment is used or if segmentation is not used. However, the base address
is not necessarily zero if a user programs defines its own segments. This
is possible by using a local descriptor table.

Since the effective address is a signed quantity, the unsigned segment
base address is saved in a separate variable and added to the final,
unsigned, effective address.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 04f696c3793e..8ae110a273de 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -673,7 +673,7 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs 
*regs)
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 {
int addr_offset, base_offset, indx_offset;
-   unsigned long linear_addr;
+   unsigned long linear_addr, seg_base_addr;
long eff_addr, base, indx;
insn_byte_t sib;
 
@@ -687,6 +687,10 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
goto out_err;
 
eff_addr = regs_get_register(regs, addr_offset);
+
+   seg_base_addr = insn_get_seg_base(regs, insn, addr_offset);
+   if (seg_base_addr == -1L)
+   goto out_err;
} else {
if (insn->sib.nbytes) {
/*
@@ -712,6 +716,11 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
indx = regs_get_register(regs, indx_offset);
 
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
+
+   seg_base_addr = insn_get_seg_base(regs, insn,
+ base_offset);
+   if (seg_base_addr == -1L)
+   goto out_err;
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
 
@@ -730,12 +739,17 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
} else {
eff_addr = regs_get_register(regs, addr_offset);
}
+
+   seg_base_addr = insn_get_seg_base(regs, insn,
+ addr_offset);
+   if (seg_base_addr == -1L)
+   goto out_err;
}
 
eff_addr += insn->displacement.value;
}
 
-   linear_addr = (unsigned long)eff_addr;
+   linear_addr = (unsigned long)eff_addr + seg_base_addr;
 
return (void __user *)linear_addr;
 out_err:
-- 
2.13.0



[PATCH v8 17/28] x86/insn-eval: Incorporate segment base in linear address computation

2017-08-18 Thread Ricardo Neri
insn_get_addr_ref() returns the effective address as defined by the
section 3.7.5.1 Vol 1 of the Intel 64 and IA-32 Architectures Software
Developer's Manual. In order to compute the linear address, we must add
to the effective address the segment base address as set in the segment
descriptor. The segment descriptor to use depends on the register used as
operand and segment override prefixes, if any.

In most cases, the segment base address will be 0 if the USER_DS/USER32_DS
segment is used or if segmentation is not used. However, the base address
is not necessarily zero if a user programs defines its own segments. This
is possible by using a local descriptor table.

Since the effective address is a signed quantity, the unsigned segment
base address is saved in a separate variable and added to the final,
unsigned, effective address.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/lib/insn-eval.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 04f696c3793e..8ae110a273de 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -673,7 +673,7 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs 
*regs)
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 {
int addr_offset, base_offset, indx_offset;
-   unsigned long linear_addr;
+   unsigned long linear_addr, seg_base_addr;
long eff_addr, base, indx;
insn_byte_t sib;
 
@@ -687,6 +687,10 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
goto out_err;
 
eff_addr = regs_get_register(regs, addr_offset);
+
+   seg_base_addr = insn_get_seg_base(regs, insn, addr_offset);
+   if (seg_base_addr == -1L)
+   goto out_err;
} else {
if (insn->sib.nbytes) {
/*
@@ -712,6 +716,11 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
indx = regs_get_register(regs, indx_offset);
 
eff_addr = base + indx * (1 << X86_SIB_SCALE(sib));
+
+   seg_base_addr = insn_get_seg_base(regs, insn,
+ base_offset);
+   if (seg_base_addr == -1L)
+   goto out_err;
} else {
addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM);
 
@@ -730,12 +739,17 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
} else {
eff_addr = regs_get_register(regs, addr_offset);
}
+
+   seg_base_addr = insn_get_seg_base(regs, insn,
+ addr_offset);
+   if (seg_base_addr == -1L)
+   goto out_err;
}
 
eff_addr += insn->displacement.value;
}
 
-   linear_addr = (unsigned long)eff_addr;
+   linear_addr = (unsigned long)eff_addr + seg_base_addr;
 
return (void __user *)linear_addr;
 out_err:
-- 
2.13.0



[PATCH v8 10/28] x86/insn-eval: Add a utility function to get register offsets

2017-08-18 Thread Ricardo Neri
The function get_reg_offset() returns the offset to the register the
argument specifies as indicated in an enumeration of type offset. Callers
of this function would need the definition of such enumeration. This is
not needed. Instead, add helper functions for this purpose. These functions
are useful in cases when, for instance, the caller needs to decide whether
the operand is a register or a memory location by looking at the rm part
of the ModRM byte. As of now, this is the only helper function that is
needed.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  1 +
 arch/x86/lib/insn-eval.c | 15 +++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 5cab1b1da84d..7e8c9633a377 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -12,5 +12,6 @@
 #include 
 
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 3919458fecbf..c6120e9298f5 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -97,6 +97,21 @@ static int get_reg_offset(struct insn *insn, struct pt_regs 
*regs,
return regoff[regno];
 }
 
+/**
+ * insn_get_modrm_rm_off() - Obtain register in r/m part of ModRM byte
+ * @insn:  Instruction structure containing the ModRM byte
+ * @regs:  Structure with register values as seen when entering kernel mode
+ *
+ * Return: The register indicated by the r/m part of the ModRM byte. The
+ * register is obtained as an offset from the base of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of ModRM does not refer to a register and shall be ignored.
+ */
+int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs)
+{
+   return get_reg_offset(insn, regs, REG_TYPE_RM);
+}
+
 /*
  * return the address being referenced be instruction
  * for rm=3 returning the content of the rm reg
-- 
2.13.0



[PATCH v8 10/28] x86/insn-eval: Add a utility function to get register offsets

2017-08-18 Thread Ricardo Neri
The function get_reg_offset() returns the offset to the register the
argument specifies as indicated in an enumeration of type offset. Callers
of this function would need the definition of such enumeration. This is
not needed. Instead, add helper functions for this purpose. These functions
are useful in cases when, for instance, the caller needs to decide whether
the operand is a register or a memory location by looking at the rm part
of the ModRM byte. As of now, this is the only helper function that is
needed.

Cc: Dave Hansen 
Cc: Adam Buchbinder 
Cc: Colin Ian King 
Cc: Lorenzo Stoakes 
Cc: Qiaowei Ren 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: Kees Cook 
Cc: Thomas Garnier 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Dmitry Vyukov 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/insn-eval.h |  1 +
 arch/x86/lib/insn-eval.c | 15 +++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 5cab1b1da84d..7e8c9633a377 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -12,5 +12,6 @@
 #include 
 
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
+int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 3919458fecbf..c6120e9298f5 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -97,6 +97,21 @@ static int get_reg_offset(struct insn *insn, struct pt_regs 
*regs,
return regoff[regno];
 }
 
+/**
+ * insn_get_modrm_rm_off() - Obtain register in r/m part of ModRM byte
+ * @insn:  Instruction structure containing the ModRM byte
+ * @regs:  Structure with register values as seen when entering kernel mode
+ *
+ * Return: The register indicated by the r/m part of the ModRM byte. The
+ * register is obtained as an offset from the base of pt_regs. In specific
+ * cases, the returned value can be -EDOM to indicate that the particular value
+ * of ModRM does not refer to a register and shall be ignored.
+ */
+int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs)
+{
+   return get_reg_offset(insn, regs, REG_TYPE_RM);
+}
+
 /*
  * return the address being referenced be instruction
  * for rm=3 returning the content of the rm reg
-- 
2.13.0



Re: [PATCH] iio: accel: mma8452: Bugfix to enbale and allow different events to work parallely.

2017-08-18 Thread Harinath Nampally

This patch fixes by detaching the event related information from
chip_info struct,

and based on channel type and event direction the corresponding

event configuration registers

are picked dynamically. Hence multiple events can be handled in

read/write callbacks.

which chip can have which event(s)?

I am planning to add 'supported events' field in
One small point. Don't put the word bugfix in the title (and fix
spelling of enable!).  I know this is obviously a false restriction
on the driver, but it doesn't not work, it is just limited in features
without this.

Sure, thanks for letting me know.


On 08/17/2017 10:40 AM, Jonathan Cameron wrote:

This patch fixes by detaching the event related information from
chip_info struct,

and based on channel type and event direction the corresponding

event configuration registers

are picked dynamically. Hence multiple events can be handled in

read/write callbacks.

which chip can have which event(s)?

I am planning to add 'supported events' field in

One small point. Don't put the word bugfix in the title (and fix
spelling of enable!).  I know this is obviously a false restriction
on the driver, but it doesn't not work, it is just limited in features
without this.




Re: [PATCH] iio: accel: mma8452: Bugfix to enbale and allow different events to work parallely.

2017-08-18 Thread Harinath Nampally

This patch fixes by detaching the event related information from
chip_info struct,

and based on channel type and event direction the corresponding

event configuration registers

are picked dynamically. Hence multiple events can be handled in

read/write callbacks.

which chip can have which event(s)?

I am planning to add 'supported events' field in
One small point. Don't put the word bugfix in the title (and fix
spelling of enable!).  I know this is obviously a false restriction
on the driver, but it doesn't not work, it is just limited in features
without this.

Sure, thanks for letting me know.


On 08/17/2017 10:40 AM, Jonathan Cameron wrote:

This patch fixes by detaching the event related information from
chip_info struct,

and based on channel type and event direction the corresponding

event configuration registers

are picked dynamically. Hence multiple events can be handled in

read/write callbacks.

which chip can have which event(s)?

I am planning to add 'supported events' field in

One small point. Don't put the word bugfix in the title (and fix
spelling of enable!).  I know this is obviously a false restriction
on the driver, but it doesn't not work, it is just limited in features
without this.




  1   2   3   4   5   6   7   8   9   10   >