Re: 32-bit powerpc, aty128fb: vmap allocation for size 135168 failed
> Meelis Rooswrites: > > > I was trying 4.13.0-rc5-00075-gac9a40905a61 on my PowerMac G4 with 1G > > RAM and after some time of sddm respawning and X trying to restart, > > dmesg is full of messages about vmap allocation failures. > > Did it just start happening? ie. did rc4 work? It goes back to at least 4.0 - that's the oldest kernel I had laying around precompiled. The messages about ROM signature changed somewehere between 4.0 and 4.7 (4.7 is already like 4.13) but after some time, the same vmalloc errors appear. Maybe the userspace has changed with more respawning that brings the problem out. I tried to read the code but I do not understand it yet. The warning seems to come from generic pci_map_rom() checking ROM size, and returning rom pointer to aty128fb (it returns resource size too but that is ignored). aty128fb starts to look at the x86 PCI ROM signature again but does not tell that the signature is missing. How come? -- Meelis Roos (mr...@linux.ee)
Re: 32-bit powerpc, aty128fb: vmap allocation for size 135168 failed
> Meelis Roos writes: > > > I was trying 4.13.0-rc5-00075-gac9a40905a61 on my PowerMac G4 with 1G > > RAM and after some time of sddm respawning and X trying to restart, > > dmesg is full of messages about vmap allocation failures. > > Did it just start happening? ie. did rc4 work? It goes back to at least 4.0 - that's the oldest kernel I had laying around precompiled. The messages about ROM signature changed somewehere between 4.0 and 4.7 (4.7 is already like 4.13) but after some time, the same vmalloc errors appear. Maybe the userspace has changed with more respawning that brings the problem out. I tried to read the code but I do not understand it yet. The warning seems to come from generic pci_map_rom() checking ROM size, and returning rom pointer to aty128fb (it returns resource size too but that is ignored). aty128fb starts to look at the x86 PCI ROM signature again but does not tell that the signature is missing. How come? -- Meelis Roos (mr...@linux.ee)
Re: [PATCH] X.509: Fix the buffer overflow in the utility function for OID string
On Sat, 19 Aug 2017 06:19:44 +0200, Lee, Chun-Yi wrote: > > From: Takashi Iwai> > The sprint_oid() utility function doesn't properly check the buffer > size that it causes that the warning in vsnprintf() be triggered. > For example on v4.1 kernel: > > [ 49.612536] [ cut here ] > [ 49.612543] WARNING: CPU: 0 PID: 2357 at lib/vsprintf.c:1867 > vsnprintf+0x5a7/0x5c0() > ... > > We can trigger this issue by injecting maliciously crafted x509 cert > in DER format. Just using hex editor to change the length of OID to > over the length of the SEQUENCE container. For example: > > 0:d=0 hl=4 l= 980 cons: SEQUENCE > 4:d=1 hl=4 l= 700 cons: SEQUENCE > 8:d=2 hl=2 l= 3 cons: cont [ 0 ] >10:d=3 hl=2 l= 1 prim:INTEGER :02 >13:d=2 hl=2 l= 9 prim: INTEGER :9B47FAF791E7D1E3 >24:d=2 hl=2 l= 13 cons: SEQUENCE >26:d=3 hl=2 l= 9 prim:OBJECT:sha256WithRSAEncryption >37:d=3 hl=2 l= 0 prim:NULL >39:d=2 hl=2 l= 121 cons: SEQUENCE >41:d=3 hl=2 l= 22 cons:SET >43:d=4 hl=2 l= 20 cons: SEQUENCE <=== the SEQ length is 20 >45:d=5 hl=2 l= 3 prim: OBJECT:organizationName > <=== the original length is 3, change the length of OID to over the > length of SEQUENCE > > Pawel Wieczorkiewicz reported this problem and Takashi Iwai provided > patch to fix it by checking the bufsize in sprint_oid(). > > From: Takashi Iwai > Reported-by: Pawel Wieczorkiewicz > Cc: David Howells > Cc: Rusty Russell > Cc: Takashi Iwai > Cc: Pawel Wieczorkiewicz > Signed-off-by: "Lee, Chun-Yi" I seem to have forgotten to give my sign-off: Signed-off-by: Takashi Iwai thanks, Takashi > --- > lib/oid_registry.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/lib/oid_registry.c b/lib/oid_registry.c > index 318f382..41b9e50 100644 > --- a/lib/oid_registry.c > +++ b/lib/oid_registry.c > @@ -142,9 +142,9 @@ int sprint_oid(const void *data, size_t datasize, char > *buffer, size_t bufsize) > } > ret += count = snprintf(buffer, bufsize, ".%lu", num); > buffer += count; > - bufsize -= count; > - if (bufsize == 0) > + if (bufsize <= count) > return -ENOBUFS; > + bufsize -= count; > } > > return ret; > -- > 2.10.2 >
Re: [PATCH] X.509: Fix the buffer overflow in the utility function for OID string
On Sat, 19 Aug 2017 06:19:44 +0200, Lee, Chun-Yi wrote: > > From: Takashi Iwai > > The sprint_oid() utility function doesn't properly check the buffer > size that it causes that the warning in vsnprintf() be triggered. > For example on v4.1 kernel: > > [ 49.612536] [ cut here ] > [ 49.612543] WARNING: CPU: 0 PID: 2357 at lib/vsprintf.c:1867 > vsnprintf+0x5a7/0x5c0() > ... > > We can trigger this issue by injecting maliciously crafted x509 cert > in DER format. Just using hex editor to change the length of OID to > over the length of the SEQUENCE container. For example: > > 0:d=0 hl=4 l= 980 cons: SEQUENCE > 4:d=1 hl=4 l= 700 cons: SEQUENCE > 8:d=2 hl=2 l= 3 cons: cont [ 0 ] >10:d=3 hl=2 l= 1 prim:INTEGER :02 >13:d=2 hl=2 l= 9 prim: INTEGER :9B47FAF791E7D1E3 >24:d=2 hl=2 l= 13 cons: SEQUENCE >26:d=3 hl=2 l= 9 prim:OBJECT:sha256WithRSAEncryption >37:d=3 hl=2 l= 0 prim:NULL >39:d=2 hl=2 l= 121 cons: SEQUENCE >41:d=3 hl=2 l= 22 cons:SET >43:d=4 hl=2 l= 20 cons: SEQUENCE <=== the SEQ length is 20 >45:d=5 hl=2 l= 3 prim: OBJECT:organizationName > <=== the original length is 3, change the length of OID to over the > length of SEQUENCE > > Pawel Wieczorkiewicz reported this problem and Takashi Iwai provided > patch to fix it by checking the bufsize in sprint_oid(). > > From: Takashi Iwai > Reported-by: Pawel Wieczorkiewicz > Cc: David Howells > Cc: Rusty Russell > Cc: Takashi Iwai > Cc: Pawel Wieczorkiewicz > Signed-off-by: "Lee, Chun-Yi" I seem to have forgotten to give my sign-off: Signed-off-by: Takashi Iwai thanks, Takashi > --- > lib/oid_registry.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/lib/oid_registry.c b/lib/oid_registry.c > index 318f382..41b9e50 100644 > --- a/lib/oid_registry.c > +++ b/lib/oid_registry.c > @@ -142,9 +142,9 @@ int sprint_oid(const void *data, size_t datasize, char > *buffer, size_t bufsize) > } > ret += count = snprintf(buffer, bufsize, ".%lu", num); > buffer += count; > - bufsize -= count; > - if (bufsize == 0) > + if (bufsize <= count) > return -ENOBUFS; > + bufsize -= count; > } > > return ret; > -- > 2.10.2 >
Re: [PATCH v2] sched/fair: Make PELT signal more accurate
On Fri, 2017-08-18 at 16:50 -0700, Joel Fernandes wrote: > The PELT signal (sa->load_avg and sa->util_avg) are not updated if the amount > accumulated during a single update doesn't cross a period boundary. This is > fine in cases where the amount accrued is much smaller than the size of a > single PELT window (1ms) however if the amount accrued is high then the > relative error (calculated against what the actual signal would be had we > updated the averages) can be high - as much 2% in my testing. On plotting > signals, I found that there are errors especially high when we update just > before the period boundary is hit. These errors can be significantly reduced > if > we update the averages more often. > > Inorder to fix this, this patch does the average update by also checking how > much time has elapsed since the last update and update the averages if it has > been long enough (as a threshold I chose 512us). Ok, I gotta ask: In order to fix what? What exactly does the small but existent overhead increase buy us other than an ever so slightly different chart? What is your motivation to care about a microscopic change in signal shape? -Mike
Re: [PATCH v2] sched/fair: Make PELT signal more accurate
On Fri, 2017-08-18 at 16:50 -0700, Joel Fernandes wrote: > The PELT signal (sa->load_avg and sa->util_avg) are not updated if the amount > accumulated during a single update doesn't cross a period boundary. This is > fine in cases where the amount accrued is much smaller than the size of a > single PELT window (1ms) however if the amount accrued is high then the > relative error (calculated against what the actual signal would be had we > updated the averages) can be high - as much 2% in my testing. On plotting > signals, I found that there are errors especially high when we update just > before the period boundary is hit. These errors can be significantly reduced > if > we update the averages more often. > > Inorder to fix this, this patch does the average update by also checking how > much time has elapsed since the last update and update the averages if it has > been long enough (as a threshold I chose 512us). Ok, I gotta ask: In order to fix what? What exactly does the small but existent overhead increase buy us other than an ever so slightly different chart? What is your motivation to care about a microscopic change in signal shape? -Mike
[PATCH v2] membarrier: Document scheduler barrier requirements
Document the membarrier requirement on having a full memory barrier in __schedule() after coming from user-space, before storing to rq->curr. It is provided by smp_mb__before_spinlock() in __schedule(). Document that membarrier requires a full barrier on transition from kernel thread to userspace thread, which skips the call to switch_mm(). We currently have an implicit barrier from atomic_dec_and_test() in mmdrop() that ensures this. The x86 switch_mm_irqs_off() full barrier is currently provided by many cpumask update operations as well as load_cr3(). Document that load_cr3() is providing this barrier. [ Rebased on top of linux-rcu for-mingo branch. Applies on top of "membarrier: Provide expedited private command". ] Signed-off-by: Mathieu DesnoyersCC: Peter Zijlstra CC: Paul E. McKenney CC: Boqun Feng CC: Andrew Hunter CC: Maged Michael CC: gro...@google.com CC: Avi Kivity CC: Benjamin Herrenschmidt CC: Paul Mackerras CC: Michael Ellerman CC: Dave Watson --- arch/x86/mm/tlb.c| 3 +++ include/linux/sched/mm.h | 4 kernel/sched/core.c | 9 + 3 files changed, 16 insertions(+) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 014d07a..cd815b6 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -133,6 +133,9 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, * and neither LOCK nor MFENCE orders them. * Fortunately, load_cr3() is serializing and gives the * ordering guarantee we need. +* +* This full barrier is also required by the membarrier +* system call. */ load_cr3(next->pgd); diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 2b24a69..fe29d06 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -38,6 +38,10 @@ static inline void mmgrab(struct mm_struct *mm) extern void __mmdrop(struct mm_struct *); static inline void mmdrop(struct mm_struct *mm) { + /* +* The implicit full barrier implied by atomic_dec_and_test is +* required by the membarrier system call. +*/ if (unlikely(atomic_dec_and_test(>mm_count))) __mmdrop(mm); } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3f29c6a..b0f199f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2654,6 +2654,12 @@ static struct rq *finish_task_switch(struct task_struct *prev) finish_arch_post_lock_switch(); fire_sched_in_preempt_notifiers(current); + /* +* When transitioning from a kernel thread to a userspace +* thread, mmdrop()'s implicit full barrier is required by the +* membarrier system call, because the current active_mm can +* become the current mm without going through switch_mm(). +*/ if (mm) mmdrop(mm); if (unlikely(prev_state == TASK_DEAD)) { @@ -3295,6 +3301,9 @@ static void __sched notrace __schedule(bool preempt) * Make sure that signal_pending_state()->signal_pending() below * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE) * done by the caller to avoid the race with signal_wake_up(). +* +* The membarrier system call requires a full memory barrier +* after coming from user-space, before storing to rq->curr. */ smp_mb__before_spinlock(); rq_lock(rq, ); -- 1.9.1
[PATCH v2] membarrier: Document scheduler barrier requirements
Document the membarrier requirement on having a full memory barrier in __schedule() after coming from user-space, before storing to rq->curr. It is provided by smp_mb__before_spinlock() in __schedule(). Document that membarrier requires a full barrier on transition from kernel thread to userspace thread, which skips the call to switch_mm(). We currently have an implicit barrier from atomic_dec_and_test() in mmdrop() that ensures this. The x86 switch_mm_irqs_off() full barrier is currently provided by many cpumask update operations as well as load_cr3(). Document that load_cr3() is providing this barrier. [ Rebased on top of linux-rcu for-mingo branch. Applies on top of "membarrier: Provide expedited private command". ] Signed-off-by: Mathieu Desnoyers CC: Peter Zijlstra CC: Paul E. McKenney CC: Boqun Feng CC: Andrew Hunter CC: Maged Michael CC: gro...@google.com CC: Avi Kivity CC: Benjamin Herrenschmidt CC: Paul Mackerras CC: Michael Ellerman CC: Dave Watson --- arch/x86/mm/tlb.c| 3 +++ include/linux/sched/mm.h | 4 kernel/sched/core.c | 9 + 3 files changed, 16 insertions(+) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 014d07a..cd815b6 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -133,6 +133,9 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, * and neither LOCK nor MFENCE orders them. * Fortunately, load_cr3() is serializing and gives the * ordering guarantee we need. +* +* This full barrier is also required by the membarrier +* system call. */ load_cr3(next->pgd); diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 2b24a69..fe29d06 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -38,6 +38,10 @@ static inline void mmgrab(struct mm_struct *mm) extern void __mmdrop(struct mm_struct *); static inline void mmdrop(struct mm_struct *mm) { + /* +* The implicit full barrier implied by atomic_dec_and_test is +* required by the membarrier system call. +*/ if (unlikely(atomic_dec_and_test(>mm_count))) __mmdrop(mm); } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3f29c6a..b0f199f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2654,6 +2654,12 @@ static struct rq *finish_task_switch(struct task_struct *prev) finish_arch_post_lock_switch(); fire_sched_in_preempt_notifiers(current); + /* +* When transitioning from a kernel thread to a userspace +* thread, mmdrop()'s implicit full barrier is required by the +* membarrier system call, because the current active_mm can +* become the current mm without going through switch_mm(). +*/ if (mm) mmdrop(mm); if (unlikely(prev_state == TASK_DEAD)) { @@ -3295,6 +3301,9 @@ static void __sched notrace __schedule(bool preempt) * Make sure that signal_pending_state()->signal_pending() below * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE) * done by the caller to avoid the race with signal_wake_up(). +* +* The membarrier system call requires a full memory barrier +* after coming from user-space, before storing to rq->curr. */ smp_mb__before_spinlock(); rq_lock(rq, ); -- 1.9.1
[PATCH] X.509: Fix the buffer overflow in the utility function for OID string
From: Takashi IwaiThe sprint_oid() utility function doesn't properly check the buffer size that it causes that the warning in vsnprintf() be triggered. For example on v4.1 kernel: [ 49.612536] [ cut here ] [ 49.612543] WARNING: CPU: 0 PID: 2357 at lib/vsprintf.c:1867 vsnprintf+0x5a7/0x5c0() ... We can trigger this issue by injecting maliciously crafted x509 cert in DER format. Just using hex editor to change the length of OID to over the length of the SEQUENCE container. For example: 0:d=0 hl=4 l= 980 cons: SEQUENCE 4:d=1 hl=4 l= 700 cons: SEQUENCE 8:d=2 hl=2 l= 3 cons: cont [ 0 ] 10:d=3 hl=2 l= 1 prim:INTEGER :02 13:d=2 hl=2 l= 9 prim: INTEGER :9B47FAF791E7D1E3 24:d=2 hl=2 l= 13 cons: SEQUENCE 26:d=3 hl=2 l= 9 prim:OBJECT:sha256WithRSAEncryption 37:d=3 hl=2 l= 0 prim:NULL 39:d=2 hl=2 l= 121 cons: SEQUENCE 41:d=3 hl=2 l= 22 cons:SET 43:d=4 hl=2 l= 20 cons: SEQUENCE <=== the SEQ length is 20 45:d=5 hl=2 l= 3 prim: OBJECT:organizationName <=== the original length is 3, change the length of OID to over the length of SEQUENCE Pawel Wieczorkiewicz reported this problem and Takashi Iwai provided patch to fix it by checking the bufsize in sprint_oid(). From: Takashi Iwai Reported-by: Pawel Wieczorkiewicz Cc: David Howells Cc: Rusty Russell Cc: Takashi Iwai Cc: Pawel Wieczorkiewicz Signed-off-by: "Lee, Chun-Yi" --- lib/oid_registry.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/oid_registry.c b/lib/oid_registry.c index 318f382..41b9e50 100644 --- a/lib/oid_registry.c +++ b/lib/oid_registry.c @@ -142,9 +142,9 @@ int sprint_oid(const void *data, size_t datasize, char *buffer, size_t bufsize) } ret += count = snprintf(buffer, bufsize, ".%lu", num); buffer += count; - bufsize -= count; - if (bufsize == 0) + if (bufsize <= count) return -ENOBUFS; + bufsize -= count; } return ret; -- 2.10.2
[PATCH] X.509: Fix the buffer overflow in the utility function for OID string
From: Takashi Iwai The sprint_oid() utility function doesn't properly check the buffer size that it causes that the warning in vsnprintf() be triggered. For example on v4.1 kernel: [ 49.612536] [ cut here ] [ 49.612543] WARNING: CPU: 0 PID: 2357 at lib/vsprintf.c:1867 vsnprintf+0x5a7/0x5c0() ... We can trigger this issue by injecting maliciously crafted x509 cert in DER format. Just using hex editor to change the length of OID to over the length of the SEQUENCE container. For example: 0:d=0 hl=4 l= 980 cons: SEQUENCE 4:d=1 hl=4 l= 700 cons: SEQUENCE 8:d=2 hl=2 l= 3 cons: cont [ 0 ] 10:d=3 hl=2 l= 1 prim:INTEGER :02 13:d=2 hl=2 l= 9 prim: INTEGER :9B47FAF791E7D1E3 24:d=2 hl=2 l= 13 cons: SEQUENCE 26:d=3 hl=2 l= 9 prim:OBJECT:sha256WithRSAEncryption 37:d=3 hl=2 l= 0 prim:NULL 39:d=2 hl=2 l= 121 cons: SEQUENCE 41:d=3 hl=2 l= 22 cons:SET 43:d=4 hl=2 l= 20 cons: SEQUENCE <=== the SEQ length is 20 45:d=5 hl=2 l= 3 prim: OBJECT:organizationName <=== the original length is 3, change the length of OID to over the length of SEQUENCE Pawel Wieczorkiewicz reported this problem and Takashi Iwai provided patch to fix it by checking the bufsize in sprint_oid(). From: Takashi Iwai Reported-by: Pawel Wieczorkiewicz Cc: David Howells Cc: Rusty Russell Cc: Takashi Iwai Cc: Pawel Wieczorkiewicz Signed-off-by: "Lee, Chun-Yi" --- lib/oid_registry.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/oid_registry.c b/lib/oid_registry.c index 318f382..41b9e50 100644 --- a/lib/oid_registry.c +++ b/lib/oid_registry.c @@ -142,9 +142,9 @@ int sprint_oid(const void *data, size_t datasize, char *buffer, size_t bufsize) } ret += count = snprintf(buffer, bufsize, ".%lu", num); buffer += count; - bufsize -= count; - if (bufsize == 0) + if (bufsize <= count) return -ENOBUFS; + bufsize -= count; } return ret; -- 2.10.2
Re: [PATCH v2 3/3] vfio/pci: Don't probe devices that can't be reset
On Fri, 18 Aug 2017 08:57:09 -0700 David Daneywrote: > On 08/18/2017 07:12 AM, Alex Williamson wrote: > > On Fri, 18 Aug 2017 15:42:31 +0200 > > Jan Glauber wrote: > > > >> On Thu, Aug 17, 2017 at 07:00:17AM -0600, Alex Williamson wrote: > >>> On Thu, 17 Aug 2017 10:14:23 +0200 > >>> Jan Glauber wrote: > >>> > If a PCI device supports neither function-level reset, nor slot > or bus reset then refuse to probe it. A line is printed to inform > the user. > >>> > >>> But that's not what this does, this requires that the device is on a > >>> reset-able bus. This is a massive regression. With this we could no > >>> longer assign devices on the root complex or any device which doesn't > >>> return from bus reset and currently makes use of the NO_BUS_RESET flag > >>> and works happily otherwise. Full NAK. Thanks, > >> > >> Looks like I missed the slot reset check. So how about this: > >> > >> if (pci_probe_reset_slot(pdev->slot) && pci_probe_reset_bus(pdev->bus)) { > >>dev_warn(...); > >>return -ENODEV; > >> } > >> > >> Or am I still missing something here? > > > > We don't require that a device is on a reset-able bus/slot, so any > > attempt to impose that requirement means that there are devices that > > might work perfectly fine that are now excluded from assignment. The > > entire premise is unacceptable. Thanks, > > > You previously rejected the idea to silently ignore bus reset requests > on buses that do not support it. > > So this leaves us with two options: > > 1) Do nothing, and crash the kernel on systems with bad combinations of > PCIe target devices and cn88xx when vfio_pci is used. > > 2) Do something else. > > We are trying to figure out what that something else should be. The > general concept we are working on is that if vfio_pci wants to reset a > device, *and* bus reset is the only option available, *and* cn88xx, then > make vfio_pci fail. But that's not what these attempts do, they say if we can't do a bus or slot reset, fail the device probe. The comment is trying to suggest they do something else, am I misinterpreting the actual code change? There are plenty of devices out there that don't care if bus reset doesn't work, they support FLR or PM reset or device specific reset or just deal without a reset. We can't suddenly say this new thing is a requirement and sorry if you were happily using device assignment before, but there's a slim chance you're on this platform that falls over if we attempt to do a secondary bus reset. > What is your opinion of doing that (assuming it is properly implemented)? It seems like these attempts are trying to completely turn off vfio-pci on cn88xx, do you just want it unsupported on these platforms? Should we blacklist anything where dev->bus->self is this root port? Otherwise, what's wrong with returning an error if a bus reset fails, because we should *never* silently ignore the request and pretend that it worked, perhaps even dev_warn()'ing that the platform doesn't support bus resets? Thanks, Alex
Re: [PATCH v2 3/3] vfio/pci: Don't probe devices that can't be reset
On Fri, 18 Aug 2017 08:57:09 -0700 David Daney wrote: > On 08/18/2017 07:12 AM, Alex Williamson wrote: > > On Fri, 18 Aug 2017 15:42:31 +0200 > > Jan Glauber wrote: > > > >> On Thu, Aug 17, 2017 at 07:00:17AM -0600, Alex Williamson wrote: > >>> On Thu, 17 Aug 2017 10:14:23 +0200 > >>> Jan Glauber wrote: > >>> > If a PCI device supports neither function-level reset, nor slot > or bus reset then refuse to probe it. A line is printed to inform > the user. > >>> > >>> But that's not what this does, this requires that the device is on a > >>> reset-able bus. This is a massive regression. With this we could no > >>> longer assign devices on the root complex or any device which doesn't > >>> return from bus reset and currently makes use of the NO_BUS_RESET flag > >>> and works happily otherwise. Full NAK. Thanks, > >> > >> Looks like I missed the slot reset check. So how about this: > >> > >> if (pci_probe_reset_slot(pdev->slot) && pci_probe_reset_bus(pdev->bus)) { > >>dev_warn(...); > >>return -ENODEV; > >> } > >> > >> Or am I still missing something here? > > > > We don't require that a device is on a reset-able bus/slot, so any > > attempt to impose that requirement means that there are devices that > > might work perfectly fine that are now excluded from assignment. The > > entire premise is unacceptable. Thanks, > > > You previously rejected the idea to silently ignore bus reset requests > on buses that do not support it. > > So this leaves us with two options: > > 1) Do nothing, and crash the kernel on systems with bad combinations of > PCIe target devices and cn88xx when vfio_pci is used. > > 2) Do something else. > > We are trying to figure out what that something else should be. The > general concept we are working on is that if vfio_pci wants to reset a > device, *and* bus reset is the only option available, *and* cn88xx, then > make vfio_pci fail. But that's not what these attempts do, they say if we can't do a bus or slot reset, fail the device probe. The comment is trying to suggest they do something else, am I misinterpreting the actual code change? There are plenty of devices out there that don't care if bus reset doesn't work, they support FLR or PM reset or device specific reset or just deal without a reset. We can't suddenly say this new thing is a requirement and sorry if you were happily using device assignment before, but there's a slim chance you're on this platform that falls over if we attempt to do a secondary bus reset. > What is your opinion of doing that (assuming it is properly implemented)? It seems like these attempts are trying to completely turn off vfio-pci on cn88xx, do you just want it unsupported on these platforms? Should we blacklist anything where dev->bus->self is this root port? Otherwise, what's wrong with returning an error if a bus reset fails, because we should *never* silently ignore the request and pretend that it worked, perhaps even dev_warn()'ing that the platform doesn't support bus resets? Thanks, Alex
[PATCH v2 1/1] usb:xhci: update condition to select bus->sysdev from parent device
From: "Thang Q. Nguyen"For commit 4c39d4b949d3 ("usb: xhci: use bus->sysdev for DMA configuration"), sysdev points to devices known to the system firmware or hardware for DMA parameters. However, the parent of the system firmware/hardware device checking logic does not work in ACPI boot mode. This patch updates the formulation to check this case in both DT and ACPI. Signed-off-by: Tung Nguyen Signed-off-by: Thang Q. Nguyen --- Change since v1: - Update codes to work with kernel 4.13-rc5 --- drivers/usb/host/xhci-plat.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index c04144b..1bb9729 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -187,7 +187,10 @@ static int xhci_plat_probe(struct platform_device *pdev) * 3. xhci_plat is grandchild of a pci device (dwc3-pci) */ sysdev = >dev; - if (sysdev->parent && !sysdev->of_node && sysdev->parent->of_node) + if (sysdev->parent && !is_of_node(sysdev->fwnode) && + !is_acpi_device_node(sysdev->fwnode) && + (is_of_node(sysdev->parent->fwnode) || +is_acpi_device_node(sysdev->parent->fwnode))) sysdev = sysdev->parent; #ifdef CONFIG_PCI else if (sysdev->parent && sysdev->parent->parent && -- 1.8.3.1
[PATCH v2 1/1] usb:xhci: update condition to select bus->sysdev from parent device
From: "Thang Q. Nguyen" For commit 4c39d4b949d3 ("usb: xhci: use bus->sysdev for DMA configuration"), sysdev points to devices known to the system firmware or hardware for DMA parameters. However, the parent of the system firmware/hardware device checking logic does not work in ACPI boot mode. This patch updates the formulation to check this case in both DT and ACPI. Signed-off-by: Tung Nguyen Signed-off-by: Thang Q. Nguyen --- Change since v1: - Update codes to work with kernel 4.13-rc5 --- drivers/usb/host/xhci-plat.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index c04144b..1bb9729 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -187,7 +187,10 @@ static int xhci_plat_probe(struct platform_device *pdev) * 3. xhci_plat is grandchild of a pci device (dwc3-pci) */ sysdev = >dev; - if (sysdev->parent && !sysdev->of_node && sysdev->parent->of_node) + if (sysdev->parent && !is_of_node(sysdev->fwnode) && + !is_acpi_device_node(sysdev->fwnode) && + (is_of_node(sysdev->parent->fwnode) || +is_acpi_device_node(sysdev->parent->fwnode))) sysdev = sysdev->parent; #ifdef CONFIG_PCI else if (sysdev->parent && sysdev->parent->parent && -- 1.8.3.1
Re: [PATCH] staging: lustre: uapi: properly include lustre_errno.h
On Fri, Aug 18, 2017 at 11:04:14PM -0400, James Simmons wrote: > The path for lustre_errno.h was not updated to the new path > for errno.c. Also the header lustre_net.h uses code found in > lustre_errno.h but doesn't include the header directly. It > is easy to forget to add the header lustre_errno.h before > including lustre_net.h so it is just easier to include > lustre_errno.h in lustre_net.h. This should resolve the > build issues seen on the ia64 platform. > > Signed-off-by: James Simmons> --- > drivers/staging/lustre/lustre/include/lustre_net.h | 1 + > drivers/staging/lustre/lustre/ptlrpc/errno.c | 2 +- > 2 files changed, 2 insertions(+), 1 deletion(-) Merge this into the correct patch and send both patch series again please. thanks, greg k-h
Re: [PATCH] staging: lustre: uapi: properly include lustre_errno.h
On Fri, Aug 18, 2017 at 11:04:14PM -0400, James Simmons wrote: > The path for lustre_errno.h was not updated to the new path > for errno.c. Also the header lustre_net.h uses code found in > lustre_errno.h but doesn't include the header directly. It > is easy to forget to add the header lustre_errno.h before > including lustre_net.h so it is just easier to include > lustre_errno.h in lustre_net.h. This should resolve the > build issues seen on the ia64 platform. > > Signed-off-by: James Simmons > --- > drivers/staging/lustre/lustre/include/lustre_net.h | 1 + > drivers/staging/lustre/lustre/ptlrpc/errno.c | 2 +- > 2 files changed, 2 insertions(+), 1 deletion(-) Merge this into the correct patch and send both patch series again please. thanks, greg k-h
Re: [f2fs-dev] [PATCH v3] f2fs: add cur_reserved_blocks to support soft block reservation
On 2017/8/18 23:09, Yunlong Song wrote: > This patch adds cur_reserved_blocks to extend reserved_blocks sysfs > interface to be soft threshold, which allows user configure it exceeding > current available user space. To ensure there is enough space for > supporting system's activation, this patch does not set the reserved space I still don't think soft block reservation could be used in android scenario, which has users on f2fs data partition, like system or apps, who are sensitive with free space. IMO, hard block reservation is enough. Instead, I expect soft block reservation could be used as a method of global disk quota controlling or enforced over-provision ratio alteration that is operated by administrator, most possibly in distributed storage scenario, in where single disk state (free space) changing does affect whole system. Thanks, > to the configured reserved_blocks value at once, instead, it safely > increase cur_reserved_blocks in dev_valid_block(,node)_count to only take > up the blocks which are just obsoleted. > > Signed-off-by: Yunlong Song> Signed-off-by: Chao Yu > --- > Documentation/ABI/testing/sysfs-fs-f2fs | 3 ++- > fs/f2fs/f2fs.h | 13 +++-- > fs/f2fs/super.c | 3 ++- > fs/f2fs/sysfs.c | 15 +-- > 4 files changed, 28 insertions(+), 6 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs > b/Documentation/ABI/testing/sysfs-fs-f2fs > index 11b7f4e..ba282ca 100644 > --- a/Documentation/ABI/testing/sysfs-fs-f2fs > +++ b/Documentation/ABI/testing/sysfs-fs-f2fs > @@ -138,7 +138,8 @@ What: /sys/fs/f2fs//reserved_blocks > Date:June 2017 > Contact: "Chao Yu" > Description: > - Controls current reserved blocks in system. > + Controls current reserved blocks in system, the threshold > + is soft, it could exceed current available user space. > > What:/sys/fs/f2fs//gc_urgent > Date:August 2017 > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index 2f20b6b..84ccbdc 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -1041,6 +1041,7 @@ struct f2fs_sb_info { > block_t discard_blks; /* discard command candidats */ > block_t last_valid_block_count; /* for recovery */ > block_t reserved_blocks;/* configurable reserved blocks > */ > + block_t cur_reserved_blocks;/* current reserved blocks */ > > u32 s_next_generation; /* for NFS support */ > > @@ -1515,7 +1516,8 @@ static inline int inc_valid_block_count(struct > f2fs_sb_info *sbi, > > spin_lock(>stat_lock); > sbi->total_valid_block_count += (block_t)(*count); > - avail_user_block_count = sbi->user_block_count - sbi->reserved_blocks; > + avail_user_block_count = sbi->user_block_count - > + sbi->cur_reserved_blocks; > if (unlikely(sbi->total_valid_block_count > avail_user_block_count)) { > diff = sbi->total_valid_block_count - avail_user_block_count; > *count -= diff; > @@ -1549,6 +1551,10 @@ static inline void dec_valid_block_count(struct > f2fs_sb_info *sbi, > f2fs_bug_on(sbi, sbi->total_valid_block_count < (block_t) count); > f2fs_bug_on(sbi, inode->i_blocks < sectors); > sbi->total_valid_block_count -= (block_t)count; > + if (sbi->reserved_blocks && > + sbi->reserved_blocks != sbi->cur_reserved_blocks) > + sbi->cur_reserved_blocks = min(sbi->reserved_blocks, > + sbi->cur_reserved_blocks + count); > spin_unlock(>stat_lock); > f2fs_i_blocks_write(inode, count, false, true); > } > @@ -1695,7 +1701,7 @@ static inline int inc_valid_node_count(struct > f2fs_sb_info *sbi, > spin_lock(>stat_lock); > > valid_block_count = sbi->total_valid_block_count + 1; > - if (unlikely(valid_block_count + sbi->reserved_blocks > > + if (unlikely(valid_block_count + sbi->cur_reserved_blocks > > sbi->user_block_count)) { > spin_unlock(>stat_lock); > goto enospc; > @@ -1738,6 +1744,9 @@ static inline void dec_valid_node_count(struct > f2fs_sb_info *sbi, > > sbi->total_valid_node_count--; > sbi->total_valid_block_count--; > + if (sbi->reserved_blocks && > + sbi->reserved_blocks != sbi->cur_reserved_blocks) > + sbi->cur_reserved_blocks++; > > spin_unlock(>stat_lock); > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > index 4c1bdcb..16a805f 100644 > --- a/fs/f2fs/super.c > +++ b/fs/f2fs/super.c > @@ -957,7 +957,7 @@ static int f2fs_statfs(struct dentry *dentry, struct > kstatfs *buf) > buf->f_blocks = total_count - start_count; >
Re: [f2fs-dev] [PATCH v3] f2fs: add cur_reserved_blocks to support soft block reservation
On 2017/8/18 23:09, Yunlong Song wrote: > This patch adds cur_reserved_blocks to extend reserved_blocks sysfs > interface to be soft threshold, which allows user configure it exceeding > current available user space. To ensure there is enough space for > supporting system's activation, this patch does not set the reserved space I still don't think soft block reservation could be used in android scenario, which has users on f2fs data partition, like system or apps, who are sensitive with free space. IMO, hard block reservation is enough. Instead, I expect soft block reservation could be used as a method of global disk quota controlling or enforced over-provision ratio alteration that is operated by administrator, most possibly in distributed storage scenario, in where single disk state (free space) changing does affect whole system. Thanks, > to the configured reserved_blocks value at once, instead, it safely > increase cur_reserved_blocks in dev_valid_block(,node)_count to only take > up the blocks which are just obsoleted. > > Signed-off-by: Yunlong Song > Signed-off-by: Chao Yu > --- > Documentation/ABI/testing/sysfs-fs-f2fs | 3 ++- > fs/f2fs/f2fs.h | 13 +++-- > fs/f2fs/super.c | 3 ++- > fs/f2fs/sysfs.c | 15 +-- > 4 files changed, 28 insertions(+), 6 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs > b/Documentation/ABI/testing/sysfs-fs-f2fs > index 11b7f4e..ba282ca 100644 > --- a/Documentation/ABI/testing/sysfs-fs-f2fs > +++ b/Documentation/ABI/testing/sysfs-fs-f2fs > @@ -138,7 +138,8 @@ What: /sys/fs/f2fs//reserved_blocks > Date:June 2017 > Contact: "Chao Yu" > Description: > - Controls current reserved blocks in system. > + Controls current reserved blocks in system, the threshold > + is soft, it could exceed current available user space. > > What:/sys/fs/f2fs//gc_urgent > Date:August 2017 > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index 2f20b6b..84ccbdc 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -1041,6 +1041,7 @@ struct f2fs_sb_info { > block_t discard_blks; /* discard command candidats */ > block_t last_valid_block_count; /* for recovery */ > block_t reserved_blocks;/* configurable reserved blocks > */ > + block_t cur_reserved_blocks;/* current reserved blocks */ > > u32 s_next_generation; /* for NFS support */ > > @@ -1515,7 +1516,8 @@ static inline int inc_valid_block_count(struct > f2fs_sb_info *sbi, > > spin_lock(>stat_lock); > sbi->total_valid_block_count += (block_t)(*count); > - avail_user_block_count = sbi->user_block_count - sbi->reserved_blocks; > + avail_user_block_count = sbi->user_block_count - > + sbi->cur_reserved_blocks; > if (unlikely(sbi->total_valid_block_count > avail_user_block_count)) { > diff = sbi->total_valid_block_count - avail_user_block_count; > *count -= diff; > @@ -1549,6 +1551,10 @@ static inline void dec_valid_block_count(struct > f2fs_sb_info *sbi, > f2fs_bug_on(sbi, sbi->total_valid_block_count < (block_t) count); > f2fs_bug_on(sbi, inode->i_blocks < sectors); > sbi->total_valid_block_count -= (block_t)count; > + if (sbi->reserved_blocks && > + sbi->reserved_blocks != sbi->cur_reserved_blocks) > + sbi->cur_reserved_blocks = min(sbi->reserved_blocks, > + sbi->cur_reserved_blocks + count); > spin_unlock(>stat_lock); > f2fs_i_blocks_write(inode, count, false, true); > } > @@ -1695,7 +1701,7 @@ static inline int inc_valid_node_count(struct > f2fs_sb_info *sbi, > spin_lock(>stat_lock); > > valid_block_count = sbi->total_valid_block_count + 1; > - if (unlikely(valid_block_count + sbi->reserved_blocks > > + if (unlikely(valid_block_count + sbi->cur_reserved_blocks > > sbi->user_block_count)) { > spin_unlock(>stat_lock); > goto enospc; > @@ -1738,6 +1744,9 @@ static inline void dec_valid_node_count(struct > f2fs_sb_info *sbi, > > sbi->total_valid_node_count--; > sbi->total_valid_block_count--; > + if (sbi->reserved_blocks && > + sbi->reserved_blocks != sbi->cur_reserved_blocks) > + sbi->cur_reserved_blocks++; > > spin_unlock(>stat_lock); > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > index 4c1bdcb..16a805f 100644 > --- a/fs/f2fs/super.c > +++ b/fs/f2fs/super.c > @@ -957,7 +957,7 @@ static int f2fs_statfs(struct dentry *dentry, struct > kstatfs *buf) > buf->f_blocks = total_count - start_count; > buf->f_bfree = user_block_count - valid_user_blocks(sbi) +
[PATCH] staging: lustre: uapi: properly include lustre_errno.h
The path for lustre_errno.h was not updated to the new path for errno.c. Also the header lustre_net.h uses code found in lustre_errno.h but doesn't include the header directly. It is easy to forget to add the header lustre_errno.h before including lustre_net.h so it is just easier to include lustre_errno.h in lustre_net.h. This should resolve the build issues seen on the ia64 platform. Signed-off-by: James Simmons--- drivers/staging/lustre/lustre/include/lustre_net.h | 1 + drivers/staging/lustre/lustre/ptlrpc/errno.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h b/drivers/staging/lustre/lustre/include/lustre_net.h index 1ae7e83..bc18131 100644 --- a/drivers/staging/lustre/lustre/include/lustre_net.h +++ b/drivers/staging/lustre/lustre/include/lustre_net.h @@ -55,6 +55,7 @@ #include "../../include/uapi/linux/lnet/nidstr.h" #include "../../include/linux/lnet/api.h" #include "../../include/uapi/linux/lustre/lustre_idl.h" +#include "lustre_errno.h" #include "lustre_ha.h" #include "lustre_sec.h" #include "lustre_import.h" diff --git a/drivers/staging/lustre/lustre/ptlrpc/errno.c b/drivers/staging/lustre/lustre/ptlrpc/errno.c index 73f8374..faad8d8 100644 --- a/drivers/staging/lustre/lustre/ptlrpc/errno.c +++ b/drivers/staging/lustre/lustre/ptlrpc/errno.c @@ -26,7 +26,7 @@ */ #include "../../include/linux/libcfs/libcfs.h" -#include "../include/lustre/lustre_errno.h" +#include "../include/lustre_errno.h" /* * The two translation tables below must define a one-to-one mapping between -- 1.8.3.1
[PATCH] staging: lustre: uapi: properly include lustre_errno.h
The path for lustre_errno.h was not updated to the new path for errno.c. Also the header lustre_net.h uses code found in lustre_errno.h but doesn't include the header directly. It is easy to forget to add the header lustre_errno.h before including lustre_net.h so it is just easier to include lustre_errno.h in lustre_net.h. This should resolve the build issues seen on the ia64 platform. Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/include/lustre_net.h | 1 + drivers/staging/lustre/lustre/ptlrpc/errno.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h b/drivers/staging/lustre/lustre/include/lustre_net.h index 1ae7e83..bc18131 100644 --- a/drivers/staging/lustre/lustre/include/lustre_net.h +++ b/drivers/staging/lustre/lustre/include/lustre_net.h @@ -55,6 +55,7 @@ #include "../../include/uapi/linux/lnet/nidstr.h" #include "../../include/linux/lnet/api.h" #include "../../include/uapi/linux/lustre/lustre_idl.h" +#include "lustre_errno.h" #include "lustre_ha.h" #include "lustre_sec.h" #include "lustre_import.h" diff --git a/drivers/staging/lustre/lustre/ptlrpc/errno.c b/drivers/staging/lustre/lustre/ptlrpc/errno.c index 73f8374..faad8d8 100644 --- a/drivers/staging/lustre/lustre/ptlrpc/errno.c +++ b/drivers/staging/lustre/lustre/ptlrpc/errno.c @@ -26,7 +26,7 @@ */ #include "../../include/linux/libcfs/libcfs.h" -#include "../include/lustre/lustre_errno.h" +#include "../include/lustre_errno.h" /* * The two translation tables below must define a one-to-one mapping between -- 1.8.3.1
Re: [RFC PATCH v3] pci: Concurrency issue during pci enable bridge
On Wed, Aug 16, 2017 at 10:33:12PM +0530, Srinath Mannam wrote: > Hi Bjorn, > > Thank you for the feedback. > > My comments are in lined. > > On Wed, Aug 16, 2017 at 7:13 PM, Bjorn Helgaaswrote: > > On Fri, Aug 04, 2017 at 08:27:28PM +0530, Srinath Mannam wrote: > >> Concurrency issue is observed during pci enable bridge called > >> for multiple pci devices initialization in SMP system. > >> > >> Setup details: > >> - SMP system with 8 ARMv8 cores running Linux kernel(4.11). > >> - Two EPs are connected to PCIe RC through bridge as shown > >>in the below figure. > >> > >>[RC] > >> | > >> [BRIDGE] > >> | > >>--- > >> | | > >> [EP][EP] > >> > >> Issue description: > >> After PCIe enumeration completed EP driver probe function called > >> for both the devices from two CPUS simultaneously. > >> From EP probe function, pci_enable_device_mem called for both the EPs. > >> This function called pci_enable_bridge enable for all the bridges > >> recursively in the path of EP to RC. > >> > >> Inside pci_enable_bridge function, at two places concurrency issue is > >> observed. > >> > >> Place 1: > >> CPU 0: > >> 1. Done Atomic increment dev->enable_cnt > >>in pci_enable_device_flags > >> 2. Inside pci_enable_resources > >> 3. Completed pci_read_config_word(dev, PCI_COMMAND, ) > >> 4. Ready to set PCI_COMMAND_MEMORY (0x2) in > >>pci_write_config_word(dev, PCI_COMMAND, cmd) > >> CPU 1: > >> 1. Check pci_is_enabled in function pci_enable_bridge > >>and it is true > >> 2. Check (!dev->is_busmaster) also true > >> 3. Gone into pci_set_master > >> 4. Completed pci_read_config_word(dev, PCI_COMMAND, _cmd) > >> 5. Ready to set PCI_COMMAND_MASTER (0x4) in > >>pci_write_config_word(dev, PCI_COMMAND, cmd) > >> > >> By the time of last point for both the CPUs are read value 0 and > >> ready to write 2 and 4. > >> After last point final value in PCI_COMMAND register is 4 instead of 6. > >> > >> Place 2: > >> CPU 0: > >> 1. Done Atomic increment dev->enable_cnt in > >>pci_enable_device_flags > >> > >> Signed-off-by: Srinath Mannam > >> --- > >> drivers/pci/pci.c | 8 ++-- > >> 1 file changed, 6 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > >> index af0cc34..12721df 100644 > >> --- a/drivers/pci/pci.c > >> +++ b/drivers/pci/pci.c > >> @@ -52,6 +52,7 @@ static void pci_pme_list_scan(struct work_struct *work); > >> static LIST_HEAD(pci_pme_list); > >> static DEFINE_MUTEX(pci_pme_list_mutex); > >> static DECLARE_DELAYED_WORK(pci_pme_work, pci_pme_list_scan); > >> +static DEFINE_MUTEX(pci_bridge_mutex); > >> > >> struct pci_pme_device { > >> struct list_head list; > >> @@ -1348,10 +1349,11 @@ static void pci_enable_bridge(struct pci_dev *dev) > >> if (bridge) > >> pci_enable_bridge(bridge); > >> > >> + mutex_lock(_bridge_mutex); > >> if (pci_is_enabled(dev)) { > >> if (!dev->is_busmaster) > >> pci_set_master(dev); > >> - return; > >> + goto end; > >> } > >> > >> retval = pci_enable_device(dev); > >> @@ -1359,6 +1361,8 @@ static void pci_enable_bridge(struct pci_dev *dev) > >> dev_err(>dev, "Error enabling bridge (%d), > >> continuing\n", > >> retval); > >> pci_set_master(dev); > >> +end: > >> + mutex_unlock(_bridge_mutex); > > > > I think this will deadlock because we're holding pci_bridge_mutex > > while we call pci_enable_device(), which may recursively call > > pci_enable_bridge(), which would try to acquire pci_bridge_mutex > > again. My original suggestion of a mutex in the host bridge would > > have the same problem. > > This extra check "if (bridge && !pci_is_enabled(bridge))" will resolve > the deadlock in the present patch. Oh, I see. In pci_enable_bridge(X), the recursive call to enable upstream bridges happens *before* acquiring pci_bridge_mutex, so when we call pci_enable_device(X) while holding the mutex, any upstream bridges will have been already enabled. We recurse all the way to the top of the hierarchy, acquire the mutex, enable the bridge, release the mutex, then unwind and repeat for each bridge on the way down to the endpoint. > > We talked about using device_lock() earlier. You found some problems > > with that, and I'd like to understand them better. You said: > > > >> But the pci_enable_bridge is called in the context of the driver > >> probe function, we will have nexted lock problem. > > > > The driver core does hold device_lock() while calling the driver probe > > function, in this path: > > > > device_initial_probe > > __device_attach > > device_lock(dev)# <-- lock > >
Re: [RFC PATCH v3] pci: Concurrency issue during pci enable bridge
On Wed, Aug 16, 2017 at 10:33:12PM +0530, Srinath Mannam wrote: > Hi Bjorn, > > Thank you for the feedback. > > My comments are in lined. > > On Wed, Aug 16, 2017 at 7:13 PM, Bjorn Helgaas wrote: > > On Fri, Aug 04, 2017 at 08:27:28PM +0530, Srinath Mannam wrote: > >> Concurrency issue is observed during pci enable bridge called > >> for multiple pci devices initialization in SMP system. > >> > >> Setup details: > >> - SMP system with 8 ARMv8 cores running Linux kernel(4.11). > >> - Two EPs are connected to PCIe RC through bridge as shown > >>in the below figure. > >> > >>[RC] > >> | > >> [BRIDGE] > >> | > >>--- > >> | | > >> [EP][EP] > >> > >> Issue description: > >> After PCIe enumeration completed EP driver probe function called > >> for both the devices from two CPUS simultaneously. > >> From EP probe function, pci_enable_device_mem called for both the EPs. > >> This function called pci_enable_bridge enable for all the bridges > >> recursively in the path of EP to RC. > >> > >> Inside pci_enable_bridge function, at two places concurrency issue is > >> observed. > >> > >> Place 1: > >> CPU 0: > >> 1. Done Atomic increment dev->enable_cnt > >>in pci_enable_device_flags > >> 2. Inside pci_enable_resources > >> 3. Completed pci_read_config_word(dev, PCI_COMMAND, ) > >> 4. Ready to set PCI_COMMAND_MEMORY (0x2) in > >>pci_write_config_word(dev, PCI_COMMAND, cmd) > >> CPU 1: > >> 1. Check pci_is_enabled in function pci_enable_bridge > >>and it is true > >> 2. Check (!dev->is_busmaster) also true > >> 3. Gone into pci_set_master > >> 4. Completed pci_read_config_word(dev, PCI_COMMAND, _cmd) > >> 5. Ready to set PCI_COMMAND_MASTER (0x4) in > >>pci_write_config_word(dev, PCI_COMMAND, cmd) > >> > >> By the time of last point for both the CPUs are read value 0 and > >> ready to write 2 and 4. > >> After last point final value in PCI_COMMAND register is 4 instead of 6. > >> > >> Place 2: > >> CPU 0: > >> 1. Done Atomic increment dev->enable_cnt in > >>pci_enable_device_flags > >> > >> Signed-off-by: Srinath Mannam > >> --- > >> drivers/pci/pci.c | 8 ++-- > >> 1 file changed, 6 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > >> index af0cc34..12721df 100644 > >> --- a/drivers/pci/pci.c > >> +++ b/drivers/pci/pci.c > >> @@ -52,6 +52,7 @@ static void pci_pme_list_scan(struct work_struct *work); > >> static LIST_HEAD(pci_pme_list); > >> static DEFINE_MUTEX(pci_pme_list_mutex); > >> static DECLARE_DELAYED_WORK(pci_pme_work, pci_pme_list_scan); > >> +static DEFINE_MUTEX(pci_bridge_mutex); > >> > >> struct pci_pme_device { > >> struct list_head list; > >> @@ -1348,10 +1349,11 @@ static void pci_enable_bridge(struct pci_dev *dev) > >> if (bridge) > >> pci_enable_bridge(bridge); > >> > >> + mutex_lock(_bridge_mutex); > >> if (pci_is_enabled(dev)) { > >> if (!dev->is_busmaster) > >> pci_set_master(dev); > >> - return; > >> + goto end; > >> } > >> > >> retval = pci_enable_device(dev); > >> @@ -1359,6 +1361,8 @@ static void pci_enable_bridge(struct pci_dev *dev) > >> dev_err(>dev, "Error enabling bridge (%d), > >> continuing\n", > >> retval); > >> pci_set_master(dev); > >> +end: > >> + mutex_unlock(_bridge_mutex); > > > > I think this will deadlock because we're holding pci_bridge_mutex > > while we call pci_enable_device(), which may recursively call > > pci_enable_bridge(), which would try to acquire pci_bridge_mutex > > again. My original suggestion of a mutex in the host bridge would > > have the same problem. > > This extra check "if (bridge && !pci_is_enabled(bridge))" will resolve > the deadlock in the present patch. Oh, I see. In pci_enable_bridge(X), the recursive call to enable upstream bridges happens *before* acquiring pci_bridge_mutex, so when we call pci_enable_device(X) while holding the mutex, any upstream bridges will have been already enabled. We recurse all the way to the top of the hierarchy, acquire the mutex, enable the bridge, release the mutex, then unwind and repeat for each bridge on the way down to the endpoint. > > We talked about using device_lock() earlier. You found some problems > > with that, and I'd like to understand them better. You said: > > > >> But the pci_enable_bridge is called in the context of the driver > >> probe function, we will have nexted lock problem. > > > > The driver core does hold device_lock() while calling the driver probe > > function, in this path: > > > > device_initial_probe > > __device_attach > > device_lock(dev)# <-- lock > > __device_attach_driver > > ... > >
Re: [PATCH] dts: Make it easier to enable __symbols__ generation in dtb files
On Wed, Aug 16, 2017 at 11:32:08PM -0700, Frank Rowand wrote: > Hi Tom, > > Some nit picking on the patch comment. :-) > > > On 08/16/17 17:35, Tom Rini wrote: > > This introduces the variabe DTC_EXTRA_FLAGS to allow for additional > > flags to be passed to dtc. While this can have many uses (such as > > easier testing of new warning flags) the use case I wish to spell out > > here is passing of -@ to enable __symbols__ to be included in the > > resulting dtb and thus 'stacked' overlays to be able to be used. > > Not just stacked overlays (I'm guessing by that you mean an > overlay applied on top of an overlay), but even an overlay on > top of the base dtb. In the case of outside of the kernel, we can today have some classes of overlays that work today. "Stacked" is how Pantelis has been referring to overlays that make use of __symbols__ so that we can resolve references. I guess this ended up being referenced as plugin support in dtc.git tho? I'm not quite sure how to re-word the above to be clear that it's extending what can be used. > > When passing in -@ this will increase the size of the dtb (and resident > > memory usage) in exchange for allowing more types of overlays to be > > applied to the dtb file prior to passing it to Linux and additional > > functionality within the kernel when OF_OVERLAY is enabled and in tools > > outside of the kernel. > > Not so much additional functionality within the kernel. It is basic > functionality for any overlay to be applied in the kernel. Oh, I thought the kernel already handled using that information. I'll think how to reword the line there then. > > Cc: Jason Kridner> > Cc: Drew Fustini > > Cc: Robert Nelson > > Cc: Lokesh Vutla > > Cc: Frank Rowand > > Cc: Rob Herring > > Cc: Mark Rutland > > Cc: Russell King > > Cc: Masahiro Yamada > > Cc: Michal Marek > > CC: linux-kbu...@vger.kernel.org > > Cc: devicet...@vger.kernel.org > > Cc: linux-arm-ker...@lists.infradead.org > > Cc: linux-kernel@vger.kernel.org > > Signed-off-by: Tom Rini > > --- > > This would be v3 of my attempt to enable __symbols__ more widely (v2, > > for people not on the CC, enabled it always, but for some boards). > > > > To be clear, this version of the patch is more flexible in that it > > allows for arbitrary flags to be passed to dtc, which can have other > > uses. But, I'm still spelling out -@ being the motivation behind this > > for clarity. > > Thank you, that is very useful info to capture in the commit history. > > > > Both Frank and Rob were uncomfortable with v2, and Frank > > suggested this approach to make it easier for users / distributions / > > etc to opt-in to this functionality. For clarity, Frank was suggesting > > (at least at first) perhaps also a tie-in CONFIG option, which I have > > not done here as I feel that's adding extra hurdles to something that > > must already be opt-in. Given that today many of the immediate uses are > > for "pre-boot" application of overlays (SoM + carrier + add-ons or EVM + > > LCD for example), I believe this is an acceptable trade-off to make > > currently. > > If I understand correctly, then you are following Rob's suggestion of > passing the DTC_EXTRA_FLAGS value in the dtb make command. > > That seems to be a reasonable approach for now. Rob did mention that > he would like to remove the current hard coding of a base DTC_FLAGS > value, where several architectures initialize it with one of: > >DTC_FLAGS ?= -p 1024 >DTC_FLAGS := -p 1024 > > If those initializations can be removed, then we could just use > DTC_FLAGS in the make command, instead of adding DTC_EXTRA_FLAGS. > Can you discuss that option with Rob? (Sorry Rob, I'll be off-grid > soon.) With respect to the current users of DTC_FLAGS, we could just change c6x / microblaze / openrisc / powerpc to use '+=' not '?=' or ':=' as one-line patches, from a quick test on microblaze. Or maybe I'm missing something? But we can't only pass DTC_FLAGS on the command-line directly, unless we also pass in all other valid and normal flags (ie -Wno-unit_address_vs_reg). This is in-line with how for example CFLAGS is handled in the kernel, rather than by what, based on some googling, Makefiles can be made to do, in general. Thanks! -- Tom signature.asc Description: Digital signature
Re: [PATCH] dts: Make it easier to enable __symbols__ generation in dtb files
On Wed, Aug 16, 2017 at 11:32:08PM -0700, Frank Rowand wrote: > Hi Tom, > > Some nit picking on the patch comment. :-) > > > On 08/16/17 17:35, Tom Rini wrote: > > This introduces the variabe DTC_EXTRA_FLAGS to allow for additional > > flags to be passed to dtc. While this can have many uses (such as > > easier testing of new warning flags) the use case I wish to spell out > > here is passing of -@ to enable __symbols__ to be included in the > > resulting dtb and thus 'stacked' overlays to be able to be used. > > Not just stacked overlays (I'm guessing by that you mean an > overlay applied on top of an overlay), but even an overlay on > top of the base dtb. In the case of outside of the kernel, we can today have some classes of overlays that work today. "Stacked" is how Pantelis has been referring to overlays that make use of __symbols__ so that we can resolve references. I guess this ended up being referenced as plugin support in dtc.git tho? I'm not quite sure how to re-word the above to be clear that it's extending what can be used. > > When passing in -@ this will increase the size of the dtb (and resident > > memory usage) in exchange for allowing more types of overlays to be > > applied to the dtb file prior to passing it to Linux and additional > > functionality within the kernel when OF_OVERLAY is enabled and in tools > > outside of the kernel. > > Not so much additional functionality within the kernel. It is basic > functionality for any overlay to be applied in the kernel. Oh, I thought the kernel already handled using that information. I'll think how to reword the line there then. > > Cc: Jason Kridner > > Cc: Drew Fustini > > Cc: Robert Nelson > > Cc: Lokesh Vutla > > Cc: Frank Rowand > > Cc: Rob Herring > > Cc: Mark Rutland > > Cc: Russell King > > Cc: Masahiro Yamada > > Cc: Michal Marek > > CC: linux-kbu...@vger.kernel.org > > Cc: devicet...@vger.kernel.org > > Cc: linux-arm-ker...@lists.infradead.org > > Cc: linux-kernel@vger.kernel.org > > Signed-off-by: Tom Rini > > --- > > This would be v3 of my attempt to enable __symbols__ more widely (v2, > > for people not on the CC, enabled it always, but for some boards). > > > > To be clear, this version of the patch is more flexible in that it > > allows for arbitrary flags to be passed to dtc, which can have other > > uses. But, I'm still spelling out -@ being the motivation behind this > > for clarity. > > Thank you, that is very useful info to capture in the commit history. > > > > Both Frank and Rob were uncomfortable with v2, and Frank > > suggested this approach to make it easier for users / distributions / > > etc to opt-in to this functionality. For clarity, Frank was suggesting > > (at least at first) perhaps also a tie-in CONFIG option, which I have > > not done here as I feel that's adding extra hurdles to something that > > must already be opt-in. Given that today many of the immediate uses are > > for "pre-boot" application of overlays (SoM + carrier + add-ons or EVM + > > LCD for example), I believe this is an acceptable trade-off to make > > currently. > > If I understand correctly, then you are following Rob's suggestion of > passing the DTC_EXTRA_FLAGS value in the dtb make command. > > That seems to be a reasonable approach for now. Rob did mention that > he would like to remove the current hard coding of a base DTC_FLAGS > value, where several architectures initialize it with one of: > >DTC_FLAGS ?= -p 1024 >DTC_FLAGS := -p 1024 > > If those initializations can be removed, then we could just use > DTC_FLAGS in the make command, instead of adding DTC_EXTRA_FLAGS. > Can you discuss that option with Rob? (Sorry Rob, I'll be off-grid > soon.) With respect to the current users of DTC_FLAGS, we could just change c6x / microblaze / openrisc / powerpc to use '+=' not '?=' or ':=' as one-line patches, from a quick test on microblaze. Or maybe I'm missing something? But we can't only pass DTC_FLAGS on the command-line directly, unless we also pass in all other valid and normal flags (ie -Wno-unit_address_vs_reg). This is in-line with how for example CFLAGS is handled in the kernel, rather than by what, based on some googling, Makefiles can be made to do, in general. Thanks! -- Tom signature.asc Description: Digital signature
[PATCH] iio: accel: mma8452: code improvements to handle more than one event
This driver supports multiple devices like mma8653, mma8652, mma8452, mma8453 and fxls8471. Almost all these devices have more than one event. Current driver design hardcodes the event specific information, so only one event can be supported by this driver and current design doesn't have the flexibility to add more events. This patch improves by detaching the event related information from chip_info struct, and based on channel type and event direction the corresponding event configuration registers are picked dynamically. Hence both transient and freefall events can be handled in read/write callbacks. Changes are thoroughly tested on fxls8471 device on imx6UL Eval board using iio_event_monitor user space program. After this fix both Freefall and Transient events are handled by the driver without any conflicts. Changes since v2 -> v3 -Fix typo in commit message -Replace word 'Bugfix' with 'Improvements' -Describe more accurate commit message -Replace breaks with returns -Initialise transient event threshold mask -Remove unrelated change of IIO_ACCEL channel type check in read/write event callbacks Changes since v1 -> v2 -Fix indentations -Remove unused fields in mma8452_event_regs struct -Remove redundant return statement -Remove unrelated changes like checkpatch.pl warning fixes Signed-off-by: Harinath Nampally--- drivers/iio/accel/mma8452.c | 272 +++- 1 file changed, 118 insertions(+), 154 deletions(-) diff --git a/drivers/iio/accel/mma8452.c b/drivers/iio/accel/mma8452.c index eb6e3dc..8de9928 100644 --- a/drivers/iio/accel/mma8452.c +++ b/drivers/iio/accel/mma8452.c @@ -59,7 +59,9 @@ #define MMA8452_FF_MT_THS 0x17 #define MMA8452_FF_MT_THS_MASK0x7f #define MMA8452_FF_MT_COUNT0x18 +#define MMA8452_FF_MT_CHAN_SHIFT 3 #define MMA8452_TRANSIENT_CFG 0x1d +#define MMA8452_TRANSIENT_CFG_CHAN(chan) BIT(chan + 1) #define MMA8452_TRANSIENT_CFG_HPF_BYP BIT(0) #define MMA8452_TRANSIENT_CFG_ELE BIT(4) #define MMA8452_TRANSIENT_SRC 0x1e @@ -69,6 +71,7 @@ #define MMA8452_TRANSIENT_THS 0x1f #define MMA8452_TRANSIENT_THS_MASKGENMASK(6, 0) #define MMA8452_TRANSIENT_COUNT0x20 +#define MMA8452_TRANSIENT_CHAN_SHIFT 1 #define MMA8452_CTRL_REG1 0x2a #define MMA8452_CTRL_ACTIVE BIT(0) #define MMA8452_CTRL_DR_MASK GENMASK(5, 3) @@ -107,6 +110,26 @@ struct mma8452_data { const struct mma_chip_info *chip_info; }; + /** + * struct mma8452_event_regs - chip specific data related to events + * @ev_cfg: event config register address + * @ev_src: event source register address + * @ev_ths: event threshold register address + * @ev_ths_mask: mask for the threshold value + * @ev_count: event count (period) register address + * + * Since not all chips supported by the driver support comparing high pass + * filtered data for events (interrupts), different interrupt sources are + * used for different chips and the relevant registers are included here. + */ +struct mma8452_event_regs { + u8 ev_cfg; + u8 ev_src; + u8 ev_ths; + u8 ev_ths_mask; + u8 ev_count; +}; + /** * struct mma_chip_info - chip specific data * @chip_id: WHO_AM_I register's value @@ -116,40 +139,12 @@ struct mma8452_data { * @mma_scales:scale factors for converting register values * to m/s^2; 3 modes: 2g, 4g, 8g; 2 integers * per mode: m/s^2 and micro m/s^2 - * @ev_cfg:event config register address - * @ev_cfg_ele:latch bit in event config register - * @ev_cfg_chan_shift: number of the bit to enable events in X - * direction; in event config register - * @ev_src:event source register address - * @ev_src_xe: bit in event source register that indicates - * an event in X direction - * @ev_src_ye: bit in event source register that indicates - * an event in Y direction - * @ev_src_ze: bit in event source register that indicates - * an event in Z direction - * @ev_ths:event threshold register address - * @ev_ths_mask: mask for the threshold value - * @ev_count: event count (period) register address - * - * Since not all chips supported by the driver support comparing high pass - * filtered data for events (interrupts), different interrupt sources are - * used for different chips and the
[PATCH] iio: accel: mma8452: code improvements to handle more than one event
This driver supports multiple devices like mma8653, mma8652, mma8452, mma8453 and fxls8471. Almost all these devices have more than one event. Current driver design hardcodes the event specific information, so only one event can be supported by this driver and current design doesn't have the flexibility to add more events. This patch improves by detaching the event related information from chip_info struct, and based on channel type and event direction the corresponding event configuration registers are picked dynamically. Hence both transient and freefall events can be handled in read/write callbacks. Changes are thoroughly tested on fxls8471 device on imx6UL Eval board using iio_event_monitor user space program. After this fix both Freefall and Transient events are handled by the driver without any conflicts. Changes since v2 -> v3 -Fix typo in commit message -Replace word 'Bugfix' with 'Improvements' -Describe more accurate commit message -Replace breaks with returns -Initialise transient event threshold mask -Remove unrelated change of IIO_ACCEL channel type check in read/write event callbacks Changes since v1 -> v2 -Fix indentations -Remove unused fields in mma8452_event_regs struct -Remove redundant return statement -Remove unrelated changes like checkpatch.pl warning fixes Signed-off-by: Harinath Nampally --- drivers/iio/accel/mma8452.c | 272 +++- 1 file changed, 118 insertions(+), 154 deletions(-) diff --git a/drivers/iio/accel/mma8452.c b/drivers/iio/accel/mma8452.c index eb6e3dc..8de9928 100644 --- a/drivers/iio/accel/mma8452.c +++ b/drivers/iio/accel/mma8452.c @@ -59,7 +59,9 @@ #define MMA8452_FF_MT_THS 0x17 #define MMA8452_FF_MT_THS_MASK0x7f #define MMA8452_FF_MT_COUNT0x18 +#define MMA8452_FF_MT_CHAN_SHIFT 3 #define MMA8452_TRANSIENT_CFG 0x1d +#define MMA8452_TRANSIENT_CFG_CHAN(chan) BIT(chan + 1) #define MMA8452_TRANSIENT_CFG_HPF_BYP BIT(0) #define MMA8452_TRANSIENT_CFG_ELE BIT(4) #define MMA8452_TRANSIENT_SRC 0x1e @@ -69,6 +71,7 @@ #define MMA8452_TRANSIENT_THS 0x1f #define MMA8452_TRANSIENT_THS_MASKGENMASK(6, 0) #define MMA8452_TRANSIENT_COUNT0x20 +#define MMA8452_TRANSIENT_CHAN_SHIFT 1 #define MMA8452_CTRL_REG1 0x2a #define MMA8452_CTRL_ACTIVE BIT(0) #define MMA8452_CTRL_DR_MASK GENMASK(5, 3) @@ -107,6 +110,26 @@ struct mma8452_data { const struct mma_chip_info *chip_info; }; + /** + * struct mma8452_event_regs - chip specific data related to events + * @ev_cfg: event config register address + * @ev_src: event source register address + * @ev_ths: event threshold register address + * @ev_ths_mask: mask for the threshold value + * @ev_count: event count (period) register address + * + * Since not all chips supported by the driver support comparing high pass + * filtered data for events (interrupts), different interrupt sources are + * used for different chips and the relevant registers are included here. + */ +struct mma8452_event_regs { + u8 ev_cfg; + u8 ev_src; + u8 ev_ths; + u8 ev_ths_mask; + u8 ev_count; +}; + /** * struct mma_chip_info - chip specific data * @chip_id: WHO_AM_I register's value @@ -116,40 +139,12 @@ struct mma8452_data { * @mma_scales:scale factors for converting register values * to m/s^2; 3 modes: 2g, 4g, 8g; 2 integers * per mode: m/s^2 and micro m/s^2 - * @ev_cfg:event config register address - * @ev_cfg_ele:latch bit in event config register - * @ev_cfg_chan_shift: number of the bit to enable events in X - * direction; in event config register - * @ev_src:event source register address - * @ev_src_xe: bit in event source register that indicates - * an event in X direction - * @ev_src_ye: bit in event source register that indicates - * an event in Y direction - * @ev_src_ze: bit in event source register that indicates - * an event in Z direction - * @ev_ths:event threshold register address - * @ev_ths_mask: mask for the threshold value - * @ev_count: event count (period) register address - * - * Since not all chips supported by the driver support comparing high pass - * filtered data for events (interrupts), different interrupt sources are - * used for different chips and the relevant registers are
[PATCH] serial: earlycon: Only try fdt when specify 'earlycon' exactly
When moving earlycon early_param handling to serial, the devicetree earlycons enable condition changed slightly. We used to only do that for 'console'/'earlycon', but now would also for 'console='/'earlycon='. Fix it by using the same condition like before. Fixes: d503187b6cc4 (of/serial: move earlycon early_param handling to serial) Signed-off-by: Jeffy Chen--- drivers/tty/serial/earlycon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/serial/earlycon.c b/drivers/tty/serial/earlycon.c index c3651540e1ba..335933e1822c 100644 --- a/drivers/tty/serial/earlycon.c +++ b/drivers/tty/serial/earlycon.c @@ -220,7 +220,7 @@ static int __init param_setup_earlycon(char *buf) if (IS_ENABLED(CONFIG_ACPI_SPCR_TABLE)) { earlycon_init_is_deferred = true; return 0; - } else { + } else if (!buf) { return early_init_dt_scan_chosen_stdout(); } } -- 2.11.0
[PATCH] serial: earlycon: Only try fdt when specify 'earlycon' exactly
When moving earlycon early_param handling to serial, the devicetree earlycons enable condition changed slightly. We used to only do that for 'console'/'earlycon', but now would also for 'console='/'earlycon='. Fix it by using the same condition like before. Fixes: d503187b6cc4 (of/serial: move earlycon early_param handling to serial) Signed-off-by: Jeffy Chen --- drivers/tty/serial/earlycon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/serial/earlycon.c b/drivers/tty/serial/earlycon.c index c3651540e1ba..335933e1822c 100644 --- a/drivers/tty/serial/earlycon.c +++ b/drivers/tty/serial/earlycon.c @@ -220,7 +220,7 @@ static int __init param_setup_earlycon(char *buf) if (IS_ENABLED(CONFIG_ACPI_SPCR_TABLE)) { earlycon_init_is_deferred = true; return 0; - } else { + } else if (!buf) { return early_init_dt_scan_chosen_stdout(); } } -- 2.11.0
Re: [PATCH v9 2/2] x86/boot/KASLR: Restrict kernel to be randomized in mirror regions
On 08/18/17 at 04:10pm, Ard Biesheuvel wrote: > On 17 August 2017 at 14:04, Baoquan Hewrote: > > Thanks a lot for helping improving the patch log, Ingo! Will pay more > > attention to the description in words and paragraph partition of log. > > > >> > >> So if EFI is detected, iterate EFI memory map and pick the mirror region > >> to process for adding candidate of randomization slot. If EFI is disabled > >> or no mirror region found, still process e820 memory map. > >> > >> Signed-off-by: Baoquan He > > > Could the x86 people on cc either take these directly, or indicate > whether they are ok with this going in via the EFI tree? Thanks for looking into this, Ard. I saw Ingo has put these into tip/x86/boot. > > >> --- > >> arch/x86/boot/compressed/kaslr.c | 68 > >> ++-- > >> 1 file changed, 66 insertions(+), 2 deletions(-) > >> > >> diff --git a/arch/x86/boot/compressed/kaslr.c > >> b/arch/x86/boot/compressed/kaslr.c > >> index 99c7194f7ea6..7de23bb279ce 100644 > >> --- a/arch/x86/boot/compressed/kaslr.c > >> +++ b/arch/x86/boot/compressed/kaslr.c > >> @@ -37,7 +37,9 @@ > >> #include > >> #include > >> #include > >> +#include > >> #include > >> +#include > >> > >> /* Macros used by the included decompressor code below. */ > >> #define STATIC > >> @@ -558,6 +560,65 @@ static void process_mem_region(struct mem_vector > >> *entry, > >> } > >> } > >> > >> +#ifdef CONFIG_EFI > >> +/* > >> + * Returns true if mirror region found (and must have been processed > >> + * for slots adding) > >> + */ > >> +static bool > >> +process_efi_entries(unsigned long minimum, unsigned long image_size) > >> +{ > >> + struct efi_info *e = _params->efi_info; > >> + bool efi_mirror_found = false; > >> + struct mem_vector region; > >> + efi_memory_desc_t *md; > >> + unsigned long pmap; > >> + char *signature; > >> + u32 nr_desc; > >> + int i; > >> + > >> + signature = (char *)>efi_loader_signature; > >> + if (strncmp(signature, EFI32_LOADER_SIGNATURE, 4) && > >> + strncmp(signature, EFI64_LOADER_SIGNATURE, 4)) > >> + return false; > >> + > >> +#ifdef CONFIG_X86_32 > >> + /* Can't handle data above 4GB at this time */ > >> + if (e->efi_memmap_hi) { > >> + warn("EFI memmap is above 4GB, can't be handled now on > >> x86_32. EFI should be disabled.\n"); > >> + return false; > >> + } > >> + pmap = e->efi_memmap; > >> +#else > >> + pmap = (e->efi_memmap | ((__u64)e->efi_memmap_hi << 32)); > >> +#endif > >> + > >> + nr_desc = e->efi_memmap_size / e->efi_memdesc_size; > >> + for (i = 0; i < nr_desc; i++) { > >> + md = efi_early_memdesc_ptr(pmap, e->efi_memdesc_size, i); > >> + if (md->attribute & EFI_MEMORY_MORE_RELIABLE) { > >> + region.start = md->phys_addr; > >> + region.size = md->num_pages << EFI_PAGE_SHIFT; > >> + process_mem_region(, minimum, image_size); > >> + efi_mirror_found = true; > >> + > >> + if (slot_area_index == MAX_SLOT_AREA) { > >> + debug_putstr("Aborted EFI scan (slot_areas > >> full)!\n"); > >> + break; > >> + } > >> + } > >> + } > >> + > >> + return efi_mirror_found; > >> +} > >> +#else > >> +static inline bool > >> +process_efi_entries(unsigned long minimum, unsigned long image_size) > >> +{ > >> + return false; > >> +} > >> +#endif > >> + > >> static void process_e820_entries(unsigned long minimum, > >>unsigned long image_size) > >> { > >> @@ -586,13 +647,16 @@ static unsigned long find_random_phys_addr(unsigned > >> long minimum, > >> { > >> /* Check if we had too many memmaps. */ > >> if (memmap_too_large) { > >> - debug_putstr("Aborted e820 scan (more than 4 memmap= > >> args)!\n"); > >> + debug_putstr("Aborted memory entries scan (more than 4 > >> memmap= args)!\n"); > >> return 0; > >> } > >> > >> /* Make sure minimum is aligned. */ > >> minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN); > >> > >> + if (process_efi_entries(minimum, image_size)) > >> + return slots_fetch_random(); > >> + > >> process_e820_entries(minimum, image_size); > >> return slots_fetch_random(); > >> } > >> @@ -652,7 +716,7 @@ void choose_random_location(unsigned long input, > >>*/ > >> min_addr = min(*output, 512UL << 20); > >> > >> - /* Walk e820 and find a random address. */ > >> + /* Walk available memory entries to find a random address. */ > >> random_addr = find_random_phys_addr(min_addr, output_size); > >> if (!random_addr) { > >> warn("Physical KASLR disabled: no suitable memory region!"); > >> -- > >> 2.5.5 > >>
Re: [patch v3 2/3] drivers: jtag: Add Aspeed SoC 24xx and 25xx families JTAG master driver
Hi Oleksandr, [auto build test ERROR on linus/master] [also build test ERROR on v4.13-rc5 next-20170817] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Oleksandr-Shamray/drivers-jtag-Add-JTAG-core-driver/20170818-114739 config: um-allyesconfig (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=um All errors (new ones prefixed by >>): arch/um/drivers/vde.o: In function `vde_open_real': (.text+0xfb1): warning: Using 'getgrnam' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/vde.o: In function `vde_open_real': (.text+0xdfc): warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/vde.o: In function `vde_open_real': (.text+0x1115): warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/pcap.o: In function `pcap_nametoaddr': (.text+0xe475): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/pcap.o: In function `pcap_nametonetaddr': (.text+0xe515): warning: Using 'getnetbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/pcap.o: In function `pcap_nametoproto': (.text+0xe735): warning: Using 'getprotobyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/pcap.o: In function `pcap_nametoport': (.text+0xe567): warning: Using 'getservbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking drivers/auxdisplay/img-ascii-lcd.o: In function `img_ascii_lcd_probe': drivers/auxdisplay/img-ascii-lcd.c:386: undefined reference to `devm_ioremap_resource' drivers/jtag/jtag-aspeed.o: In function `aspeed_jtag_init': >> drivers/jtag/jtag-aspeed.c:635: undefined reference to >> `devm_ioremap_resource' collect2: error: ld returned 1 exit status vim +635 drivers/jtag/jtag-aspeed.c 627 628 int aspeed_jtag_init(struct platform_device *pdev, 629 struct aspeed_jtag *aspeed_jtag) 630 { 631 struct resource *res; 632 int err; 633 634 res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > 635 aspeed_jtag->reg_base = devm_ioremap_resource(aspeed_jtag->dev, > res); 636 if (IS_ERR(aspeed_jtag->reg_base)) { 637 err = -ENOMEM; 638 goto out_region; 639 } 640 641 aspeed_jtag->pclk = devm_clk_get(aspeed_jtag->dev, NULL); 642 if (IS_ERR(aspeed_jtag->pclk)) { 643 dev_err(aspeed_jtag->dev, "devm_clk_get failed\n"); 644 return PTR_ERR(aspeed_jtag->pclk); 645 } 646 647 clk_prepare_enable(aspeed_jtag->pclk); 648 649 aspeed_jtag->irq = platform_get_irq(pdev, 0); 650 if (aspeed_jtag->irq < 0) { 651 dev_err(aspeed_jtag->dev, "no irq specified\n"); 652 err = -ENOENT; 653 goto out_region; 654 } 655 656 /* Enable clock */ 657 aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_CTL_ENG_EN | 658ASPEED_JTAG_CTL_ENG_OUT_EN, ASPEED_JTAG_CTRL); 659 aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_SW_MODE_EN | 660ASPEED_JTAG_SW_MODE_TDIO, ASPEED_JTAG_SW); 661 662 err = devm_request_irq(aspeed_jtag->dev, aspeed_jtag->irq, 663 aspeed_jtag_interrupt, 0, 664 "aspeed-jtag", aspeed_jtag); 665 if (err) { 666 dev_err(aspeed_jtag->dev, "aspeed_jtag unable to get IRQ"); 667 goto out_region; 668 } 669 dev_dbg(>dev, "aspeed_jtag:IRQ %d.\n", aspeed_jtag->irq); 670 671 aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_ISR_INST_PAUSE | 672ASPEED_JTAG_ISR_INST_COMPLETE | 673ASPEED_JTAG_ISR_DATA_PAUSE | 674ASPEED_JTAG_ISR_DATA_COMPLETE | 675ASPEED_JTAG_ISR_INST_PAUSE_EN | 676
Re: [PATCH v9 2/2] x86/boot/KASLR: Restrict kernel to be randomized in mirror regions
On 08/18/17 at 04:10pm, Ard Biesheuvel wrote: > On 17 August 2017 at 14:04, Baoquan He wrote: > > Thanks a lot for helping improving the patch log, Ingo! Will pay more > > attention to the description in words and paragraph partition of log. > > > >> > >> So if EFI is detected, iterate EFI memory map and pick the mirror region > >> to process for adding candidate of randomization slot. If EFI is disabled > >> or no mirror region found, still process e820 memory map. > >> > >> Signed-off-by: Baoquan He > > > Could the x86 people on cc either take these directly, or indicate > whether they are ok with this going in via the EFI tree? Thanks for looking into this, Ard. I saw Ingo has put these into tip/x86/boot. > > >> --- > >> arch/x86/boot/compressed/kaslr.c | 68 > >> ++-- > >> 1 file changed, 66 insertions(+), 2 deletions(-) > >> > >> diff --git a/arch/x86/boot/compressed/kaslr.c > >> b/arch/x86/boot/compressed/kaslr.c > >> index 99c7194f7ea6..7de23bb279ce 100644 > >> --- a/arch/x86/boot/compressed/kaslr.c > >> +++ b/arch/x86/boot/compressed/kaslr.c > >> @@ -37,7 +37,9 @@ > >> #include > >> #include > >> #include > >> +#include > >> #include > >> +#include > >> > >> /* Macros used by the included decompressor code below. */ > >> #define STATIC > >> @@ -558,6 +560,65 @@ static void process_mem_region(struct mem_vector > >> *entry, > >> } > >> } > >> > >> +#ifdef CONFIG_EFI > >> +/* > >> + * Returns true if mirror region found (and must have been processed > >> + * for slots adding) > >> + */ > >> +static bool > >> +process_efi_entries(unsigned long minimum, unsigned long image_size) > >> +{ > >> + struct efi_info *e = _params->efi_info; > >> + bool efi_mirror_found = false; > >> + struct mem_vector region; > >> + efi_memory_desc_t *md; > >> + unsigned long pmap; > >> + char *signature; > >> + u32 nr_desc; > >> + int i; > >> + > >> + signature = (char *)>efi_loader_signature; > >> + if (strncmp(signature, EFI32_LOADER_SIGNATURE, 4) && > >> + strncmp(signature, EFI64_LOADER_SIGNATURE, 4)) > >> + return false; > >> + > >> +#ifdef CONFIG_X86_32 > >> + /* Can't handle data above 4GB at this time */ > >> + if (e->efi_memmap_hi) { > >> + warn("EFI memmap is above 4GB, can't be handled now on > >> x86_32. EFI should be disabled.\n"); > >> + return false; > >> + } > >> + pmap = e->efi_memmap; > >> +#else > >> + pmap = (e->efi_memmap | ((__u64)e->efi_memmap_hi << 32)); > >> +#endif > >> + > >> + nr_desc = e->efi_memmap_size / e->efi_memdesc_size; > >> + for (i = 0; i < nr_desc; i++) { > >> + md = efi_early_memdesc_ptr(pmap, e->efi_memdesc_size, i); > >> + if (md->attribute & EFI_MEMORY_MORE_RELIABLE) { > >> + region.start = md->phys_addr; > >> + region.size = md->num_pages << EFI_PAGE_SHIFT; > >> + process_mem_region(, minimum, image_size); > >> + efi_mirror_found = true; > >> + > >> + if (slot_area_index == MAX_SLOT_AREA) { > >> + debug_putstr("Aborted EFI scan (slot_areas > >> full)!\n"); > >> + break; > >> + } > >> + } > >> + } > >> + > >> + return efi_mirror_found; > >> +} > >> +#else > >> +static inline bool > >> +process_efi_entries(unsigned long minimum, unsigned long image_size) > >> +{ > >> + return false; > >> +} > >> +#endif > >> + > >> static void process_e820_entries(unsigned long minimum, > >>unsigned long image_size) > >> { > >> @@ -586,13 +647,16 @@ static unsigned long find_random_phys_addr(unsigned > >> long minimum, > >> { > >> /* Check if we had too many memmaps. */ > >> if (memmap_too_large) { > >> - debug_putstr("Aborted e820 scan (more than 4 memmap= > >> args)!\n"); > >> + debug_putstr("Aborted memory entries scan (more than 4 > >> memmap= args)!\n"); > >> return 0; > >> } > >> > >> /* Make sure minimum is aligned. */ > >> minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN); > >> > >> + if (process_efi_entries(minimum, image_size)) > >> + return slots_fetch_random(); > >> + > >> process_e820_entries(minimum, image_size); > >> return slots_fetch_random(); > >> } > >> @@ -652,7 +716,7 @@ void choose_random_location(unsigned long input, > >>*/ > >> min_addr = min(*output, 512UL << 20); > >> > >> - /* Walk e820 and find a random address. */ > >> + /* Walk available memory entries to find a random address. */ > >> random_addr = find_random_phys_addr(min_addr, output_size); > >> if (!random_addr) { > >> warn("Physical KASLR disabled: no suitable memory region!"); > >> -- > >> 2.5.5 > >>
Re: [patch v3 2/3] drivers: jtag: Add Aspeed SoC 24xx and 25xx families JTAG master driver
Hi Oleksandr, [auto build test ERROR on linus/master] [also build test ERROR on v4.13-rc5 next-20170817] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Oleksandr-Shamray/drivers-jtag-Add-JTAG-core-driver/20170818-114739 config: um-allyesconfig (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=um All errors (new ones prefixed by >>): arch/um/drivers/vde.o: In function `vde_open_real': (.text+0xfb1): warning: Using 'getgrnam' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/vde.o: In function `vde_open_real': (.text+0xdfc): warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/vde.o: In function `vde_open_real': (.text+0x1115): warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/pcap.o: In function `pcap_nametoaddr': (.text+0xe475): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/pcap.o: In function `pcap_nametonetaddr': (.text+0xe515): warning: Using 'getnetbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/pcap.o: In function `pcap_nametoproto': (.text+0xe735): warning: Using 'getprotobyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking arch/um/drivers/pcap.o: In function `pcap_nametoport': (.text+0xe567): warning: Using 'getservbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking drivers/auxdisplay/img-ascii-lcd.o: In function `img_ascii_lcd_probe': drivers/auxdisplay/img-ascii-lcd.c:386: undefined reference to `devm_ioremap_resource' drivers/jtag/jtag-aspeed.o: In function `aspeed_jtag_init': >> drivers/jtag/jtag-aspeed.c:635: undefined reference to >> `devm_ioremap_resource' collect2: error: ld returned 1 exit status vim +635 drivers/jtag/jtag-aspeed.c 627 628 int aspeed_jtag_init(struct platform_device *pdev, 629 struct aspeed_jtag *aspeed_jtag) 630 { 631 struct resource *res; 632 int err; 633 634 res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > 635 aspeed_jtag->reg_base = devm_ioremap_resource(aspeed_jtag->dev, > res); 636 if (IS_ERR(aspeed_jtag->reg_base)) { 637 err = -ENOMEM; 638 goto out_region; 639 } 640 641 aspeed_jtag->pclk = devm_clk_get(aspeed_jtag->dev, NULL); 642 if (IS_ERR(aspeed_jtag->pclk)) { 643 dev_err(aspeed_jtag->dev, "devm_clk_get failed\n"); 644 return PTR_ERR(aspeed_jtag->pclk); 645 } 646 647 clk_prepare_enable(aspeed_jtag->pclk); 648 649 aspeed_jtag->irq = platform_get_irq(pdev, 0); 650 if (aspeed_jtag->irq < 0) { 651 dev_err(aspeed_jtag->dev, "no irq specified\n"); 652 err = -ENOENT; 653 goto out_region; 654 } 655 656 /* Enable clock */ 657 aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_CTL_ENG_EN | 658ASPEED_JTAG_CTL_ENG_OUT_EN, ASPEED_JTAG_CTRL); 659 aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_SW_MODE_EN | 660ASPEED_JTAG_SW_MODE_TDIO, ASPEED_JTAG_SW); 661 662 err = devm_request_irq(aspeed_jtag->dev, aspeed_jtag->irq, 663 aspeed_jtag_interrupt, 0, 664 "aspeed-jtag", aspeed_jtag); 665 if (err) { 666 dev_err(aspeed_jtag->dev, "aspeed_jtag unable to get IRQ"); 667 goto out_region; 668 } 669 dev_dbg(>dev, "aspeed_jtag:IRQ %d.\n", aspeed_jtag->irq); 670 671 aspeed_jtag_write(aspeed_jtag, ASPEED_JTAG_ISR_INST_PAUSE | 672ASPEED_JTAG_ISR_INST_COMPLETE | 673ASPEED_JTAG_ISR_DATA_PAUSE | 674ASPEED_JTAG_ISR_DATA_COMPLETE | 675ASPEED_JTAG_ISR_INST_PAUSE_EN | 676
Re: [PATCH 0/3] MIPS,bpf: Improvements for MIPS eBPF JIT
On 08/19/2017 01:40 AM, David Daney wrote: Here are several improvements and bug fixes for the MIPS eBPF JIT. The main change is the addition of support for JLT, JLE, JSLT and JSLE ops, that were recently added. Also fix WARN output when used with preemptable kernel, and a small cleanup/optimization in the use of BPF_OP(insn->code). I suggest that the whole thing go via the BPF/net-next path as there are dependencies on code that is not yet merged to Linus' tree. Yes, this would be via net-next. Still pending are changes to reduce stack usage when the verifier can determine the maximum stack size. Awesome, thanks a lot!
Re: [PATCH 0/3] MIPS,bpf: Improvements for MIPS eBPF JIT
On 08/19/2017 01:40 AM, David Daney wrote: Here are several improvements and bug fixes for the MIPS eBPF JIT. The main change is the addition of support for JLT, JLE, JSLT and JSLE ops, that were recently added. Also fix WARN output when used with preemptable kernel, and a small cleanup/optimization in the use of BPF_OP(insn->code). I suggest that the whole thing go via the BPF/net-next path as there are dependencies on code that is not yet merged to Linus' tree. Yes, this would be via net-next. Still pending are changes to reduce stack usage when the verifier can determine the maximum stack size. Awesome, thanks a lot!
Re: [PATCH 1/2] iommu/amd: Fix compiler warning in copy_device_table()
On 08/19/17 at 12:40am, Joerg Roedel wrote: > From: Joerg Roedel> > This was reported by the kbuild bot. The condition in which > entry would be used uninitialized can not happen, because > when there is no iommu this function would never be called. > But its no fast-path, so fix the warning anyway. > > Reported-by: kbuild test robot > Signed-off-by: Joerg Roedel Thanks for fixing these code bugs, Joerg. Ack the series! Acked-by: Baoquan He > --- > drivers/iommu/amd_iommu_init.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c > index c7d0325..f2023cd 100644 > --- a/drivers/iommu/amd_iommu_init.c > +++ b/drivers/iommu/amd_iommu_init.c > @@ -854,7 +854,7 @@ static int get_dev_entry_bit(u16 devid, u8 bit) > > static bool copy_device_table(void) > { > - u64 int_ctl, int_tab_len, entry, last_entry = 0; > + u64 int_ctl, int_tab_len, entry = 0, last_entry = 0; > struct dev_table_entry *old_devtb = NULL; > u32 lo, hi, devid, old_devtb_size; > phys_addr_t old_devtb_phys; > -- > 2.7.4 >
Re: [PATCH 1/2] iommu/amd: Fix compiler warning in copy_device_table()
On 08/19/17 at 12:40am, Joerg Roedel wrote: > From: Joerg Roedel > > This was reported by the kbuild bot. The condition in which > entry would be used uninitialized can not happen, because > when there is no iommu this function would never be called. > But its no fast-path, so fix the warning anyway. > > Reported-by: kbuild test robot > Signed-off-by: Joerg Roedel Thanks for fixing these code bugs, Joerg. Ack the series! Acked-by: Baoquan He > --- > drivers/iommu/amd_iommu_init.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c > index c7d0325..f2023cd 100644 > --- a/drivers/iommu/amd_iommu_init.c > +++ b/drivers/iommu/amd_iommu_init.c > @@ -854,7 +854,7 @@ static int get_dev_entry_bit(u16 devid, u8 bit) > > static bool copy_device_table(void) > { > - u64 int_ctl, int_tab_len, entry, last_entry = 0; > + u64 int_ctl, int_tab_len, entry = 0, last_entry = 0; > struct dev_table_entry *old_devtb = NULL; > u32 lo, hi, devid, old_devtb_size; > phys_addr_t old_devtb_phys; > -- > 2.7.4 >
Re: [GIT PULL] apparmor updates for next
On Fri, 18 Aug 2017, John Johansen wrote: > Hi James, > > Please pull these apparmor changes for next. > Pulled, thanks. -- James Morris
Re: [GIT PULL] apparmor updates for next
On Fri, 18 Aug 2017, John Johansen wrote: > Hi James, > > Please pull these apparmor changes for next. > Pulled, thanks. -- James Morris
Re: [PATCH] selftests: timers: Fix run_destructive_tests target to handle skipped tests
On Thu, Aug 17, 2017 at 3:48 PM, Shuah Khanwrote: > When a test exits with skip exit code of 4, "make run_destructive_tests" > halts testing. Fix run_destructive_tests target to handle error exit codes. > > Reported-by: John Stultz > Signed-off-by: Shuah Khan > --- > tools/testing/selftests/timers/Makefile | 26 +- > 1 file changed, 13 insertions(+), 13 deletions(-) > > diff --git a/tools/testing/selftests/timers/Makefile > b/tools/testing/selftests/timers/Makefile > index c805ab048d26..6c1327278d5f 100644 > --- a/tools/testing/selftests/timers/Makefile > +++ b/tools/testing/selftests/timers/Makefile > @@ -13,20 +13,20 @@ TEST_GEN_PROGS_EXTENDED = alarmtimer-suspend > valid-adjtimex adjtick change_skew > > include ../lib.mk > > +define RUN_DESTRUCTIVE_TESTS > + @for TEST in $(TEST_GEN_PROGS_EXTENDED); do \ > + BASENAME_TEST=`basename $$TEST`;\ > + if [ ! -x $$BASENAME_TEST ]; then \ > + echo "selftests: Warning: file $$BASENAME_TEST is not > executable, correct this.";\ > + echo "selftests: $$BASENAME_TEST [FAIL]"; \ > + else\ > + cd `dirname $$TEST`; (./$$BASENAME_TEST && echo > "selftests: $$BASENAME_TEST [PASS]") || echo "selftests: $$BASENAME_TEST > [FAIL]"; cd -;\ > + fi; \ > + done; > +endef One more on this... you might remove rtctest_setdate from the TEST_GEN_PROGS_EXTENDED list, since it too requires arguments to test, and it wasn't a part of run_destructive_tests previously. I see Benjamin added this test not long ago, but I'm not sure exactly how he expects it to be run (just manually?). thanks -john
Re: [PATCH] selftests: timers: Fix run_destructive_tests target to handle skipped tests
On Thu, Aug 17, 2017 at 3:48 PM, Shuah Khan wrote: > When a test exits with skip exit code of 4, "make run_destructive_tests" > halts testing. Fix run_destructive_tests target to handle error exit codes. > > Reported-by: John Stultz > Signed-off-by: Shuah Khan > --- > tools/testing/selftests/timers/Makefile | 26 +- > 1 file changed, 13 insertions(+), 13 deletions(-) > > diff --git a/tools/testing/selftests/timers/Makefile > b/tools/testing/selftests/timers/Makefile > index c805ab048d26..6c1327278d5f 100644 > --- a/tools/testing/selftests/timers/Makefile > +++ b/tools/testing/selftests/timers/Makefile > @@ -13,20 +13,20 @@ TEST_GEN_PROGS_EXTENDED = alarmtimer-suspend > valid-adjtimex adjtick change_skew > > include ../lib.mk > > +define RUN_DESTRUCTIVE_TESTS > + @for TEST in $(TEST_GEN_PROGS_EXTENDED); do \ > + BASENAME_TEST=`basename $$TEST`;\ > + if [ ! -x $$BASENAME_TEST ]; then \ > + echo "selftests: Warning: file $$BASENAME_TEST is not > executable, correct this.";\ > + echo "selftests: $$BASENAME_TEST [FAIL]"; \ > + else\ > + cd `dirname $$TEST`; (./$$BASENAME_TEST && echo > "selftests: $$BASENAME_TEST [PASS]") || echo "selftests: $$BASENAME_TEST > [FAIL]"; cd -;\ > + fi; \ > + done; > +endef One more on this... you might remove rtctest_setdate from the TEST_GEN_PROGS_EXTENDED list, since it too requires arguments to test, and it wasn't a part of run_destructive_tests previously. I see Benjamin added this test not long ago, but I'm not sure exactly how he expects it to be run (just manually?). thanks -john
[PATCH v8 01/28] x86/mm: Relocate page fault error codes to traps.h
Up to this point, only fault.c used the definitions of the page fault error codes. Thus, it made sense to keep them within such file. Other portions of code might be interested in those definitions too. For instance, the User- Mode Instruction Prevention emulation code will use such definitions to emulate a page fault when it is unable to successfully copy the results of the emulated instructions to user space. While relocating the error code enumeration, the prefix X86_ is used to make it consistent with the rest of the definitions in traps.h. Of course, code using the enumeration had to be updated as well. No functional changes were performed. Cc: Thomas GleixnerCc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Andy Lutomirski Cc: "Kirill A. Shutemov" Cc: Josh Poimboeuf Cc: Dave Hansen Cc: Paul Gortmaker Cc: x...@kernel.org Reviewed-by: Andy Lutomirski Reviewed-by: Borislav Petkov Signed-off-by: Ricardo Neri --- arch/x86/include/asm/traps.h | 18 + arch/x86/mm/fault.c | 88 +--- 2 files changed, 52 insertions(+), 54 deletions(-) diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 01fd0a7f48cd..4a2e5852eacc 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -148,4 +148,22 @@ enum { X86_TRAP_IRET = 32, /* 32, IRET Exception */ }; +/* + * Page fault error code bits: + * + * bit 0 == 0: no page found 1: protection fault + * bit 1 == 0: read access 1: write access + * bit 2 == 0: kernel-mode access 1: user-mode access + * bit 3 == 1: use of reserved bit detected + * bit 4 == 1: fault was an instruction fetch + * bit 5 == 1: protection keys block access + */ +enum x86_pf_error_code { + X86_PF_PROT = 1 << 0, + X86_PF_WRITE= 1 << 1, + X86_PF_USER = 1 << 2, + X86_PF_RSVD = 1 << 3, + X86_PF_INSTR= 1 << 4, + X86_PF_PK = 1 << 5, +}; #endif /* _ASM_X86_TRAPS_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 2a1fa10c6a98..dc87badd16e9 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -29,26 +29,6 @@ #include /* - * Page fault error code bits: - * - * bit 0 == 0: no page found 1: protection fault - * bit 1 == 0: read access 1: write access - * bit 2 == 0: kernel-mode access 1: user-mode access - * bit 3 == 1: use of reserved bit detected - * bit 4 == 1: fault was an instruction fetch - * bit 5 == 1: protection keys block access - */ -enum x86_pf_error_code { - - PF_PROT = 1 << 0, - PF_WRITE= 1 << 1, - PF_USER = 1 << 2, - PF_RSVD = 1 << 3, - PF_INSTR= 1 << 4, - PF_PK = 1 << 5, -}; - -/* * Returns 0 if mmiotrace is disabled, or if the fault is not * handled by mmiotrace: */ @@ -149,7 +129,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr) * If it was a exec (instruction fetch) fault on NX page, then * do not ignore the fault: */ - if (error_code & PF_INSTR) + if (error_code & X86_PF_INSTR) return 0; instr = (void *)convert_ip_to_linear(current, regs); @@ -179,7 +159,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr) * siginfo so userspace can discover which protection key was set * on the PTE. * - * If we get here, we know that the hardware signaled a PF_PK + * If we get here, we know that the hardware signaled a X86_PF_PK * fault and that there was a VMA once we got in the fault * handler. It does *not* guarantee that the VMA we find here * was the one that we faulted on. @@ -205,7 +185,7 @@ static void fill_sig_info_pkey(int si_code, siginfo_t *info, /* * force_sig_info_fault() is called from a number of * contexts, some of which have a VMA and some of which -* do not. The PF_PK handing happens after we have a +* do not. The X86_PF_PK handing happens after we have a * valid VMA, so we should never reach this without a * valid VMA. */ @@ -695,7 +675,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, if (!oops_may_print()) return; - if (error_code & PF_INSTR) { + if (error_code & X86_PF_INSTR) {
[PATCH v8 01/28] x86/mm: Relocate page fault error codes to traps.h
Up to this point, only fault.c used the definitions of the page fault error codes. Thus, it made sense to keep them within such file. Other portions of code might be interested in those definitions too. For instance, the User- Mode Instruction Prevention emulation code will use such definitions to emulate a page fault when it is unable to successfully copy the results of the emulated instructions to user space. While relocating the error code enumeration, the prefix X86_ is used to make it consistent with the rest of the definitions in traps.h. Of course, code using the enumeration had to be updated as well. No functional changes were performed. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Andy Lutomirski Cc: "Kirill A. Shutemov" Cc: Josh Poimboeuf Cc: Dave Hansen Cc: Paul Gortmaker Cc: x...@kernel.org Reviewed-by: Andy Lutomirski Reviewed-by: Borislav Petkov Signed-off-by: Ricardo Neri --- arch/x86/include/asm/traps.h | 18 + arch/x86/mm/fault.c | 88 +--- 2 files changed, 52 insertions(+), 54 deletions(-) diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 01fd0a7f48cd..4a2e5852eacc 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -148,4 +148,22 @@ enum { X86_TRAP_IRET = 32, /* 32, IRET Exception */ }; +/* + * Page fault error code bits: + * + * bit 0 == 0: no page found 1: protection fault + * bit 1 == 0: read access 1: write access + * bit 2 == 0: kernel-mode access 1: user-mode access + * bit 3 == 1: use of reserved bit detected + * bit 4 == 1: fault was an instruction fetch + * bit 5 == 1: protection keys block access + */ +enum x86_pf_error_code { + X86_PF_PROT = 1 << 0, + X86_PF_WRITE= 1 << 1, + X86_PF_USER = 1 << 2, + X86_PF_RSVD = 1 << 3, + X86_PF_INSTR= 1 << 4, + X86_PF_PK = 1 << 5, +}; #endif /* _ASM_X86_TRAPS_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 2a1fa10c6a98..dc87badd16e9 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -29,26 +29,6 @@ #include /* - * Page fault error code bits: - * - * bit 0 == 0: no page found 1: protection fault - * bit 1 == 0: read access 1: write access - * bit 2 == 0: kernel-mode access 1: user-mode access - * bit 3 == 1: use of reserved bit detected - * bit 4 == 1: fault was an instruction fetch - * bit 5 == 1: protection keys block access - */ -enum x86_pf_error_code { - - PF_PROT = 1 << 0, - PF_WRITE= 1 << 1, - PF_USER = 1 << 2, - PF_RSVD = 1 << 3, - PF_INSTR= 1 << 4, - PF_PK = 1 << 5, -}; - -/* * Returns 0 if mmiotrace is disabled, or if the fault is not * handled by mmiotrace: */ @@ -149,7 +129,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr) * If it was a exec (instruction fetch) fault on NX page, then * do not ignore the fault: */ - if (error_code & PF_INSTR) + if (error_code & X86_PF_INSTR) return 0; instr = (void *)convert_ip_to_linear(current, regs); @@ -179,7 +159,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr) * siginfo so userspace can discover which protection key was set * on the PTE. * - * If we get here, we know that the hardware signaled a PF_PK + * If we get here, we know that the hardware signaled a X86_PF_PK * fault and that there was a VMA once we got in the fault * handler. It does *not* guarantee that the VMA we find here * was the one that we faulted on. @@ -205,7 +185,7 @@ static void fill_sig_info_pkey(int si_code, siginfo_t *info, /* * force_sig_info_fault() is called from a number of * contexts, some of which have a VMA and some of which -* do not. The PF_PK handing happens after we have a +* do not. The X86_PF_PK handing happens after we have a * valid VMA, so we should never reach this without a * valid VMA. */ @@ -695,7 +675,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, if (!oops_may_print()) return; - if (error_code & PF_INSTR) { + if (error_code & X86_PF_INSTR) { unsigned int level; pgd_t *pgd; pte_t *pte; @@ -779,7 +759,7 @@ no_context(struct pt_regs *regs, unsigned long error_code, */ if (current->thread.sig_on_uaccess_err && signal) {
[PATCH v8 06/28] x86/mpx: Do not use SIB.index if its value is 100b and ModRM.mod is not 11b
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when ModRM.mod !=11b and ModRM.rm = 100b indexed register-indirect addressing is used. In other words, a SIB byte follows the ModRM byte. In the specific case of SIB.index = 100b, the scale*index portion of the computation of the effective address is null. To signal callers of this particular situation, get_reg_offset() can return -EDOM (-EINVAL continues to indicate that an error when decoding the SIB byte). An example of this situation can be the following instruction: 8b 4c 23 80 mov -0x80(%rbx,%riz,1),%rcx ModRM:0x4c [mod:1b][reg:1b][rm:100b] SIB: 0x23 [scale:0b][index:100b][base:11b] Displacement: 0x80 (1-byte, as per ModRM.mod = 1b) The %riz 'register' indicates a null index. In long mode, a REX prefix may be used. When a REX prefix is present, REX.X adds a fourth bit to the register selection of SIB.index. This gives the ability to refer to all the 16 general purpose registers. When REX.X is 1b and SIB.index is 100b, the index is indicated in %r12. In our example, this would look like: 42 8b 4c 23 80mov -0x80(%rbx,%r12,1),%rcx REX: 0x42 [W:0b][R:0b][X:1b][B:0b] ModRM:0x4c [mod:1b][reg:1b][rm:100b] SIB: 0x23 [scale:0b][.X: 1b, index:100b][.B:0b, base:11b] Displacement: 0x80 (1-byte, as per ModRM.mod = 1b) %r12 is a valid register to use in the scale*index part of the effective address computation. Cc: Borislav PetkovCc: Andy Lutomirski Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Peter Zijlstra Cc: Nathan Howard Cc: Adan Hawthorn Cc: Joe Perches Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/mm/mpx.c | 22 -- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index 9eec98022510..892aa6468805 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -110,6 +110,15 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, regno = X86_SIB_INDEX(insn->sib.value); if (X86_REX_X(insn->rex_prefix.value)) regno += 8; + + /* +* If ModRM.mod != 3 and SIB.index = 4 the scale*index +* portion of the address computation is null. This is +* true only if REX.X is 0. In such a case, the SIB index +* is used in the address computation. +*/ + if (X86_MODRM_MOD(insn->modrm.value) != 3 && regno == 4) + return -EDOM; break; case REG_TYPE_BASE: @@ -160,11 +169,20 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) goto out_err; indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX); - if (indx_offset < 0) + + /* +* A negative offset generally means a error, except +* -EDOM, which means that the contents of the register +* should not be used as index. +*/ + if (indx_offset == -EDOM) + indx = 0; + else if (indx_offset < 0) goto out_err; + else + indx = regs_get_register(regs, indx_offset); base = regs_get_register(regs, base_offset); - indx = regs_get_register(regs, indx_offset); eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { -- 2.13.0
[PATCH v8 06/28] x86/mpx: Do not use SIB.index if its value is 100b and ModRM.mod is not 11b
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when ModRM.mod !=11b and ModRM.rm = 100b indexed register-indirect addressing is used. In other words, a SIB byte follows the ModRM byte. In the specific case of SIB.index = 100b, the scale*index portion of the computation of the effective address is null. To signal callers of this particular situation, get_reg_offset() can return -EDOM (-EINVAL continues to indicate that an error when decoding the SIB byte). An example of this situation can be the following instruction: 8b 4c 23 80 mov -0x80(%rbx,%riz,1),%rcx ModRM:0x4c [mod:1b][reg:1b][rm:100b] SIB: 0x23 [scale:0b][index:100b][base:11b] Displacement: 0x80 (1-byte, as per ModRM.mod = 1b) The %riz 'register' indicates a null index. In long mode, a REX prefix may be used. When a REX prefix is present, REX.X adds a fourth bit to the register selection of SIB.index. This gives the ability to refer to all the 16 general purpose registers. When REX.X is 1b and SIB.index is 100b, the index is indicated in %r12. In our example, this would look like: 42 8b 4c 23 80mov -0x80(%rbx,%r12,1),%rcx REX: 0x42 [W:0b][R:0b][X:1b][B:0b] ModRM:0x4c [mod:1b][reg:1b][rm:100b] SIB: 0x23 [scale:0b][.X: 1b, index:100b][.B:0b, base:11b] Displacement: 0x80 (1-byte, as per ModRM.mod = 1b) %r12 is a valid register to use in the scale*index part of the effective address computation. Cc: Borislav Petkov Cc: Andy Lutomirski Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Peter Zijlstra Cc: Nathan Howard Cc: Adan Hawthorn Cc: Joe Perches Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/mm/mpx.c | 22 -- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index 9eec98022510..892aa6468805 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -110,6 +110,15 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, regno = X86_SIB_INDEX(insn->sib.value); if (X86_REX_X(insn->rex_prefix.value)) regno += 8; + + /* +* If ModRM.mod != 3 and SIB.index = 4 the scale*index +* portion of the address computation is null. This is +* true only if REX.X is 0. In such a case, the SIB index +* is used in the address computation. +*/ + if (X86_MODRM_MOD(insn->modrm.value) != 3 && regno == 4) + return -EDOM; break; case REG_TYPE_BASE: @@ -160,11 +169,20 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) goto out_err; indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX); - if (indx_offset < 0) + + /* +* A negative offset generally means a error, except +* -EDOM, which means that the contents of the register +* should not be used as index. +*/ + if (indx_offset == -EDOM) + indx = 0; + else if (indx_offset < 0) goto out_err; + else + indx = regs_get_register(regs, indx_offset); base = regs_get_register(regs, base_offset); - indx = regs_get_register(regs, indx_offset); eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { -- 2.13.0
[PATCH v8 07/28] x86/mpx: Do not use SIB.base if its value is 101b and ModRM.mod = 0
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that if a SIB byte is used and SIB.base is 101b and ModRM.mod is zero, then the base part of the base part of the effective address computation is null. To signal this situation, a -EDOM error is returned to indicate callers to ignore the base value present in the register operand. In this scenario, a 32-bit displacement follows the SIB byte. Displacement is obtained when the instruction decoder parses the operands. Cc: Borislav PetkovCc: Andy Lutomirski Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Peter Zijlstra Cc: Nathan Howard Cc: Adan Hawthorn Cc: Joe Perches Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/mm/mpx.c | 28 +++- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index 892aa6468805..53e24ca01f29 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -123,6 +123,14 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, case REG_TYPE_BASE: regno = X86_SIB_BASE(insn->sib.value); + /* +* If ModRM.mod is 0 and SIB.base == 5, the base of the +* register-indirect addressing is 0. In this case, a +* 32-bit displacement follows the SIB byte. +*/ + if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5) + return -EDOM; + if (X86_REX_B(insn->rex_prefix.value)) regno += 8; break; @@ -164,17 +172,21 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) eff_addr = regs_get_register(regs, addr_offset); } else { if (insn->sib.nbytes) { + /* +* Negative values in the base and index offset means +* an error when decoding the SIB byte. Except -EDOM, +* which means that the registers should not be used +* in the address computation. +*/ base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE); - if (base_offset < 0) + if (base_offset == -EDOM) + base = 0; + else if (base_offset < 0) goto out_err; + else + base = regs_get_register(regs, base_offset); indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX); - - /* -* A negative offset generally means a error, except -* -EDOM, which means that the contents of the register -* should not be used as index. -*/ if (indx_offset == -EDOM) indx = 0; else if (indx_offset < 0) @@ -182,8 +194,6 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) else indx = regs_get_register(regs, indx_offset); - base = regs_get_register(regs, base_offset); - eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); -- 2.13.0
[PATCH v8 07/28] x86/mpx: Do not use SIB.base if its value is 101b and ModRM.mod = 0
Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that if a SIB byte is used and SIB.base is 101b and ModRM.mod is zero, then the base part of the base part of the effective address computation is null. To signal this situation, a -EDOM error is returned to indicate callers to ignore the base value present in the register operand. In this scenario, a 32-bit displacement follows the SIB byte. Displacement is obtained when the instruction decoder parses the operands. Cc: Borislav Petkov Cc: Andy Lutomirski Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Peter Zijlstra Cc: Nathan Howard Cc: Adan Hawthorn Cc: Joe Perches Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/mm/mpx.c | 28 +++- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index 892aa6468805..53e24ca01f29 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -123,6 +123,14 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, case REG_TYPE_BASE: regno = X86_SIB_BASE(insn->sib.value); + /* +* If ModRM.mod is 0 and SIB.base == 5, the base of the +* register-indirect addressing is 0. In this case, a +* 32-bit displacement follows the SIB byte. +*/ + if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5) + return -EDOM; + if (X86_REX_B(insn->rex_prefix.value)) regno += 8; break; @@ -164,17 +172,21 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) eff_addr = regs_get_register(regs, addr_offset); } else { if (insn->sib.nbytes) { + /* +* Negative values in the base and index offset means +* an error when decoding the SIB byte. Except -EDOM, +* which means that the registers should not be used +* in the address computation. +*/ base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE); - if (base_offset < 0) + if (base_offset == -EDOM) + base = 0; + else if (base_offset < 0) goto out_err; + else + base = regs_get_register(regs, base_offset); indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX); - - /* -* A negative offset generally means a error, except -* -EDOM, which means that the contents of the register -* should not be used as index. -*/ if (indx_offset == -EDOM) indx = 0; else if (indx_offset < 0) @@ -182,8 +194,6 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) else indx = regs_get_register(regs, indx_offset); - base = regs_get_register(regs, base_offset); - eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); -- 2.13.0
[PATCH v8 02/28] x86/boot: Relocate definition of the initial state of CR0
Both head_32.S and head_64.S utilize the same value to initialize the control register CR0. Also, other parts of the kernel might want to access to this initial definition (e.g., emulation code for User-Mode Instruction Prevention uses this state to provide a sane dummy value for CR0 when emulating the smsw instruction). Thus, relocate this definition to a header file from which it can be conveniently accessed. Cc: Andrew MortonCc: Andy Lutomirski Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-a...@vger.kernel.org Cc: linux...@kvack.org Suggested-by: Borislav Petkov Signed-off-by: Ricardo Neri --- arch/x86/include/uapi/asm/processor-flags.h | 6 ++ arch/x86/kernel/head_32.S | 3 --- arch/x86/kernel/head_64.S | 3 --- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h index 185f3d10c194..aae1f2aa7563 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -151,5 +151,11 @@ #define CX86_ARR_BASE 0xc4 #define CX86_RCR_BASE 0xdc +/* + * Initial state of CR0 for head_32/64.S + */ +#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \ +X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \ +X86_CR0_PG) #endif /* _UAPI_ASM_X86_PROCESSOR_FLAGS_H */ diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index 0332664eb158..f64059835863 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -213,9 +213,6 @@ ENTRY(startup_32_smp) #endif .Ldefault_entry: -#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \ -X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \ -X86_CR0_PG) movl $(CR0_STATE & ~X86_CR0_PG),%eax movl %eax,%cr0 diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 513cbb012ecc..5e1bfdd86b5b 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -149,9 +149,6 @@ ENTRY(secondary_startup_64) 1: wrmsr /* Make changes effective */ /* Setup cr0 */ -#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \ -X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \ -X86_CR0_PG) movl$CR0_STATE, %eax /* Make changes effective */ movq%rax, %cr0 -- 2.13.0
[PATCH v8 02/28] x86/boot: Relocate definition of the initial state of CR0
Both head_32.S and head_64.S utilize the same value to initialize the control register CR0. Also, other parts of the kernel might want to access to this initial definition (e.g., emulation code for User-Mode Instruction Prevention uses this state to provide a sane dummy value for CR0 when emulating the smsw instruction). Thus, relocate this definition to a header file from which it can be conveniently accessed. Cc: Andrew Morton Cc: Andy Lutomirski Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-a...@vger.kernel.org Cc: linux...@kvack.org Suggested-by: Borislav Petkov Signed-off-by: Ricardo Neri --- arch/x86/include/uapi/asm/processor-flags.h | 6 ++ arch/x86/kernel/head_32.S | 3 --- arch/x86/kernel/head_64.S | 3 --- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h index 185f3d10c194..aae1f2aa7563 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -151,5 +151,11 @@ #define CX86_ARR_BASE 0xc4 #define CX86_RCR_BASE 0xdc +/* + * Initial state of CR0 for head_32/64.S + */ +#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \ +X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \ +X86_CR0_PG) #endif /* _UAPI_ASM_X86_PROCESSOR_FLAGS_H */ diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index 0332664eb158..f64059835863 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -213,9 +213,6 @@ ENTRY(startup_32_smp) #endif .Ldefault_entry: -#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \ -X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \ -X86_CR0_PG) movl $(CR0_STATE & ~X86_CR0_PG),%eax movl %eax,%cr0 diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 513cbb012ecc..5e1bfdd86b5b 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -149,9 +149,6 @@ ENTRY(secondary_startup_64) 1: wrmsr /* Make changes effective */ /* Setup cr0 */ -#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \ -X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \ -X86_CR0_PG) movl$CR0_STATE, %eax /* Make changes effective */ movq%rax, %cr0 -- 2.13.0
[PATCH v8 03/28] ptrace,x86: Make user_64bit_mode() available to 32-bit builds
In its current form, user_64bit_mode() can only be used when CONFIG_X86_64 is selected. This implies that code built with CONFIG_X86_64=n cannot use it. If a piece of code needs to be built for both CONFIG_X86_64=y and CONFIG_X86_64=n and wants to use this function, it needs to wrap it in an #ifdef/#endif; potentially, in multiple places. This can be easily avoided with a single #ifdef/#endif pair within user_64bit_mode() itself. Suggested-by: Borislav PetkovCc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Reviewed-by: Borislav Petkov Signed-off-by: Ricardo Neri --- arch/x86/include/asm/ptrace.h | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 91c04c8e67fa..e2afbf689309 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -135,9 +135,9 @@ static inline int v8086_mode(struct pt_regs *regs) #endif } -#ifdef CONFIG_X86_64 static inline bool user_64bit_mode(struct pt_regs *regs) { +#ifdef CONFIG_X86_64 #ifndef CONFIG_PARAVIRT /* * On non-paravirt systems, this is the only long mode CPL 3 @@ -148,8 +148,12 @@ static inline bool user_64bit_mode(struct pt_regs *regs) /* Headers are too twisted for this to go in paravirt.h. */ return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs; #endif +#else /* !CONFIG_X86_64 */ + return false; +#endif } +#ifdef CONFIG_X86_64 #define current_user_stack_pointer() current_pt_regs()->sp #define compat_user_stack_pointer()current_pt_regs()->sp #endif -- 2.13.0
[PATCH v8 00/28] x86: Enable User-Mode Instruction Prevention
This is v8 of this series. The seven previous submissions can be found here [1], here [2], here[3], here[4], here[5], here[6] and here[7]. This version addresses the feedback comments from Borislav Petkov received on v7. Please see details in the change log. === What is UMIP? User-Mode Instruction Prevention (UMIP) is a security feature present in new Intel Processors. If enabled, it prevents the execution of certain instructions if the Current Privilege Level (CPL) is greater than 0. If these instructions were executed while in CPL > 0, user space applications could have access to system-wide settings such as the global and local descriptor tables, the segment selectors to the current task state and the local descriptor table. Hiding these system resources reduces the tools available to craft privilege escalation attacks such as [8]. These are the instructions covered by UMIP: * SGDT - Store Global Descriptor Table * SIDT - Store Interrupt Descriptor Table * SLDT - Store Local Descriptor Table * SMSW - Store Machine Status Word * STR - Store Task Register If any of these instructions is executed with CPL > 0, a general protection exception is issued when UMIP is enabled. === How does it impact applications? When enabled, However, UMIP will change the behavior that certain applications expect from the operating system. For instance, programs running on WineHQ and DOSEMU2 rely on some of these instructions to function. Stas Sergeev found that Microsoft Windows 3.1 and dos4gw use the instruction SMSW when running in virtual-8086 mode[9]. SGDT and SIDT can also be used on virtual-8086 mode. In order to not change the behavior of the system. This patchset emulates SGDT, SIDT and SMSW. This should be sufficient to not break the applications mentioned above. Regarding the two remaining instructions, STR and SLDT, the WineHQ team has shown interest catching the general protection fault and use it as a vehicle to fix broken applications[10]. Furthermore, STR and SLDT can only run in protected and long modes. DOSEMU2 emulates virtual-8086 mode via KVM. No applications will be broken unless DOSEMU2 decides to enable the CR4.UMIP bit in platforms that support it. Also, this should not pose a security risk as no system resouces would be revealed. Instead, code running inside the KVM would only see the KVM's GDT, IDT and MSW. Please note that UMIP is always enabled for both 64-bit and 32-bit Linux builds. However, emulation of the UMIP-protected instructions is not done for 64-bit processes. 64-bit user space applications will receive the SIGSEGV signal when UMIP instructions causes a general protection fault. === How are UMIP-protected instructions emulated? UMIP is kept enabled at all times when the CONFIG_x86_INTEL_UMIP option is selected. If a general protection fault caused by the instructions protected by UMIP is detected, such fault will be trapped and fixed-up. The return values will be dummy as follows: * SGDT and SIDT return hard-coded dummy values as the base of the global descriptor and interrupt descriptor tables. These hard-coded values correspond to memory addresses that are near the end of the kernel memory map. This is also the case for virtual-8086 mode tasks. In all my experiments with 32-bit processes, the base of GDT and IDT was always a 4-byte address, even for 16-bit operands. Thus, my emulation code does the same. In all cases, the limit of the table is set to 0. * SMSW returns the value with which the CR0 register is programmed in head_32/64.S at boot time. This is, the following bits are enabled: CR0.0 for Protection Enable, CR.1 for Monitor Coprocessor, CR.4 for Extension Type, which will always be 1 in recent processors with UMIP; CR.5 for Numeric Error, CR0.16 for Write Protect, CR0.18 for Alignment Mask and CR0.31 for Paging. As per the Intel 64 and IA-32 Architectures Software Developer's Manual, SMSW returns a 16-bit results for memory operands. However, when the operand is a register, the results can be up to CR0[63:0]. Since the emulation code only kicks-in for 32-bit processes, we return up to CR[31:0]. * The proposed emulation code is handles faults that happens in both protected and virtual-8086 mode. * Again, STR and SLDT are not emulated. === How is this series laid out? ++ Preparatory work As per suggestions from Andy Lutormirsky and Borislav Petkov, I moved the x86 page fault error codes to a header. Also, I made user_64bit_mode available to x86_32 builds. This helps to reuse code and reduce the number of #ifdef's in these patches. Borislav also suggested to uprobes should use the existing definitions in arch/x86/include/asm/inat.h instead of hard- coded values when checking instruction prefixes. I included this change in the series. ++ Fix bugs in MPX address decoder I found very useful the code for Intel MPX (Memory Protection Extensions) used to parse opcodes and the memory locations contained in the general
[PATCH v8 03/28] ptrace,x86: Make user_64bit_mode() available to 32-bit builds
In its current form, user_64bit_mode() can only be used when CONFIG_X86_64 is selected. This implies that code built with CONFIG_X86_64=n cannot use it. If a piece of code needs to be built for both CONFIG_X86_64=y and CONFIG_X86_64=n and wants to use this function, it needs to wrap it in an #ifdef/#endif; potentially, in multiple places. This can be easily avoided with a single #ifdef/#endif pair within user_64bit_mode() itself. Suggested-by: Borislav Petkov Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Reviewed-by: Borislav Petkov Signed-off-by: Ricardo Neri --- arch/x86/include/asm/ptrace.h | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 91c04c8e67fa..e2afbf689309 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -135,9 +135,9 @@ static inline int v8086_mode(struct pt_regs *regs) #endif } -#ifdef CONFIG_X86_64 static inline bool user_64bit_mode(struct pt_regs *regs) { +#ifdef CONFIG_X86_64 #ifndef CONFIG_PARAVIRT /* * On non-paravirt systems, this is the only long mode CPL 3 @@ -148,8 +148,12 @@ static inline bool user_64bit_mode(struct pt_regs *regs) /* Headers are too twisted for this to go in paravirt.h. */ return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs; #endif +#else /* !CONFIG_X86_64 */ + return false; +#endif } +#ifdef CONFIG_X86_64 #define current_user_stack_pointer() current_pt_regs()->sp #define compat_user_stack_pointer()current_pt_regs()->sp #endif -- 2.13.0
[PATCH v8 00/28] x86: Enable User-Mode Instruction Prevention
This is v8 of this series. The seven previous submissions can be found here [1], here [2], here[3], here[4], here[5], here[6] and here[7]. This version addresses the feedback comments from Borislav Petkov received on v7. Please see details in the change log. === What is UMIP? User-Mode Instruction Prevention (UMIP) is a security feature present in new Intel Processors. If enabled, it prevents the execution of certain instructions if the Current Privilege Level (CPL) is greater than 0. If these instructions were executed while in CPL > 0, user space applications could have access to system-wide settings such as the global and local descriptor tables, the segment selectors to the current task state and the local descriptor table. Hiding these system resources reduces the tools available to craft privilege escalation attacks such as [8]. These are the instructions covered by UMIP: * SGDT - Store Global Descriptor Table * SIDT - Store Interrupt Descriptor Table * SLDT - Store Local Descriptor Table * SMSW - Store Machine Status Word * STR - Store Task Register If any of these instructions is executed with CPL > 0, a general protection exception is issued when UMIP is enabled. === How does it impact applications? When enabled, However, UMIP will change the behavior that certain applications expect from the operating system. For instance, programs running on WineHQ and DOSEMU2 rely on some of these instructions to function. Stas Sergeev found that Microsoft Windows 3.1 and dos4gw use the instruction SMSW when running in virtual-8086 mode[9]. SGDT and SIDT can also be used on virtual-8086 mode. In order to not change the behavior of the system. This patchset emulates SGDT, SIDT and SMSW. This should be sufficient to not break the applications mentioned above. Regarding the two remaining instructions, STR and SLDT, the WineHQ team has shown interest catching the general protection fault and use it as a vehicle to fix broken applications[10]. Furthermore, STR and SLDT can only run in protected and long modes. DOSEMU2 emulates virtual-8086 mode via KVM. No applications will be broken unless DOSEMU2 decides to enable the CR4.UMIP bit in platforms that support it. Also, this should not pose a security risk as no system resouces would be revealed. Instead, code running inside the KVM would only see the KVM's GDT, IDT and MSW. Please note that UMIP is always enabled for both 64-bit and 32-bit Linux builds. However, emulation of the UMIP-protected instructions is not done for 64-bit processes. 64-bit user space applications will receive the SIGSEGV signal when UMIP instructions causes a general protection fault. === How are UMIP-protected instructions emulated? UMIP is kept enabled at all times when the CONFIG_x86_INTEL_UMIP option is selected. If a general protection fault caused by the instructions protected by UMIP is detected, such fault will be trapped and fixed-up. The return values will be dummy as follows: * SGDT and SIDT return hard-coded dummy values as the base of the global descriptor and interrupt descriptor tables. These hard-coded values correspond to memory addresses that are near the end of the kernel memory map. This is also the case for virtual-8086 mode tasks. In all my experiments with 32-bit processes, the base of GDT and IDT was always a 4-byte address, even for 16-bit operands. Thus, my emulation code does the same. In all cases, the limit of the table is set to 0. * SMSW returns the value with which the CR0 register is programmed in head_32/64.S at boot time. This is, the following bits are enabled: CR0.0 for Protection Enable, CR.1 for Monitor Coprocessor, CR.4 for Extension Type, which will always be 1 in recent processors with UMIP; CR.5 for Numeric Error, CR0.16 for Write Protect, CR0.18 for Alignment Mask and CR0.31 for Paging. As per the Intel 64 and IA-32 Architectures Software Developer's Manual, SMSW returns a 16-bit results for memory operands. However, when the operand is a register, the results can be up to CR0[63:0]. Since the emulation code only kicks-in for 32-bit processes, we return up to CR[31:0]. * The proposed emulation code is handles faults that happens in both protected and virtual-8086 mode. * Again, STR and SLDT are not emulated. === How is this series laid out? ++ Preparatory work As per suggestions from Andy Lutormirsky and Borislav Petkov, I moved the x86 page fault error codes to a header. Also, I made user_64bit_mode available to x86_32 builds. This helps to reuse code and reduce the number of #ifdef's in these patches. Borislav also suggested to uprobes should use the existing definitions in arch/x86/include/asm/inat.h instead of hard- coded values when checking instruction prefixes. I included this change in the series. ++ Fix bugs in MPX address decoder I found very useful the code for Intel MPX (Memory Protection Extensions) used to parse opcodes and the memory locations contained in the general
[PATCH v8 12/28] x86/insn-eval: Add utility functions to get segment selector
When computing a linear address and segmentation is used, we need to know the base address of the segment involved in the computation. In most of the cases, the segment base address will be zero as in USER_DS/USER32_DS. However, it may be possible that a user space program defines its own segments via a local descriptor table. In such a case, the segment base address may not be zero .Thus, the segment base address is needed to calculate correctly the linear address. If running in protected mode, the segment selector to be used when computing a linear address is determined by either any of segment override prefixes in the instruction or inferred from the registers involved in the computation of the effective address; in that order. Also, there are cases when the segment override prefixes shall be ignored (i.e., code segments are always selected by the CS segment register; string instructions always use the ES segment register when using (E)DI register as operand). In long mode, segment registers are ignored, except for FS and GS. In these two cases, base addresses are obtained from the respective MSRs. For clarity, this process can be split into three steps (and an equal number of functions): parse the segment override prefixes, if any; resolve the relevant segment register to use, and, once known, read its value to obtain the segment selector. The method to obtain the segment selector depends on several factors. In 32-bit builds, segment selectors are saved into a pt_regs structure when switching to kernel mode. The same is also true for virtual-8086 mode. In 64-bit builds, segmentation is mostly ignored, except when running a program in 32-bit legacy mode. In this case, CS and SS can be obtained from pt_regs. DS, ES, FS and GS can be read directly from the respective segment registers. In order to identify the segment registers, a new set of #defines is introduced. It also includes two special identifiers. One of them indicates when the default segment register associated with instruction operands shall be used. Another one indicates that the contents of the segment register shall be ignored; this identifier is used when in long mode. Cc: Dave HansenCc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/include/asm/inat.h | 10 ++ arch/x86/lib/insn-eval.c| 278 2 files changed, 288 insertions(+) diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h index 02aff0867211..1c78580e58be 100644 --- a/arch/x86/include/asm/inat.h +++ b/arch/x86/include/asm/inat.h @@ -97,6 +97,16 @@ #define INAT_MAKE_GROUP(grp) ((grp << INAT_GRP_OFFS) | INAT_MODRM) #define INAT_MAKE_IMM(imm) (imm << INAT_IMM_OFFS) +/* Identifiers for segment registers */ +#define INAT_SEG_REG_IGNORE0 +#define INAT_SEG_REG_DEFAULT 1 +#define INAT_SEG_REG_CS2 +#define INAT_SEG_REG_SS3 +#define INAT_SEG_REG_DS4 +#define INAT_SEG_REG_ES5 +#define INAT_SEG_REG_FS6 +#define INAT_SEG_REG_GS7 + /* Attribute search APIs */ extern insn_attr_t inat_get_opcode_attribute(insn_byte_t opcode); extern int inat_get_last_prefix_id(insn_byte_t last_pfx); diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 25b2eb3c64c1..86f58ce6c302 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -9,6 +9,7 @@ #include #include #include +#include enum reg_type { REG_TYPE_RM = 0, @@ -42,6 +43,283 @@ static bool is_string_insn(struct insn *insn) } } +/** + * get_overridden_seg_reg() - obtain segment register to use from prefixes + * @insn: Instruction structure with segment override prefixes + * @regs: Structure with register values as seen when entering kernel mode + * @regoff:Operand offset, in pt_regs, used to deterimine segment register + * + * The segment register to which an effective address refers depends on + * a) whether running in long mode (in such a case semgment override prefixes + * are ignored. b) Whether segment override prefixes must be ignored for certain + * registers: always use CS when the register is (R|E)IP; always use ES when + * operand register is (E)DI with a string instruction as defined in the Intel + * documentation. c) If segment overrides prefixes are found in the
[PATCH v8 12/28] x86/insn-eval: Add utility functions to get segment selector
When computing a linear address and segmentation is used, we need to know the base address of the segment involved in the computation. In most of the cases, the segment base address will be zero as in USER_DS/USER32_DS. However, it may be possible that a user space program defines its own segments via a local descriptor table. In such a case, the segment base address may not be zero .Thus, the segment base address is needed to calculate correctly the linear address. If running in protected mode, the segment selector to be used when computing a linear address is determined by either any of segment override prefixes in the instruction or inferred from the registers involved in the computation of the effective address; in that order. Also, there are cases when the segment override prefixes shall be ignored (i.e., code segments are always selected by the CS segment register; string instructions always use the ES segment register when using (E)DI register as operand). In long mode, segment registers are ignored, except for FS and GS. In these two cases, base addresses are obtained from the respective MSRs. For clarity, this process can be split into three steps (and an equal number of functions): parse the segment override prefixes, if any; resolve the relevant segment register to use, and, once known, read its value to obtain the segment selector. The method to obtain the segment selector depends on several factors. In 32-bit builds, segment selectors are saved into a pt_regs structure when switching to kernel mode. The same is also true for virtual-8086 mode. In 64-bit builds, segmentation is mostly ignored, except when running a program in 32-bit legacy mode. In this case, CS and SS can be obtained from pt_regs. DS, ES, FS and GS can be read directly from the respective segment registers. In order to identify the segment registers, a new set of #defines is introduced. It also includes two special identifiers. One of them indicates when the default segment register associated with instruction operands shall be used. Another one indicates that the contents of the segment register shall be ignored; this identifier is used when in long mode. Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/include/asm/inat.h | 10 ++ arch/x86/lib/insn-eval.c| 278 2 files changed, 288 insertions(+) diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h index 02aff0867211..1c78580e58be 100644 --- a/arch/x86/include/asm/inat.h +++ b/arch/x86/include/asm/inat.h @@ -97,6 +97,16 @@ #define INAT_MAKE_GROUP(grp) ((grp << INAT_GRP_OFFS) | INAT_MODRM) #define INAT_MAKE_IMM(imm) (imm << INAT_IMM_OFFS) +/* Identifiers for segment registers */ +#define INAT_SEG_REG_IGNORE0 +#define INAT_SEG_REG_DEFAULT 1 +#define INAT_SEG_REG_CS2 +#define INAT_SEG_REG_SS3 +#define INAT_SEG_REG_DS4 +#define INAT_SEG_REG_ES5 +#define INAT_SEG_REG_FS6 +#define INAT_SEG_REG_GS7 + /* Attribute search APIs */ extern insn_attr_t inat_get_opcode_attribute(insn_byte_t opcode); extern int inat_get_last_prefix_id(insn_byte_t last_pfx); diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 25b2eb3c64c1..86f58ce6c302 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -9,6 +9,7 @@ #include #include #include +#include enum reg_type { REG_TYPE_RM = 0, @@ -42,6 +43,283 @@ static bool is_string_insn(struct insn *insn) } } +/** + * get_overridden_seg_reg() - obtain segment register to use from prefixes + * @insn: Instruction structure with segment override prefixes + * @regs: Structure with register values as seen when entering kernel mode + * @regoff:Operand offset, in pt_regs, used to deterimine segment register + * + * The segment register to which an effective address refers depends on + * a) whether running in long mode (in such a case semgment override prefixes + * are ignored. b) Whether segment override prefixes must be ignored for certain + * registers: always use CS when the register is (R|E)IP; always use ES when + * operand register is (E)DI with a string instruction as defined in the Intel + * documentation. c) If segment overrides prefixes are found in the instruction + * prefixes. d) Use the default segment register associated with the operand + * register. + * + * This function returns the overridden segment register to use, if any, as per + * the conditions described above. Please note that this function + * does not return the value in the segment register (i.e., the segment + * selector). The segment
[PATCH v8 13/28] x86/insn-eval: Add utility function to get segment descriptor
The segment descriptor contains information that is relevant to how linear addresses need to be computed. It contains the default size of addresses as well as the base address of the segment. Thus, given a segment selector, we ought look at segment descriptor to correctly calculate the linear address. In protected mode, the segment selector might indicate a segment descriptor from either the global descriptor table or a local descriptor table. Both cases are considered in this function. This function is a prerequisite for functions in subsequent commits that will obtain the aforementioned attributes of the segment descriptor. Cc: Dave HansenCc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 55 1 file changed, 55 insertions(+) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 86f58ce6c302..9cf2c49afc15 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -6,9 +6,13 @@ #include #include #include +#include +#include +#include #include #include #include +#include #include enum reg_type { @@ -402,6 +406,57 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, } /** + * get_desc() - Obtain address of segment descriptor + * @sel: Segment selector + * + * Given a segment selector, obtain a pointer to the segment descriptor. + * Both global and local descriptor tables are supported. + * + * Return: pointer to segment descriptor on success. NULL on error. + */ +static struct desc_struct *get_desc(unsigned short sel) +{ + struct desc_ptr gdt_desc = {0, 0}; + struct desc_struct *desc = NULL; + unsigned long desc_base; + +#ifdef CONFIG_MODIFY_LDT_SYSCALL + if ((sel & SEGMENT_TI_MASK) == SEGMENT_LDT) { + /* Bits [15:3] contain the index of the desired entry. */ + sel >>= 3; + + mutex_lock(>active_mm->context.lock); + /* The size of the LDT refers to the number of entries. */ + if (!current->active_mm->context.ldt || + sel >= current->active_mm->context.ldt->nr_entries) { + mutex_unlock(>active_mm->context.lock); + return NULL; + } + + desc = >active_mm->context.ldt->entries[sel]; + mutex_unlock(>active_mm->context.lock); + return desc; + } +#endif + native_store_gdt(_desc); + + /* +* Segment descriptors have a size of 8 bytes. Thus, the index is +* multiplied by 8 to obtain the memory offset of the desired descriptor +* from the base of the GDT. As bits [15:3] of the segment selector +* contain the index, it can be regarded as multiplied by 8 already. +* All that remains is to clear bits [2:0]. +*/ + desc_base = sel & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK); + + if (desc_base > gdt_desc.size) + return NULL; + + desc = (struct desc_struct *)(gdt_desc.address + desc_base); + return desc; +} + +/** * insn_get_modrm_rm_off() - Obtain register in r/m part of ModRM byte * @insn: Instruction structure containing the ModRM byte * @regs: Structure with register values as seen when entering kernel mode -- 2.13.0
[PATCH v8 15/28] x86/insn-eval: Add function to get default params of code segment
Obtain the default values of the address and operand sizes as specified in the D and L bits of the the segment descriptor selected by the register CS. The function can be used for both protected and long modes. For virtual-8086 mode, the default address and operand sizes are always 2 bytes. The returned parameters are encoded in a signed 8-bit data type. Auxiliar macros are provided to encode and decode such values. Cc: Dave HansenCc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/include/asm/insn-eval.h | 5 arch/x86/lib/insn-eval.c | 59 2 files changed, 64 insertions(+) diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index 7f3c7fe72cd0..e8c3e7cd1673 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -11,9 +11,14 @@ #include #include +#define INSN_CODE_SEG_ADDR_SZ(params) ((params >> 4) & 0xf) +#define INSN_CODE_SEG_OPND_SZ(params) (params & 0xf) +#define INSN_CODE_SEG_PARAMS(oper_sz, addr_sz) (oper_sz | (addr_sz << 4)) + void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs); unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, int regoff); +char insn_get_code_seg_defaults(struct pt_regs *regs); #endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 2c5e7081957d..a8e12bd0aecd 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -584,6 +584,65 @@ static unsigned long get_seg_limit(struct pt_regs *regs, struct insn *insn, } /** + * insn_get_code_seg_defaults() - Obtain code segment default parameters + * @regs: Structure with register values as seen when entering kernel mode + * + * Obtain the default parameters of the code segment: address and operand sizes. + * The code segment is obtained from the selector contained in the CS register + * in regs. In protected mode, the default address is determined by inspecting + * the L and D bits of the segment descriptor. In virtual-8086 mode, the default + * is always two bytes for both address and operand sizes. + * + * Return: A signed 8-bit value containing the default parameters on success and + * -EINVAL on error. + */ +char insn_get_code_seg_defaults(struct pt_regs *regs) +{ + struct desc_struct *desc; + unsigned short sel; + + if (v8086_mode(regs)) + /* Address and operand size are both 16-bit. */ + return INSN_CODE_SEG_PARAMS(2, 2); + + sel = (unsigned short)regs->cs; + + desc = get_desc(sel); + if (!desc) + return -EINVAL; + + /* +* The most significant byte of the Type field of the segment descriptor +* determines whether a segment contains data or code. If this is a data +* segment, return error. +*/ + if (!(desc->type & BIT(3))) + return -EINVAL; + + switch ((desc->l << 1) | desc->d) { + case 0: /* +* Legacy mode. CS.L=0, CS.D=0. Address and operand size are +* both 16-bit. +*/ + return INSN_CODE_SEG_PARAMS(2, 2); + case 1: /* +* Legacy mode. CS.L=0, CS.D=1. Address and operand size are +* both 32-bit. +*/ + return INSN_CODE_SEG_PARAMS(4, 4); + case 2: /* +* IA-32e 64-bit mode. CS.L=1, CS.D=0. Address size is 64-bit; +* operand size is 32-bit. +*/ + return INSN_CODE_SEG_PARAMS(4, 8); + case 3: /* Invalid setting. CS.L=1, CS.D=1 */ + /* fall through */ + default: + return -EINVAL; + } +} + +/** * insn_get_modrm_rm_off() - Obtain register in r/m part of ModRM byte * @insn: Instruction structure containing the ModRM byte * @regs: Structure with register values as seen when entering kernel mode -- 2.13.0
[PATCH v8 13/28] x86/insn-eval: Add utility function to get segment descriptor
The segment descriptor contains information that is relevant to how linear addresses need to be computed. It contains the default size of addresses as well as the base address of the segment. Thus, given a segment selector, we ought look at segment descriptor to correctly calculate the linear address. In protected mode, the segment selector might indicate a segment descriptor from either the global descriptor table or a local descriptor table. Both cases are considered in this function. This function is a prerequisite for functions in subsequent commits that will obtain the aforementioned attributes of the segment descriptor. Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 55 1 file changed, 55 insertions(+) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 86f58ce6c302..9cf2c49afc15 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -6,9 +6,13 @@ #include #include #include +#include +#include +#include #include #include #include +#include #include enum reg_type { @@ -402,6 +406,57 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, } /** + * get_desc() - Obtain address of segment descriptor + * @sel: Segment selector + * + * Given a segment selector, obtain a pointer to the segment descriptor. + * Both global and local descriptor tables are supported. + * + * Return: pointer to segment descriptor on success. NULL on error. + */ +static struct desc_struct *get_desc(unsigned short sel) +{ + struct desc_ptr gdt_desc = {0, 0}; + struct desc_struct *desc = NULL; + unsigned long desc_base; + +#ifdef CONFIG_MODIFY_LDT_SYSCALL + if ((sel & SEGMENT_TI_MASK) == SEGMENT_LDT) { + /* Bits [15:3] contain the index of the desired entry. */ + sel >>= 3; + + mutex_lock(>active_mm->context.lock); + /* The size of the LDT refers to the number of entries. */ + if (!current->active_mm->context.ldt || + sel >= current->active_mm->context.ldt->nr_entries) { + mutex_unlock(>active_mm->context.lock); + return NULL; + } + + desc = >active_mm->context.ldt->entries[sel]; + mutex_unlock(>active_mm->context.lock); + return desc; + } +#endif + native_store_gdt(_desc); + + /* +* Segment descriptors have a size of 8 bytes. Thus, the index is +* multiplied by 8 to obtain the memory offset of the desired descriptor +* from the base of the GDT. As bits [15:3] of the segment selector +* contain the index, it can be regarded as multiplied by 8 already. +* All that remains is to clear bits [2:0]. +*/ + desc_base = sel & ~(SEGMENT_RPL_MASK | SEGMENT_TI_MASK); + + if (desc_base > gdt_desc.size) + return NULL; + + desc = (struct desc_struct *)(gdt_desc.address + desc_base); + return desc; +} + +/** * insn_get_modrm_rm_off() - Obtain register in r/m part of ModRM byte * @insn: Instruction structure containing the ModRM byte * @regs: Structure with register values as seen when entering kernel mode -- 2.13.0
[PATCH v8 15/28] x86/insn-eval: Add function to get default params of code segment
Obtain the default values of the address and operand sizes as specified in the D and L bits of the the segment descriptor selected by the register CS. The function can be used for both protected and long modes. For virtual-8086 mode, the default address and operand sizes are always 2 bytes. The returned parameters are encoded in a signed 8-bit data type. Auxiliar macros are provided to encode and decode such values. Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/include/asm/insn-eval.h | 5 arch/x86/lib/insn-eval.c | 59 2 files changed, 64 insertions(+) diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index 7f3c7fe72cd0..e8c3e7cd1673 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -11,9 +11,14 @@ #include #include +#define INSN_CODE_SEG_ADDR_SZ(params) ((params >> 4) & 0xf) +#define INSN_CODE_SEG_OPND_SZ(params) (params & 0xf) +#define INSN_CODE_SEG_PARAMS(oper_sz, addr_sz) (oper_sz | (addr_sz << 4)) + void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs); unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, int regoff); +char insn_get_code_seg_defaults(struct pt_regs *regs); #endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 2c5e7081957d..a8e12bd0aecd 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -584,6 +584,65 @@ static unsigned long get_seg_limit(struct pt_regs *regs, struct insn *insn, } /** + * insn_get_code_seg_defaults() - Obtain code segment default parameters + * @regs: Structure with register values as seen when entering kernel mode + * + * Obtain the default parameters of the code segment: address and operand sizes. + * The code segment is obtained from the selector contained in the CS register + * in regs. In protected mode, the default address is determined by inspecting + * the L and D bits of the segment descriptor. In virtual-8086 mode, the default + * is always two bytes for both address and operand sizes. + * + * Return: A signed 8-bit value containing the default parameters on success and + * -EINVAL on error. + */ +char insn_get_code_seg_defaults(struct pt_regs *regs) +{ + struct desc_struct *desc; + unsigned short sel; + + if (v8086_mode(regs)) + /* Address and operand size are both 16-bit. */ + return INSN_CODE_SEG_PARAMS(2, 2); + + sel = (unsigned short)regs->cs; + + desc = get_desc(sel); + if (!desc) + return -EINVAL; + + /* +* The most significant byte of the Type field of the segment descriptor +* determines whether a segment contains data or code. If this is a data +* segment, return error. +*/ + if (!(desc->type & BIT(3))) + return -EINVAL; + + switch ((desc->l << 1) | desc->d) { + case 0: /* +* Legacy mode. CS.L=0, CS.D=0. Address and operand size are +* both 16-bit. +*/ + return INSN_CODE_SEG_PARAMS(2, 2); + case 1: /* +* Legacy mode. CS.L=0, CS.D=1. Address and operand size are +* both 32-bit. +*/ + return INSN_CODE_SEG_PARAMS(4, 4); + case 2: /* +* IA-32e 64-bit mode. CS.L=1, CS.D=0. Address size is 64-bit; +* operand size is 32-bit. +*/ + return INSN_CODE_SEG_PARAMS(4, 8); + case 3: /* Invalid setting. CS.L=1, CS.D=1 */ + /* fall through */ + default: + return -EINVAL; + } +} + +/** * insn_get_modrm_rm_off() - Obtain register in r/m part of ModRM byte * @insn: Instruction structure containing the ModRM byte * @regs: Structure with register values as seen when entering kernel mode -- 2.13.0
[PATCH v8 16/28] x86/insn-eval: Indicate a 32-bit displacement if ModRM.mod is 0 and ModRM.rm is 101b
Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when ModRM.mod is zero and ModRM.rm is 101b, a 32-bit displacement follows the ModRM byte. This means that none of the registers are used in the computation of the effective address. A return value of -EDOM indicates callers that they should not use the value of registers when computing the effective address for the instruction. In long mode, the effective address is given by the 32-bit displacement plus the location of the next instruction. In protected mode, only the displacement is used. The instruction decoder takes care of obtaining the displacement. Cc: Dave HansenCc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 26 +++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index a8e12bd0aecd..04f696c3793e 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -360,6 +360,14 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, switch (type) { case REG_TYPE_RM: regno = X86_MODRM_RM(insn->modrm.value); + + /* +* ModRM.mod == 0 and ModRM.rm == 5 means a 32-bit displacement +* follows the ModRM byte. +*/ + if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5) + return -EDOM; + if (X86_REX_B(insn->rex_prefix.value)) regno += 8; break; @@ -706,10 +714,22 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); - if (addr_offset < 0) - goto out_err; - eff_addr = regs_get_register(regs, addr_offset); + /* +* -EDOM means that we must ignore the address_offset. +* In such a case, in 64-bit mode the effective address +* relative to the RIP of the following instruction. +*/ + if (addr_offset == -EDOM) { + if (user_64bit_mode(regs)) + eff_addr = (long)regs->ip + insn->length; + else + eff_addr = 0; + } else if (addr_offset < 0) { + goto out_err; + } else { + eff_addr = regs_get_register(regs, addr_offset); + } } eff_addr += insn->displacement.value; -- 2.13.0
[PATCH v8 14/28] x86/insn-eval: Add utility functions to get segment descriptor base address and limit
With segmentation, the base address of the segment is needed to compute a linear address. This base address is obtained from the applicable segment descriptor. Such segment descriptor is referenced from a segment selector. The segment selector is stored in the segment register associated with operands in the instruction being executed or indicated in the instruction prefixes. Thus, both a structure containing such instruction and its prefixes as well as the register operand (specified as the offset from the base of pt_regs) are given as inputs to the new function insn_get_seg_base() that retrieves the base address indicated in the segment descriptor. The logic to obtain the segment selector is wrapped in the function get_seg_selector() with the inputs described above. Once the selector is known, the base address is determined. In protected mode, the selector is used to obtain the segment descriptor and then its base address. In 64-bit user mode, the segment base address is zero except when FS or GS are used. In virtual-8086 mode, the base address is computed as the value of the segment selector shifted 4 positions to the left. In protected mode, segment limits are enforced. Thus, a function to determine the limit of the segment is added. Segment limits are not enforced in long or virtual-8086. For the latter, addresses are limited to 20 bits; address size will be handled when computing the linear address. Cc: Dave HansenCc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/include/asm/insn-eval.h | 2 + arch/x86/lib/insn-eval.c | 127 +++ 2 files changed, 129 insertions(+) diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index 7e8c9633a377..7f3c7fe72cd0 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -13,5 +13,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs); +unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, + int regoff); #endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 9cf2c49afc15..2c5e7081957d 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -457,6 +457,133 @@ static struct desc_struct *get_desc(unsigned short sel) } /** + * insn_get_seg_base() - Obtain base address of segment descriptor. + * @regs: Structure with register values as seen when entering kernel mode + * @insn: Instruction structure with selector override prefixes + * @regoff:Operand offset, in pt_regs, of which the selector is needed + * + * Obtain the base address of the segment descriptor as indicated by either + * any segment override prefixes contained in insn or the default segment + * applicable to the register indicated by regoff. regoff is specified as the + * offset in bytes from the base of pt_regs. + * + * Return: In protected mode, base address of the segment. Zero in long mode, + * except when FS or GS are used. In virtual-8086 mode, the segment + * selector shifted 4 positions to the right. -1L in case of + * error. + */ +unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, + int regoff) +{ + struct desc_struct *desc; + int seg_reg; + short sel; + + seg_reg = resolve_seg_register(insn, regs, regoff); + if (seg_reg < 0) + return -1L; + + sel = get_segment_selector(regs, seg_reg); + if (sel < 0) + return -1L; + + if (v8086_mode(regs)) + /* +* Base is simply the segment selector shifted 4 +* positions to the right. +*/ + return (unsigned long)(sel << 4); + + if (user_64bit_mode(regs)) { + /* +* Only FS or GS will have a base address, the rest of +* the segments' bases are forced to 0. +*/ + unsigned long base; + + if (seg_reg == INAT_SEG_REG_FS) + rdmsrl(MSR_FS_BASE, base); + else if (seg_reg == INAT_SEG_REG_GS) + /* +* swapgs was called at the kernel entry point.
[PATCH v8 16/28] x86/insn-eval: Indicate a 32-bit displacement if ModRM.mod is 0 and ModRM.rm is 101b
Section 2.2.1.3 of the Intel 64 and IA-32 Architectures Software Developer's Manual volume 2A states that when ModRM.mod is zero and ModRM.rm is 101b, a 32-bit displacement follows the ModRM byte. This means that none of the registers are used in the computation of the effective address. A return value of -EDOM indicates callers that they should not use the value of registers when computing the effective address for the instruction. In long mode, the effective address is given by the 32-bit displacement plus the location of the next instruction. In protected mode, only the displacement is used. The instruction decoder takes care of obtaining the displacement. Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 26 +++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index a8e12bd0aecd..04f696c3793e 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -360,6 +360,14 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, switch (type) { case REG_TYPE_RM: regno = X86_MODRM_RM(insn->modrm.value); + + /* +* ModRM.mod == 0 and ModRM.rm == 5 means a 32-bit displacement +* follows the ModRM byte. +*/ + if (!X86_MODRM_MOD(insn->modrm.value) && regno == 5) + return -EDOM; + if (X86_REX_B(insn->rex_prefix.value)) regno += 8; break; @@ -706,10 +714,22 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); - if (addr_offset < 0) - goto out_err; - eff_addr = regs_get_register(regs, addr_offset); + /* +* -EDOM means that we must ignore the address_offset. +* In such a case, in 64-bit mode the effective address +* relative to the RIP of the following instruction. +*/ + if (addr_offset == -EDOM) { + if (user_64bit_mode(regs)) + eff_addr = (long)regs->ip + insn->length; + else + eff_addr = 0; + } else if (addr_offset < 0) { + goto out_err; + } else { + eff_addr = regs_get_register(regs, addr_offset); + } } eff_addr += insn->displacement.value; -- 2.13.0
[PATCH v8 14/28] x86/insn-eval: Add utility functions to get segment descriptor base address and limit
With segmentation, the base address of the segment is needed to compute a linear address. This base address is obtained from the applicable segment descriptor. Such segment descriptor is referenced from a segment selector. The segment selector is stored in the segment register associated with operands in the instruction being executed or indicated in the instruction prefixes. Thus, both a structure containing such instruction and its prefixes as well as the register operand (specified as the offset from the base of pt_regs) are given as inputs to the new function insn_get_seg_base() that retrieves the base address indicated in the segment descriptor. The logic to obtain the segment selector is wrapped in the function get_seg_selector() with the inputs described above. Once the selector is known, the base address is determined. In protected mode, the selector is used to obtain the segment descriptor and then its base address. In 64-bit user mode, the segment base address is zero except when FS or GS are used. In virtual-8086 mode, the base address is computed as the value of the segment selector shifted 4 positions to the left. In protected mode, segment limits are enforced. Thus, a function to determine the limit of the segment is added. Segment limits are not enforced in long or virtual-8086. For the latter, addresses are limited to 20 bits; address size will be handled when computing the linear address. Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/include/asm/insn-eval.h | 2 + arch/x86/lib/insn-eval.c | 127 +++ 2 files changed, 129 insertions(+) diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index 7e8c9633a377..7f3c7fe72cd0 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -13,5 +13,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs); +unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, + int regoff); #endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 9cf2c49afc15..2c5e7081957d 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -457,6 +457,133 @@ static struct desc_struct *get_desc(unsigned short sel) } /** + * insn_get_seg_base() - Obtain base address of segment descriptor. + * @regs: Structure with register values as seen when entering kernel mode + * @insn: Instruction structure with selector override prefixes + * @regoff:Operand offset, in pt_regs, of which the selector is needed + * + * Obtain the base address of the segment descriptor as indicated by either + * any segment override prefixes contained in insn or the default segment + * applicable to the register indicated by regoff. regoff is specified as the + * offset in bytes from the base of pt_regs. + * + * Return: In protected mode, base address of the segment. Zero in long mode, + * except when FS or GS are used. In virtual-8086 mode, the segment + * selector shifted 4 positions to the right. -1L in case of + * error. + */ +unsigned long insn_get_seg_base(struct pt_regs *regs, struct insn *insn, + int regoff) +{ + struct desc_struct *desc; + int seg_reg; + short sel; + + seg_reg = resolve_seg_register(insn, regs, regoff); + if (seg_reg < 0) + return -1L; + + sel = get_segment_selector(regs, seg_reg); + if (sel < 0) + return -1L; + + if (v8086_mode(regs)) + /* +* Base is simply the segment selector shifted 4 +* positions to the right. +*/ + return (unsigned long)(sel << 4); + + if (user_64bit_mode(regs)) { + /* +* Only FS or GS will have a base address, the rest of +* the segments' bases are forced to 0. +*/ + unsigned long base; + + if (seg_reg == INAT_SEG_REG_FS) + rdmsrl(MSR_FS_BASE, base); + else if (seg_reg == INAT_SEG_REG_GS) + /* +* swapgs was called at the kernel entry point. Thus, +* MSR_KERNEL_GS_BASE will have the user-space GS base. +*/ + rdmsrl(MSR_KERNEL_GS_BASE, base); + else if (seg_reg != INAT_SEG_REG_IGNORE) + /* We should ignore the rest of segment registers. */ + base = -1L; +
[PATCH v8 04/28] uprobes/x86: Use existing definitions for segment override prefixes
Rather than using hard-coded values of the segment override prefixes, leverage the existing definitions provided in inat.h. Suggested-by: Borislav PetkovCc: Andy Lutomirski Cc: Andrew Morton Cc: Borislav Petkov Cc: Masami Hiramatsu Cc: Denys Vlasenko Cc: Srikar Dronamraju Cc: Ravi V. Shankar Signed-off-by: Ricardo Neri --- arch/x86/kernel/uprobes.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c index 495c776de4b4..a3755d293a48 100644 --- a/arch/x86/kernel/uprobes.c +++ b/arch/x86/kernel/uprobes.c @@ -271,12 +271,15 @@ static bool is_prefix_bad(struct insn *insn) int i; for (i = 0; i < insn->prefixes.nbytes; i++) { - switch (insn->prefixes.bytes[i]) { - case 0x26: /* INAT_PFX_ES */ - case 0x2E: /* INAT_PFX_CS */ - case 0x36: /* INAT_PFX_DS */ - case 0x3E: /* INAT_PFX_SS */ - case 0xF0: /* INAT_PFX_LOCK */ + insn_attr_t attr; + + attr = inat_get_opcode_attribute(insn->prefixes.bytes[i]); + switch (attr) { + case INAT_MAKE_PREFIX(INAT_PFX_ES): + case INAT_MAKE_PREFIX(INAT_PFX_CS): + case INAT_MAKE_PREFIX(INAT_PFX_DS): + case INAT_MAKE_PREFIX(INAT_PFX_SS): + case INAT_MAKE_PREFIX(INAT_PFX_LOCK): return true; } } -- 2.13.0
[PATCH v8 19/28] x86/insn-eval: Add wrapper function for 32 and 64-bit addresses
The function insn_get_addr_ref() is capable of handling only 64-bit addresses. A previous commit introduced a function to handle 32-bit addresses. Invoke these two functions from a third wrapper function that calls the appropriate routine based on the address size specified in the instruction structure (obtained by looking at the code segment default address size and the address override prefix, if present). While doing this, rename the original function insn_get_addr_ref() with the more appropriate name get_addr_ref_64(), ensure it is only used for 64-bit addresses and returns a 64-bit error value. Also, since 64-bit addresses are not possible in 32-bit builds, provide a dummy function such case. Cc: Dave HansenCc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 53 ++-- 1 file changed, 47 insertions(+), 6 deletions(-) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 6730c9ba02c5..6537b613d0b3 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -812,12 +812,25 @@ static void __user *get_addr_ref_32(struct insn *insn, struct pt_regs *regs) return (void __user *)-1L; } -/* - * return the address being referenced be instruction - * for rm=3 returning the content of the rm reg - * for rm!=3 calculates the address using SIB and Disp +/** + * get_addr_ref_64() - Obtain a 64-bit linear address + * @insn: Instruction struct with ModRM and SIB bytes and displacement + * @regs: Structure with register values as seen when entering kernel mode + * + * This function is to be used with 64-bit address encodings to obtain the + * linear memory address referred by the instruction's ModRM, SIB, + * displacement bytes and segment base address, as applicable. + * + * Return: linear address referenced by instruction and registers on success. + * -1L on error. */ -void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) +#ifndef CONFIG_X86_64 +static void __user *get_addr_ref_64(struct insn *insn, struct pt_regs *regs) +{ + return (void __user *)-1L; +} +#else +static void __user *get_addr_ref_64(struct insn *insn, struct pt_regs *regs) { int addr_offset, base_offset, indx_offset; unsigned long linear_addr, seg_base_addr; @@ -828,6 +841,9 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) insn_get_sib(insn); sib = insn->sib.value; + if (insn->addr_bytes != 8) + goto out_err; + if (X86_MODRM_MOD(insn->modrm.value) == 3) { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); if (addr_offset < 0) @@ -900,5 +916,30 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) return (void __user *)linear_addr; out_err: - return (void __user *)-1; + return (void __user *)-1L; +} +#endif /* CONFIG_X86_64 */ + +/** + * insn_get_addr_ref() - Obtain the linear address referred by instruction + * @insn: Instruction structure containing ModRM byte and displacement + * @regs: Structure with register values as seen when entering kernel mode + * + * Obtain the linear address referred by the instruction's ModRM, SIB and + * displacement bytes, and segment base, as applicable. In protected mode, + * segment limits are enforced. + * + * Return: linear address referenced by instruction and registers on success. + * -1L on error. + */ +void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) +{ + switch (insn->addr_bytes) { + case 4: + return get_addr_ref_32(insn, regs); + case 8: + return get_addr_ref_64(insn, regs); + default: + return (void __user *)-1L; + } } -- 2.13.0
[PATCH v8 19/28] x86/insn-eval: Add wrapper function for 32 and 64-bit addresses
The function insn_get_addr_ref() is capable of handling only 64-bit addresses. A previous commit introduced a function to handle 32-bit addresses. Invoke these two functions from a third wrapper function that calls the appropriate routine based on the address size specified in the instruction structure (obtained by looking at the code segment default address size and the address override prefix, if present). While doing this, rename the original function insn_get_addr_ref() with the more appropriate name get_addr_ref_64(), ensure it is only used for 64-bit addresses and returns a 64-bit error value. Also, since 64-bit addresses are not possible in 32-bit builds, provide a dummy function such case. Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 53 ++-- 1 file changed, 47 insertions(+), 6 deletions(-) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 6730c9ba02c5..6537b613d0b3 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -812,12 +812,25 @@ static void __user *get_addr_ref_32(struct insn *insn, struct pt_regs *regs) return (void __user *)-1L; } -/* - * return the address being referenced be instruction - * for rm=3 returning the content of the rm reg - * for rm!=3 calculates the address using SIB and Disp +/** + * get_addr_ref_64() - Obtain a 64-bit linear address + * @insn: Instruction struct with ModRM and SIB bytes and displacement + * @regs: Structure with register values as seen when entering kernel mode + * + * This function is to be used with 64-bit address encodings to obtain the + * linear memory address referred by the instruction's ModRM, SIB, + * displacement bytes and segment base address, as applicable. + * + * Return: linear address referenced by instruction and registers on success. + * -1L on error. */ -void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) +#ifndef CONFIG_X86_64 +static void __user *get_addr_ref_64(struct insn *insn, struct pt_regs *regs) +{ + return (void __user *)-1L; +} +#else +static void __user *get_addr_ref_64(struct insn *insn, struct pt_regs *regs) { int addr_offset, base_offset, indx_offset; unsigned long linear_addr, seg_base_addr; @@ -828,6 +841,9 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) insn_get_sib(insn); sib = insn->sib.value; + if (insn->addr_bytes != 8) + goto out_err; + if (X86_MODRM_MOD(insn->modrm.value) == 3) { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); if (addr_offset < 0) @@ -900,5 +916,30 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) return (void __user *)linear_addr; out_err: - return (void __user *)-1; + return (void __user *)-1L; +} +#endif /* CONFIG_X86_64 */ + +/** + * insn_get_addr_ref() - Obtain the linear address referred by instruction + * @insn: Instruction structure containing ModRM byte and displacement + * @regs: Structure with register values as seen when entering kernel mode + * + * Obtain the linear address referred by the instruction's ModRM, SIB and + * displacement bytes, and segment base, as applicable. In protected mode, + * segment limits are enforced. + * + * Return: linear address referenced by instruction and registers on success. + * -1L on error. + */ +void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) +{ + switch (insn->addr_bytes) { + case 4: + return get_addr_ref_32(insn, regs); + case 8: + return get_addr_ref_64(insn, regs); + default: + return (void __user *)-1L; + } } -- 2.13.0
[PATCH v8 04/28] uprobes/x86: Use existing definitions for segment override prefixes
Rather than using hard-coded values of the segment override prefixes, leverage the existing definitions provided in inat.h. Suggested-by: Borislav Petkov Cc: Andy Lutomirski Cc: Andrew Morton Cc: Borislav Petkov Cc: Masami Hiramatsu Cc: Denys Vlasenko Cc: Srikar Dronamraju Cc: Ravi V. Shankar Signed-off-by: Ricardo Neri --- arch/x86/kernel/uprobes.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c index 495c776de4b4..a3755d293a48 100644 --- a/arch/x86/kernel/uprobes.c +++ b/arch/x86/kernel/uprobes.c @@ -271,12 +271,15 @@ static bool is_prefix_bad(struct insn *insn) int i; for (i = 0; i < insn->prefixes.nbytes; i++) { - switch (insn->prefixes.bytes[i]) { - case 0x26: /* INAT_PFX_ES */ - case 0x2E: /* INAT_PFX_CS */ - case 0x36: /* INAT_PFX_DS */ - case 0x3E: /* INAT_PFX_SS */ - case 0xF0: /* INAT_PFX_LOCK */ + insn_attr_t attr; + + attr = inat_get_opcode_attribute(insn->prefixes.bytes[i]); + switch (attr) { + case INAT_MAKE_PREFIX(INAT_PFX_ES): + case INAT_MAKE_PREFIX(INAT_PFX_CS): + case INAT_MAKE_PREFIX(INAT_PFX_DS): + case INAT_MAKE_PREFIX(INAT_PFX_SS): + case INAT_MAKE_PREFIX(INAT_PFX_LOCK): return true; } } -- 2.13.0
[PATCH v8 21/28] x86/insn-eval: Add support to resolve 16-bit addressing encodings
Tasks running in virtual-8086 mode, in protected mode with code segment descriptors that specify 16-bit default address sizes via the D bit, or via an address override prefix will use 16-bit addressing form encodings as described in the Intel 64 and IA-32 Architecture Software Developer's Manual Volume 2A Section 2.1.5, Table 2-1. 16-bit addressing encodings differ in several ways from the 32-bit/64-bit addressing form encodings: ModRM.rm points to different registers and, in some cases, effective addresses are indicated by the addition of the value of two registers. Also, there is no support for SIB bytes. Thus, a separate function is needed to parse this form of addressing. A couple of functions are introduced. get_reg_offset_16() obtains the offset from the base of pt_regs of the registers indicated by the ModRM byte of the address encoding. get_addr_ref_16() computes the linear address indicated by the instructions using the value of the registers given by ModRM and the base address of the applicable segment. Cc: Dave HansenCc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 171 +++ 1 file changed, 171 insertions(+) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 93a6d1f57c2d..6abe46aed6fd 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -414,6 +414,78 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, } /** + * get_reg_offset_16() - Obtain offset of register indicated by instruction + * @insn: Instruction structure containing ModRM and SIB bytes + * @regs: Structure with register values as seen when entering kernel mode + * @offs1: Offset of the first operand register + * @offs2: Offset of the second opeand register, if applicable + * + * Obtain the offset, in pt_regs, of the registers indicated by the ModRM byte + * within insn. This function is to be used with 16-bit address encodings. The + * offs1 and offs2 will be written with the offset of the two registers + * indicated by the instruction. In cases where any of the registers is not + * referenced by the instruction, the value will be set to -EDOM. + * + * Return: 0 on success, -EINVAL on failure. + */ +static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs, +int *offs1, int *offs2) +{ + /* +* 16-bit addressing can use one or two registers. Specifics of +* encodings are given in Table 2-1. "16-Bit Addressing Forms with the +* ModR/M Byte" of the Intel Software Development Manual. +*/ + static const int regoff1[] = { + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, bx), + }; + + static const int regoff2[] = { + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), + -EDOM, + -EDOM, + -EDOM, + -EDOM, + }; + + if (!offs1 || !offs2) + return -EINVAL; + + /* Operand is a register, use the generic function. */ + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + *offs1 = insn_get_modrm_rm_off(insn, regs); + *offs2 = -EDOM; + return 0; + } + + *offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)]; + *offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)]; + + /* +* If ModRM.mod is 0 and ModRM.rm is 110b, then we use displacement- +* only addressing. This means that no registers are involved in +* computing the effective address. Thus, ensure that the first +* register offset is invalild. The second register offset is already +* invalid under the aforementioned conditions. +*/ + if ((X86_MODRM_MOD(insn->modrm.value) == 0) && + (X86_MODRM_RM(insn->modrm.value) == 6)) + *offs1 = -EDOM; + + return 0; +} + +/** * get_desc() -
[PATCH v8 21/28] x86/insn-eval: Add support to resolve 16-bit addressing encodings
Tasks running in virtual-8086 mode, in protected mode with code segment descriptors that specify 16-bit default address sizes via the D bit, or via an address override prefix will use 16-bit addressing form encodings as described in the Intel 64 and IA-32 Architecture Software Developer's Manual Volume 2A Section 2.1.5, Table 2-1. 16-bit addressing encodings differ in several ways from the 32-bit/64-bit addressing form encodings: ModRM.rm points to different registers and, in some cases, effective addresses are indicated by the addition of the value of two registers. Also, there is no support for SIB bytes. Thus, a separate function is needed to parse this form of addressing. A couple of functions are introduced. get_reg_offset_16() obtains the offset from the base of pt_regs of the registers indicated by the ModRM byte of the address encoding. get_addr_ref_16() computes the linear address indicated by the instructions using the value of the registers given by ModRM and the base address of the applicable segment. Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 171 +++ 1 file changed, 171 insertions(+) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 93a6d1f57c2d..6abe46aed6fd 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -414,6 +414,78 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, } /** + * get_reg_offset_16() - Obtain offset of register indicated by instruction + * @insn: Instruction structure containing ModRM and SIB bytes + * @regs: Structure with register values as seen when entering kernel mode + * @offs1: Offset of the first operand register + * @offs2: Offset of the second opeand register, if applicable + * + * Obtain the offset, in pt_regs, of the registers indicated by the ModRM byte + * within insn. This function is to be used with 16-bit address encodings. The + * offs1 and offs2 will be written with the offset of the two registers + * indicated by the instruction. In cases where any of the registers is not + * referenced by the instruction, the value will be set to -EDOM. + * + * Return: 0 on success, -EINVAL on failure. + */ +static int get_reg_offset_16(struct insn *insn, struct pt_regs *regs, +int *offs1, int *offs2) +{ + /* +* 16-bit addressing can use one or two registers. Specifics of +* encodings are given in Table 2-1. "16-Bit Addressing Forms with the +* ModR/M Byte" of the Intel Software Development Manual. +*/ + static const int regoff1[] = { + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, bx), + }; + + static const int regoff2[] = { + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), + -EDOM, + -EDOM, + -EDOM, + -EDOM, + }; + + if (!offs1 || !offs2) + return -EINVAL; + + /* Operand is a register, use the generic function. */ + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + *offs1 = insn_get_modrm_rm_off(insn, regs); + *offs2 = -EDOM; + return 0; + } + + *offs1 = regoff1[X86_MODRM_RM(insn->modrm.value)]; + *offs2 = regoff2[X86_MODRM_RM(insn->modrm.value)]; + + /* +* If ModRM.mod is 0 and ModRM.rm is 110b, then we use displacement- +* only addressing. This means that no registers are involved in +* computing the effective address. Thus, ensure that the first +* register offset is invalild. The second register offset is already +* invalid under the aforementioned conditions. +*/ + if ((X86_MODRM_MOD(insn->modrm.value) == 0) && + (X86_MODRM_RM(insn->modrm.value) == 6)) + *offs1 = -EDOM; + + return 0; +} + +/** * get_desc() - Obtain address of segment descriptor * @sel: Segment selector * @@ -666,6 +738,103 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs) } /** + * get_addr_ref_16() - Obtain the 16-bit address referred by instruction + * @insn: Instruction structure containing ModRM byte and displacement + * @regs: Structure
[PATCH v8 20/28] x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode
It is possible to utilize 32-bit address encodings in virtual-8086 mode via an address override instruction prefix. However, the range of the effective address is still limited to [0x-0x]. In such a case, return error. Also, linear addresses in virtual-8086 mode are limited to 20 bits. Enforce such limit by truncating the most significant bytes of the computed linear address. Cc: Dave HansenCc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 6537b613d0b3..93a6d1f57c2d 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -801,12 +801,23 @@ static void __user *get_addr_ref_32(struct insn *insn, struct pt_regs *regs) goto out_err; /* +* Even though 32-bit address encodings are allowed in virtual-8086 +* mode, the address range is still limited to [0x-0x]. +*/ + if (v8086_mode(regs) && (eff_addr & ~0x)) + goto out_err; + + /* * Data type long could be 64 bits in size. Ensure that our 32-bit * effective address is not sign-extended when computing the linear * address. */ linear_addr = (unsigned long)(eff_addr & 0x) + seg_base_addr; + /* Limit linear address to 20 bits */ + if (v8086_mode(regs)) + linear_addr &= 0xf; + return (void __user *)linear_addr; out_err: return (void __user *)-1L; -- 2.13.0
[PATCH v8 20/28] x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode
It is possible to utilize 32-bit address encodings in virtual-8086 mode via an address override instruction prefix. However, the range of the effective address is still limited to [0x-0x]. In such a case, return error. Also, linear addresses in virtual-8086 mode are limited to 20 bits. Enforce such limit by truncating the most significant bytes of the computed linear address. Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 6537b613d0b3..93a6d1f57c2d 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -801,12 +801,23 @@ static void __user *get_addr_ref_32(struct insn *insn, struct pt_regs *regs) goto out_err; /* +* Even though 32-bit address encodings are allowed in virtual-8086 +* mode, the address range is still limited to [0x-0x]. +*/ + if (v8086_mode(regs) && (eff_addr & ~0x)) + goto out_err; + + /* * Data type long could be 64 bits in size. Ensure that our 32-bit * effective address is not sign-extended when computing the linear * address. */ linear_addr = (unsigned long)(eff_addr & 0x) + seg_base_addr; + /* Limit linear address to 20 bits */ + if (v8086_mode(regs)) + linear_addr &= 0xf; + return (void __user *)linear_addr; out_err: return (void __user *)-1L; -- 2.13.0
[PATCH v8 24/28] x86/umip: Force a page fault when unable to copy emulated result to user
fixup_umip_exception() will be called from do_general_protection(). If the former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV. However, when emulation is successful but the emulated result cannot be copied to user space memory, it is more accurate to issue a SIGSEGV with SEGV_MAPERR with the offending address. A new function, inspired in force_sig_info_fault(), is introduced to model the page fault. Cc: Andy LutomirskiCc: Andrew Morton Cc: H. Peter Anvin Cc: Borislav Petkov Cc: Brian Gerst Cc: Chen Yucong Cc: Chris Metcalf Cc: Dave Hansen Cc: Fenghua Yu Cc: Huang Rui Cc: Jiri Slaby Cc: Jonathan Corbet Cc: Michael S. Tsirkin Cc: Paul Gortmaker Cc: Peter Zijlstra Cc: Ravi V. Shankar Cc: Shuah Khan Cc: Vlastimil Babka Cc: Tony Luck Cc: Paolo Bonzini Cc: Liang Z. Li Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/kernel/umip.c | 45 +++-- 1 file changed, 43 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c index cd3201a7deca..6e38b8f5d305 100644 --- a/arch/x86/kernel/umip.c +++ b/arch/x86/kernel/umip.c @@ -185,6 +185,41 @@ static int emulate_umip_insn(struct insn *insn, int umip_inst, } /** + * force_sig_info_umip_fault() - Force a SIGSEGV with SEGV_MAPERR + * @addr: Address that caused the signal + * @regs: Register set containing the instruction pointer + * + * Force a SIGSEGV signal with SEGV_MAPERR as the error code. This function is + * intended to be used to provide a segmentation fault when the result of the + * UMIP emulation could not be copied to the user space memory. + * + * Return: none + */ +static void force_sig_info_umip_fault(void __user *addr, struct pt_regs *regs) +{ + siginfo_t info; + struct task_struct *tsk = current; + + tsk->thread.cr2 = (unsigned long)addr; + tsk->thread.error_code = X86_PF_USER | X86_PF_WRITE; + tsk->thread.trap_nr = X86_TRAP_PF; + + info.si_signo = SIGSEGV; + info.si_errno = 0; + info.si_code= SEGV_MAPERR; + info.si_addr= addr; + force_sig_info(SIGSEGV, , tsk); + + if (!(show_unhandled_signals && unhandled_signal(tsk, SIGSEGV))) + return; + + pr_err_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx error:%x in %lx\n", + tsk->comm, task_pid_nr(tsk), regs->ip, + regs->sp, X86_PF_USER | X86_PF_WRITE, + regs->ip); +} + +/** * fixup_umip_exception() - Fixup #GP faults caused by UMIP * @regs: Registers as saved when entering the #GP trap * @@ -293,8 +328,14 @@ bool fixup_umip_exception(struct pt_regs *regs) return false; nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size); - if (nr_copied > 0) - return false; + if (nr_copied > 0) { + /* +* If copy fails, send a signal and tell caller that +* fault was fixed up. +*/ + force_sig_info_umip_fault(uaddr, regs); + return true; + } } /* increase IP to let the program keep going */ -- 2.13.0
[PATCH v8 05/28] x86/mpx: Use signed variables to compute effective addresses
Even though memory addresses are unsigned, the operands used to compute the effective address do have a sign. This is true for ModRM.rm, SIB.base, SIB.index as well as the displacement bytes. Thus, signed variables shall be used when computing the effective address from these operands. Once the signed effective address has been computed, it is casted to an unsigned long to determine the linear address. Variables are renamed to better reflect the type of address being computed. Cc: Borislav PetkovCc: Andy Lutomirski Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Peter Zijlstra Cc: Nathan Howard Cc: Adan Hawthorn Cc: Joe Perches Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/mm/mpx.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index 9ceaa955d2ba..9eec98022510 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -138,8 +138,9 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, */ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) { - unsigned long addr, base, indx; int addr_offset, base_offset, indx_offset; + unsigned long linear_addr; + long eff_addr, base, indx; insn_byte_t sib; insn_get_modrm(insn); @@ -150,7 +151,8 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); if (addr_offset < 0) goto out_err; - addr = regs_get_register(regs, addr_offset); + + eff_addr = regs_get_register(regs, addr_offset); } else { if (insn->sib.nbytes) { base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE); @@ -163,16 +165,22 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) base = regs_get_register(regs, base_offset); indx = regs_get_register(regs, indx_offset); - addr = base + indx * (1 << X86_SIB_SCALE(sib)); + + eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); if (addr_offset < 0) goto out_err; - addr = regs_get_register(regs, addr_offset); + + eff_addr = regs_get_register(regs, addr_offset); } - addr += insn->displacement.value; + + eff_addr += insn->displacement.value; } - return (void __user *)addr; + + linear_addr = (unsigned long)eff_addr; + + return (void __user *)linear_addr; out_err: return (void __user *)-1; } -- 2.13.0
[PATCH v8 24/28] x86/umip: Force a page fault when unable to copy emulated result to user
fixup_umip_exception() will be called from do_general_protection(). If the former returns false, the latter will issue a SIGSEGV with SEND_SIG_PRIV. However, when emulation is successful but the emulated result cannot be copied to user space memory, it is more accurate to issue a SIGSEGV with SEGV_MAPERR with the offending address. A new function, inspired in force_sig_info_fault(), is introduced to model the page fault. Cc: Andy Lutomirski Cc: Andrew Morton Cc: H. Peter Anvin Cc: Borislav Petkov Cc: Brian Gerst Cc: Chen Yucong Cc: Chris Metcalf Cc: Dave Hansen Cc: Fenghua Yu Cc: Huang Rui Cc: Jiri Slaby Cc: Jonathan Corbet Cc: Michael S. Tsirkin Cc: Paul Gortmaker Cc: Peter Zijlstra Cc: Ravi V. Shankar Cc: Shuah Khan Cc: Vlastimil Babka Cc: Tony Luck Cc: Paolo Bonzini Cc: Liang Z. Li Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/kernel/umip.c | 45 +++-- 1 file changed, 43 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c index cd3201a7deca..6e38b8f5d305 100644 --- a/arch/x86/kernel/umip.c +++ b/arch/x86/kernel/umip.c @@ -185,6 +185,41 @@ static int emulate_umip_insn(struct insn *insn, int umip_inst, } /** + * force_sig_info_umip_fault() - Force a SIGSEGV with SEGV_MAPERR + * @addr: Address that caused the signal + * @regs: Register set containing the instruction pointer + * + * Force a SIGSEGV signal with SEGV_MAPERR as the error code. This function is + * intended to be used to provide a segmentation fault when the result of the + * UMIP emulation could not be copied to the user space memory. + * + * Return: none + */ +static void force_sig_info_umip_fault(void __user *addr, struct pt_regs *regs) +{ + siginfo_t info; + struct task_struct *tsk = current; + + tsk->thread.cr2 = (unsigned long)addr; + tsk->thread.error_code = X86_PF_USER | X86_PF_WRITE; + tsk->thread.trap_nr = X86_TRAP_PF; + + info.si_signo = SIGSEGV; + info.si_errno = 0; + info.si_code= SEGV_MAPERR; + info.si_addr= addr; + force_sig_info(SIGSEGV, , tsk); + + if (!(show_unhandled_signals && unhandled_signal(tsk, SIGSEGV))) + return; + + pr_err_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx error:%x in %lx\n", + tsk->comm, task_pid_nr(tsk), regs->ip, + regs->sp, X86_PF_USER | X86_PF_WRITE, + regs->ip); +} + +/** * fixup_umip_exception() - Fixup #GP faults caused by UMIP * @regs: Registers as saved when entering the #GP trap * @@ -293,8 +328,14 @@ bool fixup_umip_exception(struct pt_regs *regs) return false; nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size); - if (nr_copied > 0) - return false; + if (nr_copied > 0) { + /* +* If copy fails, send a signal and tell caller that +* fault was fixed up. +*/ + force_sig_info_umip_fault(uaddr, regs); + return true; + } } /* increase IP to let the program keep going */ -- 2.13.0
[PATCH v8 05/28] x86/mpx: Use signed variables to compute effective addresses
Even though memory addresses are unsigned, the operands used to compute the effective address do have a sign. This is true for ModRM.rm, SIB.base, SIB.index as well as the displacement bytes. Thus, signed variables shall be used when computing the effective address from these operands. Once the signed effective address has been computed, it is casted to an unsigned long to determine the linear address. Variables are renamed to better reflect the type of address being computed. Cc: Borislav Petkov Cc: Andy Lutomirski Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Peter Zijlstra Cc: Nathan Howard Cc: Adan Hawthorn Cc: Joe Perches Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/mm/mpx.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index 9ceaa955d2ba..9eec98022510 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -138,8 +138,9 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, */ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) { - unsigned long addr, base, indx; int addr_offset, base_offset, indx_offset; + unsigned long linear_addr; + long eff_addr, base, indx; insn_byte_t sib; insn_get_modrm(insn); @@ -150,7 +151,8 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); if (addr_offset < 0) goto out_err; - addr = regs_get_register(regs, addr_offset); + + eff_addr = regs_get_register(regs, addr_offset); } else { if (insn->sib.nbytes) { base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE); @@ -163,16 +165,22 @@ static void __user *mpx_get_addr_ref(struct insn *insn, struct pt_regs *regs) base = regs_get_register(regs, base_offset); indx = regs_get_register(regs, indx_offset); - addr = base + indx * (1 << X86_SIB_SCALE(sib)); + + eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); if (addr_offset < 0) goto out_err; - addr = regs_get_register(regs, addr_offset); + + eff_addr = regs_get_register(regs, addr_offset); } - addr += insn->displacement.value; + + eff_addr += insn->displacement.value; } - return (void __user *)addr; + + linear_addr = (unsigned long)eff_addr; + + return (void __user *)linear_addr; out_err: return (void __user *)-1; } -- 2.13.0
[PATCH v8 22/28] x86/cpufeature: Add User-Mode Instruction Prevention definitions
User-Mode Instruction Prevention is a security feature present in new Intel processors that, when set, prevents the execution of a subset of instructions if such instructions are executed in user mode (CPL > 0). Attempting to execute such instructions causes a general protection exception. The subset of instructions comprises: * SGDT - Store Global Descriptor Table * SIDT - Store Interrupt Descriptor Table * SLDT - Store Local Descriptor Table * SMSW - Store Machine Status Word * STR - Store Task Register This feature is also added to the list of disabled-features to allow a cleaner handling of build-time configuration. Cc: Andy LutomirskiCc: Andrew Morton Cc: H. Peter Anvin Cc: Borislav Petkov Cc: Brian Gerst Cc: Chen Yucong Cc: Chris Metcalf Cc: Dave Hansen Cc: Fenghua Yu Cc: Huang Rui Cc: Jiri Slaby Cc: Jonathan Corbet Cc: Michael S. Tsirkin Cc: Paul Gortmaker Cc: Peter Zijlstra Cc: Ravi V. Shankar Cc: Shuah Khan Cc: Vlastimil Babka Cc: Tony Luck Cc: Paolo Bonzini Cc: Liang Z. Li Cc: x...@kernel.org Reviewed-by: Borislav Petkov Signed-off-by: Ricardo Neri --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h| 8 +++- arch/x86/include/uapi/asm/processor-flags.h | 2 ++ 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 42bbbf0f173d..7b1aa7fc8657 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -291,6 +291,7 @@ /* Intel-defined CPU features, CPUID level 0x0007:0 (ecx), word 16 */ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ +#define X86_FEATURE_UMIP (16*32+ 2) /* User Mode Instruction Protection */ #define X86_FEATURE_PKU(16*32+ 3) /* Protection Keys for Userspace */ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index c10c9128f54e..14d6d5007314 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -16,6 +16,12 @@ # define DISABLE_MPX (1<<(X86_FEATURE_MPX & 31)) #endif +#ifdef CONFIG_X86_INTEL_UMIP +# define DISABLE_UMIP 0 +#else +# define DISABLE_UMIP (1<<(X86_FEATURE_UMIP & 31)) +#endif + #ifdef CONFIG_X86_64 # define DISABLE_VME (1<<(X86_FEATURE_VME & 31)) # define DISABLE_K6_MTRR (1<<(X86_FEATURE_K6_MTRR & 31)) @@ -63,7 +69,7 @@ #define DISABLED_MASK130 #define DISABLED_MASK140 #define DISABLED_MASK150 -#define DISABLED_MASK16(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57) +#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP) #define DISABLED_MASK170 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18) diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h index aae1f2aa7563..6ee8425b92d9 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -104,6 +104,8 @@ #define X86_CR4_OSFXSR _BITUL(X86_CR4_OSFXSR_BIT) #define X86_CR4_OSXMMEXCPT_BIT 10 /* enable unmasked SSE exceptions */ #define X86_CR4_OSXMMEXCPT _BITUL(X86_CR4_OSXMMEXCPT_BIT) +#define X86_CR4_UMIP_BIT 11 /* enable UMIP support */ +#define X86_CR4_UMIP _BITUL(X86_CR4_UMIP_BIT) #define X86_CR4_LA57_BIT 12 /* enable 5-level page tables */ #define X86_CR4_LA57 _BITUL(X86_CR4_LA57_BIT) #define X86_CR4_VMXE_BIT 13 /* enable VMX virtualization */ -- 2.13.0
[PATCH v8 22/28] x86/cpufeature: Add User-Mode Instruction Prevention definitions
User-Mode Instruction Prevention is a security feature present in new Intel processors that, when set, prevents the execution of a subset of instructions if such instructions are executed in user mode (CPL > 0). Attempting to execute such instructions causes a general protection exception. The subset of instructions comprises: * SGDT - Store Global Descriptor Table * SIDT - Store Interrupt Descriptor Table * SLDT - Store Local Descriptor Table * SMSW - Store Machine Status Word * STR - Store Task Register This feature is also added to the list of disabled-features to allow a cleaner handling of build-time configuration. Cc: Andy Lutomirski Cc: Andrew Morton Cc: H. Peter Anvin Cc: Borislav Petkov Cc: Brian Gerst Cc: Chen Yucong Cc: Chris Metcalf Cc: Dave Hansen Cc: Fenghua Yu Cc: Huang Rui Cc: Jiri Slaby Cc: Jonathan Corbet Cc: Michael S. Tsirkin Cc: Paul Gortmaker Cc: Peter Zijlstra Cc: Ravi V. Shankar Cc: Shuah Khan Cc: Vlastimil Babka Cc: Tony Luck Cc: Paolo Bonzini Cc: Liang Z. Li Cc: x...@kernel.org Reviewed-by: Borislav Petkov Signed-off-by: Ricardo Neri --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h| 8 +++- arch/x86/include/uapi/asm/processor-flags.h | 2 ++ 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 42bbbf0f173d..7b1aa7fc8657 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -291,6 +291,7 @@ /* Intel-defined CPU features, CPUID level 0x0007:0 (ecx), word 16 */ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ +#define X86_FEATURE_UMIP (16*32+ 2) /* User Mode Instruction Protection */ #define X86_FEATURE_PKU(16*32+ 3) /* Protection Keys for Userspace */ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index c10c9128f54e..14d6d5007314 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -16,6 +16,12 @@ # define DISABLE_MPX (1<<(X86_FEATURE_MPX & 31)) #endif +#ifdef CONFIG_X86_INTEL_UMIP +# define DISABLE_UMIP 0 +#else +# define DISABLE_UMIP (1<<(X86_FEATURE_UMIP & 31)) +#endif + #ifdef CONFIG_X86_64 # define DISABLE_VME (1<<(X86_FEATURE_VME & 31)) # define DISABLE_K6_MTRR (1<<(X86_FEATURE_K6_MTRR & 31)) @@ -63,7 +69,7 @@ #define DISABLED_MASK130 #define DISABLED_MASK140 #define DISABLED_MASK150 -#define DISABLED_MASK16(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57) +#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP) #define DISABLED_MASK170 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18) diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h index aae1f2aa7563..6ee8425b92d9 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -104,6 +104,8 @@ #define X86_CR4_OSFXSR _BITUL(X86_CR4_OSFXSR_BIT) #define X86_CR4_OSXMMEXCPT_BIT 10 /* enable unmasked SSE exceptions */ #define X86_CR4_OSXMMEXCPT _BITUL(X86_CR4_OSXMMEXCPT_BIT) +#define X86_CR4_UMIP_BIT 11 /* enable UMIP support */ +#define X86_CR4_UMIP _BITUL(X86_CR4_UMIP_BIT) #define X86_CR4_LA57_BIT 12 /* enable 5-level page tables */ #define X86_CR4_LA57 _BITUL(X86_CR4_LA57_BIT) #define X86_CR4_VMXE_BIT 13 /* enable VMX virtualization */ -- 2.13.0
[PATCH v8 23/28] x86: Add emulation code for UMIP instructions
The feature User-Mode Instruction Prevention present in recent Intel processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and str) from being executed with CPL > 0. Otherwise, a general protection fault is issued. Rather than relaying to the user space the general protection fault caused by the UMIP-protected instructions (in the form of a SIGSEGV signal), it can be trapped and emulate the result of such instructions to provide dummy values. This allows to both conserve the current kernel behavior and not reveal the system resources that UMIP intends to protect (i.e., the locations of the global descriptor and interrupt descriptor tables, the segment selectors of the local descriptor table, the value of the task state register and the contents of the CR0 register). This emulation is needed because certain applications (e.g., WineHQ and DOSEMU2) rely on this subset of instructions to function. Given that sldt and str are not commonly used in programs that run on WineHQ or DOSEMU2, they are not emulated. Also, emulation is provided only for 32-bit processes; 64-bit processes that attempt to use the instructions that UMIP protects will receive the SIGSEGV signal issued as a consequence of the general protection fault. The instructions protected by UMIP can be split in two groups. Those which return a kernel memory address (sgdt and sidt) and those which return a value (sldt, str and smsw). For the instructions that return a kernel memory address, applications such as WineHQ rely on the result being located in the kernel memory space, not the actual location of the table. The result is emulated as a hard-coded value that lies close to the top of the kernel memory. The limit for the GDT and the IDT are set to zero. The instruction smsw is emulated to return the value that the register CR0 has at boot time as set in the head_32. Care is taken to appropriately emulate the results when segmentation is used. That is, rather than relying on USER_DS and USER_CS, the function insn_get_addr_ref() inspects the segment descriptor pointed by the registers in pt_regs. This ensures that we correctly obtain the segment base address and the address and operand sizes even if the user space application uses a local descriptor table. Cc: Andy LutomirskiCc: Andrew Morton Cc: H. Peter Anvin Cc: Borislav Petkov Cc: Brian Gerst Cc: Chen Yucong Cc: Chris Metcalf Cc: Dave Hansen Cc: Fenghua Yu Cc: Huang Rui Cc: Jiri Slaby Cc: Jonathan Corbet Cc: Michael S. Tsirkin Cc: Paul Gortmaker Cc: Peter Zijlstra Cc: Ravi V. Shankar Cc: Shuah Khan Cc: Vlastimil Babka Cc: Tony Luck Cc: Paolo Bonzini Cc: Liang Z. Li Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/include/asm/umip.h | 12 ++ arch/x86/kernel/Makefile| 1 + arch/x86/kernel/umip.c | 303 3 files changed, 316 insertions(+) create mode 100644 arch/x86/include/asm/umip.h create mode 100644 arch/x86/kernel/umip.c diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h new file mode 100644 index ..db43f2a0d92c --- /dev/null +++ b/arch/x86/include/asm/umip.h @@ -0,0 +1,12 @@ +#ifndef _ASM_X86_UMIP_H +#define _ASM_X86_UMIP_H + +#include +#include + +#ifdef CONFIG_X86_INTEL_UMIP +bool fixup_umip_exception(struct pt_regs *regs); +#else +static inline bool fixup_umip_exception(struct pt_regs *regs) { return false; } +#endif /* CONFIG_X86_INTEL_UMIP */ +#endif /* _ASM_X86_UMIP_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 287eac7d207f..e057e22cd0d4 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -125,6 +125,7 @@ obj-$(CONFIG_EFI) += sysfb_efi.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o obj-$(CONFIG_TRACING) += tracepoint.o obj-$(CONFIG_SCHED_MC_PRIO)+= itmt.o +obj-$(CONFIG_X86_INTEL_UMIP) += umip.o obj-$(CONFIG_ORC_UNWINDER) += unwind_orc.o obj-$(CONFIG_FRAME_POINTER_UNWINDER) += unwind_frame.o diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c new file mode 100644 index ..cd3201a7deca --- /dev/null +++ b/arch/x86/kernel/umip.c @@ -0,0 +1,303 @@ +/* + * umip.c Emulation for instruction protected by the Intel User-Mode + * Instruction Prevention feature + * + * Copyright (c) 2017, Intel Corporation. + * Ricardo Neri + */ + +#include +#include +#include +#include +#include +#include
[PATCH v8 23/28] x86: Add emulation code for UMIP instructions
The feature User-Mode Instruction Prevention present in recent Intel processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and str) from being executed with CPL > 0. Otherwise, a general protection fault is issued. Rather than relaying to the user space the general protection fault caused by the UMIP-protected instructions (in the form of a SIGSEGV signal), it can be trapped and emulate the result of such instructions to provide dummy values. This allows to both conserve the current kernel behavior and not reveal the system resources that UMIP intends to protect (i.e., the locations of the global descriptor and interrupt descriptor tables, the segment selectors of the local descriptor table, the value of the task state register and the contents of the CR0 register). This emulation is needed because certain applications (e.g., WineHQ and DOSEMU2) rely on this subset of instructions to function. Given that sldt and str are not commonly used in programs that run on WineHQ or DOSEMU2, they are not emulated. Also, emulation is provided only for 32-bit processes; 64-bit processes that attempt to use the instructions that UMIP protects will receive the SIGSEGV signal issued as a consequence of the general protection fault. The instructions protected by UMIP can be split in two groups. Those which return a kernel memory address (sgdt and sidt) and those which return a value (sldt, str and smsw). For the instructions that return a kernel memory address, applications such as WineHQ rely on the result being located in the kernel memory space, not the actual location of the table. The result is emulated as a hard-coded value that lies close to the top of the kernel memory. The limit for the GDT and the IDT are set to zero. The instruction smsw is emulated to return the value that the register CR0 has at boot time as set in the head_32. Care is taken to appropriately emulate the results when segmentation is used. That is, rather than relying on USER_DS and USER_CS, the function insn_get_addr_ref() inspects the segment descriptor pointed by the registers in pt_regs. This ensures that we correctly obtain the segment base address and the address and operand sizes even if the user space application uses a local descriptor table. Cc: Andy Lutomirski Cc: Andrew Morton Cc: H. Peter Anvin Cc: Borislav Petkov Cc: Brian Gerst Cc: Chen Yucong Cc: Chris Metcalf Cc: Dave Hansen Cc: Fenghua Yu Cc: Huang Rui Cc: Jiri Slaby Cc: Jonathan Corbet Cc: Michael S. Tsirkin Cc: Paul Gortmaker Cc: Peter Zijlstra Cc: Ravi V. Shankar Cc: Shuah Khan Cc: Vlastimil Babka Cc: Tony Luck Cc: Paolo Bonzini Cc: Liang Z. Li Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/include/asm/umip.h | 12 ++ arch/x86/kernel/Makefile| 1 + arch/x86/kernel/umip.c | 303 3 files changed, 316 insertions(+) create mode 100644 arch/x86/include/asm/umip.h create mode 100644 arch/x86/kernel/umip.c diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h new file mode 100644 index ..db43f2a0d92c --- /dev/null +++ b/arch/x86/include/asm/umip.h @@ -0,0 +1,12 @@ +#ifndef _ASM_X86_UMIP_H +#define _ASM_X86_UMIP_H + +#include +#include + +#ifdef CONFIG_X86_INTEL_UMIP +bool fixup_umip_exception(struct pt_regs *regs); +#else +static inline bool fixup_umip_exception(struct pt_regs *regs) { return false; } +#endif /* CONFIG_X86_INTEL_UMIP */ +#endif /* _ASM_X86_UMIP_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 287eac7d207f..e057e22cd0d4 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -125,6 +125,7 @@ obj-$(CONFIG_EFI) += sysfb_efi.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o obj-$(CONFIG_TRACING) += tracepoint.o obj-$(CONFIG_SCHED_MC_PRIO)+= itmt.o +obj-$(CONFIG_X86_INTEL_UMIP) += umip.o obj-$(CONFIG_ORC_UNWINDER) += unwind_orc.o obj-$(CONFIG_FRAME_POINTER_UNWINDER) += unwind_frame.o diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c new file mode 100644 index ..cd3201a7deca --- /dev/null +++ b/arch/x86/kernel/umip.c @@ -0,0 +1,303 @@ +/* + * umip.c Emulation for instruction protected by the Intel User-Mode + * Instruction Prevention feature + * + * Copyright (c) 2017, Intel Corporation. + * Ricardo Neri + */ + +#include +#include +#include +#include +#include +#include + +/** DOC: Emulation for User-Mode Instruction Prevention (UMIP) + * + * The feature User-Mode Instruction Prevention present in recent Intel + * processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and str) + * from being executed with CPL > 0. Otherwise, a general protection fault is + * issued. + * + * Rather than relaying to the user space the general protection fault caused by + * the UMIP-protected instructions (in the form of a SIGSEGV signal), it can be + * trapped and emulate
[PATCH v8 09/28] x86/insn-eval: Do not BUG on invalid register type
We are not in a critical failure path. The invalid register type is caused when trying to decode invalid instruction bytes from a user-space program. Thus, simply print an error message. To prevent this warning from being abused from user space programs, use the rate-limited variant of pr_err(). Cc: Borislav PetkovCc: Andy Lutomirski Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 2bb8303ba92f..3919458fecbf 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -5,6 +5,7 @@ */ #include #include +#include #include #include #include @@ -85,9 +86,8 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, break; default: - pr_err("invalid register type"); - BUG(); - break; + pr_err_ratelimited("insn: x86: invalid register type"); + return -EINVAL; } if (regno >= nr_registers) { -- 2.13.0
[PATCH v8 09/28] x86/insn-eval: Do not BUG on invalid register type
We are not in a critical failure path. The invalid register type is caused when trying to decode invalid instruction bytes from a user-space program. Thus, simply print an error message. To prevent this warning from being abused from user space programs, use the rate-limited variant of pr_err(). Cc: Borislav Petkov Cc: Andy Lutomirski Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 2bb8303ba92f..3919458fecbf 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -5,6 +5,7 @@ */ #include #include +#include #include #include #include @@ -85,9 +86,8 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, break; default: - pr_err("invalid register type"); - BUG(); - break; + pr_err_ratelimited("insn: x86: invalid register type"); + return -EINVAL; } if (regno >= nr_registers) { -- 2.13.0
[PATCH v8 25/28] x86: Enable User-Mode Instruction Prevention
User-Mode Instruction Prevention (UMIP) is enabled by setting/clearing a bit in %cr4. It makes sense to enable UMIP at some point while booting, before user spaces come up. Like SMAP and SMEP, is not critical to have it enabled very early during boot. This is because UMIP is relevant only when there is a userspace to be protected from. Given the similarities in relevance, it makes sense to enable UMIP along with SMAP and SMEP. UMIP is enabled by default. It can be disabled by adding clearcpuid=514 to the kernel parameters. Cc: Andy LutomirskiCc: Andrew Morton Cc: H. Peter Anvin Cc: Borislav Petkov Cc: Brian Gerst Cc: Chen Yucong Cc: Chris Metcalf Cc: Dave Hansen Cc: Fenghua Yu Cc: Huang Rui Cc: Jiri Slaby Cc: Jonathan Corbet Cc: Michael S. Tsirkin Cc: Paul Gortmaker Cc: Peter Zijlstra Cc: Ravi V. Shankar Cc: Shuah Khan Cc: Vlastimil Babka Cc: Tony Luck Cc: Paolo Bonzini Cc: Liang Z. Li Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/Kconfig | 10 ++ arch/x86/kernel/cpu/common.c | 25 - 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index ce3ed304288d..5c384d926937 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1801,6 +1801,16 @@ config X86_SMAP If unsure, say Y. +config X86_INTEL_UMIP + def_bool n + depends on CPU_SUP_INTEL + prompt "Intel User Mode Instruction Prevention" if EXPERT + ---help--- + The User Mode Instruction Prevention (UMIP) is a security + feature in newer Intel processors. If enabled, a general + protection fault is issued if the instructions SGDT, SLDT, + SIDT, SMSW and STR are executed in user mode. + config X86_INTEL_MPX prompt "Intel MPX (Memory Protection Extensions)" def_bool n diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index b95cd94ca97b..5066d7ffa55e 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -348,6 +348,28 @@ static void setup_pcid(struct cpuinfo_x86 *c) } } +static __always_inline void setup_umip(struct cpuinfo_x86 *c) +{ + /* Check the boot processor, plus build option for UMIP. */ + if (!cpu_feature_enabled(X86_FEATURE_UMIP)) + goto out; + + /* Check the current processor's cpuid bits. */ + if (!cpu_has(c, X86_FEATURE_UMIP)) + goto out; + + cr4_set_bits(X86_CR4_UMIP); + + return; + +out: + /* +* Make sure UMIP is disabled in case it was enabled in a +* previous boot (e.g., via kexec). +*/ + cr4_clear_bits(X86_CR4_UMIP); +} + /* * Protection Keys are not available in 32-bit mode. */ @@ -1158,9 +1180,10 @@ static void identify_cpu(struct cpuinfo_x86 *c) /* Disable the PN if appropriate */ squash_the_stupid_serial_number(c); - /* Set up SMEP/SMAP */ + /* Set up SMEP/SMAP/UMIP */ setup_smep(c); setup_smap(c); + setup_umip(c); /* Set up PCID */ setup_pcid(c); -- 2.13.0
[PATCH v8 25/28] x86: Enable User-Mode Instruction Prevention
User-Mode Instruction Prevention (UMIP) is enabled by setting/clearing a bit in %cr4. It makes sense to enable UMIP at some point while booting, before user spaces come up. Like SMAP and SMEP, is not critical to have it enabled very early during boot. This is because UMIP is relevant only when there is a userspace to be protected from. Given the similarities in relevance, it makes sense to enable UMIP along with SMAP and SMEP. UMIP is enabled by default. It can be disabled by adding clearcpuid=514 to the kernel parameters. Cc: Andy Lutomirski Cc: Andrew Morton Cc: H. Peter Anvin Cc: Borislav Petkov Cc: Brian Gerst Cc: Chen Yucong Cc: Chris Metcalf Cc: Dave Hansen Cc: Fenghua Yu Cc: Huang Rui Cc: Jiri Slaby Cc: Jonathan Corbet Cc: Michael S. Tsirkin Cc: Paul Gortmaker Cc: Peter Zijlstra Cc: Ravi V. Shankar Cc: Shuah Khan Cc: Vlastimil Babka Cc: Tony Luck Cc: Paolo Bonzini Cc: Liang Z. Li Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/Kconfig | 10 ++ arch/x86/kernel/cpu/common.c | 25 - 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index ce3ed304288d..5c384d926937 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1801,6 +1801,16 @@ config X86_SMAP If unsure, say Y. +config X86_INTEL_UMIP + def_bool n + depends on CPU_SUP_INTEL + prompt "Intel User Mode Instruction Prevention" if EXPERT + ---help--- + The User Mode Instruction Prevention (UMIP) is a security + feature in newer Intel processors. If enabled, a general + protection fault is issued if the instructions SGDT, SLDT, + SIDT, SMSW and STR are executed in user mode. + config X86_INTEL_MPX prompt "Intel MPX (Memory Protection Extensions)" def_bool n diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index b95cd94ca97b..5066d7ffa55e 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -348,6 +348,28 @@ static void setup_pcid(struct cpuinfo_x86 *c) } } +static __always_inline void setup_umip(struct cpuinfo_x86 *c) +{ + /* Check the boot processor, plus build option for UMIP. */ + if (!cpu_feature_enabled(X86_FEATURE_UMIP)) + goto out; + + /* Check the current processor's cpuid bits. */ + if (!cpu_has(c, X86_FEATURE_UMIP)) + goto out; + + cr4_set_bits(X86_CR4_UMIP); + + return; + +out: + /* +* Make sure UMIP is disabled in case it was enabled in a +* previous boot (e.g., via kexec). +*/ + cr4_clear_bits(X86_CR4_UMIP); +} + /* * Protection Keys are not available in 32-bit mode. */ @@ -1158,9 +1180,10 @@ static void identify_cpu(struct cpuinfo_x86 *c) /* Disable the PN if appropriate */ squash_the_stupid_serial_number(c); - /* Set up SMEP/SMAP */ + /* Set up SMEP/SMAP/UMIP */ setup_smep(c); setup_smap(c); + setup_umip(c); /* Set up PCID */ setup_pcid(c); -- 2.13.0
[PATCH v8 26/28] x86/traps: Fixup general protection faults caused by UMIP
If the User-Mode Instruction Prevention CPU feature is available and enabled, a general protection fault will be issued if the instructions sgdt, sldt, sidt, str or smsw are executed from user-mode context (CPL > 0). If the fault was caused by any of the instructions protected by UMIP, fixup_umip_exception() will emulate dummy results for these instructions as follows: if running a 32-bit process, sgdt, sidt and smsw are emulated; str and sldt are not emulated. No emulation is done for 64-bit processes. If emulation is successful, the result is passed to the user space program and no SIGSEGV signal is emitted. Please note that fixup_umip_exception() also caters for the case when the fault originated while running in virtual-8086 mode. Cc: Andy LutomirskiCc: Andrew Morton Cc: H. Peter Anvin Cc: Borislav Petkov Cc: Brian Gerst Cc: Chen Yucong Cc: Chris Metcalf Cc: Dave Hansen Cc: Fenghua Yu Cc: Huang Rui Cc: Jiri Slaby Cc: Jonathan Corbet Cc: Michael S. Tsirkin Cc: Paul Gortmaker Cc: Peter Zijlstra Cc: Ravi V. Shankar Cc: Shuah Khan Cc: Vlastimil Babka Cc: Tony Luck Cc: Paolo Bonzini Cc: Liang Z. Li Cc: x...@kernel.org Reviewed-by: Andy Lutomirski Signed-off-by: Ricardo Neri --- arch/x86/kernel/traps.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index bf54309b85da..1c1bb7992f70 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -65,6 +65,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 #include @@ -526,6 +527,10 @@ do_general_protection(struct pt_regs *regs, long error_code) RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU"); cond_local_irq_enable(regs); + if (static_cpu_has(X86_FEATURE_UMIP)) + if (user_mode(regs) && fixup_umip_exception(regs)) + return; + if (v8086_mode(regs)) { local_irq_enable(); handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code); -- 2.13.0
[PATCH v8 26/28] x86/traps: Fixup general protection faults caused by UMIP
If the User-Mode Instruction Prevention CPU feature is available and enabled, a general protection fault will be issued if the instructions sgdt, sldt, sidt, str or smsw are executed from user-mode context (CPL > 0). If the fault was caused by any of the instructions protected by UMIP, fixup_umip_exception() will emulate dummy results for these instructions as follows: if running a 32-bit process, sgdt, sidt and smsw are emulated; str and sldt are not emulated. No emulation is done for 64-bit processes. If emulation is successful, the result is passed to the user space program and no SIGSEGV signal is emitted. Please note that fixup_umip_exception() also caters for the case when the fault originated while running in virtual-8086 mode. Cc: Andy Lutomirski Cc: Andrew Morton Cc: H. Peter Anvin Cc: Borislav Petkov Cc: Brian Gerst Cc: Chen Yucong Cc: Chris Metcalf Cc: Dave Hansen Cc: Fenghua Yu Cc: Huang Rui Cc: Jiri Slaby Cc: Jonathan Corbet Cc: Michael S. Tsirkin Cc: Paul Gortmaker Cc: Peter Zijlstra Cc: Ravi V. Shankar Cc: Shuah Khan Cc: Vlastimil Babka Cc: Tony Luck Cc: Paolo Bonzini Cc: Liang Z. Li Cc: x...@kernel.org Reviewed-by: Andy Lutomirski Signed-off-by: Ricardo Neri --- arch/x86/kernel/traps.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index bf54309b85da..1c1bb7992f70 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -65,6 +65,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 #include @@ -526,6 +527,10 @@ do_general_protection(struct pt_regs *regs, long error_code) RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU"); cond_local_irq_enable(regs); + if (static_cpu_has(X86_FEATURE_UMIP)) + if (user_mode(regs) && fixup_umip_exception(regs)) + return; + if (v8086_mode(regs)) { local_irq_enable(); handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code); -- 2.13.0
[PATCH v8 27/28] selftests/x86: Add tests for User-Mode Instruction Prevention
Certain user space programs that run on virtual-8086 mode may utilize instructions protected by the User-Mode Instruction Prevention (UMIP) security feature present in new Intel processors: SGDT, SIDT and SMSW. In such a case, a general protection fault is issued if UMIP is enabled. When such a fault happens, the kernel traps it and emulates the results of these instructions with dummy values. The purpose of this new test is to verify whether the impacted instructions can be executed without causing such #GP. If no #GP exceptions occur, we expect to exit virtual-8086 mode from INT3. The instructions protected by UMIP are executed in representative use cases: a) displacement-only memory addressing b) register-indirect memory addressing c) results stored directly in operands Unfortunately, it is not possible to check the results against a set of expected values because no emulation will occur in systems that do not have the UMIP feature. Instead, results are printed for verification. A simple verification is done to ensure that results of all tests are identical. Cc: Andy LutomirskiCc: Andrew Morton Cc: Borislav Petkov Cc: Brian Gerst Cc: Chen Yucong Cc: Chris Metcalf Cc: Dave Hansen Cc: Fenghua Yu Cc: Huang Rui Cc: Jiri Slaby Cc: Jonathan Corbet Cc: Michael S. Tsirkin Cc: Paul Gortmaker Cc: Peter Zijlstra Cc: Ravi V. Shankar Cc: Shuah Khan Cc: Vlastimil Babka Signed-off-by: Ricardo Neri --- tools/testing/selftests/x86/entry_from_vm86.c | 73 ++- 1 file changed, 72 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/x86/entry_from_vm86.c b/tools/testing/selftests/x86/entry_from_vm86.c index d075ea0e5ca1..130e8ad1db05 100644 --- a/tools/testing/selftests/x86/entry_from_vm86.c +++ b/tools/testing/selftests/x86/entry_from_vm86.c @@ -95,6 +95,22 @@ asm ( "int3\n\t" "vmcode_int80:\n\t" "int $0x80\n\t" + "vmcode_umip:\n\t" + /* addressing via displacements */ + "smsw (2052)\n\t" + "sidt (2054)\n\t" + "sgdt (2060)\n\t" + /* addressing via registers */ + "mov $2066, %bx\n\t" + "smsw (%bx)\n\t" + "mov $2068, %bx\n\t" + "sidt (%bx)\n\t" + "mov $2074, %bx\n\t" + "sgdt (%bx)\n\t" + /* register operands, only for smsw */ + "smsw %ax\n\t" + "mov %ax, (2080)\n\t" + "int3\n\t" ".size vmcode, . - vmcode\n\t" "end_vmcode:\n\t" ".code32\n\t" @@ -103,7 +119,7 @@ asm ( extern unsigned char vmcode[], end_vmcode[]; extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[], - vmcode_sti[], vmcode_int3[], vmcode_int80[]; + vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[]; /* Returns false if the test was skipped. */ static bool do_test(struct vm86plus_struct *v86, unsigned long eip, @@ -160,6 +176,58 @@ static bool do_test(struct vm86plus_struct *v86, unsigned long eip, return true; } +void do_umip_tests(struct vm86plus_struct *vm86, unsigned char *test_mem) +{ + struct table_desc { + unsigned short limit; + unsigned long base; + } __attribute__((packed)); + + /* Initialize variables with arbitrary values */ + struct table_desc gdt1 = { .base = 0x3c3c3c3c, .limit = 0x }; + struct table_desc gdt2 = { .base = 0x1a1a1a1a, .limit = 0xaeae }; + struct table_desc idt1 = { .base = 0x7b7b7b7b, .limit = 0xf1f1 }; + struct table_desc idt2 = { .base = 0x89898989, .limit = 0x1313 }; + unsigned short msw1 = 0x1414, msw2 = 0x2525, msw3 = 3737; + + /* UMIP -- exit with INT3 unless kernel emulation did not trap #GP */ + do_test(vm86, vmcode_umip - vmcode, VM86_TRAP, 3, "UMIP tests"); + + /* Results from displacement-only addressing */ + msw1 = *(unsigned short *)(test_mem + 2052); + memcpy(, test_mem + 2054, sizeof(idt1)); + memcpy(, test_mem + 2060, sizeof(gdt1)); + + /* Results from register-indirect addressing */ + msw2 = *(unsigned short *)(test_mem + 2066); + memcpy(, test_mem + 2068, sizeof(idt2)); + memcpy(, test_mem + 2074, sizeof(gdt2)); + + /* Results when using register operands */ + msw3 = *(unsigned short *)(test_mem + 2080); + + printf("[INFO]\tResult from SMSW:[0x%04x]\n", msw1); + printf("[INFO]\tResult from SIDT: limit[0x%04x]base[0x%08lx]\n", + idt1.limit, idt1.base); + printf("[INFO]\tResult from SGDT: limit[0x%04x]base[0x%08lx]\n", + gdt1.limit, gdt1.base); + + if ((msw1 != msw2) ||
[PATCH v8 28/28] selftests/x86: Add tests for instruction str and sldt
The instructions str and sldt are not recognized when running on virtual- 8086 mode and generate an invalid operand exception. These two instructions are protected by the Intel User-Mode Instruction Prevention (UMIP) security feature. In protected mode, if UMIP is enabled, these instructions generate a general protection fault if called from CPL > 0. Linux traps the general protection fault and emulates the instructions sgdt, sidt and smsw; but not str and sldt. These tests are added to verify that the emulation code does not emulate these two instructions but the expected invalid operand exception is seen. Tests fallback to exit with int3 in case emulation does happen. Cc: Andy LutomirskiCc: Andrew Morton Cc: Borislav Petkov Cc: Brian Gerst Cc: Chen Yucong Cc: Chris Metcalf Cc: Dave Hansen Cc: Fenghua Yu Cc: Huang Rui Cc: Jiri Slaby Cc: Jonathan Corbet Cc: Michael S. Tsirkin Cc: Paul Gortmaker Cc: Peter Zijlstra Cc: Ravi V. Shankar Cc: Shuah Khan Cc: Vlastimil Babka Signed-off-by: Ricardo Neri --- tools/testing/selftests/x86/entry_from_vm86.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/x86/entry_from_vm86.c b/tools/testing/selftests/x86/entry_from_vm86.c index 130e8ad1db05..b7a0c9024477 100644 --- a/tools/testing/selftests/x86/entry_from_vm86.c +++ b/tools/testing/selftests/x86/entry_from_vm86.c @@ -111,6 +111,11 @@ asm ( "smsw %ax\n\t" "mov %ax, (2080)\n\t" "int3\n\t" + "vmcode_umip_str:\n\t" + "str %eax\n\t" + "vmcode_umip_sldt:\n\t" + "sldt %eax\n\t" + "int3\n\t" ".size vmcode, . - vmcode\n\t" "end_vmcode:\n\t" ".code32\n\t" @@ -119,7 +124,8 @@ asm ( extern unsigned char vmcode[], end_vmcode[]; extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[], - vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[]; + vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[], + vmcode_umip_str[], vmcode_umip_sldt[]; /* Returns false if the test was skipped. */ static bool do_test(struct vm86plus_struct *v86, unsigned long eip, @@ -226,6 +232,16 @@ void do_umip_tests(struct vm86plus_struct *vm86, unsigned char *test_mem) printf("[FAIL]\tAll the results of SIDT should be the same.\n"); else printf("[PASS]\tAll the results from SIDT are identical.\n"); + + sethandler(SIGILL, sighandler, 0); + do_test(vm86, vmcode_umip_str - vmcode, VM86_SIGNAL, 0, + "STR instruction"); + clearhandler(SIGILL); + + sethandler(SIGILL, sighandler, 0); + do_test(vm86, vmcode_umip_sldt - vmcode, VM86_SIGNAL, 0, + "SLDT instruction"); + clearhandler(SIGILL); } int main(void) -- 2.13.0
[PATCH v8 27/28] selftests/x86: Add tests for User-Mode Instruction Prevention
Certain user space programs that run on virtual-8086 mode may utilize instructions protected by the User-Mode Instruction Prevention (UMIP) security feature present in new Intel processors: SGDT, SIDT and SMSW. In such a case, a general protection fault is issued if UMIP is enabled. When such a fault happens, the kernel traps it and emulates the results of these instructions with dummy values. The purpose of this new test is to verify whether the impacted instructions can be executed without causing such #GP. If no #GP exceptions occur, we expect to exit virtual-8086 mode from INT3. The instructions protected by UMIP are executed in representative use cases: a) displacement-only memory addressing b) register-indirect memory addressing c) results stored directly in operands Unfortunately, it is not possible to check the results against a set of expected values because no emulation will occur in systems that do not have the UMIP feature. Instead, results are printed for verification. A simple verification is done to ensure that results of all tests are identical. Cc: Andy Lutomirski Cc: Andrew Morton Cc: Borislav Petkov Cc: Brian Gerst Cc: Chen Yucong Cc: Chris Metcalf Cc: Dave Hansen Cc: Fenghua Yu Cc: Huang Rui Cc: Jiri Slaby Cc: Jonathan Corbet Cc: Michael S. Tsirkin Cc: Paul Gortmaker Cc: Peter Zijlstra Cc: Ravi V. Shankar Cc: Shuah Khan Cc: Vlastimil Babka Signed-off-by: Ricardo Neri --- tools/testing/selftests/x86/entry_from_vm86.c | 73 ++- 1 file changed, 72 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/x86/entry_from_vm86.c b/tools/testing/selftests/x86/entry_from_vm86.c index d075ea0e5ca1..130e8ad1db05 100644 --- a/tools/testing/selftests/x86/entry_from_vm86.c +++ b/tools/testing/selftests/x86/entry_from_vm86.c @@ -95,6 +95,22 @@ asm ( "int3\n\t" "vmcode_int80:\n\t" "int $0x80\n\t" + "vmcode_umip:\n\t" + /* addressing via displacements */ + "smsw (2052)\n\t" + "sidt (2054)\n\t" + "sgdt (2060)\n\t" + /* addressing via registers */ + "mov $2066, %bx\n\t" + "smsw (%bx)\n\t" + "mov $2068, %bx\n\t" + "sidt (%bx)\n\t" + "mov $2074, %bx\n\t" + "sgdt (%bx)\n\t" + /* register operands, only for smsw */ + "smsw %ax\n\t" + "mov %ax, (2080)\n\t" + "int3\n\t" ".size vmcode, . - vmcode\n\t" "end_vmcode:\n\t" ".code32\n\t" @@ -103,7 +119,7 @@ asm ( extern unsigned char vmcode[], end_vmcode[]; extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[], - vmcode_sti[], vmcode_int3[], vmcode_int80[]; + vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[]; /* Returns false if the test was skipped. */ static bool do_test(struct vm86plus_struct *v86, unsigned long eip, @@ -160,6 +176,58 @@ static bool do_test(struct vm86plus_struct *v86, unsigned long eip, return true; } +void do_umip_tests(struct vm86plus_struct *vm86, unsigned char *test_mem) +{ + struct table_desc { + unsigned short limit; + unsigned long base; + } __attribute__((packed)); + + /* Initialize variables with arbitrary values */ + struct table_desc gdt1 = { .base = 0x3c3c3c3c, .limit = 0x }; + struct table_desc gdt2 = { .base = 0x1a1a1a1a, .limit = 0xaeae }; + struct table_desc idt1 = { .base = 0x7b7b7b7b, .limit = 0xf1f1 }; + struct table_desc idt2 = { .base = 0x89898989, .limit = 0x1313 }; + unsigned short msw1 = 0x1414, msw2 = 0x2525, msw3 = 3737; + + /* UMIP -- exit with INT3 unless kernel emulation did not trap #GP */ + do_test(vm86, vmcode_umip - vmcode, VM86_TRAP, 3, "UMIP tests"); + + /* Results from displacement-only addressing */ + msw1 = *(unsigned short *)(test_mem + 2052); + memcpy(, test_mem + 2054, sizeof(idt1)); + memcpy(, test_mem + 2060, sizeof(gdt1)); + + /* Results from register-indirect addressing */ + msw2 = *(unsigned short *)(test_mem + 2066); + memcpy(, test_mem + 2068, sizeof(idt2)); + memcpy(, test_mem + 2074, sizeof(gdt2)); + + /* Results when using register operands */ + msw3 = *(unsigned short *)(test_mem + 2080); + + printf("[INFO]\tResult from SMSW:[0x%04x]\n", msw1); + printf("[INFO]\tResult from SIDT: limit[0x%04x]base[0x%08lx]\n", + idt1.limit, idt1.base); + printf("[INFO]\tResult from SGDT: limit[0x%04x]base[0x%08lx]\n", + gdt1.limit, gdt1.base); + + if ((msw1 != msw2) || (msw1 != msw3)) + printf("[FAIL]\tAll the results of SMSW should be the same.\n"); + else + printf("[PASS]\tAll the results from SMSW are identical.\n"); + + if (memcmp(, , sizeof(gdt1))) + printf("[FAIL]\tAll the results of SGDT should be the same.\n"); + else + printf("[PASS]\tAll the results from SGDT are
[PATCH v8 28/28] selftests/x86: Add tests for instruction str and sldt
The instructions str and sldt are not recognized when running on virtual- 8086 mode and generate an invalid operand exception. These two instructions are protected by the Intel User-Mode Instruction Prevention (UMIP) security feature. In protected mode, if UMIP is enabled, these instructions generate a general protection fault if called from CPL > 0. Linux traps the general protection fault and emulates the instructions sgdt, sidt and smsw; but not str and sldt. These tests are added to verify that the emulation code does not emulate these two instructions but the expected invalid operand exception is seen. Tests fallback to exit with int3 in case emulation does happen. Cc: Andy Lutomirski Cc: Andrew Morton Cc: Borislav Petkov Cc: Brian Gerst Cc: Chen Yucong Cc: Chris Metcalf Cc: Dave Hansen Cc: Fenghua Yu Cc: Huang Rui Cc: Jiri Slaby Cc: Jonathan Corbet Cc: Michael S. Tsirkin Cc: Paul Gortmaker Cc: Peter Zijlstra Cc: Ravi V. Shankar Cc: Shuah Khan Cc: Vlastimil Babka Signed-off-by: Ricardo Neri --- tools/testing/selftests/x86/entry_from_vm86.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/x86/entry_from_vm86.c b/tools/testing/selftests/x86/entry_from_vm86.c index 130e8ad1db05..b7a0c9024477 100644 --- a/tools/testing/selftests/x86/entry_from_vm86.c +++ b/tools/testing/selftests/x86/entry_from_vm86.c @@ -111,6 +111,11 @@ asm ( "smsw %ax\n\t" "mov %ax, (2080)\n\t" "int3\n\t" + "vmcode_umip_str:\n\t" + "str %eax\n\t" + "vmcode_umip_sldt:\n\t" + "sldt %eax\n\t" + "int3\n\t" ".size vmcode, . - vmcode\n\t" "end_vmcode:\n\t" ".code32\n\t" @@ -119,7 +124,8 @@ asm ( extern unsigned char vmcode[], end_vmcode[]; extern unsigned char vmcode_bound[], vmcode_sysenter[], vmcode_syscall[], - vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[]; + vmcode_sti[], vmcode_int3[], vmcode_int80[], vmcode_umip[], + vmcode_umip_str[], vmcode_umip_sldt[]; /* Returns false if the test was skipped. */ static bool do_test(struct vm86plus_struct *v86, unsigned long eip, @@ -226,6 +232,16 @@ void do_umip_tests(struct vm86plus_struct *vm86, unsigned char *test_mem) printf("[FAIL]\tAll the results of SIDT should be the same.\n"); else printf("[PASS]\tAll the results from SIDT are identical.\n"); + + sethandler(SIGILL, sighandler, 0); + do_test(vm86, vmcode_umip_str - vmcode, VM86_SIGNAL, 0, + "STR instruction"); + clearhandler(SIGILL); + + sethandler(SIGILL, sighandler, 0); + do_test(vm86, vmcode_umip_sldt - vmcode, VM86_SIGNAL, 0, + "SLDT instruction"); + clearhandler(SIGILL); } int main(void) -- 2.13.0
[PATCH v8 11/28] x86/insn-eval: Add utility function to identify string instructions
String instructions are special because, in protected mode, the linear address is always obtained via the ES segment register in operands that use the (E)DI register; the DS segment register in operands that use the (E)SI register. Furthermore, segment override prefixes are ignored when calculating a linear address involving the (E)DI register; segment override prefixes can be used when calculating linear addresses involving the (E)SI register. It follows that linear addresses are calculated differently for the case of string instructions. The purpose of this utility function is to identify such instructions for callers to determine a linear address correctly. Note that this function only identifies string instructions; it does not determine what segment register to use in the address computation. That is left to callers. A subsequent commmit introduces a function to determine the segment register to use given the instruction, operands and segment override prefixes. Cc: Dave HansenCc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index c6120e9298f5..25b2eb3c64c1 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -16,6 +16,32 @@ enum reg_type { REG_TYPE_BASE, }; +/** + * is_string_insn() - Determine if instruction is a string instruction + * @insn: Instruction structure containing the opcode + * + * Return: true if the instruction, determined by the opcode, is any of the + * string instructions as defined in the Intel Software Development manual. + * False otherwise. + */ +static bool is_string_insn(struct insn *insn) +{ + insn_get_opcode(insn); + + /* All string instructions have a 1-byte opcode. */ + if (insn->opcode.nbytes != 1) + return false; + + switch (insn->opcode.bytes[0]) { + case 0x6c ... 0x6f: /* INS, OUTS */ + case 0xa4 ... 0xa7: /* MOVS, CMPS */ + case 0xaa ... 0xaf: /* STOS, LODS, SCAS */ + return true; + default: + return false; + } +} + static int get_reg_offset(struct insn *insn, struct pt_regs *regs, enum reg_type type) { -- 2.13.0
[PATCH v8 11/28] x86/insn-eval: Add utility function to identify string instructions
String instructions are special because, in protected mode, the linear address is always obtained via the ES segment register in operands that use the (E)DI register; the DS segment register in operands that use the (E)SI register. Furthermore, segment override prefixes are ignored when calculating a linear address involving the (E)DI register; segment override prefixes can be used when calculating linear addresses involving the (E)SI register. It follows that linear addresses are calculated differently for the case of string instructions. The purpose of this utility function is to identify such instructions for callers to determine a linear address correctly. Note that this function only identifies string instructions; it does not determine what segment register to use in the address computation. That is left to callers. A subsequent commmit introduces a function to determine the segment register to use given the instruction, operands and segment override prefixes. Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index c6120e9298f5..25b2eb3c64c1 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -16,6 +16,32 @@ enum reg_type { REG_TYPE_BASE, }; +/** + * is_string_insn() - Determine if instruction is a string instruction + * @insn: Instruction structure containing the opcode + * + * Return: true if the instruction, determined by the opcode, is any of the + * string instructions as defined in the Intel Software Development manual. + * False otherwise. + */ +static bool is_string_insn(struct insn *insn) +{ + insn_get_opcode(insn); + + /* All string instructions have a 1-byte opcode. */ + if (insn->opcode.nbytes != 1) + return false; + + switch (insn->opcode.bytes[0]) { + case 0x6c ... 0x6f: /* INS, OUTS */ + case 0xa4 ... 0xa7: /* MOVS, CMPS */ + case 0xaa ... 0xaf: /* STOS, LODS, SCAS */ + return true; + default: + return false; + } +} + static int get_reg_offset(struct insn *insn, struct pt_regs *regs, enum reg_type type) { -- 2.13.0
[PATCH v8 08/28] x86/mpx, x86/insn: Relocate insn util functions to a new insn-eval file
Other kernel submodules can benefit from using the utility functions defined in mpx.c to obtain the addresses and values of operands contained in the general purpose registers. An instance of this is the emulation code used for instructions protected by the Intel User-Mode Instruction Prevention feature. Thus, these functions are relocated to a new insn-eval.c file. The reason to not relocate these utilities into insn.c is that the latter solely analyses instructions given by a struct insn without any knowledge of the meaning of the values of instruction operands. This new utility insn- eval.c aims to be used to resolve and userspace linear addresses based on the contents of the instruction operands as well as the contents of pt_regs structure. These utilities come with a separate header. This is to avoid taking insn.c out of sync from the instructions decoders under tools/obj and tools/perf. This also avoids adding cumbersome #ifdef's for the #include'd files required to decode instructions in a kernel context. Functions are simply relocated. There are not functional or indentation changes. The checkpatch script issues the following warning with this commit: WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON() + BUG(); This warning will be fixed in a subsequent patch. Cc: Borislav PetkovCc: Andy Lutomirski Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/include/asm/insn-eval.h | 16 arch/x86/lib/Makefile| 2 +- arch/x86/lib/insn-eval.c | 163 +++ arch/x86/mm/mpx.c| 156 + 4 files changed, 182 insertions(+), 155 deletions(-) create mode 100644 arch/x86/include/asm/insn-eval.h create mode 100644 arch/x86/lib/insn-eval.c diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h new file mode 100644 index ..5cab1b1da84d --- /dev/null +++ b/arch/x86/include/asm/insn-eval.h @@ -0,0 +1,16 @@ +#ifndef _ASM_X86_INSN_EVAL_H +#define _ASM_X86_INSN_EVAL_H +/* + * A collection of utility functions for x86 instruction analysis to be + * used in a kernel context. Useful when, for instance, making sense + * of the registers indicated by operands. + */ + +#include +#include +#include +#include + +void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); + +#endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile index 34a74131a12c..675d7b075fed 100644 --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o lib-y += memcpy_$(BITS).o lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o -lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o +lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c new file mode 100644 index ..2bb8303ba92f --- /dev/null +++ b/arch/x86/lib/insn-eval.c @@ -0,0 +1,163 @@ +/* + * Utility functions for x86 operand and address decoding + * + * Copyright (C) Intel Corporation 2017 + */ +#include +#include +#include +#include +#include + +enum reg_type { + REG_TYPE_RM = 0, + REG_TYPE_INDEX, + REG_TYPE_BASE, +}; + +static int get_reg_offset(struct insn *insn, struct pt_regs *regs, + enum reg_type type) +{ + int regno = 0; + + static const int regoff[] = { + offsetof(struct pt_regs, ax), + offsetof(struct pt_regs, cx), + offsetof(struct pt_regs, dx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, sp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), +#ifdef CONFIG_X86_64 + offsetof(struct pt_regs, r8), + offsetof(struct pt_regs, r9), + offsetof(struct pt_regs, r10), + offsetof(struct pt_regs, r11), + offsetof(struct pt_regs, r12), + offsetof(struct pt_regs, r13), + offsetof(struct
[PATCH v8 08/28] x86/mpx, x86/insn: Relocate insn util functions to a new insn-eval file
Other kernel submodules can benefit from using the utility functions defined in mpx.c to obtain the addresses and values of operands contained in the general purpose registers. An instance of this is the emulation code used for instructions protected by the Intel User-Mode Instruction Prevention feature. Thus, these functions are relocated to a new insn-eval.c file. The reason to not relocate these utilities into insn.c is that the latter solely analyses instructions given by a struct insn without any knowledge of the meaning of the values of instruction operands. This new utility insn- eval.c aims to be used to resolve and userspace linear addresses based on the contents of the instruction operands as well as the contents of pt_regs structure. These utilities come with a separate header. This is to avoid taking insn.c out of sync from the instructions decoders under tools/obj and tools/perf. This also avoids adding cumbersome #ifdef's for the #include'd files required to decode instructions in a kernel context. Functions are simply relocated. There are not functional or indentation changes. The checkpatch script issues the following warning with this commit: WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON() + BUG(); This warning will be fixed in a subsequent patch. Cc: Borislav Petkov Cc: Andy Lutomirski Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/include/asm/insn-eval.h | 16 arch/x86/lib/Makefile| 2 +- arch/x86/lib/insn-eval.c | 163 +++ arch/x86/mm/mpx.c| 156 + 4 files changed, 182 insertions(+), 155 deletions(-) create mode 100644 arch/x86/include/asm/insn-eval.h create mode 100644 arch/x86/lib/insn-eval.c diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h new file mode 100644 index ..5cab1b1da84d --- /dev/null +++ b/arch/x86/include/asm/insn-eval.h @@ -0,0 +1,16 @@ +#ifndef _ASM_X86_INSN_EVAL_H +#define _ASM_X86_INSN_EVAL_H +/* + * A collection of utility functions for x86 instruction analysis to be + * used in a kernel context. Useful when, for instance, making sense + * of the registers indicated by operands. + */ + +#include +#include +#include +#include + +void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); + +#endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile index 34a74131a12c..675d7b075fed 100644 --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -23,7 +23,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o lib-y += memcpy_$(BITS).o lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o -lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o +lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c new file mode 100644 index ..2bb8303ba92f --- /dev/null +++ b/arch/x86/lib/insn-eval.c @@ -0,0 +1,163 @@ +/* + * Utility functions for x86 operand and address decoding + * + * Copyright (C) Intel Corporation 2017 + */ +#include +#include +#include +#include +#include + +enum reg_type { + REG_TYPE_RM = 0, + REG_TYPE_INDEX, + REG_TYPE_BASE, +}; + +static int get_reg_offset(struct insn *insn, struct pt_regs *regs, + enum reg_type type) +{ + int regno = 0; + + static const int regoff[] = { + offsetof(struct pt_regs, ax), + offsetof(struct pt_regs, cx), + offsetof(struct pt_regs, dx), + offsetof(struct pt_regs, bx), + offsetof(struct pt_regs, sp), + offsetof(struct pt_regs, bp), + offsetof(struct pt_regs, si), + offsetof(struct pt_regs, di), +#ifdef CONFIG_X86_64 + offsetof(struct pt_regs, r8), + offsetof(struct pt_regs, r9), + offsetof(struct pt_regs, r10), + offsetof(struct pt_regs, r11), + offsetof(struct pt_regs, r12), + offsetof(struct pt_regs, r13), + offsetof(struct pt_regs, r14), + offsetof(struct pt_regs, r15), +#endif + }; + int nr_registers = ARRAY_SIZE(regoff); + /* +* Don't possibly decode a 32-bit instructions as +* reading a 64-bit-only register. +*/ + if (IS_ENABLED(CONFIG_X86_64) && !insn->x86_64) + nr_registers -= 8; + + switch (type) { +
[PATCH v8 18/28] x86/insn-eval: Add support to resolve 32-bit address encodings
32-bit and 64-bit address encodings are identical. Thus, the same logic could be used to resolve the effective address. However, there are two key differences: address size and enforcement of segment limits. If running a 32-bit process on a 64-bit kernel, it is best to perform the address calculation using 32-bit data types. In this manner hardware is used for the arithmetic, including handling of signs and overflows. 32-bit addresses are generally used in protected mode; segment limits are enforced in this mode. This implementation obtains the limit of the segment associated with the instruction operands and prefixes. If the computed address is outside the segment limits, an error is returned. It is also possible to use 32-bit address in long mode and virtual-8086 mode by using an address override prefix. In such cases, segment limits are not enforced. The new function get_addr_ref_32() is almost identical to the existing function insn_get_addr_ref() (used for 64-bit addresses); except for the differences mentioned above. For the sake of simplicity and readability, it is better to use two separate functions. Cc: Dave HansenCc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 147 +++ 1 file changed, 147 insertions(+) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 8ae110a273de..6730c9ba02c5 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -665,6 +665,153 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs) return get_reg_offset(insn, regs, REG_TYPE_RM); } +/** + * get_addr_ref_32() - Obtain a 32-bit linear address + * @insn: Instruction struct with ModRM and SIB bytes and displacement + * @regs: Structure with register values as seen when entering kernel mode + * + * This function is to be used with 32-bit address encodings to obtain the + * linear memory address referred by the instruction's ModRM, SIB, + * displacement bytes and segment base address, as applicable. If in protected + * mode, segment limits are enforced. + * + * Return: linear address referenced by instruction and registers on success. + * -1L on error. + */ +static void __user *get_addr_ref_32(struct insn *insn, struct pt_regs *regs) +{ + int eff_addr, base, indx, addr_offset, base_offset, indx_offset; + unsigned long linear_addr, seg_base_addr, seg_limit, tmp; + insn_byte_t sib; + + insn_get_modrm(insn); + insn_get_sib(insn); + sib = insn->sib.value; + + if (insn->addr_bytes != 4) + goto out_err; + + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); + if (addr_offset < 0) + goto out_err; + + tmp = regs_get_register(regs, addr_offset); + /* The 4 most significant bytes must be zero. */ + if (tmp & ~0xL) + goto out_err; + + eff_addr = (int)(tmp & 0x); + + seg_base_addr = insn_get_seg_base(regs, insn, addr_offset); + if (seg_base_addr == -1L) + goto out_err; + + seg_limit = get_seg_limit(regs, insn, addr_offset); + } else { + if (insn->sib.nbytes) { + /* +* Negative values in the base and index offset means +* an error when decoding the SIB byte. Except -EDOM, +* which means that the registers should not be used +* in the address computation. +*/ + base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE); + if (base_offset == -EDOM) { + base = 0; + } else if (base_offset < 0) { + goto out_err; + } else { + tmp = regs_get_register(regs, base_offset); + /* The 4 most significant bytes must be zero. */ + if (tmp & ~0xL) + goto out_err; + + base = (int)(tmp & 0x); +
[PATCH v8 18/28] x86/insn-eval: Add support to resolve 32-bit address encodings
32-bit and 64-bit address encodings are identical. Thus, the same logic could be used to resolve the effective address. However, there are two key differences: address size and enforcement of segment limits. If running a 32-bit process on a 64-bit kernel, it is best to perform the address calculation using 32-bit data types. In this manner hardware is used for the arithmetic, including handling of signs and overflows. 32-bit addresses are generally used in protected mode; segment limits are enforced in this mode. This implementation obtains the limit of the segment associated with the instruction operands and prefixes. If the computed address is outside the segment limits, an error is returned. It is also possible to use 32-bit address in long mode and virtual-8086 mode by using an address override prefix. In such cases, segment limits are not enforced. The new function get_addr_ref_32() is almost identical to the existing function insn_get_addr_ref() (used for 64-bit addresses); except for the differences mentioned above. For the sake of simplicity and readability, it is better to use two separate functions. Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 147 +++ 1 file changed, 147 insertions(+) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 8ae110a273de..6730c9ba02c5 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -665,6 +665,153 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs) return get_reg_offset(insn, regs, REG_TYPE_RM); } +/** + * get_addr_ref_32() - Obtain a 32-bit linear address + * @insn: Instruction struct with ModRM and SIB bytes and displacement + * @regs: Structure with register values as seen when entering kernel mode + * + * This function is to be used with 32-bit address encodings to obtain the + * linear memory address referred by the instruction's ModRM, SIB, + * displacement bytes and segment base address, as applicable. If in protected + * mode, segment limits are enforced. + * + * Return: linear address referenced by instruction and registers on success. + * -1L on error. + */ +static void __user *get_addr_ref_32(struct insn *insn, struct pt_regs *regs) +{ + int eff_addr, base, indx, addr_offset, base_offset, indx_offset; + unsigned long linear_addr, seg_base_addr, seg_limit, tmp; + insn_byte_t sib; + + insn_get_modrm(insn); + insn_get_sib(insn); + sib = insn->sib.value; + + if (insn->addr_bytes != 4) + goto out_err; + + if (X86_MODRM_MOD(insn->modrm.value) == 3) { + addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); + if (addr_offset < 0) + goto out_err; + + tmp = regs_get_register(regs, addr_offset); + /* The 4 most significant bytes must be zero. */ + if (tmp & ~0xL) + goto out_err; + + eff_addr = (int)(tmp & 0x); + + seg_base_addr = insn_get_seg_base(regs, insn, addr_offset); + if (seg_base_addr == -1L) + goto out_err; + + seg_limit = get_seg_limit(regs, insn, addr_offset); + } else { + if (insn->sib.nbytes) { + /* +* Negative values in the base and index offset means +* an error when decoding the SIB byte. Except -EDOM, +* which means that the registers should not be used +* in the address computation. +*/ + base_offset = get_reg_offset(insn, regs, REG_TYPE_BASE); + if (base_offset == -EDOM) { + base = 0; + } else if (base_offset < 0) { + goto out_err; + } else { + tmp = regs_get_register(regs, base_offset); + /* The 4 most significant bytes must be zero. */ + if (tmp & ~0xL) + goto out_err; + + base = (int)(tmp & 0x); + } + + indx_offset = get_reg_offset(insn, regs, REG_TYPE_INDEX); + if (indx_offset == -EDOM) { + indx = 0; + } else if (indx_offset < 0) { + goto out_err; + } else { + tmp
[PATCH v8 17/28] x86/insn-eval: Incorporate segment base in linear address computation
insn_get_addr_ref() returns the effective address as defined by the section 3.7.5.1 Vol 1 of the Intel 64 and IA-32 Architectures Software Developer's Manual. In order to compute the linear address, we must add to the effective address the segment base address as set in the segment descriptor. The segment descriptor to use depends on the register used as operand and segment override prefixes, if any. In most cases, the segment base address will be 0 if the USER_DS/USER32_DS segment is used or if segmentation is not used. However, the base address is not necessarily zero if a user programs defines its own segments. This is possible by using a local descriptor table. Since the effective address is a signed quantity, the unsigned segment base address is saved in a separate variable and added to the final, unsigned, effective address. Cc: Dave HansenCc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 04f696c3793e..8ae110a273de 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -673,7 +673,7 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs) void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) { int addr_offset, base_offset, indx_offset; - unsigned long linear_addr; + unsigned long linear_addr, seg_base_addr; long eff_addr, base, indx; insn_byte_t sib; @@ -687,6 +687,10 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) goto out_err; eff_addr = regs_get_register(regs, addr_offset); + + seg_base_addr = insn_get_seg_base(regs, insn, addr_offset); + if (seg_base_addr == -1L) + goto out_err; } else { if (insn->sib.nbytes) { /* @@ -712,6 +716,11 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) indx = regs_get_register(regs, indx_offset); eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); + + seg_base_addr = insn_get_seg_base(regs, insn, + base_offset); + if (seg_base_addr == -1L) + goto out_err; } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); @@ -730,12 +739,17 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) } else { eff_addr = regs_get_register(regs, addr_offset); } + + seg_base_addr = insn_get_seg_base(regs, insn, + addr_offset); + if (seg_base_addr == -1L) + goto out_err; } eff_addr += insn->displacement.value; } - linear_addr = (unsigned long)eff_addr; + linear_addr = (unsigned long)eff_addr + seg_base_addr; return (void __user *)linear_addr; out_err: -- 2.13.0
[PATCH v8 17/28] x86/insn-eval: Incorporate segment base in linear address computation
insn_get_addr_ref() returns the effective address as defined by the section 3.7.5.1 Vol 1 of the Intel 64 and IA-32 Architectures Software Developer's Manual. In order to compute the linear address, we must add to the effective address the segment base address as set in the segment descriptor. The segment descriptor to use depends on the register used as operand and segment override prefixes, if any. In most cases, the segment base address will be 0 if the USER_DS/USER32_DS segment is used or if segmentation is not used. However, the base address is not necessarily zero if a user programs defines its own segments. This is possible by using a local descriptor table. Since the effective address is a signed quantity, the unsigned segment base address is saved in a separate variable and added to the final, unsigned, effective address. Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/lib/insn-eval.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 04f696c3793e..8ae110a273de 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -673,7 +673,7 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs) void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) { int addr_offset, base_offset, indx_offset; - unsigned long linear_addr; + unsigned long linear_addr, seg_base_addr; long eff_addr, base, indx; insn_byte_t sib; @@ -687,6 +687,10 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) goto out_err; eff_addr = regs_get_register(regs, addr_offset); + + seg_base_addr = insn_get_seg_base(regs, insn, addr_offset); + if (seg_base_addr == -1L) + goto out_err; } else { if (insn->sib.nbytes) { /* @@ -712,6 +716,11 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) indx = regs_get_register(regs, indx_offset); eff_addr = base + indx * (1 << X86_SIB_SCALE(sib)); + + seg_base_addr = insn_get_seg_base(regs, insn, + base_offset); + if (seg_base_addr == -1L) + goto out_err; } else { addr_offset = get_reg_offset(insn, regs, REG_TYPE_RM); @@ -730,12 +739,17 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) } else { eff_addr = regs_get_register(regs, addr_offset); } + + seg_base_addr = insn_get_seg_base(regs, insn, + addr_offset); + if (seg_base_addr == -1L) + goto out_err; } eff_addr += insn->displacement.value; } - linear_addr = (unsigned long)eff_addr; + linear_addr = (unsigned long)eff_addr + seg_base_addr; return (void __user *)linear_addr; out_err: -- 2.13.0
[PATCH v8 10/28] x86/insn-eval: Add a utility function to get register offsets
The function get_reg_offset() returns the offset to the register the argument specifies as indicated in an enumeration of type offset. Callers of this function would need the definition of such enumeration. This is not needed. Instead, add helper functions for this purpose. These functions are useful in cases when, for instance, the caller needs to decide whether the operand is a register or a memory location by looking at the rm part of the ModRM byte. As of now, this is the only helper function that is needed. Cc: Dave HansenCc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/include/asm/insn-eval.h | 1 + arch/x86/lib/insn-eval.c | 15 +++ 2 files changed, 16 insertions(+) diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index 5cab1b1da84d..7e8c9633a377 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -12,5 +12,6 @@ #include void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); +int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs); #endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 3919458fecbf..c6120e9298f5 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -97,6 +97,21 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, return regoff[regno]; } +/** + * insn_get_modrm_rm_off() - Obtain register in r/m part of ModRM byte + * @insn: Instruction structure containing the ModRM byte + * @regs: Structure with register values as seen when entering kernel mode + * + * Return: The register indicated by the r/m part of the ModRM byte. The + * register is obtained as an offset from the base of pt_regs. In specific + * cases, the returned value can be -EDOM to indicate that the particular value + * of ModRM does not refer to a register and shall be ignored. + */ +int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs) +{ + return get_reg_offset(insn, regs, REG_TYPE_RM); +} + /* * return the address being referenced be instruction * for rm=3 returning the content of the rm reg -- 2.13.0
[PATCH v8 10/28] x86/insn-eval: Add a utility function to get register offsets
The function get_reg_offset() returns the offset to the register the argument specifies as indicated in an enumeration of type offset. Callers of this function would need the definition of such enumeration. This is not needed. Instead, add helper functions for this purpose. These functions are useful in cases when, for instance, the caller needs to decide whether the operand is a register or a memory location by looking at the rm part of the ModRM byte. As of now, this is the only helper function that is needed. Cc: Dave Hansen Cc: Adam Buchbinder Cc: Colin Ian King Cc: Lorenzo Stoakes Cc: Qiaowei Ren Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Adrian Hunter Cc: Kees Cook Cc: Thomas Garnier Cc: Peter Zijlstra Cc: Borislav Petkov Cc: Dmitry Vyukov Cc: Ravi V. Shankar Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- arch/x86/include/asm/insn-eval.h | 1 + arch/x86/lib/insn-eval.c | 15 +++ 2 files changed, 16 insertions(+) diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h index 5cab1b1da84d..7e8c9633a377 100644 --- a/arch/x86/include/asm/insn-eval.h +++ b/arch/x86/include/asm/insn-eval.h @@ -12,5 +12,6 @@ #include void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs); +int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs); #endif /* _ASM_X86_INSN_EVAL_H */ diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 3919458fecbf..c6120e9298f5 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -97,6 +97,21 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs, return regoff[regno]; } +/** + * insn_get_modrm_rm_off() - Obtain register in r/m part of ModRM byte + * @insn: Instruction structure containing the ModRM byte + * @regs: Structure with register values as seen when entering kernel mode + * + * Return: The register indicated by the r/m part of the ModRM byte. The + * register is obtained as an offset from the base of pt_regs. In specific + * cases, the returned value can be -EDOM to indicate that the particular value + * of ModRM does not refer to a register and shall be ignored. + */ +int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs) +{ + return get_reg_offset(insn, regs, REG_TYPE_RM); +} + /* * return the address being referenced be instruction * for rm=3 returning the content of the rm reg -- 2.13.0
Re: [PATCH] iio: accel: mma8452: Bugfix to enbale and allow different events to work parallely.
This patch fixes by detaching the event related information from chip_info struct, and based on channel type and event direction the corresponding event configuration registers are picked dynamically. Hence multiple events can be handled in read/write callbacks. which chip can have which event(s)? I am planning to add 'supported events' field in One small point. Don't put the word bugfix in the title (and fix spelling of enable!). I know this is obviously a false restriction on the driver, but it doesn't not work, it is just limited in features without this. Sure, thanks for letting me know. On 08/17/2017 10:40 AM, Jonathan Cameron wrote: This patch fixes by detaching the event related information from chip_info struct, and based on channel type and event direction the corresponding event configuration registers are picked dynamically. Hence multiple events can be handled in read/write callbacks. which chip can have which event(s)? I am planning to add 'supported events' field in One small point. Don't put the word bugfix in the title (and fix spelling of enable!). I know this is obviously a false restriction on the driver, but it doesn't not work, it is just limited in features without this.
Re: [PATCH] iio: accel: mma8452: Bugfix to enbale and allow different events to work parallely.
This patch fixes by detaching the event related information from chip_info struct, and based on channel type and event direction the corresponding event configuration registers are picked dynamically. Hence multiple events can be handled in read/write callbacks. which chip can have which event(s)? I am planning to add 'supported events' field in One small point. Don't put the word bugfix in the title (and fix spelling of enable!). I know this is obviously a false restriction on the driver, but it doesn't not work, it is just limited in features without this. Sure, thanks for letting me know. On 08/17/2017 10:40 AM, Jonathan Cameron wrote: This patch fixes by detaching the event related information from chip_info struct, and based on channel type and event direction the corresponding event configuration registers are picked dynamically. Hence multiple events can be handled in read/write callbacks. which chip can have which event(s)? I am planning to add 'supported events' field in One small point. Don't put the word bugfix in the title (and fix spelling of enable!). I know this is obviously a false restriction on the driver, but it doesn't not work, it is just limited in features without this.