Re: [kernel-hardening] Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Wed, Jul 19, 2017 at 11:35 AM, Russell King - ARM Linuxwrote: > On Wed, Jul 19, 2017 at 10:20:35AM -0700, Thomas Garnier wrote: >> On Wed, Jul 19, 2017 at 10:06 AM, Russell King - ARM Linux >> wrote: >> > On Wed, Jul 19, 2017 at 05:58:20PM +0300, Leonard Crestez wrote: >> > Probably best to revert. I stopped looking at these patches during >> > the discussion, as the discussion seemed to be mainly around other >> > architectures, and I thought we had ARM settled. >> > >> > Looking at this patch now, there's several things I'm not happy with. >> > >> > The effect of adding a the new TIF flag for FSCHECK amongst the other >> > flags is that we end up overflowing the 8-bit constant, and have to >> > split the tests, meaning more instructions in the return path. Eg: >> > >> > - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK >> > + tst r1, #_TIF_SYSCALL_WORK >> > + bne fast_work_pending >> > + tst r1, #_TIF_WORK_MASK >> > bne fast_work_pending >> > >> > should be written: >> > >> > tst r1, #_TIF_SYSCALL_WORK >> > tsteq r1, #_TIF_WORK_MASK >> > bne fast_work_pending >> > >> > and: >> > >> > - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK >> > + tst r1, #_TIF_SYSCALL_WORK >> > + bne fast_work_pending >> > + tst r1, #_TIF_WORK_MASK >> > >> > should be: >> > >> > tst r1, #_TIF_SYSCALL_WORK >> > tsteq r1, #_TIF_WORK_MASK >> > >> > There's no need for extra branches. >> > >> > Now, the next issue is that I don't think this TIF-flag approach is >> > good for ARM - alignment faults can happen any time due to misaligned >> > packets in the networking code, and we really don't want to be doing >> > this check in a place that we can loop. >> > >> > My original suggestion for ARM was to do the address limit check after >> > all work had been processed, with interrupts disabled (so no >> > possibility of this kind of loop happening.) However, that seems to >> > have been replaced with this TIF approach, which is going to cause >> > loops - I suspect if the probes code is enabled, this will suffer >> > the same problem. Remember, the various probes stuff can walk >> > userspace stacks, which means they'll be using set_fs(). >> > >> > I don't see why we've ended up with this (imho) sub-standard TIF-flag >> > approach, and I think it's going to be very problematical. >> > >> > Can we please go back to the approach I suggested back in March for >> > ARM that doesn't suffer from this problem? >> >> During the extensive thread discussion, Linus asked to move away from >> architecture specific changes to this work flag system. I am glad to >> fix the assembly as you asked on a separate patch. > > Well, for the record, I don't think you've got to the bottom of the > "infinite loop" potential of Linus' approach. > > Eg, perf will likely trigger this same issue. Eg, perf record -a -g > will attempt to record the callchain both in kernel space and userspace > each time a perf interrupt happens. If the perf interrupt frequency is > sufficiently high that we have multiple interrupts during the execution > of do_work_pending() and its called functions, then that will turn this > into an infinite loop yet again. Do you think it applies to the patch I just sent? The other approach is to check at the entrance, ignore _TIF_FSCHECK on the loop and clear it on exit. > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net. -- Thomas
Re: [kernel-hardening] Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Wed, Jul 19, 2017 at 11:35 AM, Russell King - ARM Linux wrote: > On Wed, Jul 19, 2017 at 10:20:35AM -0700, Thomas Garnier wrote: >> On Wed, Jul 19, 2017 at 10:06 AM, Russell King - ARM Linux >> wrote: >> > On Wed, Jul 19, 2017 at 05:58:20PM +0300, Leonard Crestez wrote: >> > Probably best to revert. I stopped looking at these patches during >> > the discussion, as the discussion seemed to be mainly around other >> > architectures, and I thought we had ARM settled. >> > >> > Looking at this patch now, there's several things I'm not happy with. >> > >> > The effect of adding a the new TIF flag for FSCHECK amongst the other >> > flags is that we end up overflowing the 8-bit constant, and have to >> > split the tests, meaning more instructions in the return path. Eg: >> > >> > - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK >> > + tst r1, #_TIF_SYSCALL_WORK >> > + bne fast_work_pending >> > + tst r1, #_TIF_WORK_MASK >> > bne fast_work_pending >> > >> > should be written: >> > >> > tst r1, #_TIF_SYSCALL_WORK >> > tsteq r1, #_TIF_WORK_MASK >> > bne fast_work_pending >> > >> > and: >> > >> > - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK >> > + tst r1, #_TIF_SYSCALL_WORK >> > + bne fast_work_pending >> > + tst r1, #_TIF_WORK_MASK >> > >> > should be: >> > >> > tst r1, #_TIF_SYSCALL_WORK >> > tsteq r1, #_TIF_WORK_MASK >> > >> > There's no need for extra branches. >> > >> > Now, the next issue is that I don't think this TIF-flag approach is >> > good for ARM - alignment faults can happen any time due to misaligned >> > packets in the networking code, and we really don't want to be doing >> > this check in a place that we can loop. >> > >> > My original suggestion for ARM was to do the address limit check after >> > all work had been processed, with interrupts disabled (so no >> > possibility of this kind of loop happening.) However, that seems to >> > have been replaced with this TIF approach, which is going to cause >> > loops - I suspect if the probes code is enabled, this will suffer >> > the same problem. Remember, the various probes stuff can walk >> > userspace stacks, which means they'll be using set_fs(). >> > >> > I don't see why we've ended up with this (imho) sub-standard TIF-flag >> > approach, and I think it's going to be very problematical. >> > >> > Can we please go back to the approach I suggested back in March for >> > ARM that doesn't suffer from this problem? >> >> During the extensive thread discussion, Linus asked to move away from >> architecture specific changes to this work flag system. I am glad to >> fix the assembly as you asked on a separate patch. > > Well, for the record, I don't think you've got to the bottom of the > "infinite loop" potential of Linus' approach. > > Eg, perf will likely trigger this same issue. Eg, perf record -a -g > will attempt to record the callchain both in kernel space and userspace > each time a perf interrupt happens. If the perf interrupt frequency is > sufficiently high that we have multiple interrupts during the execution > of do_work_pending() and its called functions, then that will turn this > into an infinite loop yet again. Do you think it applies to the patch I just sent? The other approach is to check at the entrance, ignore _TIF_FSCHECK on the loop and clear it on exit. > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net. -- Thomas
Re: [kernel-hardening] Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Wed, Jul 19, 2017 at 10:20:35AM -0700, Thomas Garnier wrote: > On Wed, Jul 19, 2017 at 10:06 AM, Russell King - ARM Linux >wrote: > > On Wed, Jul 19, 2017 at 05:58:20PM +0300, Leonard Crestez wrote: > > Probably best to revert. I stopped looking at these patches during > > the discussion, as the discussion seemed to be mainly around other > > architectures, and I thought we had ARM settled. > > > > Looking at this patch now, there's several things I'm not happy with. > > > > The effect of adding a the new TIF flag for FSCHECK amongst the other > > flags is that we end up overflowing the 8-bit constant, and have to > > split the tests, meaning more instructions in the return path. Eg: > > > > - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK > > + tst r1, #_TIF_SYSCALL_WORK > > + bne fast_work_pending > > + tst r1, #_TIF_WORK_MASK > > bne fast_work_pending > > > > should be written: > > > > tst r1, #_TIF_SYSCALL_WORK > > tsteq r1, #_TIF_WORK_MASK > > bne fast_work_pending > > > > and: > > > > - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK > > + tst r1, #_TIF_SYSCALL_WORK > > + bne fast_work_pending > > + tst r1, #_TIF_WORK_MASK > > > > should be: > > > > tst r1, #_TIF_SYSCALL_WORK > > tsteq r1, #_TIF_WORK_MASK > > > > There's no need for extra branches. > > > > Now, the next issue is that I don't think this TIF-flag approach is > > good for ARM - alignment faults can happen any time due to misaligned > > packets in the networking code, and we really don't want to be doing > > this check in a place that we can loop. > > > > My original suggestion for ARM was to do the address limit check after > > all work had been processed, with interrupts disabled (so no > > possibility of this kind of loop happening.) However, that seems to > > have been replaced with this TIF approach, which is going to cause > > loops - I suspect if the probes code is enabled, this will suffer > > the same problem. Remember, the various probes stuff can walk > > userspace stacks, which means they'll be using set_fs(). > > > > I don't see why we've ended up with this (imho) sub-standard TIF-flag > > approach, and I think it's going to be very problematical. > > > > Can we please go back to the approach I suggested back in March for > > ARM that doesn't suffer from this problem? > > During the extensive thread discussion, Linus asked to move away from > architecture specific changes to this work flag system. I am glad to > fix the assembly as you asked on a separate patch. Well, for the record, I don't think you've got to the bottom of the "infinite loop" potential of Linus' approach. Eg, perf will likely trigger this same issue. Eg, perf record -a -g will attempt to record the callchain both in kernel space and userspace each time a perf interrupt happens. If the perf interrupt frequency is sufficiently high that we have multiple interrupts during the execution of do_work_pending() and its called functions, then that will turn this into an infinite loop yet again. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
Re: [kernel-hardening] Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Wed, Jul 19, 2017 at 10:20:35AM -0700, Thomas Garnier wrote: > On Wed, Jul 19, 2017 at 10:06 AM, Russell King - ARM Linux > wrote: > > On Wed, Jul 19, 2017 at 05:58:20PM +0300, Leonard Crestez wrote: > > Probably best to revert. I stopped looking at these patches during > > the discussion, as the discussion seemed to be mainly around other > > architectures, and I thought we had ARM settled. > > > > Looking at this patch now, there's several things I'm not happy with. > > > > The effect of adding a the new TIF flag for FSCHECK amongst the other > > flags is that we end up overflowing the 8-bit constant, and have to > > split the tests, meaning more instructions in the return path. Eg: > > > > - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK > > + tst r1, #_TIF_SYSCALL_WORK > > + bne fast_work_pending > > + tst r1, #_TIF_WORK_MASK > > bne fast_work_pending > > > > should be written: > > > > tst r1, #_TIF_SYSCALL_WORK > > tsteq r1, #_TIF_WORK_MASK > > bne fast_work_pending > > > > and: > > > > - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK > > + tst r1, #_TIF_SYSCALL_WORK > > + bne fast_work_pending > > + tst r1, #_TIF_WORK_MASK > > > > should be: > > > > tst r1, #_TIF_SYSCALL_WORK > > tsteq r1, #_TIF_WORK_MASK > > > > There's no need for extra branches. > > > > Now, the next issue is that I don't think this TIF-flag approach is > > good for ARM - alignment faults can happen any time due to misaligned > > packets in the networking code, and we really don't want to be doing > > this check in a place that we can loop. > > > > My original suggestion for ARM was to do the address limit check after > > all work had been processed, with interrupts disabled (so no > > possibility of this kind of loop happening.) However, that seems to > > have been replaced with this TIF approach, which is going to cause > > loops - I suspect if the probes code is enabled, this will suffer > > the same problem. Remember, the various probes stuff can walk > > userspace stacks, which means they'll be using set_fs(). > > > > I don't see why we've ended up with this (imho) sub-standard TIF-flag > > approach, and I think it's going to be very problematical. > > > > Can we please go back to the approach I suggested back in March for > > ARM that doesn't suffer from this problem? > > During the extensive thread discussion, Linus asked to move away from > architecture specific changes to this work flag system. I am glad to > fix the assembly as you asked on a separate patch. Well, for the record, I don't think you've got to the bottom of the "infinite loop" potential of Linus' approach. Eg, perf will likely trigger this same issue. Eg, perf record -a -g will attempt to record the callchain both in kernel space and userspace each time a perf interrupt happens. If the perf interrupt frequency is sufficiently high that we have multiple interrupts during the execution of do_work_pending() and its called functions, then that will turn this into an infinite loop yet again. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
Re: [kernel-hardening] Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Wed, Jul 19, 2017 at 10:06 AM, Russell King - ARM Linuxwrote: > On Wed, Jul 19, 2017 at 05:58:20PM +0300, Leonard Crestez wrote: >> On Tue, 2017-07-18 at 12:04 -0700, Thomas Garnier wrote: >> > On Tue, Jul 18, 2017 at 10:18 AM, Leonard Crestez >> > wrote: >> > > On Tue, 2017-07-18 at 09:04 -0700, Thomas Garnier wrote: >> > > > On Tue, Jul 18, 2017 at 7:36 AM, Leonard Crestez >> > > > wrote: >> > > > > On Wed, 2017-06-14 at 18:12 -0700, Thomas Garnier wrote: >> > > > > > >> > > > > > Ensure the address limit is a user-mode segment before returning to >> > > > > > user-mode. Otherwise a process can corrupt kernel-mode memory and >> > > > > > elevate privileges [1]. >> > > > > > >> > > > > > The set_fs function sets the TIF_SETFS flag to force a slow path on >> > > > > > return. In the slow path, the address limit is checked to be >> > > > > > USER_DS if >> > > > > > needed. >> > > > > > >> > > > > > The TIF_SETFS flag is added to _TIF_WORK_MASK shifting >> > > > > > _TIF_SYSCALL_WORK >> > > > > > for arm instruction immediate support. The global work mask is too >> > > > > > big >> > > > > > to used on a single instruction so adapt ret_fast_syscall. >> > > > > > >> > > > > > @@ -571,6 +572,10 @@ do_work_pending(struct pt_regs *regs, >> > > > > > unsigned int thread_flags, int syscall) >> > > > > >* Update the trace code with the current status. >> > > > > >*/ >> > > > > > trace_hardirqs_off(); >> > > > > > + >> > > > > > + /* Check valid user FS if needed */ >> > > > > > + addr_limit_user_check(); >> > > > > > + >> > > > > > do { >> > > > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { >> > > > > > schedule(); >> > > > > This patch made it's way into linux-next next-20170717 and it seems >> > > > > to >> > > > > cause hangs when booting some boards over NFS (found via bisection). >> > > > > I >> > > > > don't know exactly what determines the issue but I can reproduce >> > > > > hangs >> > > > > if even if I just boot with init=/bin/bash and do stuff like >> > > > > >> > > > > # sleep 1 & sleep 1 & sleep 1 & wait; wait; wait; echo done! >> > > > > >> > > > > When this happens sysrq-t shows a sleep task hung in the 'R' state >> > > > > spinning in do_work_pending, so maybe there is a potential infinite >> > > > > loop here? >> > > > > >> > > > > The addr_limit_user_check at the start of do_work_pending will check >> > > > > for TIF_FSCHECK once and clear it but the function loops while >> > > > > (thread_flags & _TIF_WORK_MASK), so it if TIF_FSCHECK is set again >> > > > > then >> > > > > the loop will never terminate. Does this make sense? >> > > > >> > > > Yes, it does. Thanks for looking into this. >> > > > >> > > > Can you try this change? >> > > > >> > > > diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c >> > > > index 3a48b54c6405..bc6ad7789568 100644 >> > > > --- a/arch/arm/kernel/signal.c >> > > > +++ b/arch/arm/kernel/signal.c >> > > > @@ -573,12 +573,11 @@ do_work_pending(struct pt_regs *regs, unsigned >> > > > int thread_flags, int syscall) >> > > > */ >> > > > trace_hardirqs_off(); >> > > > >> > > > - /* Check valid user FS if needed */ >> > > > - addr_limit_user_check(); >> > > > - >> > > > do { >> > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { >> > > > schedule(); >> > > > + } else if (thread_flags & _TIF_FSCHECK) { >> > > > + addr_limit_user_check(); >> > > > } else { >> > > > if (unlikely(!user_mode(regs))) >> > > > return 0; >> > > This does seem to work, it no longer hangs on boot in my setup. This is >> > > obviously only a very superficial test. >> > > >> > > The new location of this check seems weird, it's not clear why it >> > > should be on an else path. Perhaps it should be moved to right before >> > > where current_thread_info()->flags is fetched again? >> >> > I was hitting bug when I tried that.I think that's because you >> > basically let the signal handler do pending work before you check the >> > flag, that's not a good idea. >> >> > > If the purpose is hardening against buggy kernel code doing bad set_fs >> > > calls shouldn't this flag also be checked before looking at >> > > TIF_NEED_RESCHED and calling schedule()? >> > I am not sure to be honest. I expected schedule to only schedule the >> > processor to another task which would be fine given only the current >> > task have a bogus fs. I will put it first in case there is an edge >> > case scenario I missed. >> > >> > What do you think? Let me know and I will look at changes all >> > architectures and testing them. >> >> I don't know and I'd rather not guess on security issues. It's better >> if someone else reviews the code. >> >> Unless there is a very quick fix maybe this series should be removed or >> reverted from linux-next? A diagnosis of "system calls can sometimes >> hang on return" seems serious even for
Re: [kernel-hardening] Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Wed, Jul 19, 2017 at 10:06 AM, Russell King - ARM Linux wrote: > On Wed, Jul 19, 2017 at 05:58:20PM +0300, Leonard Crestez wrote: >> On Tue, 2017-07-18 at 12:04 -0700, Thomas Garnier wrote: >> > On Tue, Jul 18, 2017 at 10:18 AM, Leonard Crestez >> > wrote: >> > > On Tue, 2017-07-18 at 09:04 -0700, Thomas Garnier wrote: >> > > > On Tue, Jul 18, 2017 at 7:36 AM, Leonard Crestez >> > > > wrote: >> > > > > On Wed, 2017-06-14 at 18:12 -0700, Thomas Garnier wrote: >> > > > > > >> > > > > > Ensure the address limit is a user-mode segment before returning to >> > > > > > user-mode. Otherwise a process can corrupt kernel-mode memory and >> > > > > > elevate privileges [1]. >> > > > > > >> > > > > > The set_fs function sets the TIF_SETFS flag to force a slow path on >> > > > > > return. In the slow path, the address limit is checked to be >> > > > > > USER_DS if >> > > > > > needed. >> > > > > > >> > > > > > The TIF_SETFS flag is added to _TIF_WORK_MASK shifting >> > > > > > _TIF_SYSCALL_WORK >> > > > > > for arm instruction immediate support. The global work mask is too >> > > > > > big >> > > > > > to used on a single instruction so adapt ret_fast_syscall. >> > > > > > >> > > > > > @@ -571,6 +572,10 @@ do_work_pending(struct pt_regs *regs, >> > > > > > unsigned int thread_flags, int syscall) >> > > > > >* Update the trace code with the current status. >> > > > > >*/ >> > > > > > trace_hardirqs_off(); >> > > > > > + >> > > > > > + /* Check valid user FS if needed */ >> > > > > > + addr_limit_user_check(); >> > > > > > + >> > > > > > do { >> > > > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { >> > > > > > schedule(); >> > > > > This patch made it's way into linux-next next-20170717 and it seems >> > > > > to >> > > > > cause hangs when booting some boards over NFS (found via bisection). >> > > > > I >> > > > > don't know exactly what determines the issue but I can reproduce >> > > > > hangs >> > > > > if even if I just boot with init=/bin/bash and do stuff like >> > > > > >> > > > > # sleep 1 & sleep 1 & sleep 1 & wait; wait; wait; echo done! >> > > > > >> > > > > When this happens sysrq-t shows a sleep task hung in the 'R' state >> > > > > spinning in do_work_pending, so maybe there is a potential infinite >> > > > > loop here? >> > > > > >> > > > > The addr_limit_user_check at the start of do_work_pending will check >> > > > > for TIF_FSCHECK once and clear it but the function loops while >> > > > > (thread_flags & _TIF_WORK_MASK), so it if TIF_FSCHECK is set again >> > > > > then >> > > > > the loop will never terminate. Does this make sense? >> > > > >> > > > Yes, it does. Thanks for looking into this. >> > > > >> > > > Can you try this change? >> > > > >> > > > diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c >> > > > index 3a48b54c6405..bc6ad7789568 100644 >> > > > --- a/arch/arm/kernel/signal.c >> > > > +++ b/arch/arm/kernel/signal.c >> > > > @@ -573,12 +573,11 @@ do_work_pending(struct pt_regs *regs, unsigned >> > > > int thread_flags, int syscall) >> > > > */ >> > > > trace_hardirqs_off(); >> > > > >> > > > - /* Check valid user FS if needed */ >> > > > - addr_limit_user_check(); >> > > > - >> > > > do { >> > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { >> > > > schedule(); >> > > > + } else if (thread_flags & _TIF_FSCHECK) { >> > > > + addr_limit_user_check(); >> > > > } else { >> > > > if (unlikely(!user_mode(regs))) >> > > > return 0; >> > > This does seem to work, it no longer hangs on boot in my setup. This is >> > > obviously only a very superficial test. >> > > >> > > The new location of this check seems weird, it's not clear why it >> > > should be on an else path. Perhaps it should be moved to right before >> > > where current_thread_info()->flags is fetched again? >> >> > I was hitting bug when I tried that.I think that's because you >> > basically let the signal handler do pending work before you check the >> > flag, that's not a good idea. >> >> > > If the purpose is hardening against buggy kernel code doing bad set_fs >> > > calls shouldn't this flag also be checked before looking at >> > > TIF_NEED_RESCHED and calling schedule()? >> > I am not sure to be honest. I expected schedule to only schedule the >> > processor to another task which would be fine given only the current >> > task have a bogus fs. I will put it first in case there is an edge >> > case scenario I missed. >> > >> > What do you think? Let me know and I will look at changes all >> > architectures and testing them. >> >> I don't know and I'd rather not guess on security issues. It's better >> if someone else reviews the code. >> >> Unless there is a very quick fix maybe this series should be removed or >> reverted from linux-next? A diagnosis of "system calls can sometimes >> hang on return" seems serious even for linux-next. Since it happens >> very rarely in most setups I can easily imagine
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Wed, Jul 19, 2017 at 05:58:20PM +0300, Leonard Crestez wrote: > On Tue, 2017-07-18 at 12:04 -0700, Thomas Garnier wrote: > > On Tue, Jul 18, 2017 at 10:18 AM, Leonard Crestez> > wrote: > > > On Tue, 2017-07-18 at 09:04 -0700, Thomas Garnier wrote: > > > > On Tue, Jul 18, 2017 at 7:36 AM, Leonard Crestez > > > > wrote: > > > > > On Wed, 2017-06-14 at 18:12 -0700, Thomas Garnier wrote: > > > > > > > > > > > > Ensure the address limit is a user-mode segment before returning to > > > > > > user-mode. Otherwise a process can corrupt kernel-mode memory and > > > > > > elevate privileges [1]. > > > > > > > > > > > > The set_fs function sets the TIF_SETFS flag to force a slow path on > > > > > > return. In the slow path, the address limit is checked to be > > > > > > USER_DS if > > > > > > needed. > > > > > > > > > > > > The TIF_SETFS flag is added to _TIF_WORK_MASK shifting > > > > > > _TIF_SYSCALL_WORK > > > > > > for arm instruction immediate support. The global work mask is too > > > > > > big > > > > > > to used on a single instruction so adapt ret_fast_syscall. > > > > > > > > > > > > @@ -571,6 +572,10 @@ do_work_pending(struct pt_regs *regs, unsigned > > > > > > int thread_flags, int syscall) > > > > > > * Update the trace code with the current status. > > > > > > */ > > > > > > trace_hardirqs_off(); > > > > > > + > > > > > > + /* Check valid user FS if needed */ > > > > > > + addr_limit_user_check(); > > > > > > + > > > > > > do { > > > > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { > > > > > > schedule(); > > > > > This patch made it's way into linux-next next-20170717 and it seems to > > > > > cause hangs when booting some boards over NFS (found via bisection). I > > > > > don't know exactly what determines the issue but I can reproduce hangs > > > > > if even if I just boot with init=/bin/bash and do stuff like > > > > > > > > > > # sleep 1 & sleep 1 & sleep 1 & wait; wait; wait; echo done! > > > > > > > > > > When this happens sysrq-t shows a sleep task hung in the 'R' state > > > > > spinning in do_work_pending, so maybe there is a potential infinite > > > > > loop here? > > > > > > > > > > The addr_limit_user_check at the start of do_work_pending will check > > > > > for TIF_FSCHECK once and clear it but the function loops while > > > > > (thread_flags & _TIF_WORK_MASK), so it if TIF_FSCHECK is set again > > > > > then > > > > > the loop will never terminate. Does this make sense? > > > > > > > > Yes, it does. Thanks for looking into this. > > > > > > > > Can you try this change? > > > > > > > > diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c > > > > index 3a48b54c6405..bc6ad7789568 100644 > > > > --- a/arch/arm/kernel/signal.c > > > > +++ b/arch/arm/kernel/signal.c > > > > @@ -573,12 +573,11 @@ do_work_pending(struct pt_regs *regs, unsigned > > > > int thread_flags, int syscall) > > > > */ > > > > trace_hardirqs_off(); > > > > > > > > - /* Check valid user FS if needed */ > > > > - addr_limit_user_check(); > > > > - > > > > do { > > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { > > > > schedule(); > > > > + } else if (thread_flags & _TIF_FSCHECK) { > > > > + addr_limit_user_check(); > > > > } else { > > > > if (unlikely(!user_mode(regs))) > > > > return 0; > > > This does seem to work, it no longer hangs on boot in my setup. This is > > > obviously only a very superficial test. > > > > > > The new location of this check seems weird, it's not clear why it > > > should be on an else path. Perhaps it should be moved to right before > > > where current_thread_info()->flags is fetched again? > > > I was hitting bug when I tried that.I think that's because you > > basically let the signal handler do pending work before you check the > > flag, that's not a good idea. > > > > If the purpose is hardening against buggy kernel code doing bad set_fs > > > calls shouldn't this flag also be checked before looking at > > > TIF_NEED_RESCHED and calling schedule()? > > I am not sure to be honest. I expected schedule to only schedule the > > processor to another task which would be fine given only the current > > task have a bogus fs. I will put it first in case there is an edge > > case scenario I missed. > > > > What do you think? Let me know and I will look at changes all > > architectures and testing them. > > I don't know and I'd rather not guess on security issues. It's better > if someone else reviews the code. > > Unless there is a very quick fix maybe this series should be removed or > reverted from linux-next? A diagnosis of "system calls can sometimes > hang on return" seems serious even for linux-next. Since it happens > very rarely in most setups I can easily imagine somebody spending a lot > of time digging at this. Probably best to revert. I stopped looking at these patches during the discussion, as the
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Wed, Jul 19, 2017 at 05:58:20PM +0300, Leonard Crestez wrote: > On Tue, 2017-07-18 at 12:04 -0700, Thomas Garnier wrote: > > On Tue, Jul 18, 2017 at 10:18 AM, Leonard Crestez > > wrote: > > > On Tue, 2017-07-18 at 09:04 -0700, Thomas Garnier wrote: > > > > On Tue, Jul 18, 2017 at 7:36 AM, Leonard Crestez > > > > wrote: > > > > > On Wed, 2017-06-14 at 18:12 -0700, Thomas Garnier wrote: > > > > > > > > > > > > Ensure the address limit is a user-mode segment before returning to > > > > > > user-mode. Otherwise a process can corrupt kernel-mode memory and > > > > > > elevate privileges [1]. > > > > > > > > > > > > The set_fs function sets the TIF_SETFS flag to force a slow path on > > > > > > return. In the slow path, the address limit is checked to be > > > > > > USER_DS if > > > > > > needed. > > > > > > > > > > > > The TIF_SETFS flag is added to _TIF_WORK_MASK shifting > > > > > > _TIF_SYSCALL_WORK > > > > > > for arm instruction immediate support. The global work mask is too > > > > > > big > > > > > > to used on a single instruction so adapt ret_fast_syscall. > > > > > > > > > > > > @@ -571,6 +572,10 @@ do_work_pending(struct pt_regs *regs, unsigned > > > > > > int thread_flags, int syscall) > > > > > > * Update the trace code with the current status. > > > > > > */ > > > > > > trace_hardirqs_off(); > > > > > > + > > > > > > + /* Check valid user FS if needed */ > > > > > > + addr_limit_user_check(); > > > > > > + > > > > > > do { > > > > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { > > > > > > schedule(); > > > > > This patch made it's way into linux-next next-20170717 and it seems to > > > > > cause hangs when booting some boards over NFS (found via bisection). I > > > > > don't know exactly what determines the issue but I can reproduce hangs > > > > > if even if I just boot with init=/bin/bash and do stuff like > > > > > > > > > > # sleep 1 & sleep 1 & sleep 1 & wait; wait; wait; echo done! > > > > > > > > > > When this happens sysrq-t shows a sleep task hung in the 'R' state > > > > > spinning in do_work_pending, so maybe there is a potential infinite > > > > > loop here? > > > > > > > > > > The addr_limit_user_check at the start of do_work_pending will check > > > > > for TIF_FSCHECK once and clear it but the function loops while > > > > > (thread_flags & _TIF_WORK_MASK), so it if TIF_FSCHECK is set again > > > > > then > > > > > the loop will never terminate. Does this make sense? > > > > > > > > Yes, it does. Thanks for looking into this. > > > > > > > > Can you try this change? > > > > > > > > diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c > > > > index 3a48b54c6405..bc6ad7789568 100644 > > > > --- a/arch/arm/kernel/signal.c > > > > +++ b/arch/arm/kernel/signal.c > > > > @@ -573,12 +573,11 @@ do_work_pending(struct pt_regs *regs, unsigned > > > > int thread_flags, int syscall) > > > > */ > > > > trace_hardirqs_off(); > > > > > > > > - /* Check valid user FS if needed */ > > > > - addr_limit_user_check(); > > > > - > > > > do { > > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { > > > > schedule(); > > > > + } else if (thread_flags & _TIF_FSCHECK) { > > > > + addr_limit_user_check(); > > > > } else { > > > > if (unlikely(!user_mode(regs))) > > > > return 0; > > > This does seem to work, it no longer hangs on boot in my setup. This is > > > obviously only a very superficial test. > > > > > > The new location of this check seems weird, it's not clear why it > > > should be on an else path. Perhaps it should be moved to right before > > > where current_thread_info()->flags is fetched again? > > > I was hitting bug when I tried that.I think that's because you > > basically let the signal handler do pending work before you check the > > flag, that's not a good idea. > > > > If the purpose is hardening against buggy kernel code doing bad set_fs > > > calls shouldn't this flag also be checked before looking at > > > TIF_NEED_RESCHED and calling schedule()? > > I am not sure to be honest. I expected schedule to only schedule the > > processor to another task which would be fine given only the current > > task have a bogus fs. I will put it first in case there is an edge > > case scenario I missed. > > > > What do you think? Let me know and I will look at changes all > > architectures and testing them. > > I don't know and I'd rather not guess on security issues. It's better > if someone else reviews the code. > > Unless there is a very quick fix maybe this series should be removed or > reverted from linux-next? A diagnosis of "system calls can sometimes > hang on return" seems serious even for linux-next. Since it happens > very rarely in most setups I can easily imagine somebody spending a lot > of time digging at this. Probably best to revert. I stopped looking at these patches during the discussion, as the discussion seemed to be mainly around other
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Wed, Jul 19, 2017 at 7:58 AM, Leonard Crestezwrote: > On Tue, 2017-07-18 at 12:04 -0700, Thomas Garnier wrote: >> On Tue, Jul 18, 2017 at 10:18 AM, Leonard Crestez >> wrote: >> > On Tue, 2017-07-18 at 09:04 -0700, Thomas Garnier wrote: >> > > On Tue, Jul 18, 2017 at 7:36 AM, Leonard Crestez >> > > wrote: >> > > > On Wed, 2017-06-14 at 18:12 -0700, Thomas Garnier wrote: >> > > > > >> > > > > Ensure the address limit is a user-mode segment before returning to >> > > > > user-mode. Otherwise a process can corrupt kernel-mode memory and >> > > > > elevate privileges [1]. >> > > > > >> > > > > The set_fs function sets the TIF_SETFS flag to force a slow path on >> > > > > return. In the slow path, the address limit is checked to be USER_DS >> > > > > if >> > > > > needed. >> > > > > >> > > > > The TIF_SETFS flag is added to _TIF_WORK_MASK shifting >> > > > > _TIF_SYSCALL_WORK >> > > > > for arm instruction immediate support. The global work mask is too >> > > > > big >> > > > > to used on a single instruction so adapt ret_fast_syscall. >> > > > > >> > > > > @@ -571,6 +572,10 @@ do_work_pending(struct pt_regs *regs, unsigned >> > > > > int thread_flags, int syscall) >> > > > >* Update the trace code with the current status. >> > > > >*/ >> > > > > trace_hardirqs_off(); >> > > > > + >> > > > > + /* Check valid user FS if needed */ >> > > > > + addr_limit_user_check(); >> > > > > + >> > > > > do { >> > > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { >> > > > > schedule(); >> > > > This patch made it's way into linux-next next-20170717 and it seems to >> > > > cause hangs when booting some boards over NFS (found via bisection). I >> > > > don't know exactly what determines the issue but I can reproduce hangs >> > > > if even if I just boot with init=/bin/bash and do stuff like >> > > > >> > > > # sleep 1 & sleep 1 & sleep 1 & wait; wait; wait; echo done! >> > > > >> > > > When this happens sysrq-t shows a sleep task hung in the 'R' state >> > > > spinning in do_work_pending, so maybe there is a potential infinite >> > > > loop here? >> > > > >> > > > The addr_limit_user_check at the start of do_work_pending will check >> > > > for TIF_FSCHECK once and clear it but the function loops while >> > > > (thread_flags & _TIF_WORK_MASK), so it if TIF_FSCHECK is set again then >> > > > the loop will never terminate. Does this make sense? >> > > >> > > Yes, it does. Thanks for looking into this. >> > > >> > > Can you try this change? >> > > >> > > diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c >> > > index 3a48b54c6405..bc6ad7789568 100644 >> > > --- a/arch/arm/kernel/signal.c >> > > +++ b/arch/arm/kernel/signal.c >> > > @@ -573,12 +573,11 @@ do_work_pending(struct pt_regs *regs, unsigned >> > > int thread_flags, int syscall) >> > > */ >> > > trace_hardirqs_off(); >> > > >> > > - /* Check valid user FS if needed */ >> > > - addr_limit_user_check(); >> > > - >> > > do { >> > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { >> > > schedule(); >> > > + } else if (thread_flags & _TIF_FSCHECK) { >> > > + addr_limit_user_check(); >> > > } else { >> > > if (unlikely(!user_mode(regs))) >> > > return 0; >> > This does seem to work, it no longer hangs on boot in my setup. This is >> > obviously only a very superficial test. >> > >> > The new location of this check seems weird, it's not clear why it >> > should be on an else path. Perhaps it should be moved to right before >> > where current_thread_info()->flags is fetched again? > >> I was hitting bug when I tried that.I think that's because you >> basically let the signal handler do pending work before you check the >> flag, that's not a good idea. > >> > If the purpose is hardening against buggy kernel code doing bad set_fs >> > calls shouldn't this flag also be checked before looking at >> > TIF_NEED_RESCHED and calling schedule()? >> I am not sure to be honest. I expected schedule to only schedule the >> processor to another task which would be fine given only the current >> task have a bogus fs. I will put it first in case there is an edge >> case scenario I missed. >> >> What do you think? Let me know and I will look at changes all >> architectures and testing them. > > I don't know and I'd rather not guess on security issues. It's better > if someone else reviews the code. > > Unless there is a very quick fix maybe this series should be removed or > reverted from linux-next? A diagnosis of "system calls can sometimes > hang on return" seems serious even for linux-next. Since it happens > very rarely in most setups I can easily imagine somebody spending a lot > of time digging at this. I will send fixes for each architecture in the meantime. > > -- > Regards, > Leonard -- Thomas
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Wed, Jul 19, 2017 at 7:58 AM, Leonard Crestez wrote: > On Tue, 2017-07-18 at 12:04 -0700, Thomas Garnier wrote: >> On Tue, Jul 18, 2017 at 10:18 AM, Leonard Crestez >> wrote: >> > On Tue, 2017-07-18 at 09:04 -0700, Thomas Garnier wrote: >> > > On Tue, Jul 18, 2017 at 7:36 AM, Leonard Crestez >> > > wrote: >> > > > On Wed, 2017-06-14 at 18:12 -0700, Thomas Garnier wrote: >> > > > > >> > > > > Ensure the address limit is a user-mode segment before returning to >> > > > > user-mode. Otherwise a process can corrupt kernel-mode memory and >> > > > > elevate privileges [1]. >> > > > > >> > > > > The set_fs function sets the TIF_SETFS flag to force a slow path on >> > > > > return. In the slow path, the address limit is checked to be USER_DS >> > > > > if >> > > > > needed. >> > > > > >> > > > > The TIF_SETFS flag is added to _TIF_WORK_MASK shifting >> > > > > _TIF_SYSCALL_WORK >> > > > > for arm instruction immediate support. The global work mask is too >> > > > > big >> > > > > to used on a single instruction so adapt ret_fast_syscall. >> > > > > >> > > > > @@ -571,6 +572,10 @@ do_work_pending(struct pt_regs *regs, unsigned >> > > > > int thread_flags, int syscall) >> > > > >* Update the trace code with the current status. >> > > > >*/ >> > > > > trace_hardirqs_off(); >> > > > > + >> > > > > + /* Check valid user FS if needed */ >> > > > > + addr_limit_user_check(); >> > > > > + >> > > > > do { >> > > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { >> > > > > schedule(); >> > > > This patch made it's way into linux-next next-20170717 and it seems to >> > > > cause hangs when booting some boards over NFS (found via bisection). I >> > > > don't know exactly what determines the issue but I can reproduce hangs >> > > > if even if I just boot with init=/bin/bash and do stuff like >> > > > >> > > > # sleep 1 & sleep 1 & sleep 1 & wait; wait; wait; echo done! >> > > > >> > > > When this happens sysrq-t shows a sleep task hung in the 'R' state >> > > > spinning in do_work_pending, so maybe there is a potential infinite >> > > > loop here? >> > > > >> > > > The addr_limit_user_check at the start of do_work_pending will check >> > > > for TIF_FSCHECK once and clear it but the function loops while >> > > > (thread_flags & _TIF_WORK_MASK), so it if TIF_FSCHECK is set again then >> > > > the loop will never terminate. Does this make sense? >> > > >> > > Yes, it does. Thanks for looking into this. >> > > >> > > Can you try this change? >> > > >> > > diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c >> > > index 3a48b54c6405..bc6ad7789568 100644 >> > > --- a/arch/arm/kernel/signal.c >> > > +++ b/arch/arm/kernel/signal.c >> > > @@ -573,12 +573,11 @@ do_work_pending(struct pt_regs *regs, unsigned >> > > int thread_flags, int syscall) >> > > */ >> > > trace_hardirqs_off(); >> > > >> > > - /* Check valid user FS if needed */ >> > > - addr_limit_user_check(); >> > > - >> > > do { >> > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { >> > > schedule(); >> > > + } else if (thread_flags & _TIF_FSCHECK) { >> > > + addr_limit_user_check(); >> > > } else { >> > > if (unlikely(!user_mode(regs))) >> > > return 0; >> > This does seem to work, it no longer hangs on boot in my setup. This is >> > obviously only a very superficial test. >> > >> > The new location of this check seems weird, it's not clear why it >> > should be on an else path. Perhaps it should be moved to right before >> > where current_thread_info()->flags is fetched again? > >> I was hitting bug when I tried that.I think that's because you >> basically let the signal handler do pending work before you check the >> flag, that's not a good idea. > >> > If the purpose is hardening against buggy kernel code doing bad set_fs >> > calls shouldn't this flag also be checked before looking at >> > TIF_NEED_RESCHED and calling schedule()? >> I am not sure to be honest. I expected schedule to only schedule the >> processor to another task which would be fine given only the current >> task have a bogus fs. I will put it first in case there is an edge >> case scenario I missed. >> >> What do you think? Let me know and I will look at changes all >> architectures and testing them. > > I don't know and I'd rather not guess on security issues. It's better > if someone else reviews the code. > > Unless there is a very quick fix maybe this series should be removed or > reverted from linux-next? A diagnosis of "system calls can sometimes > hang on return" seems serious even for linux-next. Since it happens > very rarely in most setups I can easily imagine somebody spending a lot > of time digging at this. I will send fixes for each architecture in the meantime. > > -- > Regards, > Leonard -- Thomas
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Tue, 2017-07-18 at 12:04 -0700, Thomas Garnier wrote: > On Tue, Jul 18, 2017 at 10:18 AM, Leonard Crestez> wrote: > > On Tue, 2017-07-18 at 09:04 -0700, Thomas Garnier wrote: > > > On Tue, Jul 18, 2017 at 7:36 AM, Leonard Crestez > > > wrote: > > > > On Wed, 2017-06-14 at 18:12 -0700, Thomas Garnier wrote: > > > > > > > > > > Ensure the address limit is a user-mode segment before returning to > > > > > user-mode. Otherwise a process can corrupt kernel-mode memory and > > > > > elevate privileges [1]. > > > > > > > > > > The set_fs function sets the TIF_SETFS flag to force a slow path on > > > > > return. In the slow path, the address limit is checked to be USER_DS > > > > > if > > > > > needed. > > > > > > > > > > The TIF_SETFS flag is added to _TIF_WORK_MASK shifting > > > > > _TIF_SYSCALL_WORK > > > > > for arm instruction immediate support. The global work mask is too big > > > > > to used on a single instruction so adapt ret_fast_syscall. > > > > > > > > > > @@ -571,6 +572,10 @@ do_work_pending(struct pt_regs *regs, unsigned > > > > > int thread_flags, int syscall) > > > > > * Update the trace code with the current status. > > > > > */ > > > > > trace_hardirqs_off(); > > > > > + > > > > > + /* Check valid user FS if needed */ > > > > > + addr_limit_user_check(); > > > > > + > > > > > do { > > > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { > > > > > schedule(); > > > > This patch made it's way into linux-next next-20170717 and it seems to > > > > cause hangs when booting some boards over NFS (found via bisection). I > > > > don't know exactly what determines the issue but I can reproduce hangs > > > > if even if I just boot with init=/bin/bash and do stuff like > > > > > > > > # sleep 1 & sleep 1 & sleep 1 & wait; wait; wait; echo done! > > > > > > > > When this happens sysrq-t shows a sleep task hung in the 'R' state > > > > spinning in do_work_pending, so maybe there is a potential infinite > > > > loop here? > > > > > > > > The addr_limit_user_check at the start of do_work_pending will check > > > > for TIF_FSCHECK once and clear it but the function loops while > > > > (thread_flags & _TIF_WORK_MASK), so it if TIF_FSCHECK is set again then > > > > the loop will never terminate. Does this make sense? > > > > > > Yes, it does. Thanks for looking into this. > > > > > > Can you try this change? > > > > > > diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c > > > index 3a48b54c6405..bc6ad7789568 100644 > > > --- a/arch/arm/kernel/signal.c > > > +++ b/arch/arm/kernel/signal.c > > > @@ -573,12 +573,11 @@ do_work_pending(struct pt_regs *regs, unsigned > > > int thread_flags, int syscall) > > > */ > > > trace_hardirqs_off(); > > > > > > - /* Check valid user FS if needed */ > > > - addr_limit_user_check(); > > > - > > > do { > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { > > > schedule(); > > > + } else if (thread_flags & _TIF_FSCHECK) { > > > + addr_limit_user_check(); > > > } else { > > > if (unlikely(!user_mode(regs))) > > > return 0; > > This does seem to work, it no longer hangs on boot in my setup. This is > > obviously only a very superficial test. > > > > The new location of this check seems weird, it's not clear why it > > should be on an else path. Perhaps it should be moved to right before > > where current_thread_info()->flags is fetched again? > I was hitting bug when I tried that.I think that's because you > basically let the signal handler do pending work before you check the > flag, that's not a good idea. > > If the purpose is hardening against buggy kernel code doing bad set_fs > > calls shouldn't this flag also be checked before looking at > > TIF_NEED_RESCHED and calling schedule()? > I am not sure to be honest. I expected schedule to only schedule the > processor to another task which would be fine given only the current > task have a bogus fs. I will put it first in case there is an edge > case scenario I missed. > > What do you think? Let me know and I will look at changes all > architectures and testing them. I don't know and I'd rather not guess on security issues. It's better if someone else reviews the code. Unless there is a very quick fix maybe this series should be removed or reverted from linux-next? A diagnosis of "system calls can sometimes hang on return" seems serious even for linux-next. Since it happens very rarely in most setups I can easily imagine somebody spending a lot of time digging at this. -- Regards, Leonard
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Tue, 2017-07-18 at 12:04 -0700, Thomas Garnier wrote: > On Tue, Jul 18, 2017 at 10:18 AM, Leonard Crestez > wrote: > > On Tue, 2017-07-18 at 09:04 -0700, Thomas Garnier wrote: > > > On Tue, Jul 18, 2017 at 7:36 AM, Leonard Crestez > > > wrote: > > > > On Wed, 2017-06-14 at 18:12 -0700, Thomas Garnier wrote: > > > > > > > > > > Ensure the address limit is a user-mode segment before returning to > > > > > user-mode. Otherwise a process can corrupt kernel-mode memory and > > > > > elevate privileges [1]. > > > > > > > > > > The set_fs function sets the TIF_SETFS flag to force a slow path on > > > > > return. In the slow path, the address limit is checked to be USER_DS > > > > > if > > > > > needed. > > > > > > > > > > The TIF_SETFS flag is added to _TIF_WORK_MASK shifting > > > > > _TIF_SYSCALL_WORK > > > > > for arm instruction immediate support. The global work mask is too big > > > > > to used on a single instruction so adapt ret_fast_syscall. > > > > > > > > > > @@ -571,6 +572,10 @@ do_work_pending(struct pt_regs *regs, unsigned > > > > > int thread_flags, int syscall) > > > > > * Update the trace code with the current status. > > > > > */ > > > > > trace_hardirqs_off(); > > > > > + > > > > > + /* Check valid user FS if needed */ > > > > > + addr_limit_user_check(); > > > > > + > > > > > do { > > > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { > > > > > schedule(); > > > > This patch made it's way into linux-next next-20170717 and it seems to > > > > cause hangs when booting some boards over NFS (found via bisection). I > > > > don't know exactly what determines the issue but I can reproduce hangs > > > > if even if I just boot with init=/bin/bash and do stuff like > > > > > > > > # sleep 1 & sleep 1 & sleep 1 & wait; wait; wait; echo done! > > > > > > > > When this happens sysrq-t shows a sleep task hung in the 'R' state > > > > spinning in do_work_pending, so maybe there is a potential infinite > > > > loop here? > > > > > > > > The addr_limit_user_check at the start of do_work_pending will check > > > > for TIF_FSCHECK once and clear it but the function loops while > > > > (thread_flags & _TIF_WORK_MASK), so it if TIF_FSCHECK is set again then > > > > the loop will never terminate. Does this make sense? > > > > > > Yes, it does. Thanks for looking into this. > > > > > > Can you try this change? > > > > > > diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c > > > index 3a48b54c6405..bc6ad7789568 100644 > > > --- a/arch/arm/kernel/signal.c > > > +++ b/arch/arm/kernel/signal.c > > > @@ -573,12 +573,11 @@ do_work_pending(struct pt_regs *regs, unsigned > > > int thread_flags, int syscall) > > > */ > > > trace_hardirqs_off(); > > > > > > - /* Check valid user FS if needed */ > > > - addr_limit_user_check(); > > > - > > > do { > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { > > > schedule(); > > > + } else if (thread_flags & _TIF_FSCHECK) { > > > + addr_limit_user_check(); > > > } else { > > > if (unlikely(!user_mode(regs))) > > > return 0; > > This does seem to work, it no longer hangs on boot in my setup. This is > > obviously only a very superficial test. > > > > The new location of this check seems weird, it's not clear why it > > should be on an else path. Perhaps it should be moved to right before > > where current_thread_info()->flags is fetched again? > I was hitting bug when I tried that.I think that's because you > basically let the signal handler do pending work before you check the > flag, that's not a good idea. > > If the purpose is hardening against buggy kernel code doing bad set_fs > > calls shouldn't this flag also be checked before looking at > > TIF_NEED_RESCHED and calling schedule()? > I am not sure to be honest. I expected schedule to only schedule the > processor to another task which would be fine given only the current > task have a bogus fs. I will put it first in case there is an edge > case scenario I missed. > > What do you think? Let me know and I will look at changes all > architectures and testing them. I don't know and I'd rather not guess on security issues. It's better if someone else reviews the code. Unless there is a very quick fix maybe this series should be removed or reverted from linux-next? A diagnosis of "system calls can sometimes hang on return" seems serious even for linux-next. Since it happens very rarely in most setups I can easily imagine somebody spending a lot of time digging at this. -- Regards, Leonard
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Tue, Jul 18, 2017 at 10:18 AM, Leonard Crestezwrote: > > On Tue, 2017-07-18 at 09:04 -0700, Thomas Garnier wrote: > > On Tue, Jul 18, 2017 at 7:36 AM, Leonard Crestez > > wrote: > > > > > > On Wed, 2017-06-14 at 18:12 -0700, Thomas Garnier wrote: > > > > > > > > Ensure the address limit is a user-mode segment before returning to > > > > user-mode. Otherwise a process can corrupt kernel-mode memory and > > > > elevate privileges [1]. > > > > > > > > The set_fs function sets the TIF_SETFS flag to force a slow path on > > > > return. In the slow path, the address limit is checked to be USER_DS if > > > > needed. > > > > > > > > The TIF_SETFS flag is added to _TIF_WORK_MASK shifting _TIF_SYSCALL_WORK > > > > for arm instruction immediate support. The global work mask is too big > > > > to used on a single instruction so adapt ret_fast_syscall. > > > > > > > > @@ -571,6 +572,10 @@ do_work_pending(struct pt_regs *regs, unsigned int > > > > thread_flags, int syscall) > > > >* Update the trace code with the current status. > > > >*/ > > > > trace_hardirqs_off(); > > > > + > > > > + /* Check valid user FS if needed */ > > > > + addr_limit_user_check(); > > > > + > > > > do { > > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { > > > > schedule(); > > > This patch made it's way into linux-next next-20170717 and it seems to > > > cause hangs when booting some boards over NFS (found via bisection). I > > > don't know exactly what determines the issue but I can reproduce hangs > > > if even if I just boot with init=/bin/bash and do stuff like > > > > > > # sleep 1 & sleep 1 & sleep 1 & wait; wait; wait; echo done! > > > > > > When this happens sysrq-t shows a sleep task hung in the 'R' state > > > spinning in do_work_pending, so maybe there is a potential infinite > > > loop here? > > > > > > The addr_limit_user_check at the start of do_work_pending will check > > > for TIF_FSCHECK once and clear it but the function loops while > > > (thread_flags & _TIF_WORK_MASK), so it if TIF_FSCHECK is set again then > > > the loop will never terminate. Does this make sense? > > > Yes, it does. Thanks for looking into this. > > > > I added some instrumentation to check if TIF_FSCHECK can show up during > > > the do_work_pending loop and the answer seems to be yes. I also tried > > > to get a stack with a set_fs call from inside do_work_pending and got > > > the following: > > > > > > [ 227.582402] CPU: 0 PID: 829 Comm: sleep Not tainted > > > 4.12.0-01057-g93af8f7-dirty #332 > > > [ 227.590171] Hardware name: Freescale i.MX6 SoloLite (Device Tree) > > > [ 227.596275] Backtrace: > > > [ 227.598754] [] (dump_backtrace) from [] (show_stack+0x18/0x1c) > > > [ 227.606339] r7: r6:60070113 r5: r4:c105a958 > > > [ 227.612016] [] (show_stack) from [] (dump_stack+0xb4/0xe8) > > > [ 227.619258] [] (dump_stack) from [] (mydbg_set_fs+0x40/0x48) > > > [ 227.626671] r9:c08cf35c r8:ee1cda7c r7:ee1e3dce r6:bf00 > > > r5: r4:e000 > > > [ 227.634433] [] (mydbg_set_fs) from [] (__probe_kernel_read+0x44/0xd0) > > > [ 227.642629] [] (__probe_kernel_read) from [] (do_alignment+0x8c/0x75c) > > > [ 227.650909] r10:ef085000 r9:c08cf35c r8:0001 r7:ee1e3dce > > > r6:c011b84c r5:ee1cdbe0 > > > [ 227.658748] r4: r3: > > > [ 227.662338] [] (do_alignment) from [] (do_DataAbort+0x40/0xc0) > > > [ 227.669921] r10:ef085000 r9:ee1cc000 r8:ee1cdbe0 r7:ee1e3dce > > > r6:c011b84c r5:0001 > > > [ 227.677760] r4:c100dd3c > > > [ 227.680308] [] (do_DataAbort) from [] (__dabt_svc+0x64/0xa0) > > > [ 227.687714] Exception stack(0xee1cdbe0 to 0xee1cdc28) > > > [ 227.692780] dbe0: 9064a8c0 ee1e3de2 d82727d8 ee1b20c0 > > > ee1e3dce ef08572c > > > [ 227.700971] dc00: c0bb2034 c10c75ea ef085000 ee1cdc74 ee1cdc00 > > > ee1cdc30 c01761a8 c08cf35c > > > [ 227.709158] dc20: 40070113 > > > [ 227.712661] r8:c0bb2034 r7:ee1cdc14 r6: r5:40070113 > > > r4:c08cf35c > > > [ 227.719382] [] (inet_gro_receive) from [] (dev_gro_receive+0x2f0/0x618) > > > [ 227.727746] r10:ef085000 r9:0001 r8: r7:ef085710 > > > r6:c1008b88 r5:ee1b20c0 > > > [ 227.735585] r4:c1009f78 > > > [ 227.738132] [] (dev_gro_receive) from [] (napi_gro_receive+0x78/0x1f4) > > > [ 227.746410] r10:ef085000 r9:0001 r8:c10d15ec r7:c100792c > > > r6:ef085710 r5:c10c744e > > > [ 227.754249] r4:ee1b20c0 > > > [ 227.756801] [] (napi_gro_receive) from [] > > > (fec_enet_rx_napi+0x39c/0x988) > > > [ 227.765253] r9:0001 r8:f0c8a960 r7: r6: > > > r5:ef086000 r4:ee1b20c0 > > > [ 227.773010] [] (fec_enet_rx_napi) from [] (net_rx_action+0x21c/0x474) > > > [ 227.781201] r10:ee1cdd78 r9:c0fa7b80 r8:ef7dab80 r7:012c > > > r6:0040 r5:0001 > > > [ 227.789039] r4:ef085710 > > > [ 227.791593] [] (net_rx_action) from []
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Tue, Jul 18, 2017 at 10:18 AM, Leonard Crestez wrote: > > On Tue, 2017-07-18 at 09:04 -0700, Thomas Garnier wrote: > > On Tue, Jul 18, 2017 at 7:36 AM, Leonard Crestez > > wrote: > > > > > > On Wed, 2017-06-14 at 18:12 -0700, Thomas Garnier wrote: > > > > > > > > Ensure the address limit is a user-mode segment before returning to > > > > user-mode. Otherwise a process can corrupt kernel-mode memory and > > > > elevate privileges [1]. > > > > > > > > The set_fs function sets the TIF_SETFS flag to force a slow path on > > > > return. In the slow path, the address limit is checked to be USER_DS if > > > > needed. > > > > > > > > The TIF_SETFS flag is added to _TIF_WORK_MASK shifting _TIF_SYSCALL_WORK > > > > for arm instruction immediate support. The global work mask is too big > > > > to used on a single instruction so adapt ret_fast_syscall. > > > > > > > > @@ -571,6 +572,10 @@ do_work_pending(struct pt_regs *regs, unsigned int > > > > thread_flags, int syscall) > > > >* Update the trace code with the current status. > > > >*/ > > > > trace_hardirqs_off(); > > > > + > > > > + /* Check valid user FS if needed */ > > > > + addr_limit_user_check(); > > > > + > > > > do { > > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { > > > > schedule(); > > > This patch made it's way into linux-next next-20170717 and it seems to > > > cause hangs when booting some boards over NFS (found via bisection). I > > > don't know exactly what determines the issue but I can reproduce hangs > > > if even if I just boot with init=/bin/bash and do stuff like > > > > > > # sleep 1 & sleep 1 & sleep 1 & wait; wait; wait; echo done! > > > > > > When this happens sysrq-t shows a sleep task hung in the 'R' state > > > spinning in do_work_pending, so maybe there is a potential infinite > > > loop here? > > > > > > The addr_limit_user_check at the start of do_work_pending will check > > > for TIF_FSCHECK once and clear it but the function loops while > > > (thread_flags & _TIF_WORK_MASK), so it if TIF_FSCHECK is set again then > > > the loop will never terminate. Does this make sense? > > > Yes, it does. Thanks for looking into this. > > > > I added some instrumentation to check if TIF_FSCHECK can show up during > > > the do_work_pending loop and the answer seems to be yes. I also tried > > > to get a stack with a set_fs call from inside do_work_pending and got > > > the following: > > > > > > [ 227.582402] CPU: 0 PID: 829 Comm: sleep Not tainted > > > 4.12.0-01057-g93af8f7-dirty #332 > > > [ 227.590171] Hardware name: Freescale i.MX6 SoloLite (Device Tree) > > > [ 227.596275] Backtrace: > > > [ 227.598754] [] (dump_backtrace) from [] (show_stack+0x18/0x1c) > > > [ 227.606339] r7: r6:60070113 r5: r4:c105a958 > > > [ 227.612016] [] (show_stack) from [] (dump_stack+0xb4/0xe8) > > > [ 227.619258] [] (dump_stack) from [] (mydbg_set_fs+0x40/0x48) > > > [ 227.626671] r9:c08cf35c r8:ee1cda7c r7:ee1e3dce r6:bf00 > > > r5: r4:e000 > > > [ 227.634433] [] (mydbg_set_fs) from [] (__probe_kernel_read+0x44/0xd0) > > > [ 227.642629] [] (__probe_kernel_read) from [] (do_alignment+0x8c/0x75c) > > > [ 227.650909] r10:ef085000 r9:c08cf35c r8:0001 r7:ee1e3dce > > > r6:c011b84c r5:ee1cdbe0 > > > [ 227.658748] r4: r3: > > > [ 227.662338] [] (do_alignment) from [] (do_DataAbort+0x40/0xc0) > > > [ 227.669921] r10:ef085000 r9:ee1cc000 r8:ee1cdbe0 r7:ee1e3dce > > > r6:c011b84c r5:0001 > > > [ 227.677760] r4:c100dd3c > > > [ 227.680308] [] (do_DataAbort) from [] (__dabt_svc+0x64/0xa0) > > > [ 227.687714] Exception stack(0xee1cdbe0 to 0xee1cdc28) > > > [ 227.692780] dbe0: 9064a8c0 ee1e3de2 d82727d8 ee1b20c0 > > > ee1e3dce ef08572c > > > [ 227.700971] dc00: c0bb2034 c10c75ea ef085000 ee1cdc74 ee1cdc00 > > > ee1cdc30 c01761a8 c08cf35c > > > [ 227.709158] dc20: 40070113 > > > [ 227.712661] r8:c0bb2034 r7:ee1cdc14 r6: r5:40070113 > > > r4:c08cf35c > > > [ 227.719382] [] (inet_gro_receive) from [] (dev_gro_receive+0x2f0/0x618) > > > [ 227.727746] r10:ef085000 r9:0001 r8: r7:ef085710 > > > r6:c1008b88 r5:ee1b20c0 > > > [ 227.735585] r4:c1009f78 > > > [ 227.738132] [] (dev_gro_receive) from [] (napi_gro_receive+0x78/0x1f4) > > > [ 227.746410] r10:ef085000 r9:0001 r8:c10d15ec r7:c100792c > > > r6:ef085710 r5:c10c744e > > > [ 227.754249] r4:ee1b20c0 > > > [ 227.756801] [] (napi_gro_receive) from [] > > > (fec_enet_rx_napi+0x39c/0x988) > > > [ 227.765253] r9:0001 r8:f0c8a960 r7: r6: > > > r5:ef086000 r4:ee1b20c0 > > > [ 227.773010] [] (fec_enet_rx_napi) from [] (net_rx_action+0x21c/0x474) > > > [ 227.781201] r10:ee1cdd78 r9:c0fa7b80 r8:ef7dab80 r7:012c > > > r6:0040 r5:0001 > > > [ 227.789039] r4:ef085710 > > > [ 227.791593] [] (net_rx_action) from [] (__do_softirq+0x158/0x534) > > > [ 227.799437]
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Tue, 2017-07-18 at 09:04 -0700, Thomas Garnier wrote: > On Tue, Jul 18, 2017 at 7:36 AM, Leonard Crestez> wrote: > > > > On Wed, 2017-06-14 at 18:12 -0700, Thomas Garnier wrote: > > > > > > Ensure the address limit is a user-mode segment before returning to > > > user-mode. Otherwise a process can corrupt kernel-mode memory and > > > elevate privileges [1]. > > > > > > The set_fs function sets the TIF_SETFS flag to force a slow path on > > > return. In the slow path, the address limit is checked to be USER_DS if > > > needed. > > > > > > The TIF_SETFS flag is added to _TIF_WORK_MASK shifting _TIF_SYSCALL_WORK > > > for arm instruction immediate support. The global work mask is too big > > > to used on a single instruction so adapt ret_fast_syscall. > > > > > > @@ -571,6 +572,10 @@ do_work_pending(struct pt_regs *regs, unsigned int > > > thread_flags, int syscall) > > > * Update the trace code with the current status. > > > */ > > > trace_hardirqs_off(); > > > + > > > + /* Check valid user FS if needed */ > > > + addr_limit_user_check(); > > > + > > > do { > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { > > > schedule(); > > This patch made it's way into linux-next next-20170717 and it seems to > > cause hangs when booting some boards over NFS (found via bisection). I > > don't know exactly what determines the issue but I can reproduce hangs > > if even if I just boot with init=/bin/bash and do stuff like > > > > # sleep 1 & sleep 1 & sleep 1 & wait; wait; wait; echo done! > > > > When this happens sysrq-t shows a sleep task hung in the 'R' state > > spinning in do_work_pending, so maybe there is a potential infinite > > loop here? > > > > The addr_limit_user_check at the start of do_work_pending will check > > for TIF_FSCHECK once and clear it but the function loops while > > (thread_flags & _TIF_WORK_MASK), so it if TIF_FSCHECK is set again then > > the loop will never terminate. Does this make sense? > Yes, it does. Thanks for looking into this. > > I added some instrumentation to check if TIF_FSCHECK can show up during > > the do_work_pending loop and the answer seems to be yes. I also tried > > to get a stack with a set_fs call from inside do_work_pending and got > > the following: > > > > [ 227.582402] CPU: 0 PID: 829 Comm: sleep Not tainted > > 4.12.0-01057-g93af8f7-dirty #332 > > [ 227.590171] Hardware name: Freescale i.MX6 SoloLite (Device Tree) > > [ 227.596275] Backtrace: > > [ 227.598754] [] (dump_backtrace) from [] (show_stack+0x18/0x1c) > > [ 227.606339] r7: r6:60070113 r5: r4:c105a958 > > [ 227.612016] [] (show_stack) from [] (dump_stack+0xb4/0xe8) > > [ 227.619258] [] (dump_stack) from [] (mydbg_set_fs+0x40/0x48) > > [ 227.626671] r9:c08cf35c r8:ee1cda7c r7:ee1e3dce r6:bf00 r5: > > r4:e000 > > [ 227.634433] [] (mydbg_set_fs) from [] (__probe_kernel_read+0x44/0xd0) > > [ 227.642629] [] (__probe_kernel_read) from [] (do_alignment+0x8c/0x75c) > > [ 227.650909] r10:ef085000 r9:c08cf35c r8:0001 r7:ee1e3dce > > r6:c011b84c r5:ee1cdbe0 > > [ 227.658748] r4: r3: > > [ 227.662338] [] (do_alignment) from [] (do_DataAbort+0x40/0xc0) > > [ 227.669921] r10:ef085000 r9:ee1cc000 r8:ee1cdbe0 r7:ee1e3dce > > r6:c011b84c r5:0001 > > [ 227.677760] r4:c100dd3c > > [ 227.680308] [] (do_DataAbort) from [] (__dabt_svc+0x64/0xa0) > > [ 227.687714] Exception stack(0xee1cdbe0 to 0xee1cdc28) > > [ 227.692780] dbe0: 9064a8c0 ee1e3de2 d82727d8 ee1b20c0 ee1e3dce > > ef08572c > > [ 227.700971] dc00: c0bb2034 c10c75ea ef085000 ee1cdc74 ee1cdc00 ee1cdc30 > > c01761a8 c08cf35c > > [ 227.709158] dc20: 40070113 > > [ 227.712661] r8:c0bb2034 r7:ee1cdc14 r6: r5:40070113 r4:c08cf35c > > [ 227.719382] [] (inet_gro_receive) from [] (dev_gro_receive+0x2f0/0x618) > > [ 227.727746] r10:ef085000 r9:0001 r8: r7:ef085710 > > r6:c1008b88 r5:ee1b20c0 > > [ 227.735585] r4:c1009f78 > > [ 227.738132] [] (dev_gro_receive) from [] (napi_gro_receive+0x78/0x1f4) > > [ 227.746410] r10:ef085000 r9:0001 r8:c10d15ec r7:c100792c > > r6:ef085710 r5:c10c744e > > [ 227.754249] r4:ee1b20c0 > > [ 227.756801] [] (napi_gro_receive) from [] (fec_enet_rx_napi+0x39c/0x988) > > [ 227.765253] r9:0001 r8:f0c8a960 r7: r6: r5:ef086000 > > r4:ee1b20c0 > > [ 227.773010] [] (fec_enet_rx_napi) from [] (net_rx_action+0x21c/0x474) > > [ 227.781201] r10:ee1cdd78 r9:c0fa7b80 r8:ef7dab80 r7:012c > > r6:0040 r5:0001 > > [ 227.789039] r4:ef085710 > > [ 227.791593] [] (net_rx_action) from [] (__do_softirq+0x158/0x534) > > [ 227.799437] r10:0008 r9:ee1cc000 r8:c10ce568 r7:c100792c > > r6:c10247bd r5:0003 > > [ 227.807275] r4:c100208c > > [ 227.809824] [] (__do_softirq) from [] (irq_exit+0xec/0x168) > > [ 227.817147] r10:c1007ea0 r9:ef010400 r8:0001
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Tue, 2017-07-18 at 09:04 -0700, Thomas Garnier wrote: > On Tue, Jul 18, 2017 at 7:36 AM, Leonard Crestez > wrote: > > > > On Wed, 2017-06-14 at 18:12 -0700, Thomas Garnier wrote: > > > > > > Ensure the address limit is a user-mode segment before returning to > > > user-mode. Otherwise a process can corrupt kernel-mode memory and > > > elevate privileges [1]. > > > > > > The set_fs function sets the TIF_SETFS flag to force a slow path on > > > return. In the slow path, the address limit is checked to be USER_DS if > > > needed. > > > > > > The TIF_SETFS flag is added to _TIF_WORK_MASK shifting _TIF_SYSCALL_WORK > > > for arm instruction immediate support. The global work mask is too big > > > to used on a single instruction so adapt ret_fast_syscall. > > > > > > @@ -571,6 +572,10 @@ do_work_pending(struct pt_regs *regs, unsigned int > > > thread_flags, int syscall) > > > * Update the trace code with the current status. > > > */ > > > trace_hardirqs_off(); > > > + > > > + /* Check valid user FS if needed */ > > > + addr_limit_user_check(); > > > + > > > do { > > > if (likely(thread_flags & _TIF_NEED_RESCHED)) { > > > schedule(); > > This patch made it's way into linux-next next-20170717 and it seems to > > cause hangs when booting some boards over NFS (found via bisection). I > > don't know exactly what determines the issue but I can reproduce hangs > > if even if I just boot with init=/bin/bash and do stuff like > > > > # sleep 1 & sleep 1 & sleep 1 & wait; wait; wait; echo done! > > > > When this happens sysrq-t shows a sleep task hung in the 'R' state > > spinning in do_work_pending, so maybe there is a potential infinite > > loop here? > > > > The addr_limit_user_check at the start of do_work_pending will check > > for TIF_FSCHECK once and clear it but the function loops while > > (thread_flags & _TIF_WORK_MASK), so it if TIF_FSCHECK is set again then > > the loop will never terminate. Does this make sense? > Yes, it does. Thanks for looking into this. > > I added some instrumentation to check if TIF_FSCHECK can show up during > > the do_work_pending loop and the answer seems to be yes. I also tried > > to get a stack with a set_fs call from inside do_work_pending and got > > the following: > > > > [ 227.582402] CPU: 0 PID: 829 Comm: sleep Not tainted > > 4.12.0-01057-g93af8f7-dirty #332 > > [ 227.590171] Hardware name: Freescale i.MX6 SoloLite (Device Tree) > > [ 227.596275] Backtrace: > > [ 227.598754] [] (dump_backtrace) from [] (show_stack+0x18/0x1c) > > [ 227.606339] r7: r6:60070113 r5: r4:c105a958 > > [ 227.612016] [] (show_stack) from [] (dump_stack+0xb4/0xe8) > > [ 227.619258] [] (dump_stack) from [] (mydbg_set_fs+0x40/0x48) > > [ 227.626671] r9:c08cf35c r8:ee1cda7c r7:ee1e3dce r6:bf00 r5: > > r4:e000 > > [ 227.634433] [] (mydbg_set_fs) from [] (__probe_kernel_read+0x44/0xd0) > > [ 227.642629] [] (__probe_kernel_read) from [] (do_alignment+0x8c/0x75c) > > [ 227.650909] r10:ef085000 r9:c08cf35c r8:0001 r7:ee1e3dce > > r6:c011b84c r5:ee1cdbe0 > > [ 227.658748] r4: r3: > > [ 227.662338] [] (do_alignment) from [] (do_DataAbort+0x40/0xc0) > > [ 227.669921] r10:ef085000 r9:ee1cc000 r8:ee1cdbe0 r7:ee1e3dce > > r6:c011b84c r5:0001 > > [ 227.677760] r4:c100dd3c > > [ 227.680308] [] (do_DataAbort) from [] (__dabt_svc+0x64/0xa0) > > [ 227.687714] Exception stack(0xee1cdbe0 to 0xee1cdc28) > > [ 227.692780] dbe0: 9064a8c0 ee1e3de2 d82727d8 ee1b20c0 ee1e3dce > > ef08572c > > [ 227.700971] dc00: c0bb2034 c10c75ea ef085000 ee1cdc74 ee1cdc00 ee1cdc30 > > c01761a8 c08cf35c > > [ 227.709158] dc20: 40070113 > > [ 227.712661] r8:c0bb2034 r7:ee1cdc14 r6: r5:40070113 r4:c08cf35c > > [ 227.719382] [] (inet_gro_receive) from [] (dev_gro_receive+0x2f0/0x618) > > [ 227.727746] r10:ef085000 r9:0001 r8: r7:ef085710 > > r6:c1008b88 r5:ee1b20c0 > > [ 227.735585] r4:c1009f78 > > [ 227.738132] [] (dev_gro_receive) from [] (napi_gro_receive+0x78/0x1f4) > > [ 227.746410] r10:ef085000 r9:0001 r8:c10d15ec r7:c100792c > > r6:ef085710 r5:c10c744e > > [ 227.754249] r4:ee1b20c0 > > [ 227.756801] [] (napi_gro_receive) from [] (fec_enet_rx_napi+0x39c/0x988) > > [ 227.765253] r9:0001 r8:f0c8a960 r7: r6: r5:ef086000 > > r4:ee1b20c0 > > [ 227.773010] [] (fec_enet_rx_napi) from [] (net_rx_action+0x21c/0x474) > > [ 227.781201] r10:ee1cdd78 r9:c0fa7b80 r8:ef7dab80 r7:012c > > r6:0040 r5:0001 > > [ 227.789039] r4:ef085710 > > [ 227.791593] [] (net_rx_action) from [] (__do_softirq+0x158/0x534) > > [ 227.799437] r10:0008 r9:ee1cc000 r8:c10ce568 r7:c100792c > > r6:c10247bd r5:0003 > > [ 227.807275] r4:c100208c > > [ 227.809824] [] (__do_softirq) from [] (irq_exit+0xec/0x168) > > [ 227.817147] r10:c1007ea0 r9:ef010400 r8:0001 r7: > >
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Tue, Jul 18, 2017 at 7:36 AM, Leonard Crestezwrote: > On Wed, 2017-06-14 at 18:12 -0700, Thomas Garnier wrote: >> Ensure the address limit is a user-mode segment before returning to >> user-mode. Otherwise a process can corrupt kernel-mode memory and >> elevate privileges [1]. >> >> The set_fs function sets the TIF_SETFS flag to force a slow path on >> return. In the slow path, the address limit is checked to be USER_DS if >> needed. >> >> The TIF_SETFS flag is added to _TIF_WORK_MASK shifting _TIF_SYSCALL_WORK >> for arm instruction immediate support. The global work mask is too big >> to used on a single instruction so adapt ret_fast_syscall. >> >> [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990 >> >> Signed-off-by: Thomas Garnier >> --- >> v10 redesigns the change to use work flags on set_fs as recommended by >> Linus and agreed by others. >> >> Based on next-20170609 >> --- >> arch/arm/include/asm/thread_info.h | 15 +-- >> arch/arm/include/asm/uaccess.h | 2 ++ >> arch/arm/kernel/entry-common.S | 9 +++-- >> arch/arm/kernel/signal.c | 5 + >> 4 files changed, 23 insertions(+), 8 deletions(-) >> >> diff --git a/arch/arm/include/asm/thread_info.h >> b/arch/arm/include/asm/thread_info.h >> index 776757d1604a..1d468b527b7b 100644 >> --- a/arch/arm/include/asm/thread_info.h >> +++ b/arch/arm/include/asm/thread_info.h >> @@ -139,10 +139,11 @@ extern int vfp_restore_user_hwstate(struct user_vfp >> __user *, >> #define TIF_NEED_RESCHED 1 /* rescheduling necessary */ >> #define TIF_NOTIFY_RESUME2 /* callback before returning to user */ >> #define TIF_UPROBE 3 /* breakpointed or singlestepping */ >> -#define TIF_SYSCALL_TRACE4 /* syscall trace active */ >> -#define TIF_SYSCALL_AUDIT5 /* syscall auditing active */ >> -#define TIF_SYSCALL_TRACEPOINT 6 /* syscall tracepoint >> instrumentation */ >> -#define TIF_SECCOMP 7 /* seccomp syscall filtering active */ >> +#define TIF_FSCHECK 4 /* Check FS is USER_DS on return */ >> +#define TIF_SYSCALL_TRACE5 /* syscall trace active */ >> +#define TIF_SYSCALL_AUDIT6 /* syscall auditing active */ >> +#define TIF_SYSCALL_TRACEPOINT 7 /* syscall tracepoint >> instrumentation */ >> +#define TIF_SECCOMP 8 /* seccomp syscall filtering active */ >> >> #define TIF_NOHZ 12 /* in adaptive nohz mode */ >> #define TIF_USING_IWMMXT 17 >> @@ -153,6 +154,7 @@ extern int vfp_restore_user_hwstate(struct user_vfp >> __user *, >> #define _TIF_NEED_RESCHED(1 << TIF_NEED_RESCHED) >> #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) >> #define _TIF_UPROBE (1 << TIF_UPROBE) >> +#define _TIF_FSCHECK (1 << TIF_FSCHECK) >> #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) >> #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) >> #define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT) >> @@ -166,8 +168,9 @@ extern int vfp_restore_user_hwstate(struct user_vfp >> __user *, >> /* >> * Change these and you break ASM code in entry-common.S >> */ >> -#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | >> \ >> - _TIF_NOTIFY_RESUME | _TIF_UPROBE) >> +#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | >> \ >> + _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ >> + _TIF_FSCHECK) >> >> #endif /* __KERNEL__ */ >> #endif /* __ASM_ARM_THREAD_INFO_H */ >> diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h >> index 2577405d082d..6cc882223e34 100644 >> --- a/arch/arm/include/asm/uaccess.h >> +++ b/arch/arm/include/asm/uaccess.h >> @@ -77,6 +77,8 @@ static inline void set_fs(mm_segment_t fs) >> { >> current_thread_info()->addr_limit = fs; >> modify_domain(DOMAIN_KERNEL, fs ? DOMAIN_CLIENT : DOMAIN_MANAGER); >> + /* On user-mode return, check fs is correct */ >> + set_thread_flag(TIF_FSCHECK); >> } >> >> #define segment_eq(a, b) ((a) == (b)) >> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S >> index eb5cd77bf1d8..e33c32d56193 100644 >> --- a/arch/arm/kernel/entry-common.S >> +++ b/arch/arm/kernel/entry-common.S >> @@ -41,7 +41,9 @@ ret_fast_syscall: >> UNWIND(.cantunwind ) >> disable_irq_notrace @ disable interrupts >> ldr r1, [tsk, #TI_FLAGS]@ re-check for syscall tracing >> - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK >> + tst r1, #_TIF_SYSCALL_WORK >> + bne fast_work_pending >> + tst r1, #_TIF_WORK_MASK >> bne fast_work_pending >> >> /* perform architecture specific actions before user return */ >> @@ -67,12 +69,15 @@ ret_fast_syscall: >> str r0, [sp, #S_R0 +
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Tue, Jul 18, 2017 at 7:36 AM, Leonard Crestez wrote: > On Wed, 2017-06-14 at 18:12 -0700, Thomas Garnier wrote: >> Ensure the address limit is a user-mode segment before returning to >> user-mode. Otherwise a process can corrupt kernel-mode memory and >> elevate privileges [1]. >> >> The set_fs function sets the TIF_SETFS flag to force a slow path on >> return. In the slow path, the address limit is checked to be USER_DS if >> needed. >> >> The TIF_SETFS flag is added to _TIF_WORK_MASK shifting _TIF_SYSCALL_WORK >> for arm instruction immediate support. The global work mask is too big >> to used on a single instruction so adapt ret_fast_syscall. >> >> [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990 >> >> Signed-off-by: Thomas Garnier >> --- >> v10 redesigns the change to use work flags on set_fs as recommended by >> Linus and agreed by others. >> >> Based on next-20170609 >> --- >> arch/arm/include/asm/thread_info.h | 15 +-- >> arch/arm/include/asm/uaccess.h | 2 ++ >> arch/arm/kernel/entry-common.S | 9 +++-- >> arch/arm/kernel/signal.c | 5 + >> 4 files changed, 23 insertions(+), 8 deletions(-) >> >> diff --git a/arch/arm/include/asm/thread_info.h >> b/arch/arm/include/asm/thread_info.h >> index 776757d1604a..1d468b527b7b 100644 >> --- a/arch/arm/include/asm/thread_info.h >> +++ b/arch/arm/include/asm/thread_info.h >> @@ -139,10 +139,11 @@ extern int vfp_restore_user_hwstate(struct user_vfp >> __user *, >> #define TIF_NEED_RESCHED 1 /* rescheduling necessary */ >> #define TIF_NOTIFY_RESUME2 /* callback before returning to user */ >> #define TIF_UPROBE 3 /* breakpointed or singlestepping */ >> -#define TIF_SYSCALL_TRACE4 /* syscall trace active */ >> -#define TIF_SYSCALL_AUDIT5 /* syscall auditing active */ >> -#define TIF_SYSCALL_TRACEPOINT 6 /* syscall tracepoint >> instrumentation */ >> -#define TIF_SECCOMP 7 /* seccomp syscall filtering active */ >> +#define TIF_FSCHECK 4 /* Check FS is USER_DS on return */ >> +#define TIF_SYSCALL_TRACE5 /* syscall trace active */ >> +#define TIF_SYSCALL_AUDIT6 /* syscall auditing active */ >> +#define TIF_SYSCALL_TRACEPOINT 7 /* syscall tracepoint >> instrumentation */ >> +#define TIF_SECCOMP 8 /* seccomp syscall filtering active */ >> >> #define TIF_NOHZ 12 /* in adaptive nohz mode */ >> #define TIF_USING_IWMMXT 17 >> @@ -153,6 +154,7 @@ extern int vfp_restore_user_hwstate(struct user_vfp >> __user *, >> #define _TIF_NEED_RESCHED(1 << TIF_NEED_RESCHED) >> #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) >> #define _TIF_UPROBE (1 << TIF_UPROBE) >> +#define _TIF_FSCHECK (1 << TIF_FSCHECK) >> #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) >> #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) >> #define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT) >> @@ -166,8 +168,9 @@ extern int vfp_restore_user_hwstate(struct user_vfp >> __user *, >> /* >> * Change these and you break ASM code in entry-common.S >> */ >> -#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | >> \ >> - _TIF_NOTIFY_RESUME | _TIF_UPROBE) >> +#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | >> \ >> + _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ >> + _TIF_FSCHECK) >> >> #endif /* __KERNEL__ */ >> #endif /* __ASM_ARM_THREAD_INFO_H */ >> diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h >> index 2577405d082d..6cc882223e34 100644 >> --- a/arch/arm/include/asm/uaccess.h >> +++ b/arch/arm/include/asm/uaccess.h >> @@ -77,6 +77,8 @@ static inline void set_fs(mm_segment_t fs) >> { >> current_thread_info()->addr_limit = fs; >> modify_domain(DOMAIN_KERNEL, fs ? DOMAIN_CLIENT : DOMAIN_MANAGER); >> + /* On user-mode return, check fs is correct */ >> + set_thread_flag(TIF_FSCHECK); >> } >> >> #define segment_eq(a, b) ((a) == (b)) >> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S >> index eb5cd77bf1d8..e33c32d56193 100644 >> --- a/arch/arm/kernel/entry-common.S >> +++ b/arch/arm/kernel/entry-common.S >> @@ -41,7 +41,9 @@ ret_fast_syscall: >> UNWIND(.cantunwind ) >> disable_irq_notrace @ disable interrupts >> ldr r1, [tsk, #TI_FLAGS]@ re-check for syscall tracing >> - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK >> + tst r1, #_TIF_SYSCALL_WORK >> + bne fast_work_pending >> + tst r1, #_TIF_WORK_MASK >> bne fast_work_pending >> >> /* perform architecture specific actions before user return */ >> @@ -67,12 +69,15 @@ ret_fast_syscall: >> str r0, [sp, #S_R0 + S_OFF]!@ save returned r0 >>
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Wed, 2017-06-14 at 18:12 -0700, Thomas Garnier wrote: > Ensure the address limit is a user-mode segment before returning to > user-mode. Otherwise a process can corrupt kernel-mode memory and > elevate privileges [1]. > > The set_fs function sets the TIF_SETFS flag to force a slow path on > return. In the slow path, the address limit is checked to be USER_DS if > needed. > > The TIF_SETFS flag is added to _TIF_WORK_MASK shifting _TIF_SYSCALL_WORK > for arm instruction immediate support. The global work mask is too big > to used on a single instruction so adapt ret_fast_syscall. > > [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990 > > Signed-off-by: Thomas Garnier> --- > v10 redesigns the change to use work flags on set_fs as recommended by > Linus and agreed by others. > > Based on next-20170609 > --- > arch/arm/include/asm/thread_info.h | 15 +-- > arch/arm/include/asm/uaccess.h | 2 ++ > arch/arm/kernel/entry-common.S | 9 +++-- > arch/arm/kernel/signal.c | 5 + > 4 files changed, 23 insertions(+), 8 deletions(-) > > diff --git a/arch/arm/include/asm/thread_info.h > b/arch/arm/include/asm/thread_info.h > index 776757d1604a..1d468b527b7b 100644 > --- a/arch/arm/include/asm/thread_info.h > +++ b/arch/arm/include/asm/thread_info.h > @@ -139,10 +139,11 @@ extern int vfp_restore_user_hwstate(struct user_vfp > __user *, > #define TIF_NEED_RESCHED 1 /* rescheduling necessary */ > #define TIF_NOTIFY_RESUME2 /* callback before returning to user */ > #define TIF_UPROBE 3 /* breakpointed or singlestepping */ > -#define TIF_SYSCALL_TRACE4 /* syscall trace active */ > -#define TIF_SYSCALL_AUDIT5 /* syscall auditing active */ > -#define TIF_SYSCALL_TRACEPOINT 6 /* syscall tracepoint > instrumentation */ > -#define TIF_SECCOMP 7 /* seccomp syscall filtering active */ > +#define TIF_FSCHECK 4 /* Check FS is USER_DS on return */ > +#define TIF_SYSCALL_TRACE5 /* syscall trace active */ > +#define TIF_SYSCALL_AUDIT6 /* syscall auditing active */ > +#define TIF_SYSCALL_TRACEPOINT 7 /* syscall tracepoint > instrumentation */ > +#define TIF_SECCOMP 8 /* seccomp syscall filtering active */ > > #define TIF_NOHZ 12 /* in adaptive nohz mode */ > #define TIF_USING_IWMMXT 17 > @@ -153,6 +154,7 @@ extern int vfp_restore_user_hwstate(struct user_vfp > __user *, > #define _TIF_NEED_RESCHED(1 << TIF_NEED_RESCHED) > #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) > #define _TIF_UPROBE (1 << TIF_UPROBE) > +#define _TIF_FSCHECK (1 << TIF_FSCHECK) > #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) > #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) > #define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT) > @@ -166,8 +168,9 @@ extern int vfp_restore_user_hwstate(struct user_vfp > __user *, > /* > * Change these and you break ASM code in entry-common.S > */ > -#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ > - _TIF_NOTIFY_RESUME | _TIF_UPROBE) > +#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | > \ > + _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ > + _TIF_FSCHECK) > > #endif /* __KERNEL__ */ > #endif /* __ASM_ARM_THREAD_INFO_H */ > diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h > index 2577405d082d..6cc882223e34 100644 > --- a/arch/arm/include/asm/uaccess.h > +++ b/arch/arm/include/asm/uaccess.h > @@ -77,6 +77,8 @@ static inline void set_fs(mm_segment_t fs) > { > current_thread_info()->addr_limit = fs; > modify_domain(DOMAIN_KERNEL, fs ? DOMAIN_CLIENT : DOMAIN_MANAGER); > + /* On user-mode return, check fs is correct */ > + set_thread_flag(TIF_FSCHECK); > } > > #define segment_eq(a, b) ((a) == (b)) > diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S > index eb5cd77bf1d8..e33c32d56193 100644 > --- a/arch/arm/kernel/entry-common.S > +++ b/arch/arm/kernel/entry-common.S > @@ -41,7 +41,9 @@ ret_fast_syscall: > UNWIND(.cantunwind ) > disable_irq_notrace @ disable interrupts > ldr r1, [tsk, #TI_FLAGS]@ re-check for syscall tracing > - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK > + tst r1, #_TIF_SYSCALL_WORK > + bne fast_work_pending > + tst r1, #_TIF_WORK_MASK > bne fast_work_pending > > /* perform architecture specific actions before user return */ > @@ -67,12 +69,15 @@ ret_fast_syscall: > str r0, [sp, #S_R0 + S_OFF]!@ save returned r0 > disable_irq_notrace @ disable interrupts > ldr r1, [tsk, #TI_FLAGS]@ re-check for syscall
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Wed, 2017-06-14 at 18:12 -0700, Thomas Garnier wrote: > Ensure the address limit is a user-mode segment before returning to > user-mode. Otherwise a process can corrupt kernel-mode memory and > elevate privileges [1]. > > The set_fs function sets the TIF_SETFS flag to force a slow path on > return. In the slow path, the address limit is checked to be USER_DS if > needed. > > The TIF_SETFS flag is added to _TIF_WORK_MASK shifting _TIF_SYSCALL_WORK > for arm instruction immediate support. The global work mask is too big > to used on a single instruction so adapt ret_fast_syscall. > > [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990 > > Signed-off-by: Thomas Garnier > --- > v10 redesigns the change to use work flags on set_fs as recommended by > Linus and agreed by others. > > Based on next-20170609 > --- > arch/arm/include/asm/thread_info.h | 15 +-- > arch/arm/include/asm/uaccess.h | 2 ++ > arch/arm/kernel/entry-common.S | 9 +++-- > arch/arm/kernel/signal.c | 5 + > 4 files changed, 23 insertions(+), 8 deletions(-) > > diff --git a/arch/arm/include/asm/thread_info.h > b/arch/arm/include/asm/thread_info.h > index 776757d1604a..1d468b527b7b 100644 > --- a/arch/arm/include/asm/thread_info.h > +++ b/arch/arm/include/asm/thread_info.h > @@ -139,10 +139,11 @@ extern int vfp_restore_user_hwstate(struct user_vfp > __user *, > #define TIF_NEED_RESCHED 1 /* rescheduling necessary */ > #define TIF_NOTIFY_RESUME2 /* callback before returning to user */ > #define TIF_UPROBE 3 /* breakpointed or singlestepping */ > -#define TIF_SYSCALL_TRACE4 /* syscall trace active */ > -#define TIF_SYSCALL_AUDIT5 /* syscall auditing active */ > -#define TIF_SYSCALL_TRACEPOINT 6 /* syscall tracepoint > instrumentation */ > -#define TIF_SECCOMP 7 /* seccomp syscall filtering active */ > +#define TIF_FSCHECK 4 /* Check FS is USER_DS on return */ > +#define TIF_SYSCALL_TRACE5 /* syscall trace active */ > +#define TIF_SYSCALL_AUDIT6 /* syscall auditing active */ > +#define TIF_SYSCALL_TRACEPOINT 7 /* syscall tracepoint > instrumentation */ > +#define TIF_SECCOMP 8 /* seccomp syscall filtering active */ > > #define TIF_NOHZ 12 /* in adaptive nohz mode */ > #define TIF_USING_IWMMXT 17 > @@ -153,6 +154,7 @@ extern int vfp_restore_user_hwstate(struct user_vfp > __user *, > #define _TIF_NEED_RESCHED(1 << TIF_NEED_RESCHED) > #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) > #define _TIF_UPROBE (1 << TIF_UPROBE) > +#define _TIF_FSCHECK (1 << TIF_FSCHECK) > #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) > #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) > #define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT) > @@ -166,8 +168,9 @@ extern int vfp_restore_user_hwstate(struct user_vfp > __user *, > /* > * Change these and you break ASM code in entry-common.S > */ > -#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ > - _TIF_NOTIFY_RESUME | _TIF_UPROBE) > +#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | > \ > + _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ > + _TIF_FSCHECK) > > #endif /* __KERNEL__ */ > #endif /* __ASM_ARM_THREAD_INFO_H */ > diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h > index 2577405d082d..6cc882223e34 100644 > --- a/arch/arm/include/asm/uaccess.h > +++ b/arch/arm/include/asm/uaccess.h > @@ -77,6 +77,8 @@ static inline void set_fs(mm_segment_t fs) > { > current_thread_info()->addr_limit = fs; > modify_domain(DOMAIN_KERNEL, fs ? DOMAIN_CLIENT : DOMAIN_MANAGER); > + /* On user-mode return, check fs is correct */ > + set_thread_flag(TIF_FSCHECK); > } > > #define segment_eq(a, b) ((a) == (b)) > diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S > index eb5cd77bf1d8..e33c32d56193 100644 > --- a/arch/arm/kernel/entry-common.S > +++ b/arch/arm/kernel/entry-common.S > @@ -41,7 +41,9 @@ ret_fast_syscall: > UNWIND(.cantunwind ) > disable_irq_notrace @ disable interrupts > ldr r1, [tsk, #TI_FLAGS]@ re-check for syscall tracing > - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK > + tst r1, #_TIF_SYSCALL_WORK > + bne fast_work_pending > + tst r1, #_TIF_WORK_MASK > bne fast_work_pending > > /* perform architecture specific actions before user return */ > @@ -67,12 +69,15 @@ ret_fast_syscall: > str r0, [sp, #S_R0 + S_OFF]!@ save returned r0 > disable_irq_notrace @ disable interrupts > ldr r1, [tsk, #TI_FLAGS]@ re-check for syscall tracing > - tst
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Tue, Jun 20, 2017 at 01:31:14PM -0700, Thomas Garnier wrote: > On Tue, Jun 20, 2017 at 1:18 PM, Kees Cookwrote: > > On Wed, Jun 14, 2017 at 6:12 PM, Thomas Garnier wrote: > >> diff --git a/arch/arm/kernel/entry-common.S > >> b/arch/arm/kernel/entry-common.S > >> index eb5cd77bf1d8..e33c32d56193 100644 > >> --- a/arch/arm/kernel/entry-common.S > >> +++ b/arch/arm/kernel/entry-common.S > >> @@ -41,7 +41,9 @@ ret_fast_syscall: > >> UNWIND(.cantunwind) > >> disable_irq_notrace @ disable interrupts > >> ldr r1, [tsk, #TI_FLAGS]@ re-check for syscall > >> tracing > >> - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK > >> + tst r1, #_TIF_SYSCALL_WORK > >> + bne fast_work_pending > >> + tst r1, #_TIF_WORK_MASK > > > > (IIUC) MOV32 is 2 cycles (MOVW, MOVT), and each TST above is 1 cycle > > and each BNE is 1 cycle (when not taken). So: > > > > mov32 r2, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK > > tst r1, r2 > > bne fast_work_pending > > > > is 4 cycles and tst, bne, tst, bne is also 4 cycles. Would mov32 be > > more readable (since it keeps the flags together)? > > I guess it would be more readable. Any opinion from the arm folks? The mov32 sequence is probably better, but statically attributing cycles on a per instruction basis is pretty futile on modern CPUs. Will
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Tue, Jun 20, 2017 at 01:31:14PM -0700, Thomas Garnier wrote: > On Tue, Jun 20, 2017 at 1:18 PM, Kees Cook wrote: > > On Wed, Jun 14, 2017 at 6:12 PM, Thomas Garnier wrote: > >> diff --git a/arch/arm/kernel/entry-common.S > >> b/arch/arm/kernel/entry-common.S > >> index eb5cd77bf1d8..e33c32d56193 100644 > >> --- a/arch/arm/kernel/entry-common.S > >> +++ b/arch/arm/kernel/entry-common.S > >> @@ -41,7 +41,9 @@ ret_fast_syscall: > >> UNWIND(.cantunwind) > >> disable_irq_notrace @ disable interrupts > >> ldr r1, [tsk, #TI_FLAGS]@ re-check for syscall > >> tracing > >> - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK > >> + tst r1, #_TIF_SYSCALL_WORK > >> + bne fast_work_pending > >> + tst r1, #_TIF_WORK_MASK > > > > (IIUC) MOV32 is 2 cycles (MOVW, MOVT), and each TST above is 1 cycle > > and each BNE is 1 cycle (when not taken). So: > > > > mov32 r2, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK > > tst r1, r2 > > bne fast_work_pending > > > > is 4 cycles and tst, bne, tst, bne is also 4 cycles. Would mov32 be > > more readable (since it keeps the flags together)? > > I guess it would be more readable. Any opinion from the arm folks? The mov32 sequence is probably better, but statically attributing cycles on a per instruction basis is pretty futile on modern CPUs. Will
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Tue, Jun 20, 2017 at 1:18 PM, Kees Cookwrote: > On Wed, Jun 14, 2017 at 6:12 PM, Thomas Garnier wrote: >> Ensure the address limit is a user-mode segment before returning to >> user-mode. Otherwise a process can corrupt kernel-mode memory and >> elevate privileges [1]. >> >> The set_fs function sets the TIF_SETFS flag to force a slow path on >> return. In the slow path, the address limit is checked to be USER_DS if >> needed. >> >> The TIF_SETFS flag is added to _TIF_WORK_MASK shifting _TIF_SYSCALL_WORK >> for arm instruction immediate support. The global work mask is too big >> to used on a single instruction so adapt ret_fast_syscall. >> >> [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990 >> >> Signed-off-by: Thomas Garnier >> --- >> v10 redesigns the change to use work flags on set_fs as recommended by >> Linus and agreed by others. >> >> Based on next-20170609 >> --- >> arch/arm/include/asm/thread_info.h | 15 +-- >> arch/arm/include/asm/uaccess.h | 2 ++ >> arch/arm/kernel/entry-common.S | 9 +++-- >> arch/arm/kernel/signal.c | 5 + >> 4 files changed, 23 insertions(+), 8 deletions(-) >> >> diff --git a/arch/arm/include/asm/thread_info.h >> b/arch/arm/include/asm/thread_info.h >> index 776757d1604a..1d468b527b7b 100644 >> --- a/arch/arm/include/asm/thread_info.h >> +++ b/arch/arm/include/asm/thread_info.h >> @@ -139,10 +139,11 @@ extern int vfp_restore_user_hwstate(struct user_vfp >> __user *, >> #define TIF_NEED_RESCHED 1 /* rescheduling necessary */ >> #define TIF_NOTIFY_RESUME 2 /* callback before returning to user >> */ >> #define TIF_UPROBE 3 /* breakpointed or singlestepping */ >> -#define TIF_SYSCALL_TRACE 4 /* syscall trace active */ >> -#define TIF_SYSCALL_AUDIT 5 /* syscall auditing active */ >> -#define TIF_SYSCALL_TRACEPOINT 6 /* syscall tracepoint >> instrumentation */ >> -#define TIF_SECCOMP7 /* seccomp syscall filtering active >> */ >> +#define TIF_FSCHECK4 /* Check FS is USER_DS on return */ >> +#define TIF_SYSCALL_TRACE 5 /* syscall trace active */ >> +#define TIF_SYSCALL_AUDIT 6 /* syscall auditing active */ >> +#define TIF_SYSCALL_TRACEPOINT 7 /* syscall tracepoint >> instrumentation */ >> +#define TIF_SECCOMP8 /* seccomp syscall filtering active >> */ >> >> #define TIF_NOHZ 12 /* in adaptive nohz mode */ >> #define TIF_USING_IWMMXT 17 >> @@ -153,6 +154,7 @@ extern int vfp_restore_user_hwstate(struct user_vfp >> __user *, >> #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) >> #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) >> #define _TIF_UPROBE(1 << TIF_UPROBE) >> +#define _TIF_FSCHECK (1 << TIF_FSCHECK) >> #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) >> #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) >> #define _TIF_SYSCALL_TRACEPOINT(1 << TIF_SYSCALL_TRACEPOINT) >> @@ -166,8 +168,9 @@ extern int vfp_restore_user_hwstate(struct user_vfp >> __user *, >> /* >> * Change these and you break ASM code in entry-common.S >> */ >> -#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ >> -_TIF_NOTIFY_RESUME | _TIF_UPROBE) >> +#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ >> +_TIF_NOTIFY_RESUME | _TIF_UPROBE | \ >> +_TIF_FSCHECK) >> >> #endif /* __KERNEL__ */ >> #endif /* __ASM_ARM_THREAD_INFO_H */ >> diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h >> index 2577405d082d..6cc882223e34 100644 >> --- a/arch/arm/include/asm/uaccess.h >> +++ b/arch/arm/include/asm/uaccess.h >> @@ -77,6 +77,8 @@ static inline void set_fs(mm_segment_t fs) >> { >> current_thread_info()->addr_limit = fs; >> modify_domain(DOMAIN_KERNEL, fs ? DOMAIN_CLIENT : DOMAIN_MANAGER); >> + /* On user-mode return, check fs is correct */ >> + set_thread_flag(TIF_FSCHECK); >> } >> >> #define segment_eq(a, b) ((a) == (b)) >> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S >> index eb5cd77bf1d8..e33c32d56193 100644 >> --- a/arch/arm/kernel/entry-common.S >> +++ b/arch/arm/kernel/entry-common.S >> @@ -41,7 +41,9 @@ ret_fast_syscall: >> UNWIND(.cantunwind) >> disable_irq_notrace @ disable interrupts >> ldr r1, [tsk, #TI_FLAGS]@ re-check for syscall >> tracing >> - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK >> + tst r1, #_TIF_SYSCALL_WORK >> + bne fast_work_pending >> + tst r1, #_TIF_WORK_MASK > > (IIUC) MOV32 is 2 cycles (MOVW, MOVT), and each TST above is 1 cycle > and each BNE is 1 cycle (when not taken). So:
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Tue, Jun 20, 2017 at 1:18 PM, Kees Cook wrote: > On Wed, Jun 14, 2017 at 6:12 PM, Thomas Garnier wrote: >> Ensure the address limit is a user-mode segment before returning to >> user-mode. Otherwise a process can corrupt kernel-mode memory and >> elevate privileges [1]. >> >> The set_fs function sets the TIF_SETFS flag to force a slow path on >> return. In the slow path, the address limit is checked to be USER_DS if >> needed. >> >> The TIF_SETFS flag is added to _TIF_WORK_MASK shifting _TIF_SYSCALL_WORK >> for arm instruction immediate support. The global work mask is too big >> to used on a single instruction so adapt ret_fast_syscall. >> >> [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990 >> >> Signed-off-by: Thomas Garnier >> --- >> v10 redesigns the change to use work flags on set_fs as recommended by >> Linus and agreed by others. >> >> Based on next-20170609 >> --- >> arch/arm/include/asm/thread_info.h | 15 +-- >> arch/arm/include/asm/uaccess.h | 2 ++ >> arch/arm/kernel/entry-common.S | 9 +++-- >> arch/arm/kernel/signal.c | 5 + >> 4 files changed, 23 insertions(+), 8 deletions(-) >> >> diff --git a/arch/arm/include/asm/thread_info.h >> b/arch/arm/include/asm/thread_info.h >> index 776757d1604a..1d468b527b7b 100644 >> --- a/arch/arm/include/asm/thread_info.h >> +++ b/arch/arm/include/asm/thread_info.h >> @@ -139,10 +139,11 @@ extern int vfp_restore_user_hwstate(struct user_vfp >> __user *, >> #define TIF_NEED_RESCHED 1 /* rescheduling necessary */ >> #define TIF_NOTIFY_RESUME 2 /* callback before returning to user >> */ >> #define TIF_UPROBE 3 /* breakpointed or singlestepping */ >> -#define TIF_SYSCALL_TRACE 4 /* syscall trace active */ >> -#define TIF_SYSCALL_AUDIT 5 /* syscall auditing active */ >> -#define TIF_SYSCALL_TRACEPOINT 6 /* syscall tracepoint >> instrumentation */ >> -#define TIF_SECCOMP7 /* seccomp syscall filtering active >> */ >> +#define TIF_FSCHECK4 /* Check FS is USER_DS on return */ >> +#define TIF_SYSCALL_TRACE 5 /* syscall trace active */ >> +#define TIF_SYSCALL_AUDIT 6 /* syscall auditing active */ >> +#define TIF_SYSCALL_TRACEPOINT 7 /* syscall tracepoint >> instrumentation */ >> +#define TIF_SECCOMP8 /* seccomp syscall filtering active >> */ >> >> #define TIF_NOHZ 12 /* in adaptive nohz mode */ >> #define TIF_USING_IWMMXT 17 >> @@ -153,6 +154,7 @@ extern int vfp_restore_user_hwstate(struct user_vfp >> __user *, >> #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) >> #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) >> #define _TIF_UPROBE(1 << TIF_UPROBE) >> +#define _TIF_FSCHECK (1 << TIF_FSCHECK) >> #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) >> #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) >> #define _TIF_SYSCALL_TRACEPOINT(1 << TIF_SYSCALL_TRACEPOINT) >> @@ -166,8 +168,9 @@ extern int vfp_restore_user_hwstate(struct user_vfp >> __user *, >> /* >> * Change these and you break ASM code in entry-common.S >> */ >> -#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ >> -_TIF_NOTIFY_RESUME | _TIF_UPROBE) >> +#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ >> +_TIF_NOTIFY_RESUME | _TIF_UPROBE | \ >> +_TIF_FSCHECK) >> >> #endif /* __KERNEL__ */ >> #endif /* __ASM_ARM_THREAD_INFO_H */ >> diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h >> index 2577405d082d..6cc882223e34 100644 >> --- a/arch/arm/include/asm/uaccess.h >> +++ b/arch/arm/include/asm/uaccess.h >> @@ -77,6 +77,8 @@ static inline void set_fs(mm_segment_t fs) >> { >> current_thread_info()->addr_limit = fs; >> modify_domain(DOMAIN_KERNEL, fs ? DOMAIN_CLIENT : DOMAIN_MANAGER); >> + /* On user-mode return, check fs is correct */ >> + set_thread_flag(TIF_FSCHECK); >> } >> >> #define segment_eq(a, b) ((a) == (b)) >> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S >> index eb5cd77bf1d8..e33c32d56193 100644 >> --- a/arch/arm/kernel/entry-common.S >> +++ b/arch/arm/kernel/entry-common.S >> @@ -41,7 +41,9 @@ ret_fast_syscall: >> UNWIND(.cantunwind) >> disable_irq_notrace @ disable interrupts >> ldr r1, [tsk, #TI_FLAGS]@ re-check for syscall >> tracing >> - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK >> + tst r1, #_TIF_SYSCALL_WORK >> + bne fast_work_pending >> + tst r1, #_TIF_WORK_MASK > > (IIUC) MOV32 is 2 cycles (MOVW, MOVT), and each TST above is 1 cycle > and each BNE is 1 cycle (when not taken). So: > > mov32 r2, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK > tst r1, r2 >
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Wed, Jun 14, 2017 at 6:12 PM, Thomas Garnierwrote: > Ensure the address limit is a user-mode segment before returning to > user-mode. Otherwise a process can corrupt kernel-mode memory and > elevate privileges [1]. > > The set_fs function sets the TIF_SETFS flag to force a slow path on > return. In the slow path, the address limit is checked to be USER_DS if > needed. > > The TIF_SETFS flag is added to _TIF_WORK_MASK shifting _TIF_SYSCALL_WORK > for arm instruction immediate support. The global work mask is too big > to used on a single instruction so adapt ret_fast_syscall. > > [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990 > > Signed-off-by: Thomas Garnier > --- > v10 redesigns the change to use work flags on set_fs as recommended by > Linus and agreed by others. > > Based on next-20170609 > --- > arch/arm/include/asm/thread_info.h | 15 +-- > arch/arm/include/asm/uaccess.h | 2 ++ > arch/arm/kernel/entry-common.S | 9 +++-- > arch/arm/kernel/signal.c | 5 + > 4 files changed, 23 insertions(+), 8 deletions(-) > > diff --git a/arch/arm/include/asm/thread_info.h > b/arch/arm/include/asm/thread_info.h > index 776757d1604a..1d468b527b7b 100644 > --- a/arch/arm/include/asm/thread_info.h > +++ b/arch/arm/include/asm/thread_info.h > @@ -139,10 +139,11 @@ extern int vfp_restore_user_hwstate(struct user_vfp > __user *, > #define TIF_NEED_RESCHED 1 /* rescheduling necessary */ > #define TIF_NOTIFY_RESUME 2 /* callback before returning to user > */ > #define TIF_UPROBE 3 /* breakpointed or singlestepping */ > -#define TIF_SYSCALL_TRACE 4 /* syscall trace active */ > -#define TIF_SYSCALL_AUDIT 5 /* syscall auditing active */ > -#define TIF_SYSCALL_TRACEPOINT 6 /* syscall tracepoint instrumentation > */ > -#define TIF_SECCOMP7 /* seccomp syscall filtering active */ > +#define TIF_FSCHECK4 /* Check FS is USER_DS on return */ > +#define TIF_SYSCALL_TRACE 5 /* syscall trace active */ > +#define TIF_SYSCALL_AUDIT 6 /* syscall auditing active */ > +#define TIF_SYSCALL_TRACEPOINT 7 /* syscall tracepoint instrumentation > */ > +#define TIF_SECCOMP8 /* seccomp syscall filtering active */ > > #define TIF_NOHZ 12 /* in adaptive nohz mode */ > #define TIF_USING_IWMMXT 17 > @@ -153,6 +154,7 @@ extern int vfp_restore_user_hwstate(struct user_vfp > __user *, > #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) > #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) > #define _TIF_UPROBE(1 << TIF_UPROBE) > +#define _TIF_FSCHECK (1 << TIF_FSCHECK) > #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) > #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) > #define _TIF_SYSCALL_TRACEPOINT(1 << TIF_SYSCALL_TRACEPOINT) > @@ -166,8 +168,9 @@ extern int vfp_restore_user_hwstate(struct user_vfp > __user *, > /* > * Change these and you break ASM code in entry-common.S > */ > -#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ > -_TIF_NOTIFY_RESUME | _TIF_UPROBE) > +#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ > +_TIF_NOTIFY_RESUME | _TIF_UPROBE | \ > +_TIF_FSCHECK) > > #endif /* __KERNEL__ */ > #endif /* __ASM_ARM_THREAD_INFO_H */ > diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h > index 2577405d082d..6cc882223e34 100644 > --- a/arch/arm/include/asm/uaccess.h > +++ b/arch/arm/include/asm/uaccess.h > @@ -77,6 +77,8 @@ static inline void set_fs(mm_segment_t fs) > { > current_thread_info()->addr_limit = fs; > modify_domain(DOMAIN_KERNEL, fs ? DOMAIN_CLIENT : DOMAIN_MANAGER); > + /* On user-mode return, check fs is correct */ > + set_thread_flag(TIF_FSCHECK); > } > > #define segment_eq(a, b) ((a) == (b)) > diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S > index eb5cd77bf1d8..e33c32d56193 100644 > --- a/arch/arm/kernel/entry-common.S > +++ b/arch/arm/kernel/entry-common.S > @@ -41,7 +41,9 @@ ret_fast_syscall: > UNWIND(.cantunwind) > disable_irq_notrace @ disable interrupts > ldr r1, [tsk, #TI_FLAGS]@ re-check for syscall tracing > - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK > + tst r1, #_TIF_SYSCALL_WORK > + bne fast_work_pending > + tst r1, #_TIF_WORK_MASK (IIUC) MOV32 is 2 cycles (MOVW, MOVT), and each TST above is 1 cycle and each BNE is 1 cycle (when not taken). So: mov32 r2, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK tst r1, r2 bne fast_work_pending is 4 cycles and tst, bne, tst, bne is also 4 cycles. Would mov32 be more readable (since it keeps the flags
Re: [PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
On Wed, Jun 14, 2017 at 6:12 PM, Thomas Garnier wrote: > Ensure the address limit is a user-mode segment before returning to > user-mode. Otherwise a process can corrupt kernel-mode memory and > elevate privileges [1]. > > The set_fs function sets the TIF_SETFS flag to force a slow path on > return. In the slow path, the address limit is checked to be USER_DS if > needed. > > The TIF_SETFS flag is added to _TIF_WORK_MASK shifting _TIF_SYSCALL_WORK > for arm instruction immediate support. The global work mask is too big > to used on a single instruction so adapt ret_fast_syscall. > > [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990 > > Signed-off-by: Thomas Garnier > --- > v10 redesigns the change to use work flags on set_fs as recommended by > Linus and agreed by others. > > Based on next-20170609 > --- > arch/arm/include/asm/thread_info.h | 15 +-- > arch/arm/include/asm/uaccess.h | 2 ++ > arch/arm/kernel/entry-common.S | 9 +++-- > arch/arm/kernel/signal.c | 5 + > 4 files changed, 23 insertions(+), 8 deletions(-) > > diff --git a/arch/arm/include/asm/thread_info.h > b/arch/arm/include/asm/thread_info.h > index 776757d1604a..1d468b527b7b 100644 > --- a/arch/arm/include/asm/thread_info.h > +++ b/arch/arm/include/asm/thread_info.h > @@ -139,10 +139,11 @@ extern int vfp_restore_user_hwstate(struct user_vfp > __user *, > #define TIF_NEED_RESCHED 1 /* rescheduling necessary */ > #define TIF_NOTIFY_RESUME 2 /* callback before returning to user > */ > #define TIF_UPROBE 3 /* breakpointed or singlestepping */ > -#define TIF_SYSCALL_TRACE 4 /* syscall trace active */ > -#define TIF_SYSCALL_AUDIT 5 /* syscall auditing active */ > -#define TIF_SYSCALL_TRACEPOINT 6 /* syscall tracepoint instrumentation > */ > -#define TIF_SECCOMP7 /* seccomp syscall filtering active */ > +#define TIF_FSCHECK4 /* Check FS is USER_DS on return */ > +#define TIF_SYSCALL_TRACE 5 /* syscall trace active */ > +#define TIF_SYSCALL_AUDIT 6 /* syscall auditing active */ > +#define TIF_SYSCALL_TRACEPOINT 7 /* syscall tracepoint instrumentation > */ > +#define TIF_SECCOMP8 /* seccomp syscall filtering active */ > > #define TIF_NOHZ 12 /* in adaptive nohz mode */ > #define TIF_USING_IWMMXT 17 > @@ -153,6 +154,7 @@ extern int vfp_restore_user_hwstate(struct user_vfp > __user *, > #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) > #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) > #define _TIF_UPROBE(1 << TIF_UPROBE) > +#define _TIF_FSCHECK (1 << TIF_FSCHECK) > #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) > #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) > #define _TIF_SYSCALL_TRACEPOINT(1 << TIF_SYSCALL_TRACEPOINT) > @@ -166,8 +168,9 @@ extern int vfp_restore_user_hwstate(struct user_vfp > __user *, > /* > * Change these and you break ASM code in entry-common.S > */ > -#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ > -_TIF_NOTIFY_RESUME | _TIF_UPROBE) > +#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ > +_TIF_NOTIFY_RESUME | _TIF_UPROBE | \ > +_TIF_FSCHECK) > > #endif /* __KERNEL__ */ > #endif /* __ASM_ARM_THREAD_INFO_H */ > diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h > index 2577405d082d..6cc882223e34 100644 > --- a/arch/arm/include/asm/uaccess.h > +++ b/arch/arm/include/asm/uaccess.h > @@ -77,6 +77,8 @@ static inline void set_fs(mm_segment_t fs) > { > current_thread_info()->addr_limit = fs; > modify_domain(DOMAIN_KERNEL, fs ? DOMAIN_CLIENT : DOMAIN_MANAGER); > + /* On user-mode return, check fs is correct */ > + set_thread_flag(TIF_FSCHECK); > } > > #define segment_eq(a, b) ((a) == (b)) > diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S > index eb5cd77bf1d8..e33c32d56193 100644 > --- a/arch/arm/kernel/entry-common.S > +++ b/arch/arm/kernel/entry-common.S > @@ -41,7 +41,9 @@ ret_fast_syscall: > UNWIND(.cantunwind) > disable_irq_notrace @ disable interrupts > ldr r1, [tsk, #TI_FLAGS]@ re-check for syscall tracing > - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK > + tst r1, #_TIF_SYSCALL_WORK > + bne fast_work_pending > + tst r1, #_TIF_WORK_MASK (IIUC) MOV32 is 2 cycles (MOVW, MOVT), and each TST above is 1 cycle and each BNE is 1 cycle (when not taken). So: mov32 r2, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK tst r1, r2 bne fast_work_pending is 4 cycles and tst, bne, tst, bne is also 4 cycles. Would mov32 be more readable (since it keeps the flags together)? -Kees -- Kees Cook Pixel
[PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
Ensure the address limit is a user-mode segment before returning to user-mode. Otherwise a process can corrupt kernel-mode memory and elevate privileges [1]. The set_fs function sets the TIF_SETFS flag to force a slow path on return. In the slow path, the address limit is checked to be USER_DS if needed. The TIF_SETFS flag is added to _TIF_WORK_MASK shifting _TIF_SYSCALL_WORK for arm instruction immediate support. The global work mask is too big to used on a single instruction so adapt ret_fast_syscall. [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990 Signed-off-by: Thomas Garnier--- v10 redesigns the change to use work flags on set_fs as recommended by Linus and agreed by others. Based on next-20170609 --- arch/arm/include/asm/thread_info.h | 15 +-- arch/arm/include/asm/uaccess.h | 2 ++ arch/arm/kernel/entry-common.S | 9 +++-- arch/arm/kernel/signal.c | 5 + 4 files changed, 23 insertions(+), 8 deletions(-) diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h index 776757d1604a..1d468b527b7b 100644 --- a/arch/arm/include/asm/thread_info.h +++ b/arch/arm/include/asm/thread_info.h @@ -139,10 +139,11 @@ extern int vfp_restore_user_hwstate(struct user_vfp __user *, #define TIF_NEED_RESCHED 1 /* rescheduling necessary */ #define TIF_NOTIFY_RESUME 2 /* callback before returning to user */ #define TIF_UPROBE 3 /* breakpointed or singlestepping */ -#define TIF_SYSCALL_TRACE 4 /* syscall trace active */ -#define TIF_SYSCALL_AUDIT 5 /* syscall auditing active */ -#define TIF_SYSCALL_TRACEPOINT 6 /* syscall tracepoint instrumentation */ -#define TIF_SECCOMP7 /* seccomp syscall filtering active */ +#define TIF_FSCHECK4 /* Check FS is USER_DS on return */ +#define TIF_SYSCALL_TRACE 5 /* syscall trace active */ +#define TIF_SYSCALL_AUDIT 6 /* syscall auditing active */ +#define TIF_SYSCALL_TRACEPOINT 7 /* syscall tracepoint instrumentation */ +#define TIF_SECCOMP8 /* seccomp syscall filtering active */ #define TIF_NOHZ 12 /* in adaptive nohz mode */ #define TIF_USING_IWMMXT 17 @@ -153,6 +154,7 @@ extern int vfp_restore_user_hwstate(struct user_vfp __user *, #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) #define _TIF_UPROBE(1 << TIF_UPROBE) +#define _TIF_FSCHECK (1 << TIF_FSCHECK) #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) #define _TIF_SYSCALL_TRACEPOINT(1 << TIF_SYSCALL_TRACEPOINT) @@ -166,8 +168,9 @@ extern int vfp_restore_user_hwstate(struct user_vfp __user *, /* * Change these and you break ASM code in entry-common.S */ -#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ -_TIF_NOTIFY_RESUME | _TIF_UPROBE) +#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ +_TIF_NOTIFY_RESUME | _TIF_UPROBE | \ +_TIF_FSCHECK) #endif /* __KERNEL__ */ #endif /* __ASM_ARM_THREAD_INFO_H */ diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h index 2577405d082d..6cc882223e34 100644 --- a/arch/arm/include/asm/uaccess.h +++ b/arch/arm/include/asm/uaccess.h @@ -77,6 +77,8 @@ static inline void set_fs(mm_segment_t fs) { current_thread_info()->addr_limit = fs; modify_domain(DOMAIN_KERNEL, fs ? DOMAIN_CLIENT : DOMAIN_MANAGER); + /* On user-mode return, check fs is correct */ + set_thread_flag(TIF_FSCHECK); } #define segment_eq(a, b) ((a) == (b)) diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S index eb5cd77bf1d8..e33c32d56193 100644 --- a/arch/arm/kernel/entry-common.S +++ b/arch/arm/kernel/entry-common.S @@ -41,7 +41,9 @@ ret_fast_syscall: UNWIND(.cantunwind) disable_irq_notrace @ disable interrupts ldr r1, [tsk, #TI_FLAGS]@ re-check for syscall tracing - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK + tst r1, #_TIF_SYSCALL_WORK + bne fast_work_pending + tst r1, #_TIF_WORK_MASK bne fast_work_pending /* perform architecture specific actions before user return */ @@ -67,12 +69,15 @@ ret_fast_syscall: str r0, [sp, #S_R0 + S_OFF]!@ save returned r0 disable_irq_notrace @ disable interrupts ldr r1, [tsk, #TI_FLAGS]@ re-check for syscall tracing - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK + tst r1, #_TIF_SYSCALL_WORK + bne fast_work_pending + tst r1, #_TIF_WORK_MASK beq no_work_pending UNWIND(.fnend
[PATCH v10 2/3] arm/syscalls: Check address limit on user-mode return
Ensure the address limit is a user-mode segment before returning to user-mode. Otherwise a process can corrupt kernel-mode memory and elevate privileges [1]. The set_fs function sets the TIF_SETFS flag to force a slow path on return. In the slow path, the address limit is checked to be USER_DS if needed. The TIF_SETFS flag is added to _TIF_WORK_MASK shifting _TIF_SYSCALL_WORK for arm instruction immediate support. The global work mask is too big to used on a single instruction so adapt ret_fast_syscall. [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990 Signed-off-by: Thomas Garnier --- v10 redesigns the change to use work flags on set_fs as recommended by Linus and agreed by others. Based on next-20170609 --- arch/arm/include/asm/thread_info.h | 15 +-- arch/arm/include/asm/uaccess.h | 2 ++ arch/arm/kernel/entry-common.S | 9 +++-- arch/arm/kernel/signal.c | 5 + 4 files changed, 23 insertions(+), 8 deletions(-) diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h index 776757d1604a..1d468b527b7b 100644 --- a/arch/arm/include/asm/thread_info.h +++ b/arch/arm/include/asm/thread_info.h @@ -139,10 +139,11 @@ extern int vfp_restore_user_hwstate(struct user_vfp __user *, #define TIF_NEED_RESCHED 1 /* rescheduling necessary */ #define TIF_NOTIFY_RESUME 2 /* callback before returning to user */ #define TIF_UPROBE 3 /* breakpointed or singlestepping */ -#define TIF_SYSCALL_TRACE 4 /* syscall trace active */ -#define TIF_SYSCALL_AUDIT 5 /* syscall auditing active */ -#define TIF_SYSCALL_TRACEPOINT 6 /* syscall tracepoint instrumentation */ -#define TIF_SECCOMP7 /* seccomp syscall filtering active */ +#define TIF_FSCHECK4 /* Check FS is USER_DS on return */ +#define TIF_SYSCALL_TRACE 5 /* syscall trace active */ +#define TIF_SYSCALL_AUDIT 6 /* syscall auditing active */ +#define TIF_SYSCALL_TRACEPOINT 7 /* syscall tracepoint instrumentation */ +#define TIF_SECCOMP8 /* seccomp syscall filtering active */ #define TIF_NOHZ 12 /* in adaptive nohz mode */ #define TIF_USING_IWMMXT 17 @@ -153,6 +154,7 @@ extern int vfp_restore_user_hwstate(struct user_vfp __user *, #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) #define _TIF_UPROBE(1 << TIF_UPROBE) +#define _TIF_FSCHECK (1 << TIF_FSCHECK) #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) #define _TIF_SYSCALL_TRACEPOINT(1 << TIF_SYSCALL_TRACEPOINT) @@ -166,8 +168,9 @@ extern int vfp_restore_user_hwstate(struct user_vfp __user *, /* * Change these and you break ASM code in entry-common.S */ -#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ -_TIF_NOTIFY_RESUME | _TIF_UPROBE) +#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ +_TIF_NOTIFY_RESUME | _TIF_UPROBE | \ +_TIF_FSCHECK) #endif /* __KERNEL__ */ #endif /* __ASM_ARM_THREAD_INFO_H */ diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h index 2577405d082d..6cc882223e34 100644 --- a/arch/arm/include/asm/uaccess.h +++ b/arch/arm/include/asm/uaccess.h @@ -77,6 +77,8 @@ static inline void set_fs(mm_segment_t fs) { current_thread_info()->addr_limit = fs; modify_domain(DOMAIN_KERNEL, fs ? DOMAIN_CLIENT : DOMAIN_MANAGER); + /* On user-mode return, check fs is correct */ + set_thread_flag(TIF_FSCHECK); } #define segment_eq(a, b) ((a) == (b)) diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S index eb5cd77bf1d8..e33c32d56193 100644 --- a/arch/arm/kernel/entry-common.S +++ b/arch/arm/kernel/entry-common.S @@ -41,7 +41,9 @@ ret_fast_syscall: UNWIND(.cantunwind) disable_irq_notrace @ disable interrupts ldr r1, [tsk, #TI_FLAGS]@ re-check for syscall tracing - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK + tst r1, #_TIF_SYSCALL_WORK + bne fast_work_pending + tst r1, #_TIF_WORK_MASK bne fast_work_pending /* perform architecture specific actions before user return */ @@ -67,12 +69,15 @@ ret_fast_syscall: str r0, [sp, #S_R0 + S_OFF]!@ save returned r0 disable_irq_notrace @ disable interrupts ldr r1, [tsk, #TI_FLAGS]@ re-check for syscall tracing - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK + tst r1, #_TIF_SYSCALL_WORK + bne fast_work_pending + tst r1, #_TIF_WORK_MASK beq no_work_pending UNWIND(.fnend )