Re: [PATCH v13 00/12] support "task_isolation" mode

2016-08-11 Thread Peter Zijlstra
On Fri, Jul 22, 2016 at 08:50:44AM -0400, Chris Metcalf wrote:
> On 7/21/2016 10:20 PM, Christoph Lameter wrote:
> >On Thu, 21 Jul 2016, Chris Metcalf wrote:
> >>On 7/20/2016 10:04 PM, Christoph Lameter wrote:
> >>unstable, and then scheduling work to safely remove that timer.
> >>I haven't looked at this code before (in kernel/time/clocksource.c
> >>under CONFIG_CLOCKSOURCE_WATCHDOG) since the timers on
> >>arm64 and tile aren't unstable.  Is it possible to boot your machine
> >>with a stable clocksource?
> >It already as a stable clocksource. Sorry but that was one of the criteria
> >for the server when we ordered them. Could this be clock adjustments?
> 
> We probably need to get clock folks to jump in on this thread!

Boot with: tsc=reliable, this disables the watchdog.

We (sadly) have to have this thing running on most x86 because TSC, even
if initially stable, can do weird things once its running.

We have seen:

 - SMI
 - hotplug
 - suspend
 - multi-socket

mess up the TSC, even if it was deemed 'good' at boot time.

If you _know_ your TSC to be solid, boot with tsc=reliable and be happy.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v13 00/12] support "task_isolation" mode

2016-07-27 Thread Christoph Lameter

We tested this with 4.7-rc7 and aside from the issue with
clocksource_watchdog() this is working fine.

Tested-by: Christoph Lameter 

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v13 00/12] support "task_isolation" mode

2016-07-25 Thread Christoph Lameter
On Fri, 22 Jul 2016, Chris Metcalf wrote:

> > It already as a stable clocksource. Sorry but that was one of the criteria
> > for the server when we ordered them. Could this be clock adjustments?
>
> We probably need to get clock folks to jump in on this thread!

Guess so. I will have a look at this when I get some time again.

> Maybe it's disabling some built-in unstable clock just as part of
> falling back to using the better, stable clock that you also have?
> So maybe there's a way of just disabling that clocksource from the
> get-go instead of having it be marked unstable later.

This is a standard Dell server. No clocksources are marked as unstable as
far as I can tell.

> If you run the test again after this storm of unstable marking, does
> it all happen again?  Or is it a persistent state in the kernel?

This happens anytime we try to run with prctl().

I hope to get some more detail once I get some time to look at this. But
this is likely an x86 specific problem.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v13 00/12] support "task_isolation" mode

2016-07-22 Thread Chris Metcalf

On 7/21/2016 10:20 PM, Christoph Lameter wrote:

On Thu, 21 Jul 2016, Chris Metcalf wrote:

On 7/20/2016 10:04 PM, Christoph Lameter wrote:
unstable, and then scheduling work to safely remove that timer.
I haven't looked at this code before (in kernel/time/clocksource.c
under CONFIG_CLOCKSOURCE_WATCHDOG) since the timers on
arm64 and tile aren't unstable.  Is it possible to boot your machine
with a stable clocksource?

It already as a stable clocksource. Sorry but that was one of the criteria
for the server when we ordered them. Could this be clock adjustments?


We probably need to get clock folks to jump in on this thread!

Maybe it's disabling some built-in unstable clock just as part of
falling back to using the better, stable clock that you also have?
So maybe there's a way of just disabling that clocksource from the
get-go instead of having it be marked unstable later.

If you run the test again after this storm of unstable marking, does
it all happen again?  Or is it a persistent state in the kernel?
If so, maybe you can just arrange to get to that state before starting
your application's task-isolation code.

Or, if you think it's clock adjustments, perhaps running your test with
ntpd disabled would make it work better?

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v13 00/12] support "task_isolation" mode

2016-07-21 Thread Christoph Lameter

On Thu, 21 Jul 2016, Chris Metcalf wrote:
> On 7/20/2016 10:04 PM, Christoph Lameter wrote:

> unstable, and then scheduling work to safely remove that timer.
> I haven't looked at this code before (in kernel/time/clocksource.c
> under CONFIG_CLOCKSOURCE_WATCHDOG) since the timers on
> arm64 and tile aren't unstable.  Is it possible to boot your machine
> with a stable clocksource?

It already as a stable clocksource. Sorry but that was one of the criteria
for the server when we ordered them. Could this be clock adjustments?

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v13 00/12] support "task_isolation" mode

2016-07-21 Thread Chris Metcalf

On 7/20/2016 10:04 PM, Christoph Lameter wrote:

We are trying to test the patchset on x86 and are getting strange
backtraces and aborts. It seems that the cpu before the cpu we are running
on creates an irq_work event that causes a latency event on the next cpu.

This is weird. Is there a new round robin IPI feature in the kernel that I
am not aware of?


This seems to be from your clocksource declaring itself to be
unstable, and then scheduling work to safely remove that timer.
I haven't looked at this code before (in kernel/time/clocksource.c
under CONFIG_CLOCKSOURCE_WATCHDOG) since the timers on
arm64 and tile aren't unstable.  Is it possible to boot your machine
with a stable clocksource?



Backtraces from dmesg:

[  956.603223] latencytest/7928: task_isolation mode lost due to irq_work
[  956.610817] cpu 12: irq_work violating task isolation for latencytest/7928 
on cpu 13
[  956.619985] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 4.7.0-rc7-stream1 #1
[  956.628765] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.0.2 
03/15/2016
[  956.637642]  0086 ce6735c7b39e7b81 88103e783d00 
8134f6ff
[  956.646739]  88102c50d700 000d 88103e783d28 
811986f4
[  956.655828]  88102c50d700 88203cf97f80 000d 
88103e783d68
[  956.664924] Call Trace:
[  956.667945][] dump_stack+0x63/0x84
[  956.674740]  [] task_isolation_debug_task+0xb4/0xd0
[  956.682229]  [] _task_isolation_debug+0x83/0xc0
[  956.689331]  [] irq_work_queue_on+0x9c/0x120
[  956.696142]  [] tick_nohz_full_kick_cpu+0x44/0x50
[  956.703438]  [] wake_up_nohz_cpu+0x99/0x110
[  956.710150]  [] internal_add_timer+0x71/0xb0
[  956.716959]  [] add_timer_on+0xbb/0x140
[  956.723283]  [] clocksource_watchdog+0x230/0x300
[  956.730480]  [] ? __clocksource_unstable.isra.2+0x40/0x40
[  956.738555]  [] call_timer_fn+0x35/0x120
[  956.744973]  [] ? __clocksource_unstable.isra.2+0x40/0x40
[  956.753046]  [] run_timer_softirq+0x23c/0x2f0
[  956.759952]  [] __do_softirq+0xd7/0x2c5
[  956.766272]  [] irq_exit+0xf5/0x100
[  956.772209]  [] smp_apic_timer_interrupt+0x42/0x50
[  956.779600]  [] apic_timer_interrupt+0x8c/0xa0
[  956.786602][] ? poll_idle+0x40/0x80
[  956.793490]  [] cpuidle_enter_state+0x9c/0x260
[  956.800498]  [] cpuidle_enter+0x17/0x20
[  956.806810]  [] cpu_startup_entry+0x2b7/0x3a0
[  956.813717]  [] start_secondary+0x15c/0x1a0
[ 1036.601758] cpu 12: irq_work violating task isolation for latencytest/8447 
on cpu 13
[ 1036.610922] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 4.7.0-rc7-stream1 #1
[ 1036.619692] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.0.2 
03/15/2016
[ 1036.628551]  0086 ce6735c7b39e7b81 88103e783d00 
8134f6ff
[ 1036.637648]  88102dca 000d 88103e783d28 
811986f4
[ 1036.646741]  88102dca 88203cf97f80 000d 
88103e783d68
[ 1036.655833] Call Trace:
[ 1036.658852][] dump_stack+0x63/0x84
[ 1036.665649]  [] task_isolation_debug_task+0xb4/0xd0
[ 1036.673136]  [] _task_isolation_debug+0x83/0xc0
[ 1036.680237]  [] irq_work_queue_on+0x9c/0x120
[ 1036.687091]  [] tick_nohz_full_kick_cpu+0x44/0x50
[ 1036.694388]  [] wake_up_nohz_cpu+0x99/0x110
[ 1036.701089]  [] internal_add_timer+0x71/0xb0
[ 1036.707896]  [] add_timer_on+0xbb/0x140
[ 1036.714210]  [] clocksource_watchdog+0x230/0x300
[ 1036.721411]  [] ? __clocksource_unstable.isra.2+0x40/0x40
[ 1036.729478]  [] call_timer_fn+0x35/0x120
[ 1036.735899]  [] ? __clocksource_unstable.isra.2+0x40/0x40
[ 1036.743970]  [] run_timer_softirq+0x23c/0x2f0
[ 1036.750878]  [] __do_softirq+0xd7/0x2c5
[ 1036.757199]  [] irq_exit+0xf5/0x100
[ 1036.763132]  [] smp_apic_timer_interrupt+0x42/0x50
[ 1036.770520]  [] apic_timer_interrupt+0x8c/0xa0
[ 1036.777520][] ? poll_idle+0x40/0x80
[ 1036.784410]  [] cpuidle_enter_state+0x9c/0x260
[ 1036.791413]  [] cpuidle_enter+0x17/0x20
[ 1036.797734]  [] cpu_startup_entry+0x2b7/0x3a0
[ 1036.804641]  [] start_secondary+0x15c/0x1a0




--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v13 00/12] support "task_isolation" mode

2016-07-20 Thread Christoph Lameter
We are trying to test the patchset on x86 and are getting strange
backtraces and aborts. It seems that the cpu before the cpu we are running
on creates an irq_work event that causes a latency event on the next cpu.

This is weird. Is there a new round robin IPI feature in the kernel that I
am not aware of?

Backtraces from dmesg:

[  956.603223] latencytest/7928: task_isolation mode lost due to irq_work
[  956.610817] cpu 12: irq_work violating task isolation for latencytest/7928 
on cpu 13
[  956.619985] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 4.7.0-rc7-stream1 #1
[  956.628765] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.0.2 
03/15/2016
[  956.637642]  0086 ce6735c7b39e7b81 88103e783d00 
8134f6ff
[  956.646739]  88102c50d700 000d 88103e783d28 
811986f4
[  956.655828]  88102c50d700 88203cf97f80 000d 
88103e783d68
[  956.664924] Call Trace:
[  956.667945][] dump_stack+0x63/0x84
[  956.674740]  [] task_isolation_debug_task+0xb4/0xd0
[  956.682229]  [] _task_isolation_debug+0x83/0xc0
[  956.689331]  [] irq_work_queue_on+0x9c/0x120
[  956.696142]  [] tick_nohz_full_kick_cpu+0x44/0x50
[  956.703438]  [] wake_up_nohz_cpu+0x99/0x110
[  956.710150]  [] internal_add_timer+0x71/0xb0
[  956.716959]  [] add_timer_on+0xbb/0x140
[  956.723283]  [] clocksource_watchdog+0x230/0x300
[  956.730480]  [] ? __clocksource_unstable.isra.2+0x40/0x40
[  956.738555]  [] call_timer_fn+0x35/0x120
[  956.744973]  [] ? __clocksource_unstable.isra.2+0x40/0x40
[  956.753046]  [] run_timer_softirq+0x23c/0x2f0
[  956.759952]  [] __do_softirq+0xd7/0x2c5
[  956.766272]  [] irq_exit+0xf5/0x100
[  956.772209]  [] smp_apic_timer_interrupt+0x42/0x50
[  956.779600]  [] apic_timer_interrupt+0x8c/0xa0
[  956.786602][] ? poll_idle+0x40/0x80
[  956.793490]  [] cpuidle_enter_state+0x9c/0x260
[  956.800498]  [] cpuidle_enter+0x17/0x20
[  956.806810]  [] cpu_startup_entry+0x2b7/0x3a0
[  956.813717]  [] start_secondary+0x15c/0x1a0
[ 1036.601758] cpu 12: irq_work violating task isolation for latencytest/8447 
on cpu 13
[ 1036.610922] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 4.7.0-rc7-stream1 #1
[ 1036.619692] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.0.2 
03/15/2016
[ 1036.628551]  0086 ce6735c7b39e7b81 88103e783d00 
8134f6ff
[ 1036.637648]  88102dca 000d 88103e783d28 
811986f4
[ 1036.646741]  88102dca 88203cf97f80 000d 
88103e783d68
[ 1036.655833] Call Trace:
[ 1036.658852][] dump_stack+0x63/0x84
[ 1036.665649]  [] task_isolation_debug_task+0xb4/0xd0
[ 1036.673136]  [] _task_isolation_debug+0x83/0xc0
[ 1036.680237]  [] irq_work_queue_on+0x9c/0x120
[ 1036.687091]  [] tick_nohz_full_kick_cpu+0x44/0x50
[ 1036.694388]  [] wake_up_nohz_cpu+0x99/0x110
[ 1036.701089]  [] internal_add_timer+0x71/0xb0
[ 1036.707896]  [] add_timer_on+0xbb/0x140
[ 1036.714210]  [] clocksource_watchdog+0x230/0x300
[ 1036.721411]  [] ? __clocksource_unstable.isra.2+0x40/0x40
[ 1036.729478]  [] call_timer_fn+0x35/0x120
[ 1036.735899]  [] ? __clocksource_unstable.isra.2+0x40/0x40
[ 1036.743970]  [] run_timer_softirq+0x23c/0x2f0
[ 1036.750878]  [] __do_softirq+0xd7/0x2c5
[ 1036.757199]  [] irq_exit+0xf5/0x100
[ 1036.763132]  [] smp_apic_timer_interrupt+0x42/0x50
[ 1036.770520]  [] apic_timer_interrupt+0x8c/0xa0
[ 1036.777520][] ? poll_idle+0x40/0x80
[ 1036.784410]  [] cpuidle_enter_state+0x9c/0x260
[ 1036.791413]  [] cpuidle_enter+0x17/0x20
[ 1036.797734]  [] cpu_startup_entry+0x2b7/0x3a0
[ 1036.804641]  [] start_secondary+0x15c/0x1a0


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v13 00/12] support "task_isolation" mode

2016-07-18 Thread Chris Metcalf

On 7/18/2016 6:11 PM, Andy Lutomirski wrote:

As an example, enough vmalloc/vfree activity will eventually cause
flush_tlb_kernel_range to be called and*boom*, there goes your shiny
production dataplane application.


Well, that's actually a refinement that I did not inflict on this patch
series.

Submit it separately, perhaps?

The "kill the process if it goofs" thing while there are known goofs
in the kernel, apparently with patches written but unsent, seems
questionable.


Sure, that's a good idea.

I think what I will plan to do is, once the patch series is accepted into
some tree, return to this piece.  I'll have to go back and look at the internal
Tilera version of this code, since we have diverged quite a ways from that
in the 13 versions of the patch series, but my memory is that the kernel TLB
flush management was the only substantial piece of additional code not in
the initial batch of changes.  The extra requirement is the need to have a
hook very early on in the kernel entry path that you can hook in all paths;
arm64 has the ct_user_exit macro and tile has the finish_interrupt_save macro,
but I'm not sure there's something equivalent on x86 to catch all entries.

It's worth noting that the typical target application for task isolation, though
(at least in our experience) is a pretty dedicated machine, with the primary
application running in task isolation mode almost all of the time, and so
you are generally in pretty good control of all aspects of the system, including
whether or not you are generating kernel TLB flushes from your non task
isolation cores.  So I would argue the kernel TLB flush management piece is
an improvement to, not a requirement for, the main patch series.

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v13 00/12] support "task_isolation" mode

2016-07-18 Thread Andy Lutomirski
On Thu, Jul 14, 2016 at 2:22 PM, Chris Metcalf  wrote:
> On 7/14/2016 5:03 PM, Andy Lutomirski wrote:
>>
>> On Thu, Jul 14, 2016 at 1:48 PM, Chris Metcalf 
>> wrote:
>>>
>>> Here is a respin of the task-isolation patch set.  This primarily
>>> reflects feedback from Frederic and Peter Z.
>>
>> I still think this is the wrong approach, at least at this point.  The
>> first step should be to instrument things if necessary and fix the
>> obvious cases where the kernel gets entered asynchronously.
>
>
> Note, however, that the task_isolation_debug mode is a very convenient
> way of discovering what is going on when things do go wrong for task
> isolation.
>
>> Only once
>> there's a credible reason to believe it can work well should any form
>> of strictness be applied.
>
>
> I'm not sure what criteria you need for this, though.  Certainly we've been
> shipping our version of task isolation to customers since 2008, and there
> are quite a few customer applications in production that are working well.
> I'd argue that's a credible reason.
>
>> As an example, enough vmalloc/vfree activity will eventually cause
>> flush_tlb_kernel_range to be called and *boom*, there goes your shiny
>> production dataplane application.
>
>
> Well, that's actually a refinement that I did not inflict on this patch
> series.

Submit it separately, perhaps?

The "kill the process if it goofs" think while there are known goofs
in the kernel, apparently with patches written but unsent, seems
questionable.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v13 00/12] support "task_isolation" mode

2016-07-17 Thread Christoph Lameter
On Thu, 14 Jul 2016, Andy Lutomirski wrote:

> As an example, enough vmalloc/vfree activity will eventually cause
> flush_tlb_kernel_range to be called and *boom*, there goes your shiny
> production dataplane application.  Once virtually mapped kernel stacks
> happen, the frequency with which this happens will only increase.

But then vmalloc/vfre activity is not to be expected if user space only is
running. Since the kernel is not active and this affects kernel address
space only it could be deferred. Such events will cause OS activity that
causes a number of high latency events but then the system will quiet down
again.

> On very brief inspection, __kmem_cache_shutdown will be a problem on
> some workloads as well.

These are all corner cases that can be worked on over time if they are
significant. The main issue here is to reduce the obvious and relatively
frequent causes for ticks and allow easier detection of events that cause
tick activity.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v13 00/12] support "task_isolation" mode

2016-07-14 Thread Chris Metcalf

On 7/14/2016 5:03 PM, Andy Lutomirski wrote:

On Thu, Jul 14, 2016 at 1:48 PM, Chris Metcalf  wrote:

Here is a respin of the task-isolation patch set.  This primarily
reflects feedback from Frederic and Peter Z.

I still think this is the wrong approach, at least at this point.  The
first step should be to instrument things if necessary and fix the
obvious cases where the kernel gets entered asynchronously.


Note, however, that the task_isolation_debug mode is a very convenient
way of discovering what is going on when things do go wrong for task isolation.


Only once
there's a credible reason to believe it can work well should any form
of strictness be applied.


I'm not sure what criteria you need for this, though.  Certainly we've been
shipping our version of task isolation to customers since 2008, and there
are quite a few customer applications in production that are working well.
I'd argue that's a credible reason.


As an example, enough vmalloc/vfree activity will eventually cause
flush_tlb_kernel_range to be called and *boom*, there goes your shiny
production dataplane application.


Well, that's actually a refinement that I did not inflict on this patch series.

In our code base, we have a hook for kernel TLB flushes that defers such
flushes for cores that are running in userspace, because, after all, they
don't yet care about such flushes.  Instead, we atomically set a flag that
is checked on entry to the kernel, and that causes the TLB flush to occur
at that point.


On very brief inspection, __kmem_cache_shutdown will be a problem on
some workloads as well.


That looks like it should be amenable to a version of the same fix I pushed
upstream in 5fbc461636c32efd ("mm: make lru_add_drain_all() selective").
You would basically check which cores have non-empty caches, and only
interrupt those cores.  For extra credit, you empty the cache on your local cpu
when you are entering task isolation mode.  Now you don't get interrupted.

To be fair, I've never seen this particular path cause an interruption.  And I
think this speaks to the fact that there really can't be a black and white
decision about when you have removed enough possible interrupt paths.
It really does depend on what else is running on your machine in addition
to the task isolation code, and that will vary from application to application.
And, as the kernel evolves, new ways of interrupting task isolation cores
will get added and need to be dealt with.  There really isn't a perfect time
you can wait for and then declare that all the asynchronous entry cases
have been dealt with and now things are safe for task isolation.

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v13 00/12] support "task_isolation" mode

2016-07-14 Thread Andy Lutomirski
On Thu, Jul 14, 2016 at 1:48 PM, Chris Metcalf  wrote:
> Here is a respin of the task-isolation patch set.  This primarily
> reflects feedback from Frederic and Peter Z.

I still think this is the wrong approach, at least at this point.  The
first step should be to instrument things if necessary and fix the
obvious cases where the kernel gets entered asynchronously.  Only once
there's a credible reason to believe it can work well should any form
of strictness be applied.

As an example, enough vmalloc/vfree activity will eventually cause
flush_tlb_kernel_range to be called and *boom*, there goes your shiny
production dataplane application.  Once virtually mapped kernel stacks
happen, the frequency with which this happens will only increase.

On very brief inspection, __kmem_cache_shutdown will be a problem on
some workloads as well.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v13 00/12] support "task_isolation" mode

2016-07-14 Thread Chris Metcalf
Here is a respin of the task-isolation patch set.  This primarily
reflects feedback from Frederic and Peter Z.

Changes since v12:

- Rebased on v4.7-rc7.

- New default "strict" model for task isolation - tasks exit the
  kernel from the initial prctl() to userspace, and can only legally
  exit by calling prctl() again to turn off isolation.  Any other
  kernel entry results in a SIGKILL by default.

- New optional "relaxed" mode, where the application can receive some
  signal other than SIGKILL, or no signal at all, when it re-enters
  the kernel.  Since by default task isolation is now strict, there is
  no longer an additional "STRICT" mode, but rather a new "NOSIG" mode
  that builds on top of the "USERSIG" support for setting a signal
  other than SIGKILL to be delivered to the process.  The "NOSIG" mode
  also relaxes the required criteria for entering task isolation mode;
  we just issue a warning if the affinity isn't set right, and we
  don't fail with EAGAIN if the kernel isn't ready to stop the tick.

  Running your task-isolation application in this "NOSIG" mode is also
  necessary when debugging, since otherwise hitting breakpoints, etc.,
  will cause a fatal signal to be sent to the process.

  Frederic has suggested we might want to defer this functionality
  until later, but (in addition to the debuggability aspect) there is
  some thought that it might be useful for e.g. HPC, so I have just
  broken out the additional semantics into a single separate patch at
  the end of the series.

- Function naming has been changed and comments have been added to try
  to clarify the role of the task-isolation reporting on kernel
  entries that do NOT cause signals.  This hopefully clarifies why we
  only invoke the renamed task_isolation_quiet_exception() in a few
  places, since all the other places generate signals anyway. [PeterZ]

- The task_isolation_debug() call now has an inline piece that checks
  to see if the target is a task_isolation cpu before actually
  calling. [PeterZ]

- In _task_isolation_debug(), we use the new task_struct_trylock()
  call that is in linux-next now; for now I just have a static copy of
  the function, which I will switch to using the version from
  linux-next in the next rebasing. [PeterZ]

- We now pass a string describing the interrupt up from
  task_isolation_debug() so there is more information on where the
  interrupt came from beyond just the stack backtrace. [PeterZ]

- I added task_isolation_debug() hooks to smp_sched_reschedule() on
  x86, which was missing before, and removed the hooks in the tile
  send_IPI_*() routines, since there were already hooks in the
  callers.  Likewise I moved the hook for arm64 from the generic
  smp_cross_call() routine to the only caller that wasn't already
  hooked, smp_send_reschedule().  The commit message clarifies the
  rationale for where hooks are placed.

- I moved the page fault reporting so that it only reports in the case
  that we are not also sending a SIGSEGV/SIGBUS, for consistency with
  other uses of task_isolation_quiet_exception().

The previous (v12) patch series is here:

https://lkml.kernel.org/g/1459877922-15512-1-git-send-email-cmetc...@mellanox.com

This version of the patch series has been tested on arm64 and tilegx,
and build-tested on x86.

It remains true that the 1 Hz tick needs to be disabled for this
patch series to be able to achieve its primary goal of enabling
truly tick-free operation, but that is ongoing orthogonal work.
Frederick, do you have a sense of what is left to be done there?
I can certainly try to contribute to that effort as well.

The series is available at:

  git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git 
dataplane

Chris Metcalf (12):
  vmstat: add quiet_vmstat_sync function
  vmstat: add vmstat_idle function
  lru_add_drain_all: factor out lru_add_drain_needed
  task_isolation: add initial support
  task_isolation: track asynchronous interrupts
  arch/x86: enable task isolation functionality
  arm64: factor work_pending state machine to C
  arch/arm64: enable task isolation functionality
  arch/tile: enable task isolation functionality
  arm, tile: turn off timer tick for oneshot_stopped state
  task_isolation: support CONFIG_TASK_ISOLATION_ALL
  task_isolation: add user-settable notification signal

 Documentation/kernel-parameters.txt|  16 ++
 arch/arm64/Kconfig |   1 +
 arch/arm64/include/asm/thread_info.h   |   5 +-
 arch/arm64/kernel/entry.S  |  12 +-
 arch/arm64/kernel/ptrace.c |  15 +-
 arch/arm64/kernel/signal.c |  42 +++-
 arch/arm64/kernel/smp.c|   2 +
 arch/arm64/mm/fault.c  |   8 +-
 arch/tile/Kconfig  |   1 +
 arch/tile/include/asm/thread_info.h|   4 +-
 arch/tile/kernel/process.c |   9 +
 arch/tile/kernel/ptrace.c  |   7 +
 arch/tile/kernel/single_step.c |   7 +
 arch/tile/kernel/smp.c