Re: [tip:sched/core] sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity

2016-03-01 Thread Frederic Weisbecker
2016-02-29 16:31 GMT+01:00 Frederic Weisbecker :
> 2016-02-29 12:18 GMT+01:00 tip-bot for Rik van Riel :
>> Commit-ID:  ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd
>> Gitweb: 
>> http://git.kernel.org/tip/ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd
>> Author: Rik van Riel 
>> AuthorDate: Wed, 10 Feb 2016 20:08:27 -0500
>> Committer:  Ingo Molnar 
>> CommitDate: Mon, 29 Feb 2016 09:53:10 +0100
>>
>> sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity
>>
>> When profiling syscall overhead on nohz-full kernels,
>> after removing __acct_update_integrals() from the profile,
>> native_sched_clock() remains as the top CPU user. This can be
>> reduced by moving VIRT_CPU_ACCOUNTING_GEN to jiffy granularity.
>>
>> This will reduce timing accuracy on nohz_full CPUs to jiffy
>> based sampling, just like on normal CPUs. It results in
>> totally removing native_sched_clock from the profile, and
>> significantly speeding up the syscall entry and exit path,
>> as well as irq entry and exit, and KVM guest entry & exit.
>>
>> Additionally, only call the more expensive functions (and
>> advance the seqlock) when jiffies actually changed.
>>
>> This code relies on another CPU advancing jiffies when the
>> system is busy. On a nohz_full system, this is done by a
>> housekeeping CPU.
>>
>> A microbenchmark calling an invalid syscall number 10 million
>> times in a row speeds up an additional 30% over the numbers
>> with just the previous patches, for a total speedup of about
>> 40% over 4.4 and 4.5-rc1.
>>
>> Run times for the microbenchmark:
>>
>>  4.43.8 seconds
>>  4.5-rc13.7 seconds
>>  4.5-rc1 + first patch  3.3 seconds
>>  4.5-rc1 + first 3 patches  3.1 seconds
>>  4.5-rc1 + all patches  2.3 seconds
>>
>> A non-NOHZ_FULL cpu (not the housekeeping CPU):
>>
>>  all kernels1.86 seconds
>>
>> Signed-off-by: Rik van Riel 
>> Signed-off-by: Peter Zijlstra (Intel) 
>> Reviewed-by: Thomas Gleixner 
>> Cc: Linus Torvalds 
>> Cc: Mike Galbraith 
>> Cc: Peter Zijlstra 
>> Cc: cl...@redhat.com
>> Cc: eric.duma...@gmail.com
>> Cc: fweis...@gmail.com
>
> It seems the tip bot doesn't parse correctly the Cc tags as I wasn't
> cc'ed on this commit.
>
> Also I wish I had a chance to test and ack this patch before it got
> applied. I guess I should have told I was in vacation for the last
> weeks.
>
> I'm going to run it through some tests.

Ok I did some simple tests (kernel loops, user loops) and it seems to
account the cputime accurately. The kernel loop consists in brk()
calls (borrowed from an old test from Steve) and it also accounted the
small fragments of user time spent between syscall calls.

So it looks like a very nice improvement, thanks a lot Rik!


Re: [tip:sched/core] sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity

2016-03-01 Thread Frederic Weisbecker
2016-02-29 16:31 GMT+01:00 Frederic Weisbecker :
> 2016-02-29 12:18 GMT+01:00 tip-bot for Rik van Riel :
>> Commit-ID:  ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd
>> Gitweb: 
>> http://git.kernel.org/tip/ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd
>> Author: Rik van Riel 
>> AuthorDate: Wed, 10 Feb 2016 20:08:27 -0500
>> Committer:  Ingo Molnar 
>> CommitDate: Mon, 29 Feb 2016 09:53:10 +0100
>>
>> sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity
>>
>> When profiling syscall overhead on nohz-full kernels,
>> after removing __acct_update_integrals() from the profile,
>> native_sched_clock() remains as the top CPU user. This can be
>> reduced by moving VIRT_CPU_ACCOUNTING_GEN to jiffy granularity.
>>
>> This will reduce timing accuracy on nohz_full CPUs to jiffy
>> based sampling, just like on normal CPUs. It results in
>> totally removing native_sched_clock from the profile, and
>> significantly speeding up the syscall entry and exit path,
>> as well as irq entry and exit, and KVM guest entry & exit.
>>
>> Additionally, only call the more expensive functions (and
>> advance the seqlock) when jiffies actually changed.
>>
>> This code relies on another CPU advancing jiffies when the
>> system is busy. On a nohz_full system, this is done by a
>> housekeeping CPU.
>>
>> A microbenchmark calling an invalid syscall number 10 million
>> times in a row speeds up an additional 30% over the numbers
>> with just the previous patches, for a total speedup of about
>> 40% over 4.4 and 4.5-rc1.
>>
>> Run times for the microbenchmark:
>>
>>  4.43.8 seconds
>>  4.5-rc13.7 seconds
>>  4.5-rc1 + first patch  3.3 seconds
>>  4.5-rc1 + first 3 patches  3.1 seconds
>>  4.5-rc1 + all patches  2.3 seconds
>>
>> A non-NOHZ_FULL cpu (not the housekeeping CPU):
>>
>>  all kernels1.86 seconds
>>
>> Signed-off-by: Rik van Riel 
>> Signed-off-by: Peter Zijlstra (Intel) 
>> Reviewed-by: Thomas Gleixner 
>> Cc: Linus Torvalds 
>> Cc: Mike Galbraith 
>> Cc: Peter Zijlstra 
>> Cc: cl...@redhat.com
>> Cc: eric.duma...@gmail.com
>> Cc: fweis...@gmail.com
>
> It seems the tip bot doesn't parse correctly the Cc tags as I wasn't
> cc'ed on this commit.
>
> Also I wish I had a chance to test and ack this patch before it got
> applied. I guess I should have told I was in vacation for the last
> weeks.
>
> I'm going to run it through some tests.

Ok I did some simple tests (kernel loops, user loops) and it seems to
account the cputime accurately. The kernel loop consists in brk()
calls (borrowed from an old test from Steve) and it also accounted the
small fragments of user time spent between syscall calls.

So it looks like a very nice improvement, thanks a lot Rik!


Re: [tip:sched/core] sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity

2016-02-29 Thread Frederic Weisbecker
2016-02-29 12:18 GMT+01:00 tip-bot for Rik van Riel :
> Commit-ID:  ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd
> Gitweb: http://git.kernel.org/tip/ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd
> Author: Rik van Riel 
> AuthorDate: Wed, 10 Feb 2016 20:08:27 -0500
> Committer:  Ingo Molnar 
> CommitDate: Mon, 29 Feb 2016 09:53:10 +0100
>
> sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity
>
> When profiling syscall overhead on nohz-full kernels,
> after removing __acct_update_integrals() from the profile,
> native_sched_clock() remains as the top CPU user. This can be
> reduced by moving VIRT_CPU_ACCOUNTING_GEN to jiffy granularity.
>
> This will reduce timing accuracy on nohz_full CPUs to jiffy
> based sampling, just like on normal CPUs. It results in
> totally removing native_sched_clock from the profile, and
> significantly speeding up the syscall entry and exit path,
> as well as irq entry and exit, and KVM guest entry & exit.
>
> Additionally, only call the more expensive functions (and
> advance the seqlock) when jiffies actually changed.
>
> This code relies on another CPU advancing jiffies when the
> system is busy. On a nohz_full system, this is done by a
> housekeeping CPU.
>
> A microbenchmark calling an invalid syscall number 10 million
> times in a row speeds up an additional 30% over the numbers
> with just the previous patches, for a total speedup of about
> 40% over 4.4 and 4.5-rc1.
>
> Run times for the microbenchmark:
>
>  4.43.8 seconds
>  4.5-rc13.7 seconds
>  4.5-rc1 + first patch  3.3 seconds
>  4.5-rc1 + first 3 patches  3.1 seconds
>  4.5-rc1 + all patches  2.3 seconds
>
> A non-NOHZ_FULL cpu (not the housekeeping CPU):
>
>  all kernels1.86 seconds
>
> Signed-off-by: Rik van Riel 
> Signed-off-by: Peter Zijlstra (Intel) 
> Reviewed-by: Thomas Gleixner 
> Cc: Linus Torvalds 
> Cc: Mike Galbraith 
> Cc: Peter Zijlstra 
> Cc: cl...@redhat.com
> Cc: eric.duma...@gmail.com
> Cc: fweis...@gmail.com

It seems the tip bot doesn't parse correctly the Cc tags as I wasn't
cc'ed on this commit.

Also I wish I had a chance to test and ack this patch before it got
applied. I guess I should have told I was in vacation for the last
weeks.

I'm going to run it through some tests.

Thanks.


Re: [tip:sched/core] sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity

2016-02-29 Thread Frederic Weisbecker
2016-02-29 12:18 GMT+01:00 tip-bot for Rik van Riel :
> Commit-ID:  ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd
> Gitweb: http://git.kernel.org/tip/ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd
> Author: Rik van Riel 
> AuthorDate: Wed, 10 Feb 2016 20:08:27 -0500
> Committer:  Ingo Molnar 
> CommitDate: Mon, 29 Feb 2016 09:53:10 +0100
>
> sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity
>
> When profiling syscall overhead on nohz-full kernels,
> after removing __acct_update_integrals() from the profile,
> native_sched_clock() remains as the top CPU user. This can be
> reduced by moving VIRT_CPU_ACCOUNTING_GEN to jiffy granularity.
>
> This will reduce timing accuracy on nohz_full CPUs to jiffy
> based sampling, just like on normal CPUs. It results in
> totally removing native_sched_clock from the profile, and
> significantly speeding up the syscall entry and exit path,
> as well as irq entry and exit, and KVM guest entry & exit.
>
> Additionally, only call the more expensive functions (and
> advance the seqlock) when jiffies actually changed.
>
> This code relies on another CPU advancing jiffies when the
> system is busy. On a nohz_full system, this is done by a
> housekeeping CPU.
>
> A microbenchmark calling an invalid syscall number 10 million
> times in a row speeds up an additional 30% over the numbers
> with just the previous patches, for a total speedup of about
> 40% over 4.4 and 4.5-rc1.
>
> Run times for the microbenchmark:
>
>  4.43.8 seconds
>  4.5-rc13.7 seconds
>  4.5-rc1 + first patch  3.3 seconds
>  4.5-rc1 + first 3 patches  3.1 seconds
>  4.5-rc1 + all patches  2.3 seconds
>
> A non-NOHZ_FULL cpu (not the housekeeping CPU):
>
>  all kernels1.86 seconds
>
> Signed-off-by: Rik van Riel 
> Signed-off-by: Peter Zijlstra (Intel) 
> Reviewed-by: Thomas Gleixner 
> Cc: Linus Torvalds 
> Cc: Mike Galbraith 
> Cc: Peter Zijlstra 
> Cc: cl...@redhat.com
> Cc: eric.duma...@gmail.com
> Cc: fweis...@gmail.com

It seems the tip bot doesn't parse correctly the Cc tags as I wasn't
cc'ed on this commit.

Also I wish I had a chance to test and ack this patch before it got
applied. I guess I should have told I was in vacation for the last
weeks.

I'm going to run it through some tests.

Thanks.


[tip:sched/core] sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity

2016-02-29 Thread tip-bot for Rik van Riel
Commit-ID:  ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd
Gitweb: http://git.kernel.org/tip/ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd
Author: Rik van Riel 
AuthorDate: Wed, 10 Feb 2016 20:08:27 -0500
Committer:  Ingo Molnar 
CommitDate: Mon, 29 Feb 2016 09:53:10 +0100

sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity

When profiling syscall overhead on nohz-full kernels,
after removing __acct_update_integrals() from the profile,
native_sched_clock() remains as the top CPU user. This can be
reduced by moving VIRT_CPU_ACCOUNTING_GEN to jiffy granularity.

This will reduce timing accuracy on nohz_full CPUs to jiffy
based sampling, just like on normal CPUs. It results in
totally removing native_sched_clock from the profile, and
significantly speeding up the syscall entry and exit path,
as well as irq entry and exit, and KVM guest entry & exit.

Additionally, only call the more expensive functions (and
advance the seqlock) when jiffies actually changed.

This code relies on another CPU advancing jiffies when the
system is busy. On a nohz_full system, this is done by a
housekeeping CPU.

A microbenchmark calling an invalid syscall number 10 million
times in a row speeds up an additional 30% over the numbers
with just the previous patches, for a total speedup of about
40% over 4.4 and 4.5-rc1.

Run times for the microbenchmark:

 4.43.8 seconds
 4.5-rc13.7 seconds
 4.5-rc1 + first patch  3.3 seconds
 4.5-rc1 + first 3 patches  3.1 seconds
 4.5-rc1 + all patches  2.3 seconds

A non-NOHZ_FULL cpu (not the housekeeping CPU):

 all kernels1.86 seconds

Signed-off-by: Rik van Riel 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: cl...@redhat.com
Cc: eric.duma...@gmail.com
Cc: fweis...@gmail.com
Cc: l...@amacapital.net
Link: http://lkml.kernel.org/r/1455152907-18495-5-git-send-email-r...@redhat.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/cputime.c | 39 +++
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index b2ab2ff..01d9898 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -668,26 +668,25 @@ void thread_group_cputime_adjusted(struct task_struct *p, 
cputime_t *ut, cputime
 #endif /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-static unsigned long long vtime_delta(struct task_struct *tsk)
+static cputime_t vtime_delta(struct task_struct *tsk)
 {
-   unsigned long long clock;
+   unsigned long now = READ_ONCE(jiffies);
 
-   clock = local_clock();
-   if (clock < tsk->vtime_snap)
+   if (time_before(now, (unsigned long)tsk->vtime_snap))
return 0;
 
-   return clock - tsk->vtime_snap;
+   return jiffies_to_cputime(now - tsk->vtime_snap);
 }
 
 static cputime_t get_vtime_delta(struct task_struct *tsk)
 {
-   unsigned long long delta = vtime_delta(tsk);
+   unsigned long now = READ_ONCE(jiffies);
+   unsigned long delta = now - tsk->vtime_snap;
 
WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_INACTIVE);
-   tsk->vtime_snap += delta;
+   tsk->vtime_snap = now;
 
-   /* CHECKME: always safe to convert nsecs to cputime? */
-   return nsecs_to_cputime(delta);
+   return jiffies_to_cputime(delta);
 }
 
 static void __vtime_account_system(struct task_struct *tsk)
@@ -699,6 +698,9 @@ static void __vtime_account_system(struct task_struct *tsk)
 
 void vtime_account_system(struct task_struct *tsk)
 {
+   if (!vtime_delta(tsk))
+   return;
+
write_seqcount_begin(>vtime_seqcount);
__vtime_account_system(tsk);
write_seqcount_end(>vtime_seqcount);
@@ -707,7 +709,8 @@ void vtime_account_system(struct task_struct *tsk)
 void vtime_gen_account_irq_exit(struct task_struct *tsk)
 {
write_seqcount_begin(>vtime_seqcount);
-   __vtime_account_system(tsk);
+   if (vtime_delta(tsk))
+   __vtime_account_system(tsk);
if (context_tracking_in_user())
tsk->vtime_snap_whence = VTIME_USER;
write_seqcount_end(>vtime_seqcount);
@@ -718,16 +721,19 @@ void vtime_account_user(struct task_struct *tsk)
cputime_t delta_cpu;
 
write_seqcount_begin(>vtime_seqcount);
-   delta_cpu = get_vtime_delta(tsk);
tsk->vtime_snap_whence = VTIME_SYS;
-   account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
+   if (vtime_delta(tsk)) {
+   delta_cpu = get_vtime_delta(tsk);
+   account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
+   }
write_seqcount_end(>vtime_seqcount);
 }
 

[tip:sched/core] sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity

2016-02-29 Thread tip-bot for Rik van Riel
Commit-ID:  ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd
Gitweb: http://git.kernel.org/tip/ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd
Author: Rik van Riel 
AuthorDate: Wed, 10 Feb 2016 20:08:27 -0500
Committer:  Ingo Molnar 
CommitDate: Mon, 29 Feb 2016 09:53:10 +0100

sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity

When profiling syscall overhead on nohz-full kernels,
after removing __acct_update_integrals() from the profile,
native_sched_clock() remains as the top CPU user. This can be
reduced by moving VIRT_CPU_ACCOUNTING_GEN to jiffy granularity.

This will reduce timing accuracy on nohz_full CPUs to jiffy
based sampling, just like on normal CPUs. It results in
totally removing native_sched_clock from the profile, and
significantly speeding up the syscall entry and exit path,
as well as irq entry and exit, and KVM guest entry & exit.

Additionally, only call the more expensive functions (and
advance the seqlock) when jiffies actually changed.

This code relies on another CPU advancing jiffies when the
system is busy. On a nohz_full system, this is done by a
housekeeping CPU.

A microbenchmark calling an invalid syscall number 10 million
times in a row speeds up an additional 30% over the numbers
with just the previous patches, for a total speedup of about
40% over 4.4 and 4.5-rc1.

Run times for the microbenchmark:

 4.43.8 seconds
 4.5-rc13.7 seconds
 4.5-rc1 + first patch  3.3 seconds
 4.5-rc1 + first 3 patches  3.1 seconds
 4.5-rc1 + all patches  2.3 seconds

A non-NOHZ_FULL cpu (not the housekeeping CPU):

 all kernels1.86 seconds

Signed-off-by: Rik van Riel 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: cl...@redhat.com
Cc: eric.duma...@gmail.com
Cc: fweis...@gmail.com
Cc: l...@amacapital.net
Link: http://lkml.kernel.org/r/1455152907-18495-5-git-send-email-r...@redhat.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/cputime.c | 39 +++
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index b2ab2ff..01d9898 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -668,26 +668,25 @@ void thread_group_cputime_adjusted(struct task_struct *p, 
cputime_t *ut, cputime
 #endif /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-static unsigned long long vtime_delta(struct task_struct *tsk)
+static cputime_t vtime_delta(struct task_struct *tsk)
 {
-   unsigned long long clock;
+   unsigned long now = READ_ONCE(jiffies);
 
-   clock = local_clock();
-   if (clock < tsk->vtime_snap)
+   if (time_before(now, (unsigned long)tsk->vtime_snap))
return 0;
 
-   return clock - tsk->vtime_snap;
+   return jiffies_to_cputime(now - tsk->vtime_snap);
 }
 
 static cputime_t get_vtime_delta(struct task_struct *tsk)
 {
-   unsigned long long delta = vtime_delta(tsk);
+   unsigned long now = READ_ONCE(jiffies);
+   unsigned long delta = now - tsk->vtime_snap;
 
WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_INACTIVE);
-   tsk->vtime_snap += delta;
+   tsk->vtime_snap = now;
 
-   /* CHECKME: always safe to convert nsecs to cputime? */
-   return nsecs_to_cputime(delta);
+   return jiffies_to_cputime(delta);
 }
 
 static void __vtime_account_system(struct task_struct *tsk)
@@ -699,6 +698,9 @@ static void __vtime_account_system(struct task_struct *tsk)
 
 void vtime_account_system(struct task_struct *tsk)
 {
+   if (!vtime_delta(tsk))
+   return;
+
write_seqcount_begin(>vtime_seqcount);
__vtime_account_system(tsk);
write_seqcount_end(>vtime_seqcount);
@@ -707,7 +709,8 @@ void vtime_account_system(struct task_struct *tsk)
 void vtime_gen_account_irq_exit(struct task_struct *tsk)
 {
write_seqcount_begin(>vtime_seqcount);
-   __vtime_account_system(tsk);
+   if (vtime_delta(tsk))
+   __vtime_account_system(tsk);
if (context_tracking_in_user())
tsk->vtime_snap_whence = VTIME_USER;
write_seqcount_end(>vtime_seqcount);
@@ -718,16 +721,19 @@ void vtime_account_user(struct task_struct *tsk)
cputime_t delta_cpu;
 
write_seqcount_begin(>vtime_seqcount);
-   delta_cpu = get_vtime_delta(tsk);
tsk->vtime_snap_whence = VTIME_SYS;
-   account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
+   if (vtime_delta(tsk)) {
+   delta_cpu = get_vtime_delta(tsk);
+   account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
+   }
write_seqcount_end(>vtime_seqcount);
 }
 
 void vtime_user_enter(struct task_struct *tsk)
 {
write_seqcount_begin(>vtime_seqcount);
-   __vtime_account_system(tsk);
+   if (vtime_delta(tsk))
+