[PATCH] powerpc: Avoid taking a data miss on every userspace instruction miss
From: Anton Blanchard Early on in do_page_fault() we call store_updates_sp(), regardless of the type of exception. For an instruction miss this doesn't make sense, because we only use this information to detect if a data miss is the result of a stack expansion instruction or not. Worse still, it results in a data miss within every userspace instruction miss handler, because we try and load the very instruction we are about to install a pte for! A simple exec microbenchmark runs 6% faster on POWER8 with this fix: int main(int argc, char *argv[]) { unsigned long left = atol(argv[1]); char leftstr[16]; if (left-- == 0) return 0; sprintf(leftstr, "%ld", left); execlp(argv[0], argv[0], leftstr, NULL); perror("exec failed\n"); return 0; } Signed-off-by: Anton Blanchard --- arch/powerpc/mm/fault.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index fd6484fc2fa9..3a7d580fdc59 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -287,7 +287,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address, * can result in fault, which will cause a deadlock when called with * mmap_sem held */ - if (user_mode(regs)) + if (!is_exec && user_mode(regs)) store_update_sp = store_updates_sp(regs); if (user_mode(regs)) -- 2.11.0
[PATCH] powerpc: Avoid taking a data miss on every userspace instruction miss
From: Anton Blanchard Early on in do_page_fault() we call store_updates_sp(), regardless of the type of exception. For an instruction miss this doesn't make sense, because we only use this information to detect if a data miss is the result of a stack expansion instruction or not. Worse still, it results in a data miss within every userspace instruction miss handler, because we try and load the very instruction we are about to install a pte for! A simple exec microbenchmark runs 6% faster on POWER8 with this fix: #include #include #include int main(int argc, char *argv[]) { unsigned long left = atol(argv[1]); char leftstr[16]; if (left-- == 0) return 0; sprintf(leftstr, "%ld", left); execlp(argv[0], argv[0], leftstr, NULL); perror("exec failed\n"); return 0; } Pass the number of iterations on the command line (eg 1) and time how long it takes to execute. Signed-off-by: Anton Blanchard --- arch/powerpc/mm/fault.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index fd6484fc2fa9..3a7d580fdc59 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -287,7 +287,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address, * can result in fault, which will cause a deadlock when called with * mmap_sem held */ - if (user_mode(regs)) + if (!is_exec && user_mode(regs)) store_update_sp = store_updates_sp(regs); if (user_mode(regs)) -- 2.11.0
Re: [PATCH] powerpc: Avoid taking a data miss on every userspace instruction miss
Anton Blanchard a écrit : From: Anton Blanchard Early on in do_page_fault() we call store_updates_sp(), regardless of the type of exception. For an instruction miss this doesn't make sense, because we only use this information to detect if a data miss is the result of a stack expansion instruction or not. Worse still, it results in a data miss within every userspace instruction miss handler, because we try and load the very instruction we are about to install a pte for! A simple exec microbenchmark runs 6% faster on POWER8 with this fix: #include #include #include int main(int argc, char *argv[]) { unsigned long left = atol(argv[1]); char leftstr[16]; if (left-- == 0) return 0; sprintf(leftstr, "%ld", left); execlp(argv[0], argv[0], leftstr, NULL); perror("exec failed\n"); return 0; } Pass the number of iterations on the command line (eg 1) and time how long it takes to execute. Signed-off-by: Anton Blanchard --- arch/powerpc/mm/fault.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index fd6484fc2fa9..3a7d580fdc59 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -287,7 +287,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address, * can result in fault, which will cause a deadlock when called with * mmap_sem held */ - if (user_mode(regs)) + if (!is_exec && user_mode(regs)) Shouldn't it also check 'is_write' ? If it is a store, is_write should be set, shouldn't it ? Christophe store_update_sp = store_updates_sp(regs); if (user_mode(regs)) -- 2.11.0
Re: [PATCH] powerpc: Avoid taking a data miss on every userspace instruction miss
Hi Christophe, > > - if (user_mode(regs)) > > + if (!is_exec && user_mode(regs)) > > Shouldn't it also check 'is_write' ? > If it is a store, is_write should be set, shouldn't it ? Thanks, Ben had the same suggestion. I'll add that further optimisation in a subsequent patch. Anton
Re: [PATCH] powerpc: Avoid taking a data miss on every userspace instruction miss
Hi Anton, Le 04/04/2017 à 00:00, Anton Blanchard a écrit : Hi Christophe, - if (user_mode(regs)) + if (!is_exec && user_mode(regs)) Shouldn't it also check 'is_write' ? If it is a store, is_write should be set, shouldn't it ? Thanks, Ben had the same suggestion. I'll add that further optimisation in a subsequent patch. Anton For your information, I made some benchmark test using 'perf stat' with your app on MPC8321 and MPC885, and I got the following results: MPC8321 before the change: Performance counter stats for './fault 1000' (10 runs): 4491.971466 cpu-clock (msec) ( +- 0.03% ) 47386 faults ( +- 0.02% ) 4.727864465 seconds time elapsed ( +- 0.17% ) MPC8321 after your change: Performance counter stats for './fault 1000' (10 runs): 4278.738845 cpu-clock (msec) ( +- 0.02% ) 35181 faults ( +- 0.02% ) 4.504443891 seconds time elapsed ( +- 0.19% ) MPC8321 after changing !is_exec by is_write Performance counter stats for './fault 1000' (10 runs): 4268.187261 cpu-clock (msec) ( +- 0.03% ) 35181 faults ( +- 0.01% ) 4.489207922 seconds time elapsed ( +- 0.20% ) MPC885 before the change: Performance counter stats for './fault 500' (10 runs): 726605854 cpu-cycles ( +- 0.03% ) 176067 dTLB-load-misses ( +- 0.08% ) 52722 iTLB-load-misses ( +- 0.01% ) 25718 faults ( +- 0.03% ) 5.795924654 seconds time elapsed ( +- 0.14% ) MPC885 after your change: Performance counter stats for './fault 500' (10 runs): 711233251 cpu-cycles ( +- 0.04% ) 152462 dTLB-load-misses ( +- 0.09% ) 52715 iTLB-load-misses ( +- 0.01% ) 19611 faults ( +- 0.02% ) 5.673784606 seconds time elapsed ( +- 0.14% ) MPC885 after changing !is_exec by is_write Performance counter stats for './fault 500' (10 runs): 710904083 cpu-cycles ( +- 0.05% ) 147162 dTLB-load-misses ( +- 0.06% ) 52716 iTLB-load-misses ( +- 0.01% ) 19610 faults ( +- 0.02% ) 5.672091139 seconds time elapsed ( +- 0.15% ) Christophe
Re: [PATCH] powerpc: Avoid taking a data miss on every userspace instruction miss
Christophe LEROY writes: > Hi Anton, > > Le 04/04/2017 à 00:00, Anton Blanchard a écrit : >> Hi Christophe, >> - if (user_mode(regs)) + if (!is_exec && user_mode(regs)) >>> >>> Shouldn't it also check 'is_write' ? >>> If it is a store, is_write should be set, shouldn't it ? >> >> Thanks, Ben had the same suggestion. I'll add that further optimisation >> in a subsequent patch. >> > > For your information, I made some benchmark test using 'perf stat' with > your app on MPC8321 and MPC885, and I got the following results: MPC8321: before47386 faults after 35181 faults-12205 is_write 35181 faults-12205 So that's good. MPC885: before: 176067 dTLB-load-misses 52722 iTLB-load-misses 25718 faults after: 152462 dTLB-load-misses-23605 52715 iTLB-load-misses-7 19611 faults -6107 is_write:147162 dTLB-load-misses-28905 52716 iTLB-load-misses-6 19610 faults -6108 Also good, and shows that is_write idea would be even better. cheers