On 8/12/21 6:19 PM, Michael Ellerman wrote:
"Puvichakravarthy Ramachandran" <puvichakravar...@in.ibm.com> writes:
With shared mapping, even though we are unmapping a large range, the kernel
will force a TLB flush with ptl lock held to avoid the race mentioned in
commit 1cf35d47712d ("mm: split 'tlb_flush_mmu()' into tlb flushing and memory 
freeing parts")
This results in the kernel issuing a high number of TLB flushes even for a large
range. This can be improved by making sure the kernel switch to pid based flush 
if the
kernel is unmapping a 2M range.

Signed-off-by: Aneesh Kumar K.V <aneesh.ku...@linux.ibm.com>
---
  arch/powerpc/mm/book3s64/radix_tlb.c | 8 ++++----
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c > 
b/arch/powerpc/mm/book3s64/radix_tlb.c
index aefc100d79a7..21d0f098e43b 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -1106,7 +1106,7 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
   * invalidating a full PID, so it has a far lower threshold to change > from
   * individual page flushes to full-pid flushes.
   */
-static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
+static unsigned long tlb_single_page_flush_ceiling __read_mostly = 32;
  static unsigned long tlb_local_single_page_flush_ceiling __read_mostly > = 
POWER9_TLB_SETS_RADIX * 2;

  static inline void __radix__flush_tlb_range(struct mm_struct *mm,
@@ -1133,7 +1133,7 @@ static inline void __radix__flush_tlb_range(struct > 
mm_struct *mm,
       if (fullmm)
               flush_pid = true;
       else if (type == FLUSH_TYPE_GLOBAL)
-             flush_pid = nr_pages > tlb_single_page_flush_ceiling;
+             flush_pid = nr_pages >= tlb_single_page_flush_ceiling;
       else
               flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;

Additional details on the test environment. This was tested on a 2 Node/8
socket Power10 system.
The LPAR had 105 cores and the LPAR spanned across all the sockets.

# perf stat -I 1000 -a -e cycles,instructions -e
"{cpu/config=0x030008,name=PM_EXEC_STALL/}" -e
"{cpu/config=0x02E01C,name=PM_EXEC_STALL_TLBIE/}" ./tlbie -i 10 -c 1  -t 1
  Rate of work: = 176
#           time             counts unit events
      1.029206442         4198594519      cycles
      1.029206442         2458254252      instructions              # 0.59 insn 
per cycle
      1.029206442         3004031488      PM_EXEC_STALL
      1.029206442         1798186036      PM_EXEC_STALL_TLBIE
  Rate of work: = 181
      2.054288539         4183883450      cycles
      2.054288539         2472178171      instructions              # 0.59 insn 
per cycle
      2.054288539         3014609313      PM_EXEC_STALL
      2.054288539         1797851642      PM_EXEC_STALL_TLBIE
  Rate of work: = 180
      3.078306883         4171250717      cycles
      3.078306883         2468341094      instructions              # 0.59 insn 
per cycle
      3.078306883         2993036205      PM_EXEC_STALL
      3.078306883         1798181890      PM_EXEC_STALL_TLBIE
.
.

# cat /sys/kernel/debug/powerpc/tlb_single_page_flush_ceiling
34

# echo 32 > /sys/kernel/debug/powerpc/tlb_single_page_flush_ceiling

# perf stat -I 1000 -a -e cycles,instructions -e
"{cpu/config=0x030008,name=PM_EXEC_STALL/}" -e
"{cpu/config=0x02E01C,name=PM_EXEC_STALL_TLBIE/}" ./tlbie -i 10 -c 1  -t 1
  Rate of work: = 313
#           time             counts unit events
      1.030310506         4206071143      cycles
      1.030310506         4314716958      instructions              # 1.03 insn 
per cycle
      1.030310506         2157762167      PM_EXEC_STALL
      1.030310506          110825573      PM_EXEC_STALL_TLBIE
  Rate of work: = 322
      2.056034068         4331745630      cycles
      2.056034068         4531658304      instructions              # 1.05 insn 
per cycle
      2.056034068         2288971361      PM_EXEC_STALL
      2.056034068          111267927      PM_EXEC_STALL_TLBIE
  Rate of work: = 321
      3.081216434         4327050349      cycles
      3.081216434         4379679508      instructions              # 1.01 insn 
per cycle
      3.081216434         2252602550      PM_EXEC_STALL
      3.081216434          110974887      PM_EXEC_STALL_TLBIE


What is the tlbie test actually doing?

Does it do anything to measure the cost of refilling after the full mm flush?



That is essentially

for ()
{
  shmat()
  fillshm()
  shmdt()

}

for a 256MB range. So it is not really a fair benchmark because it doesn't take into account the impact of throwing away the full pid translation. But even then the TLBIE stalls is an important data point?

-aneesh


Reply via email to