"Aneesh Kumar K.V" <aneesh.ku...@linux.vnet.ibm.com> writes:
> I looked at the perf data and with the test, we are doing larger number > of hash faults and then around 10k flush_hash_range. Can the small > improvement in number be due to the fact that we are not storing slot > number when doing an insert now?. Also in the flush path we are now not > using real_pte_t. > With THP disabled I am finding below. Without patch 35.62% a.out [kernel.vmlinux] [k] clear_user_page 8.54% a.out [kernel.vmlinux] [k] __lock_acquire 3.86% a.out [kernel.vmlinux] [k] native_flush_hash_range 3.38% a.out [kernel.vmlinux] [k] save_context_stack 2.98% a.out a.out [.] main 2.59% a.out [kernel.vmlinux] [k] lock_acquire 2.29% a.out [kernel.vmlinux] [k] mark_lock 2.23% a.out [kernel.vmlinux] [k] native_hpte_insert 1.87% a.out [kernel.vmlinux] [k] get_mem_cgroup_from_mm 1.71% a.out [kernel.vmlinux] [k] rcu_lockdep_current_cpu_online 1.68% a.out [kernel.vmlinux] [k] lock_release 1.47% a.out [kernel.vmlinux] [k] __handle_mm_fault 1.41% a.out [kernel.vmlinux] [k] validate_sp With patch 35.40% a.out [kernel.vmlinux] [k] clear_user_page 8.82% a.out [kernel.vmlinux] [k] __lock_acquire 3.66% a.out a.out [.] main 3.49% a.out [kernel.vmlinux] [k] save_context_stack 2.77% a.out [kernel.vmlinux] [k] lock_acquire 2.45% a.out [kernel.vmlinux] [k] mark_lock 1.80% a.out [kernel.vmlinux] [k] get_mem_cgroup_from_mm 1.80% a.out [kernel.vmlinux] [k] native_hpte_insert 1.79% a.out [kernel.vmlinux] [k] rcu_lockdep_current_cpu_online 1.78% a.out [kernel.vmlinux] [k] lock_release 1.73% a.out [kernel.vmlinux] [k] native_flush_hash_range 1.53% a.out [kernel.vmlinux] [k] __handle_mm_fault That is we are now spending less time in native_flush_hash_range. -aneesh