> On Mar 12, 2018, at 3:47 PM, Song Liu <songliubrav...@fb.com> wrote: > > > >> On Mar 12, 2018, at 2:31 PM, Alexei Starovoitov <a...@fb.com> wrote: >> >> On 3/12/18 2:12 PM, Song Liu wrote: >>> >>>> On Mar 12, 2018, at 2:00 PM, Alexei Starovoitov <a...@fb.com> wrote: >>>> >>>> On 3/12/18 1:39 PM, Song Liu wrote: >>>>> + page = find_get_page(vma->vm_file->f_mapping, 0); >>>> >>>> did you test it with config_debug_atomic_sleep ? >>>> it should have complained... >>> >>> Yeah, I have CONFIG_DEBUG_ATOMIC_SLEEP=y. >>> >>> I think find_get_page() will not sleep. The variation find_get_page_flags() >>> may sleep with flag FGP_CREAT. >> >> I see. gfp_mask == 0 and no locks. should work indeed. >> curious how perf report looks like for heavy bpf_get_stackid() usage? > > I modified samples/bpf/sampleip to only call bpf_get_stackid(). The following > is captured with bpf_get_stackid() called at 10k Hz. stressapptest is running > with 16 threads on a system with 56 cores. > > > Samples: 1M of event 'cycles:pp', Event count (approx.): 628092326243 > Overhead Command Shared Object Symbol > + 51.61% stressapptest stressapptest [.] AdlerMemcpyC > > - 20.82% stressapptest [kernel.vmlinux] [k] > queued_spin_lock_slowpath > - queued_spin_lock_slowpath > > - 20.80% pcpu_freelist_pop > > bpf_get_stackid > > bpf_get_stackid_tp > > - 0x590c > > 16.12% AdlerMemcpyC > > 4.50% OsLayer::CpuStressWorkload > > + 14.36% stressapptest stressapptest [.] > OsLayer::CpuStressWorkload > - 8.74% stressapptest [kernel.vmlinux] [k] _raw_spin_lock > > - _raw_spin_lock > > - 8.73% bpf_get_stackid > > bpf_get_stackid_tp > > + 0x590c > > - 0.67% stressapptest [kernel.vmlinux] [k] > pcpu_freelist_pop > - pcpu_freelist_pop > > - 0.67% bpf_get_stackid > > bpf_get_stackid_tp > > + 0x590c > > Seems lock contention is the dominating overhead here. This should be the same > for original stackmap. > > Song
Samples: 172K of event 'cycles:pp', Event count (approx.): 102311012653 Overhead Command Shared Object Symbol 78.84% stressapptest stressapptest [.] AdlerMemcpyC 8.78% stressapptest stressapptest [.] OsLayer::CpuStressWorkload 3.14% stressapptest [kernel.vmlinux] [k] _raw_spin_lock 2.56% stressapptest stressapptest [.] WorkerThread::FillPage 0.45% stressapptest [kernel.vmlinux] [k] perf_callchain_user 0.37% stressapptest [kernel.vmlinux] [.] native_irq_return_iret 0.31% stressapptest [kernel.vmlinux] [k] clear_page_erms 0.29% stressapptest [kernel.vmlinux] [k] pcpu_freelist_pop 0.27% stressapptest stressapptest [.] CalculateAdlerChecksum 0.25% stressapptest [kernel.vmlinux] [k] bpf_get_stackid 0.22% swapper [kernel.vmlinux] [k] poll_idle This perf output is taken with stressapptest running on 4 cores. bpf_get_stackid() and pcps_free_list_pop combined about 0.54% of CPU. Song