Hi Andi, On 28.08.2018 11:59, Jiri Olsa wrote: > On Mon, Aug 27, 2018 at 08:03:21PM +0300, Alexey Budankov wrote: >> >> Currently in record mode the tool implements trace writing serially. >> The algorithm loops over mapped per-cpu data buffers and stores ready >> data chunks into a trace file using write() system call. >> >> At some circumstances the kernel may lack free space in a buffer >> because the other buffer's half is not yet written to disk due to >> some other buffer's data writing by the tool at the moment. >> >> Thus serial trace writing implementation may cause the kernel >> to loose profiling data and that is what observed when profiling >> highly parallel CPU bound workloads on machines with big number >> of cores. >> >> Experiment with profiling matrix multiplication code executing 128 >> threads on Intel Xeon Phi (KNM) with 272 cores, like below, >> demonstrates data loss metrics value of 98%: >> >> /usr/bin/time perf record -o /tmp/perf-ser.data -a -N -B -T -R -g \ >> --call-graph dwarf,1024 --user-regs=IP,SP,BP \ >> --switch-events -e >> cycles,instructions,ref-cycles,software/period=1,name=cs,config=0x3/Duk -- \ >> matrix.gcc >> >> Data loss metrics is the ratio lost_time/elapsed_time where >> lost_time is the sum of time intervals containing PERF_RECORD_LOST >> records and elapsed_time is the elapsed application run time >> under profiling. > > I like the idea and I think it's good direction to go, but could > you please share some from perf stat or whatever you used to meassure > the new performance?
Is it ok to share VTune GUI screenshots I sent you the last time to demonstrate the advantage of AIO trace streaming? Thanks, Alexey > > thanks, > jirka >