On Thu, Mar 26, 2015 at 03:37:29PM -0600, David Ahern wrote: > On 3/26/15 3:11 PM, Don Zickus wrote: > >Sorry for drawing this out. Originally the performance still seemed off. > >But as we split the patch up to see where the perf impact was, the problem > >seemed to have disappeared. So we are testing the original patch again. > > > >The only difference now is we were playing with the -BN option in perf based > >on your changelog, just because we never used it before. :-) > > I was beyond surprised that you were measuring a 50% hit with the > first patch. As mentioned in a previous response it only adds the > processing of 3 additional lines to the already opened and read > /proc/pid/status file. So, when I wrote this second version I wanted > to make sure we are only measuring the impact of this change. The > /proc/pid/status files are read on startup of the record -- before > any samples are taken. > > The intent of '-e cpu-clock -F 1000 -- usleep 1' is to avoid any > samples since we don't care about them. Really the -a should be > dropped as well -- no need to open per-cpu events. > > -B impacts processing done at the end of the run: > > builin-record.c, __cmd_record(): > > if (!rec->no_buildid) > process_buildids(rec); > > and -N says don't copying anything to ~/.debug. All together it > tries to focus the measurement to /proc walking. > > > > >One last test without the -BN option and if that looks fine, then we have no > >objections. Again sorry for dragging this out. I will let you know > >tomorrow EST. > > no problem; appreciate the heads up.
I talked with Joe on my way out the door yesterday and he confirmed, just removing -BN from our test showed a performance hit with your patch. With the -BN option, there is no performance hit and we are perfectly fine with your patch. So, I guess I am confused how the -BN and your patch could change behaviour. Just to re-iterate what we did, Joe kicked off a specJBB run and he did 20 captures of two runs (one with the unpatched binary and one with a pached binary). for i in {1..20} do time perf.unpatched mem record -a -e cpu/mem-loads,ldlat=50/pp -e cpu/mem-stores/pp sleep 10 time perf.patched mem record -a -e cpu/mem-loads,ldlat=50/pp -e cpu/mem-stores/pp sleep 10 done then we repeat the above test but with -BN in both runs. We compare the log sizes to make sure they are similar for the random snapshots and compare the times. With the -BN option, the times are generally within +/- 0.5 seconds of each. Without the -BN option the patched perf binary is generally +20-40 seconds slower. However, based on your description above about what the -BN option does, I am scratching my head about our results. Thoughts? Cheers, Don -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/