On Tue, Oct 06, 2020 at 02:40:32PM +0900, namhy...@kernel.org wrote: > On Tue, Oct 06, 2020 at 11:39:49AM +0900, namhy...@kernel.org wrote: > > > > On Fri, Oct 02, 2020 at 10:03:17PM +0900, Namhyung Kim wrote: > > > > > Below measures time and memory usage during the perf inject and > > > > > report using ~190MB data file. > > > > > > > > > > Before: > > > > > perf inject: 11.09 s, 382148 KB > > > > > perf report: 8.05 s, 397440 KB > > > > > > > > > > After: > > > > > perf inject: 16.24 s, 83376 KB > > > > > perf report: 7.96 s, 216184 KB > > > > > > > > > > As you can see, it used 2x memory of the input size. I guess it's > > > > > because it needs to keep the copy for the whole input. But I don't > > > > > understand why processing time of perf inject increased.. > > > > Measuring it with time shows: > > > > before after > > real 11.309s 17.040s > > user 8.084s 13.940s > > sys 6.535s 6.732s > > > > So it's user space to make the difference. I've run perf record on > > both (with cycles:U) and the dominant function is same: queue_event. > > (46.98% vs 65.87%) > > > > It seems the flushing the queue makes more overhead on sorting. > > So I suspect the cache-miss ratio affects the performance. With > flushing, data is processed in the middle and all the entries are > reused after flush so it would invalidate all the cache lines > occasionally. > > This is the perf stat result: > > * Before > > 7,167,414,019 L1-dcache-loads > > 337,471,761 L1-dcache-read-misses # 4.71% of all L1-dcache > hits > > 11.011224671 seconds time elapsed > > > * After > > 7,075,556,792 L1-dcache-loads > > 771,810,388 L1-dcache-read-misses # 10.91% of all L1-dcache > hits > > 17.015901863 seconds time elapsed > > > Hmm.. it's a memory & time trade-off then. Maybe we need a switch to > select which one?
I'd keep the faster one ;-) so the one before jirka