On Wed, Oct 14, 2020 at 9:09 PM Alexey Budankov <alexey.budan...@linux.intel.com> wrote: > > Hi, > > On 14.10.2020 13:52, Namhyung Kim wrote: > > Hi, > > > > On Mon, Oct 12, 2020 at 6:01 PM Alexey Budankov > > <alexey.budan...@linux.intel.com> wrote: > >> > >> > >> Write trace data into per mmap trace files located > >> at data directory. Streaming thread adjusts its affinity > >> according to mask of the buffer being processed. > >> > >> Signed-off-by: Alexey Budankov <alexey.budan...@linux.intel.com> > >> --- > > [SNIP] > >> @@ -1184,8 +1203,12 @@ static int record__mmap_read_evlist(struct record > >> *rec, struct evlist *evlist, > >> /* > >> * Mark the round finished in case we wrote > >> * at least one event. > >> + * > >> + * No need for round events in directory mode, > >> + * because per-cpu maps and files have data > >> + * sorted by kernel. > >> */ > >> - if (bytes_written != rec->bytes_written) > >> + if (!record__threads_enabled(rec) && bytes_written != > >> rec->bytes_written) > >> rc = record__write(rec, NULL, &finished_round_event, > >> sizeof(finished_round_event)); > > > > This means it needs to keep all events in the ordered events queue > > when perf report processes the data, right? > > Looks so.
Maybe it's not related to this directly. But we need to think about how to make perf report faster and more efficient as well. In my previous attempt, I separated samples from other events to be in different mmaps so they were saved to different files (or in a separate part of the data file). And perf report processes the meta events (FORK/MMAP/...) first to construct the system image and then processes samples with multi-threads. Once it has the image, it could bypass the ordered events queue entirely. Thanks Namhyung