Hi Kan, I built a small test case for you to demonstrate the issue for code and data. Compile the test program and then do: For text: $ perf record ./mmap $ perf report -D | fgrep MMAP2
The test program mmaps 2 pages, unmaps the second, and remap 1 page over the freed space. If you look at the MMAP2 record, you will not be able to reconstruct what happened and perf will get confused should it try to symbolize from the address range With Text: PERF_RECORD_MMAP2 5937/5937: [0x400000(0x1000) @ 0 08:01 400938 824817672]: r-xp /home/eranian/mmap PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000 00:00 0 0]: rwxp //anon PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000 00:00 0 0]: rwxp //anon ^^^^^^^^^^^^^^^^^^^^^^^^ captures the whole VMA but not the mapping change in user space For data: $ perf record -d ./mmap $ perf report -D | fgrep MMAP2 With data: PERF_RECORD_MMAP2 6430/6430: [0x400000(0x1000) @ 0 08:01 400938 3278843184]: r-xp /home/eranian/mmap PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000 00:00 0 0]: rw-p //anon PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000 00:00 0 0]: rw-p //anon Same test case with data. Perf will think the entire 2 pages have been replaced when in fact only the second has. I believe the problem is likely to impact data and jitted code cache #include <sys/types.h> #include <stdio.h> #include <unistd.h> #include <sys/mman.h> #include <err.h> #include <getopt.h> int main(int argc, char **argv) { void *addr1, *addr2; size_t pgsz = sysconf(_SC_PAGESIZE); int n = 2; int ret; int c, mode = 0; while ((c = getopt(argc, argv, "hd")) != -1) { switch (c) { case 'h': printf("[-h]\tget this help\n"); printf("[-d]\tuse data mmaps (no PROT_EXEC)\n"); return 0; case 'd': mode = PROT_EXEC; break; default: errx(1, "unknown option"); } } /* default to data */ if (mode == 0) mode = PROT_WRITE; /* * mmap 2 contiugous pages */ addr1 = mmap(NULL, n * pgsz, PROT_READ| mode, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); if (addr1 == (void *)MAP_FAILED) err(1, "mmap 1 failed"); printf("addr1=[%p : %p]\n", addr1, addr1 + n * pgsz); /* * unmap only the second page */ ret = munmap(addr1 + pgsz, pgsz); if (ret == -1) err(1, "munmp failed"); /* * mmap 1 page at the location of the unmap page (should reuse virtual space) * This creates a continuous region built from two mmaps and potentially two different sources * especially with jitted runtimes */ addr2 = mmap(addr1 + pgsz, 1 * pgsz, PROT_READ|PROT_WRITE | mode, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); printf("addr2=%p\n", addr2); if (addr2 == (void *)MAP_FAILED) err(1, "mmap 2 failed"); if (addr2 != (addr1 + pgsz)) errx(1, "wrong mmap2 address"); sleep(1); return 0; } On Thu, Nov 1, 2018 at 7:10 AM Liang, Kan <kan.li...@linux.intel.com> wrote: > > > > On 10/24/2018 3:30 PM, Stephane Eranian wrote: > > The need for this new record type extends beyond physical address > > conversions > > and PEBS. A long while ago, someone reported issues with symbolization > > related > > to perf lacking munmap tracking. It had to do with vma merging. I think the > > sequence of mmaps was as follows in the problematic case: > > 1. addr1 = mmap(8192); > > 2. munmap(addr1 + 4096, 4096) > > 3. addr2 = mmap(addr1+4096, 4096) > > > > If successful, that yields addr2 = addr1 + 4096 (could also get the > > same without forcing the address). > > > > In that case, if I recall correctly, the vma for 1st mapping (now at > > 4k) and that of the 2nd mapping (4k) > > get merged into a single 8k vma and this is what perf_events will > > record for PERF_RECORD_MMAP. > > On the perf tool side, it is assumed that if two timestamped mappings > > overlap then, the latter overrides > > the former. In this case, perf would loose the mapping of the first > > 4kb and assume all symbols comes from > > 2nd mapping. Hopefully I got the scenario right. If so, then you'd > > need PERF_RECORD_UNMAP to > > disambiguate assuming the perf tool is modified accordingly. > > > > Hi Stephane and Peter, > > I went through the link(https://lkml.org/lkml/2017/1/27/452). I'm trying > to understand the problematic case. > > It looks like the issue can only be triggered by perf inject --jit. > Because it can inject extra MMAP events. > As my understanding, Linux kernel only try to merge VMAs if they are > both from anon or they are both from the same file. --jit breaks the > rule, and makes the merged VMA partly from anon, partly from file. > Now, there is a new MMAP event which range covers the modified VMA. > Without the help of MUNMAP event, perf tool have no idea if the new one > is a newly merged VMA (modified VMA + a new VMA) or a brand new VMA. > Current code just simply overwrite the modified VMAs. The VMA > information which --jit injected may be lost. The symbolization may be > lost as well. > > Except --jit, the VMAs information should be consistent between kernel > and perf tools. We shouldn't observe the problem. MUNMAP event is not > needed. > > Is my understanding correct? > > Do you have a test case for the problem? > > Thanks, > Kan