On Mon, 25 Jan 2021 at 11:52, Chris <oi...@bsdos.info> wrote: > On 2021-01-24 23:10, Joshua M. Clulow via openindiana-discuss wrote: > > # dtrace -x stackframes=100 -n ' > > profile-997 /arg0/ { @[stack()] = count(); } > > tick-60s { exit(0); }' -o out.kern_stacks > > > OK I simply created an sh script (DTTRACE) with the contents above and > fired it off as; sudo ./DTRACE & > followed by; ls -Cla /usr/include > which created: out.kern_stacks (attached). > > > That will capture the stack of what's running in the kernel (if the > > kernel is running at the time) on each CPU, 997 times per second, for > > 60 seconds. While that's running, kick off the "time ls" again. Take > > the "out.kern_stacks" file and pass it through the flame graph > > generator; e.g., something like: > > > > $ ./stackcollapse.pl out.kern_stacks | ./flamegraph.pl > output.svg > The results of the above are attached as: out_kern_stacks.svg > I has somehow expected a longer spike on the graph, as the output of > ls -Cla /usr/include took the same ~20 seconds to finish writing to the > screen as before.
That's great! Thank you. I expected a bit more as well, but I think I can see what's happening. It looks like the "nvidia" driver is closed source and built in a way that doesn't correctly maintain the frame pointer so DTrace is not able to walk up the stack past that point. On a machine that isn't using the nvidia driver, it looks more like... gfx_private`bitmap_cons_display() gfx_private`do_gfx_ioctl+0x272 gfx_private`gfxp_fb_ioctl+0x63 vgatext`vgatext_ioctl+0xc0 genunix`cdev_ioctl+0x2b genunix`ldi_ioctl+0x89 tem`tems_display_layered+0x37 tem`tems_safe_display+0x2d tem`tem_safe_pix_cls_range+0x152 tem`tem_safe_pix_cls+0x4d tem`tem_safe_clear_chars+0xb0 tem`tem_safe_scroll+0xdc tem`tem_safe_lf+0xbd tem`tem_safe_control+0x18d tem`tem_safe_parse+0x53 tem`tem_safe_input_byte+0x109 tem`tem_safe_terminal_emulate+0x84 tem`tem_write+0x73 wc`wcuwsrv+0xc7 genunix`runservice+0x49 genunix`queue_service+0x41 It looks like one would only get to bitmap_cons_display() by making a VIS_CONSDISPLAY ioctl(), perhaps via tems_display_layered(). This routine ends up copying memory around, basically. That it's doing it 100% of the time on one CPU seems like the obvious bottleneck here. It'd be good to know, perhaps, at what _rate_ calls to bitmap_cons_display() are being made. You could try something like: dtrace -q -n ' bitmap_cons_display:return { @ = count(); } tick-1s { printf("%Y ", walltimestamp); printa("%@d", @); printf("\n"); trunc(@); }' I ran that on my system, and then did "echo a >/dev/wscons" simultaneously and was able to count 91 firings... 2021 Jan 25 15:28:46 2021 Jan 25 15:28:47 2021 Jan 25 15:28:48 91 2021 Jan 25 15:28:49 ... Another thing that would be interesting to know is: if you disable the nvidia driver completely, is performance better? Because you're not currently using X11, I don't believe you technically need it. I think you could try, at the boot loader, hitting escape to get to the "ok" prompt and then... set disable-nvidia=true boot It should hopefully then give you a WARNING about the "nvidia" module being disabled at boot. Hopefully performance is at least different, if not better, if you do that. Cheers. -- Joshua M. Clulow http://blog.sysmgr.org _______________________________________________ openindiana-discuss mailing list openindiana-discuss@openindiana.org https://openindiana.org/mailman/listinfo/openindiana-discuss