https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99105
--- Comment #6 from Martin Liška <marxin at gcc dot gnu.org> --- (In reply to Jan Hubicka from comment #5) > > > So it effectively replaces gcov's own buffered I/O by stdio. First I am > > > not sure how safe it is (as we had a lot of fun about using malloc) > > > > Why is not safe? We use filesystem locking for .gcda file. > > Because user apps may do funny thins with stdio such as they do with > malloc. Fewer library stuff we rely on, the less likely we will hit the > problems. So I am not sure if simply fixing i/o isn't better approach, > but I do not know. Sure. With the patch, we don't rely on any glibc feature. We will just use a default read/write IO (which uses a buffering internally). > > > > > also it adds dependency on stdio that is not necessarily good idea for > > > embedded targets. Not sure how often it is used there. > > > > It was motivated by PR97834. Well, I think it's better to rely on a system C > > library > > as it provides a faster implementation of buffered I/O. > > > > For embedded targets, I plan to implement hooks that can be used instead of > > I/O: > > https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559342.html > > > > > > > > But why glibc stdio is more effective? Is it because our buffer size of > > > 1k is way too small (as it seems juding from the profile that is > > > dominated by fread calls rather than open/lock/close)? > > > > It behaved the same on my machine, but BSD impact was more significant. > > Clang training seems to be a good extreme testcase and not that hard to > set up. It is relatively large testsuite and streaming is clearly > dominating over everything. Sure, I'll set it up. > > Profile also seems quite clear that read dominates other syscall > overhead. > > > > I'm planning to collect more detailed statistics about why is a lot of small > > I/Os slower. > > From the perf it seems that simply the syscall overhead plays important > role (about 20% at kernel side, plus 9% on glibc side) followed by some > stupidness of opensuse setup - apparmor and btrfs. Yes, that's pretty obvious from the profile. > > > > In the case of Clang, I would expect 100s (or even 1000s) of object files. > > During profiling > > run (using all cores), I would expect each run takes 100ms (or even > > seconds), > > so waiting > > for a file lock of an object file should not block it much. > > 2727 gcda files, 44MB overall, 4MB xz compressed tar file. > I am actually surprised that the file count is quite small. Firefox has > more... To be honest, it's very small file size. I would expect these files should definitely live in a page cache. What type of disk do you use?