Jul 17, 2023, 07:18 by r...@remlab.net: > Le sunnuntaina 16. heinäkuuta 2023, 23.32.21 EEST Lynne a écrit : > >> Introducing additional overhead in the form of a dereference is a point >> where instability can creep in. Can you guarantee that a context will >> always remain in L1D cache, >> > > L1D is not involved here. In version 2, the pointers are cached locally. > >> as opposed to just reading the raw CPU timing >> directly where that's supported. >> > > Of course not. Raw CPU timing is subject to noise from interrupts (and > whatever those interrupts trigger). And that's not just theoretical. I've > experienced it and it sucks. Raw CPU timing is much noisier than Linux perf. > > And because it has also been proven vastly insecure, it's been disabled on > Arm > for a long time, and is being disabled on RISC-V too now. > >> > But I still argue that that is, either way, completely negligible compared >> > to the *existing* overhead. Each loop is making 4 system calls, and each >> > of those system call requires a direct call (to PLT) and an indirect >> > branch (from GOT). If you have a problem with the two additional function >> > calls, then you can't be using Linux perf in the first place. >> >> You don't want to ever use linux perf in the first place, it's second class. >> > > No it isn't. The interface is more involved than just reading a CSR; and sure > I'd prefer the simple interface that RDCYCLE is all other things being equal. > But other things are not equal. Linux perf is in fact *more* accurate by > virtue of not *wrongly* counting other things. And it does not threaten the > security of the entire system, so it will work inside a rented VM or an > unprivileged process. >
Threaten? This is a development tool first and foremost. If anyone doesn't want to use rdcycle, they can use linux perf, it still works, with or without the patch. >> I don't think it's worth changing the direct inlining we had before. You're >> not interested in whether or not the same exact code is ran between >> platforms, >> > > Err, I am definitely interested in doing exactly that. I don't want to have > to > reconfigure and recompile the entire FFmpeg just to switch between Linux perf > and raw cycle counter. A contrario, I *do* want to compare performance > between > vendors once the hardware is available. > That's a weak reason to compromise the accuracy of a development tool. >> just that the code that's measuring timing is as efficient and >> low overhead as possible. >> > > Of course not. Low overhead is irrelevant here. The measurement overhead is > know and is subtracted. What we need is stable/reproducible overhead, and > accurate measurements. > Which is what TSC or the equivalent gets you. It's noisy, but that's because it's better and higher accuracy than having to roundtrip through the kernel. > And that's assuming the stuff works at all. You can argue that we should use > Arm PMU and RISC-V RDCYCLE, and that Linux perf sucks, all you want. PMU > access will just throw a SIGILL and end the checkasm process with zero > measurements. The rest of the industry wants to use system calls for informed > reasons. I don't think you, or even the whole FFmpeg project, can win that > argument against OS and CPU vendors. > Either way, I don't agree with this patch, not accepting it. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".