On 2019/3/7 下午10:02, David Sterba wrote: > On Wed, Mar 06, 2019 at 02:19:04PM +0800, Qu Wenruo wrote: >> This patchset can be fetched from github: >> https://github.com/adam900710/linux/tree/perf_tree_lock >> Which is based on v5.0-rc7 tag. >> >> Although we have ftrace/perf to do various performance analyse, under most >> case the granularity is too small, resulting data flood for users. >> >> This RFC patchset provides a btrfs specific performance profiler. >> It calculates certain function duration and account the duration. >> >> The result is provided through RO sysfs interface, >> /sys/fs/btrfs/<FSID>/profiler. >> >> The content of that file is genreated when read. >> Users can have full control on the sample resolution. >> >> The example content can be found in the last patch. >> >> One example using the interface to profile fsstress can be found here: >> https://docs.google.com/spreadsheets/d/1BVng8hqyyxFWPQF_1N0cpwiCA6R3SXtDTHmRqo8qyvo/edit?usp=sharing >> >> The test script can be found here: >> https://gist.github.com/adam900710/ca47b9a8d4b8db7168b261b6fba71ff1 >> >> The interesting result from the graph is: >> - Concurrency on fs tree is only high for the initial 25 seconds >> My initial expectation is, the hotness on fs tree should be more or >> less stable. Which looks pretty interesting >> >> - Then extent tree get more concurrency after 25 seconds >> This again breaks my expectation. As write to extent tree should only >> be triggered by delayed ref. So there is something interesting here >> too. >> >> - Root tree is pretty cold >> Since the test is only happening on fs tree, it's expected to be less >> racy. >> >> - There is some minor load on other trees. >> My guess is, that's from csum tree. >> >> Although the patchset is relatively small, there are some design points >> need extra commends before the patchset get larger and larger. >> >> - How should this profiler get enabled? >> Should this feature get enabled by mount option or kernel config? >> Or just let it run for all kernel build? >> Currently the overhead should be pretty small, but the overhead should >> be larger and larger with new telemetry. >> >> - Design of the interface >> Is this a valid usage of sysfs or an abuse? >> And if the content can be improved for both human or program? > > Well, most of that is answered by 'figure out how to use tracepoints and > perf for that'. > > If there were not a whole substystem, actively maintained and > documented, implementing something like that might help, but that's not > the case. > > I see what you were able to understand from the results, but it's more > like a custom analysis tool that should not merged as-is. It brings a > whole new interface and that's always hard to get right with all the > mistakes ahead that somebody has probably solved already. > > It would be good to have list of the limitations of perf you see, and we > can find a solution ourselves or ask elsewhere.
Add linux-perf-users mail list. I should mention the problem of ftrace (or my perf skill) in cover letter. The biggest problem is the conflicts between detailed function execution duration and classification. For tree lock case, indeed we can use function graph to get execution duration of btrfs_tree_read_lock() and btrfs_tree_lock(). But that's all. We can't really do much classification. If just use trace event, with trace event added, then we can't get the execution duration. Even we can get both (my poor perf/ftrace skill must be blamed), it may take some more time to get an easy graph. We need to classify the result by eb owner, then account the execution duration compared to the timestamp. If perf users could provide better solution, I'm pretty willing to learn some new tricks. (Considering my poor perf skill is mostly from that LWN article, it won't be a surprise for some perf black magic beating my hack). And for the interface, that's why I send the patchset early to get some feedback. I'm really a bad interface designer. Thanks, Qu
signature.asc
Description: OpenPGP digital signature