Bugzilla 41004 calls for a more -Os-friendly algorithm for BB Reorder, and I'm hoping I can meet this challenge. But I need your help!
If you have any existing ideas or thoughts that could help me get closer to a sensible heuristic sooner, then please post them to this list. In the mean time, here's my progress so far: My first thought was to limit BB Reorder to hot functions, as identified by gcov, so we could get maximum execution time benefit for minimised code size impact. Based on benchmarking I've done so far, this looks to be a winner, but it only works when profile data is available. Without profile data, we face two main problems: 1) We cannot easily estimate how many times we will execute our more efficient code-layout, so we can't do an accurate trade-off versus the code size increase. 2) The traces that BB Reorder constructs will be based on static branch prediction, rather than true dynamic flows of execution, so the new layout may not be the best one in practice. We can address #1 by tagging functions as hot (using attributes), but that may not always be possible and it does not guarantee that we will get minimal codesize increases, which is the main aim of this work. I'm not sure how #2 can be addressed, so I'm planning to sidestep it completely, since the problem isn't really the performance pay-off but the codesize increase that usually comes with each new layout of a function that BB Reorder makes. My current plan is to characterise a function within find_traces() (looking at things like the number of traces, edge probabilities and frequencies, etc) and only call connect_traces() to effect the reordering change if these characteristics suggest that minimal code disruption will occur and/or maximum performance pay-off. Thanks for reading and I look forward to your input! Cheers, Ian