On Oct 22, 2014, at 4:24 AM, Vladimir Makarov <vmaka...@redhat.com> wrote:
> On 10/20/2014 02:57 AM, Maxim Kuvyrkov wrote: >> Hi, >> >> This patch improves register pressure scheduling (both >> SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number >> of available registers. >> >> At the moment the scheduler does not account for spills in the prologues and >> restores in the epilogue, which occur from use of call-used registers. The >> current state is, essentially, optimized for case when there is a hot loop >> inside the function, and the loop executes significantly more often than the >> prologue/epilogue. However, on the opposite end, we have a case when the >> function is just a single non-cyclic basic block, which executes just as >> often as prologue / epilogue, so spills in the prologue hurt performance as >> much as spills in the basic block itself. In such a case the scheduler >> should throttle-down on the number of available registers and try to not go >> beyond call-clobbered registers. >> >> The patch uses basic block frequencies to balance the cost of using >> call-used registers for intermediate cases between the two above extremes. >> >> The motivation for this patch was a floating-point testcase on >> arm-linux-gnueabihf (ARM is one of the few targets that use register >> pressure scheduling by default). >> >> A "thanks" goes to Richard good discussion of the problem and suggestions on >> the approach to fix it. >> >> The patch was bootstrapped on x86_64-linux-gnu (which doesn't really >> exercises the patch), and cross-tested on arm-linux-gnueabihf and >> aarch64-linux-gnu. >> >> OK to apply? >> > It is a pretty interesting idea for heuristic, Maxim. > > But I don't understand the following loop: > > + for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i) > + if (call_used_regs[i]) > + for (c = 0; c < ira_pressure_classes_num; ++c) > + { > + enum reg_class cl = ira_pressure_classes[c]; > + if (ira_class_hard_regs[cl][i]) > + ++call_used_regs_num[cl]; > > > ira_class_hard_regs[cl] is array containing hard registers belonging to > class CL. So if GENERAL_REGS consists of hard regs 0..3, 12..15, the > array will contain 8 elements 0..3, 12..15. The array size is defined > by ira_class_hard_regs_num[cl]. So the index is order number of hard > reg in the class (starting from 0) but not hard register number itself. > Also the pressure classes never intersect so you can stop the inner loop > when you find class to which hard reg belongs to. Thanks for spotting this. Indeed, this is a bug, but it still happened to correctly calculate numbers of call-used register for ARM (where I debugged the implementation). > > I believe you should rewrite the code and get performance results again > to get an approval. Sebastian, could you run the geekbench again to make sure you see same performance numbers? > You also missed the changelog. > The changelog was in the separate file. Also attached here with the fixed patch. Bootstrapped on x86_64-linux-gnu. Bootstrap and regtest on arm-linux-gnueabihf is in progress. -- Maxim Kuvyrkov www.linaro.org
0001-sched_class_reg_num.ChangeLog
Description: Binary data
0001-sched_class_reg_num.patch
Description: Binary data