Hi,
For very small loops (< 6 insns), it would be fine to unroll 4
times to run fast with less latency and better cache usage. Like
below loops:
while (i) a[--i] = NULL; while (p < e) *d++ = *p++;
With this patch enhances, we could see some performance improvement
for some workloads(e.g. SPEC2017).
Bootstrap and regtest pass on powerpc64le. Ok for trunk?
BR,
Jiufu Guo
2020-07-13 Jiufu Guo
* config/rs6000/rs6000.c (rs6000_loop_unroll_adjust): Refine hook.
---
gcc/config/rs6000/rs6000.c | 12 ++--
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 58f5d780603..06844fdba57 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5135,16 +5135,15 @@ rs6000_destroy_cost_data (void *data)
static unsigned
rs6000_loop_unroll_adjust (unsigned nunroll, struct loop *loop)
{
- if (unroll_only_small_loops)
+ if (unroll_only_small_loops)
{
- /* TODO: This is hardcoded to 10 right now. It can be refined, for
-example we may want to unroll very small loops more times (4 perhaps).
-We also should use a PARAM for this. */
+ /* TODO: Using hardcodes here, for tunable, PARAM(s) maybe refined. */
+ if (loop->ninsns <= 6)
+ return MIN (4, nunroll);
if (loop->ninsns <= 10)
return MIN (2, nunroll);
- else
- return 0;
+
+ return 0;
}
return nunroll;
--
2.25.1