On 5/7/24 3:24 PM, Palmer Dabbelt wrote:

@@ -529,6 +536,7 @@ static const struct riscv_tune_param generic_ooo_tune_info 
= {
    4,                                          /* fmv_cost */
    false,                                      /* slow_unaligned_access */
    false,                                      /* use_divmod_expansion */
+  false,                                       /* overlap_op_by_pieces */

IMO we should turn this on for the generic OOO tuning -- the benchmarks
say it's not faster for the T-Head OOO cores, but we were all so
surprised to find that I don't think we even fully trust the benchmarks.
I'd assume OOO cores are faster with the overlapping stores, so we
should just lean into it and let vendors say something if that's the
wrong assumption.
Several factors likely come into play (branch prediction, OOO properties, write combining, etc etc).

But sure, I don't think that'd be terribly controversial. I can go ahead and make that change now given its triviality.

Jeff



Reply via email to