Together with the combine.c patch posted (but remaining a WIP), all coremark performance regressions are gone for CRIS, compared to cc0. Unfortunately, I looked further, and found some issues when running gcc.c-torture/execute/arith-rand.c and arith-rand-ll.c, in those functions and the target-specific library division code (umod, udiv, div).
What remains after this series is a fix for reorg.c, for (IIRC) fill_slots_from_thread, to do a "filter_flags" approach like in 33c2207d3fda for fill_simple_delay_slots. Apparently cc0 targets still get better treatment there: there's a delay-slot not filled with TARGET_FLAGS_REGNUM, seen in e.g. random_bitstring. Heads-up to Eric, who did that in 2015. brgds, H-P