On Thu, Dec 13, 2012 at 10:51 AM, Richard Biener <richard.guent...@gmail.com> wrote:
>>>> I assume that this is not right way for fixing such simple performance >>>> anomaly since we need to do redundant work - combine load to >>>> conditional and then split it back in peephole2? Does it look >>>> reasonable? Why we should produce non-efficient instrucction that must >>>> be splitted later? >>> >>> Well, either don't allow this instruction variant from the start, or allow >>> the extra freedom for register allocation this creates. It doesn't make >>> sense to just reject it being generated by combine - that doesn't address >>> when it materializes in another way. >> >> Please check the attached patch, it implements this limitation in a correct >> way: >> - keeps memory operands for -Os or cold parts of the executable >> - doesn't increase register pressure >> - handles all situations where memory operand can propagate into RTX >> >> Yuri, can you please check if this patch fixes the performance problem for >> you? >> >> BTW: I would really like to add some TARGET_USE_CMOVE_WITH_MEMOP >> target macro and conditionalize new peephole2 patterns on it. > > Looks good to me. I believe optimize_insn_for_speed_p () > only works reliable during RTL expansion as it relies on > crtl->maybe_hot_insn_p to be better than function granular. I'm quite sure > split does not adjust this (it probably should, as those predicates are > definitely the correct ones to use), via rtl_profile_for_bb (). I > think passes that > do not adjust this get what is left over by previous passes (instead of the > default). > > Honza, I think the pass manager should call default_rtl_profile () before each > RTL pass to avoid this, no? Please note that we have plenty of existing peephole2s that use optimize_insn_for_speed_p predicate. It is assumed to work ... Uros.