https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #13 from YunQiang Su <syq at gcc dot gnu.org> ---
I try to insert 
        li      $3, 500
        li      $5, 500
between SLL/BGEZ  and LUI+AND/BNE.

The later is still some faster on Loongson 3A4000.

I notice something like this in 74K's software manual:

The 74K core’s ALU is pipelined. Some ALU instructions complete the operation
and bypass the results in this cycle. These instructions are referred to as
single-cycle ops and they include all logical instructions (AND, ANDI, OR, ORI,
XOR, XORI, LUI), some shift instructions (SLL sa<=8, SRL 31<=sa<=25), and some
arithmetic instructions (ADD rt=0, ADDU rt=0, SLT, SLTI, SLTU, SLTIU, SEH, SEB,
ZEH, ZEB). In addition, add instructions (ADD, ADDU, ADDI, ADDIU) complete the
operation and bypass results to the ALU pipe in this cycle.


I guess it means that if sa>8, SLL may be some slow.
On Loongson 3A4000, the value seems to be 20/21. It may means that we should be
care about for 64bit.

Can you have a test on XBurst 1?

Reply via email to