I have question on forwarding logic.

This is my test code.

 for (i = 0; i < 10000; i++){
    asm volatile(
                 "nop;" //signature for identifying test code region"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "nop;"
                 "addq %rbx, %rax;"
                 "addq %rbx, %rax;"
                 "addq %rbx, %rax;"
                 "addq %rbx, %rax;"
                 "addq %rbx, %rax;"
                 "addq %rbx, %rax;"
                 "addq %rbx, %rax;"
                 "addq %rbx, %rax;"
                 "addq %rbx, %rax;"
                 "addq %rbx, %rax;"
                 "addq %rbx, %rax;"
                 "addq %rbx, %rax;"
                 "addq %rbx, %rax;"
                 "addq %rbx, %rax;"
                 "addq %rbx, %rax;"
                 "addq %rbx, %rax;"
                 "addq %rbx, %rax;"
                 "addq %rbx, %rax;"
                 );
  }
  printf("TEST DONE");

In most of CPU design, if i run consecutive add instructions which are
dependent each other, I usually expect that IPC becomes 1. but Marss86
shows different result; Marss86 commits an add instruction every two cycles
(IPC is 1/2).

Also there are two different forwarding latency in this scenario.
1) add i       : [Dispatch] | [I & C]   | [transfer] |
    add i + 1 : [Dispatch] | [Bubble] | [Bubble]  | [Issue]
(I & C is issue and complete)
This case is when two dependent add instructions enter in issue queue at
the same time. In this case, add i + 1 should wait forwarding signal;
because when it is dispatched, add i's result is not bypass state. Check
dispatch function.  Thus, it can be issued two cycles after add i is issued.

2) add i       : [Dispatch] | [I & C]      | [transfer] |
    add i + 1 : [Rename]  | [Dispatch] | [Issue]    |
In this case, add i +1 can be issued right after add i is issued.

I think every dependent add instruction should be issued every cycle.

Are these intended design of Marss?

Thanks,
- Hanhwi
_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Reply via email to