I have question on forwarding logic.
This is my test code.
for (i = 0; i < 10000; i++){
asm volatile(
"nop;" //signature for identifying test code region"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"nop;"
"addq %rbx, %rax;"
"addq %rbx, %rax;"
"addq %rbx, %rax;"
"addq %rbx, %rax;"
"addq %rbx, %rax;"
"addq %rbx, %rax;"
"addq %rbx, %rax;"
"addq %rbx, %rax;"
"addq %rbx, %rax;"
"addq %rbx, %rax;"
"addq %rbx, %rax;"
"addq %rbx, %rax;"
"addq %rbx, %rax;"
"addq %rbx, %rax;"
"addq %rbx, %rax;"
"addq %rbx, %rax;"
"addq %rbx, %rax;"
"addq %rbx, %rax;"
);
}
printf("TEST DONE");
In most of CPU design, if i run consecutive add instructions which are
dependent each other, I usually expect that IPC becomes 1. but Marss86
shows different result; Marss86 commits an add instruction every two cycles
(IPC is 1/2).
Also there are two different forwarding latency in this scenario.
1) add i : [Dispatch] | [I & C] | [transfer] |
add i + 1 : [Dispatch] | [Bubble] | [Bubble] | [Issue]
(I & C is issue and complete)
This case is when two dependent add instructions enter in issue queue at
the same time. In this case, add i + 1 should wait forwarding signal;
because when it is dispatched, add i's result is not bypass state. Check
dispatch function. Thus, it can be issued two cycles after add i is issued.
2) add i : [Dispatch] | [I & C] | [transfer] |
add i + 1 : [Rename] | [Dispatch] | [Issue] |
In this case, add i +1 can be issued right after add i is issued.
I think every dependent add instruction should be issued every cycle.
Are these intended design of Marss?
Thanks,
- Hanhwi
_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel