Yep,

I've thought of the need for a fully pipelined fetch as well.  However my
current method is to fake longer instruction cache latencies by leaving the
delay as 1 cycle, but make up for it by adding additional "fetchToDecode"
delay.   This makes the front-end latency and branch mispredict penalty the
same (for branches resolved at decode as well as execute).

I haven't yet seen a case where this adding additional latency later to
make up for the lack of real instruction cache latency makes much of a
difference.



On Tue, Aug 26, 2014 at 11:32 AM, Amin Farmahini via gem5-users <
gem5-users@gem5.org> wrote:

> Hi,
>
> Looking at the codes for the fetch unit in O3, I realized that the fetch
> unit does not take advantage of non-blocking i-caches. The fetch unit does
> not initiate a new i-cache request while it is waiting for the an i-cache
> response. Since fetch unit in O3 does not pipeline i-cache requests, fetch
> unit throughput reduces significantly when the i-cache hit latency is more
> than 1 cycle. I expected that fetch unit should be able to initiate a new
> i-cache request each cycle (based on BTB addr or next sequential fetch
> addr) even when fetch unit is waiting for i-cache responses. Any thoughts
> on this?
>
> I understand a large fetch buffer can mitigate this to some degree...
>
> Thanks,
> Amin
>
> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to