On Wed, Dec 7, 2011 at 12:58 PM, Furat Afram <[email protected]> wrote:

> The other tricky part is how the simple cache will mirror the
> simulated cache hierarchy.
> One way to solve both problems is to warm up only the LLC .
>
> That a good way to minimize the cold startup issues.

Regarding cache line state, we can put all cache lines into 'shared' state
as all the lines will be readable but to write it will need to perform
coherence action and this method will not violate any coherency protocols.

To implement such scheme would require changes to QEMU's memory access
routines.  I am not sure about how much time it will take and how much it
affects the final results.  To simplify we can run basic experiment where
we collect cache miss rate for each 10k cycles using timed stats feature
and find out how long it takes to fill the caches.  Once we find the
threshold then we can have a config option to reset all stats after
specific instruction count so we can remove the cold startup issues.

- Avadh

> > Is this a silly idea? Is it worth it to try to implement or will it just
> be
> > a waste of time since the beginning behavior will be transient and won't
> > last very long?
>
> On Wed, Dec 7, 2011 at 10:54 AM, DRAM Ninjas <[email protected]> wrote:
> > Greetings all,
> >
> > I was thinking the other day that it would be very nice to have a way to
> > warm up caches before an ROI starts. I just wanted any input about how
> hard
> > it would be to implement. I think maybe it wouldn't be that hard:
> >
> > - First, add a ptlcall to trigger a warmup operation -- this would
> probably
> > be done at the shell via a small script like 'start_sim' before an ROI
> > workload is launched
> > - Then, instrument the qemu load/store functions to run a simple cache
> model
> > -- we probably don't have to do full coherence and all that -- just sort
> of
> > fill in tags as requests are made. This wouldn't be *that* accurate, but
> at
> > least it would approximate what the caches might have in them so that
> there
> > aren't tons of false compulsory misses at the beginning of an ROI.
> > - Finally, when a ptlcall_switch_to_sim is issued -- copy the tags from
> the
> > simple cache model state into the full simulation cache model and begin
> > simulating as usual. The real trick here would be if the simple cache
> model
> > doesn't do coherence, to figure out what the states of the cache lines
> > should be (exclusive, shared, owned?).
> >
> > Is this a silly idea? Is it worth it to try to implement or will it just
> be
> > a waste of time since the beginning behavior will be transient and won't
> > last very long?
> >
> > -Paul
> >
> >
> >
> >
> >
> > _______________________________________________
> > http://www.marss86.org
> > Marss86-Devel mailing list
> > [email protected]
> > https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
> >
>
> _______________________________________________
> http://www.marss86.org
> Marss86-Devel mailing list
> [email protected]
> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>
_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Reply via email to