Hi guys,

Sorry for the delay in responding.  I spent last night on the couch in 3
sweatshirts and under 3 fleece blankets generally feeling like a shit
sandwich and not looking at my email.  Fortunately, today I feel a bit more
like a normal human being, so let me page this in.

The point of the _numCpus parameter is to provide the cache some way of
knowing how many sharers it has, since in a lot of shared cache scenarios
knowing things like hits/misses per thread is important.  The +1 does indeed
have to do with devices. IIRC, always doing a +1 really made things annoying
because you'd suddenly go from a singleton stat to a vector stat and the
output would look ugly and blow up in SE mode because you'd always have a
blank slot and vector stats have a much bigger "space" overhead in the stats
file, and yet to have things be accurate in FS you needed an extra slot to
put stats from devices.

As for the maxThreadsPerCPU init variable, it has something to do with SMT
and it seems pretty broken in concept (assumes single CPU and always
instantiates a vector of max SMT width), but I didn't change any stats that
didn't seem to be relevant in differentiating between sharing threads.

WRT Korey's suggestion, it is possible to "explore" the hierarchy to look
for how many CPUs there are but I don't think that's exactly the right thing
to do.  I've done something like that before for some other cache-sharing
study, and M5 is so modular that you'd have to do a lot of configuration in
the python scripts anyway to indicate who is sharing what with whom, and
register that with some common object and make connections to that object.
 For example, the only way caches right now connect to anything else is
through their ports.  Not only would you need to need to add in a facility
to have the cache be able to call out to, say, System and ask for how many
sharers there are, but you'd need facilities to register with System how
many sharers there are, and you'd need to be able to differentiate between
different levels, e.g. if you had 4 private L1s, 4 private L2s, and 1 shared
L3, walking through and "discovering" how many CPUs exist in the system will
not tell you anything about how they are hooked up together and you'd need a
way in configuration scripts to disambiguate from, say, 4 private L1s, 2
shared L2s, and 1 shared L3.

I think the right thing to do is set _numCPUs appropriately from the
configuration scripts.  But instead of doing how it is done now, which is
L3cache.num_cpus = options.np, something like:

if SMT:
  num_cpus = options.np*width
if buildEnv["FULL_SYSTEM"]:
  num_cpus = num_cpus+1

L3cache.num_cpus = num_cpus.

I can make this change if people agree.

Lisa


On Tue, Apr 19, 2011 at 8:53 AM, Korey Sewell <ksew...@umich.edu> wrote:

> It'd be good for Lisa to chime in here to comment on the exact meaning
> of _numCpus and how that value is intended to be set.
>
> If you dont care about threads per CPU (and just the raw "# of CPUs"
> value), then I'd defer to Lisa for figuring out what the right vector
> lengths are so that they are consistent :)
>
> However, since "_numCpus" seems to be a parameter given to the
> BaseCache, then the burden is on the script writer to figure out how
> many CPUs that a cache is sharing rather than automatically going
> through the cache hierarchy and figuring it out. That seems like the
> "ultimately right" thing to do.
>
> Lastly, I'm not sure how maxThreadsPerCPU plays into this but if you
> needed to get that threads value on a perCPU basis, then the suggested
> CPU search method would also work well.
>
> On Tue, Apr 19, 2011 at 11:35 AM, Steve Reinhardt <ste...@gmail.com>
> wrote:
> > Lisa should verify, but I think that's what _numCpus is.
> >
> > On Tue, Apr 19, 2011 at 8:01 AM, Korey Sewell <ksew...@umich.edu> wrote:
> >> Hey Steve,
> >> I dont disagree with your solution (at least in the interim), but
> >> wouldn't the right solution be to have the actual CPUs pass how many
> >> hardware threads is has to the caches?
> >>
> >> The actual "regStats" happens after all the CPUs are instantiated and
> >> ports are connected (right?), so it would seem that since different
> >> CPUs can have a different amount of threads per CPU, that the caches
> >> should just use the port interface to ask "how many threads?" .  Then,
> >> the caches down the hierarchy would just sum those thread counts up
> >> for each of their Stat vectors.
> >>
> >> I imagine a similar method to how the "M5 classic" figured out address
> >> ranges and snoop ports would work fine.
> >>
> >> Do people think that's fine or am I missing the point here?
> >>
> >> On Tue, Apr 19, 2011 at 10:42 AM, Steve Reinhardt <ste...@gmail.com>
> wrote:
> >>> Looks like it's Lisa's fault ;-)
> >>>
> >>> http://repo.m5sim.org/m5/diff/ab05e20dc4a7/src/mem/cache/base.cc
> >>>
> >>> I think Nate's point is that all the stats vector lengths should be
> >>> changed to _numCpus or _numCpus+1 instead of maxThreadsPerCpu to be
> >>> consistent.
> >>>
> >>> We should also either (1) always do _numCpus+1 even though the extra
> >>> "device" slot is unnecessary for SE mode or (2) have a single #ifdef
> >>> to set a local var to one or the other and use that consistently
> >>> rather than having #ifdefs all over the place.  I'd lean toward #2
> >>> just to keep the output a little cleaner in SE mode.
> >>>
> >>> Does that make sense, Lisa?
> >>>
> >>> Steve
> >>>
> >>> On Mon, Apr 18, 2011 at 3:58 PM, nathan binkert <n...@binkert.org>
> wrote:
> >>>> Yes, but all arithmetic between vectors is elementwise, so they need
> >>>> to be the same length if used in a formula. Total miss latency needs
> >>>> to have the same vector length as total misses.
> >>>>
> >>>> Nate
> >>>>
> >>>> On Mon, Apr 18, 2011 at 2:09 PM, Lisa Hsu <h...@eecs.umich.edu>
> wrote:
> >>>>> I'm not sure I understand what the problem is either.  Can different
> >>>>> VectorStats not have different lengths?
> >>>>>
> >>>>> Lisa
> >>>>>
> >>>>> On Mon, Apr 18, 2011 at 11:43 AM, Gabriel Michael Black <
> >>>>> gbl...@eecs.umich.edu> wrote:
> >>>>>
> >>>>>> My first reaction is "let's fix it", but I don't really understand
> the
> >>>>>> problem or the impact of changing things. Anything serious?
> >>>>>>
> >>>>>> Gabe
> >>>>>>
> >>>>>>
> >>>>>> Quoting nathan binkert <n...@binkert.org>:
> >>>>>>
> >>>>>>  I'm trying to get my python stats stuff committed and I found a bug
> in
> >>>>>>> the classic cache stats.  Look in src/mem/cache/base.cc.  The
> >>>>>>> VectorStats have several different lengths "_numCpus + 1",
> "_numCpus",
> >>>>>>> or "maxThreadsPerCPU".
> >>>>>>>
> >>>>>>> The fact that this works in the current stats package is lucky.  I
> can
> >>>>>>> be bug compatible, but I think we should fix this instead.
> >>>>>>>
> >>>>>>>  Nate
> >>>>>>> _______________________________________________
> >>>>>>> m5-dev mailing list
> >>>>>>> m5-dev@m5sim.org
> >>>>>>> http://m5sim.org/mailman/listinfo/m5-dev
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> m5-dev mailing list
> >>>>>> m5-dev@m5sim.org
> >>>>>> http://m5sim.org/mailman/listinfo/m5-dev
> >>>>>>
> >>>>>>
> >>>>> _______________________________________________
> >>>>> m5-dev mailing list
> >>>>> m5-dev@m5sim.org
> >>>>> http://m5sim.org/mailman/listinfo/m5-dev
> >>>>>
> >>>> _______________________________________________
> >>>> m5-dev mailing list
> >>>> m5-dev@m5sim.org
> >>>> http://m5sim.org/mailman/listinfo/m5-dev
> >>>>
> >>> _______________________________________________
> >>> m5-dev mailing list
> >>> m5-dev@m5sim.org
> >>> http://m5sim.org/mailman/listinfo/m5-dev
> >>>
> >>
> >>
> >>
> >> --
> >> - Korey
> >> _______________________________________________
> >> m5-dev mailing list
> >> m5-dev@m5sim.org
> >> http://m5sim.org/mailman/listinfo/m5-dev
> >>
> >
>
>
>
> --
> - Korey
> _______________________________________________
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>
>
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to