I think I've figured this out.  This max_in_port_rank value is computed
using a Python class variable on the InPortDeclAST class; it's initialized
to zero, then every time a port is declared it updates the max rank as
necessary.  The problem arises because the SLICC parser is not invoked as a
separate process, but is just imported into the scons instance of the
Python interpreter.  Thus when you build binaries encompassing multiple
protocols in a single scons invocation (as you do for regressions), you
actually end up with this being the max value across all the protocols that
have been parsed thus far.  Throw in some concurrency with '-j', and it's
not just dependent on the set of protocols you're building, it's also
non-deterministic.  If you're only compiling one protocol, you get the
right answer, and it's stable, which is why most people probably don't
notice this.

Since it's only setting the max size for the vector of message buffers,
having the value be too large doesn't impact the correctness of the
simulation either.  The real problem is just that it's causing spurious
rebuilds (and regression re-runs).

It seems like there are several ways to fix this:

1. Invoke SLICC as a separate process for each protocol.
2. Find a different way to calculate the max rank that localizes it to a
specific protocol.
3. Tweak Ruby to not need this value.

Initially #1 seemed like the obvious path to me, but looking at where
m_max_in_port_rank is used (only in two places in AbstractController.cc), I
don't think #3 would be that bad either.  Basically this value seems to be
used to move a couple of loops up into AbstractController by guaranteeing
that those loops cover the worst-case bounds of any derived class.  It
seems like it wouldn't be that hard to pass the actual in-port rank for a
derived class up into AbstractController via a constructor argument
instead, and then we wouldn't even need this whole calculation.  Does that
make sense?  Any other opinions?

Steve



On Sun, Sep 29, 2013 at 8:52 PM, Steve Reinhardt <ste...@gmail.com> wrote:

> Also, note that this m_max_in_port_rank assignment value is the *only*
> change in all of the generated Ruby files. This line changes in more than
> one, maybe all, of the *_Controller.cc files, but the rest of those files
> and all of the other files are identical.  Thus I'd say it's pretty clearly
> the same protocol.
>
> Steve
>
>
> On Sun, Sep 29, 2013 at 8:49 PM, Steve Reinhardt <ste...@gmail.com> wrote:
>
>> Absolutely, I didn't touch a thing between builds when this happened.
>>  Note that this is just the plain "X86" build, so it's using whatever the
>> default protocol is.  I think I saw the same thing with some of the other
>> builds with specific Ruby protocols though (like ALPHA_MESI_CMP_directory)
>> so I don't think it's specific to any one protocol.
>>
>> Steve
>>
>>
>> On Sun, Sep 29, 2013 at 8:43 PM, Nilay <ni...@cs.wisc.edu> wrote:
>>
>>> Steve, are you sure the coherence protocol was same over all the runs?
>>>
>>> --
>>> Nilay
>>>
>>>
>>>
>>> On 2013-09-29 20:05, Steve Reinhardt wrote:
>>>
>>>> In the process of updating stats, I've had scons recompile a few Ruby
>>>> controller .cc files and then rebuild the gem5 binary in cases where
>>>> nothing else has changed, forcing you to re-run tests unnecessarily,
>>>> which
>>>> is a huge pain particularly when you're running the long tests.
>>>>
>>>> I know that the generated Ruby files get re-generated every time scons
>>>> is
>>>> run so that it can deduce what the output files are, but since scons is
>>>> signature-based, re-generating an identical file should not cause a
>>>> rebuild.  I figured out that the files really were changing, and
>>>> captured a
>>>> bunch of diffs like this:
>>>>
>>>> --- X86/mem/protocol/DMA_**Controller.cc.orig     2013-09-29
>>>> 17:29:26.529556643 -0700
>>>> +++ X86/mem/protocol/DMA_**Controller.cc  2013-09-29
>>>> 17:29:39.457556581 -0700
>>>> @@ -50,7 +50,7 @@
>>>>      : AbstractController(p)
>>>>  {
>>>>      m_name = "DMA";
>>>> -    m_max_in_port_rank = 6;
>>>> +    m_max_in_port_rank = 1;
>>>>      m_dma_sequencer_ptr = p->dma_sequencer;
>>>>      m_request_latency = p->request_latency;
>>>>      m_dma_sequencer_ptr->**setController(this);
>>>>
>>>>
>>>> Note that this happened across all ISAs, and affected not just
>>>> DMA_Controller.cc but several other *_Controller.cc files also --
>>>> Directory_Controller.cc, L1Cache_Controller.cc, and (where applicable)
>>>> L2Cache_Controller.cc.  In this one particular run of scons, all the
>>>> files
>>>> changed m_max_in_port_rank from 6 to 1, though on another run I saw the
>>>> value change from 6 to 4.  (I think it just happened to be 6 the time I
>>>> remembered to copy the files for diffing.)
>>>>
>>>> So basically slicc seems to be spitting out nondeterministic output.
>>>>  I'll
>>>> guess that whatever this value is, it is being generated from an
>>>> uninitialized value somewhere.  I don't know slicc well enough to track
>>>> this down easily, and I've already spent far too much of the weekend
>>>> wrestling with stats updates and other issues, so I'm not going to
>>>> pursue
>>>> it farther right now.  It would be really great if someone could follow
>>>> up
>>>> on this though.
>>>>
>>>> I don't recall this happening before, so I wouldn't be surprised if it's
>>>> the result of some change that occurred in the last several months.
>>>>
>>>> Thanks,
>>>>
>>>> Steve
>>>> ______________________________**_________________
>>>> gem5-dev mailing list
>>>> gem5-dev@gem5.org
>>>> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
>>>>
>>> ______________________________**_________________
>>> gem5-dev mailing list
>>> gem5-dev@gem5.org
>>> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
>>>
>>
>>
>
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to