Re: [gem5-dev] Ruby Sequencer starving O3 core?

Jason Lowe-Power Tue, 24 Sep 2019 10:34:15 -0700

Hey Brad,

Would you have a look at the changeset that Timothy pushed:
https://gem5-review.googlesource.com/c/public/gem5/+/21161.


There's a few differences from Steve's implementation:
1) Wow! I had forgotten how terrible the reviewboard interface was!
2) Steve's patch made many changes other than just what happens on aliasing
including completely changing the interface to the makeRequest function.
3) Steve's patch issues all of the requests separately causing L1 latency
(or worse) for each coalesced request. This also causes requests to buffer
at the sequencer. The new patch instead simply responds to all of the
coalesced requests in the same tick. While there's still some buffering in
the sequencer, since all buffered requests are handled at the same time the
buffer won't grow.

Cheers,
Jason

On Tue, Sep 24, 2019 at 9:48 AM Beckmann, Brad <brad.beckm...@amd.com>
wrote:

> Hi Timothy,
>
> As Jason said, this is a known issue.  In fact we tried to fix it many
> years ago in the public tree but we had difficulty getting the patch
> approved and eventually discarded our effort.
>
> http://reviews.gem5.org/r/2276/
>
> We still have this patch applied to our internal tree and it works quite
> well.  Another key change along these same lines is splitting the store
> address request from the store data request.  Unfortunately that patch has
> never made it out of our internal tree and we need to find someone within
> AMD to maintain it before we can push it publicly.
>
> There are a few things to keep in mind when thinking about how to improve
> the CPU/GPU to Ruby interface.  The Sequencer and Coalescer implement
> protocol agnostic logic, such as address coalescing and request tracking.
> We could move this logic into the L1 cache controllers, but that would
> require duplicating that work in each controller and further complicating
> the already complicated state transitions.  Furthermore, the generated
> protocol state machines only operate on cache-line aligned addresses, but
> all gem5 CPU and GPU core models send byte-aligned address to their ports.
> Thus the Sequencer and Coalescer are in charge of managing the byte to
> cache-line address conversion.
>
> I hope this helps and let us know how you want to proceed.
>
> Thanks,
>
> Brad
>
>
>
> -----Original Message-----
> From: gem5-dev <gem5-dev-boun...@gem5.org> On Behalf Of Jason Lowe-Power
> Sent: Tuesday, September 24, 2019 8:11 AM
> To: gem5 Developer List <gem5-dev@gem5.org>
> Subject: Re: [gem5-dev] Ruby Sequencer starving O3 core?
>
> [CAUTION: External Email]
>
> Hi Timothy,
>
> The short answer is that this is a quasi-known issue. The interface
> between the core and Ruby needs to be improved. (It's on the roadmap!
> Though, no one is actively working on it.)
>
> I could be wrong myself, but I believe you're correct that Ruby cannot
> handle multiple loads to the same cache block. I believe in previous
> incarnations of the simulator that the coalescing into cache blocks
> happened in the LSQ. However, the classic caches assume this happens in the
> cache creating a mis-match between Ruby and the classic caches.
>
> I'm not sure what the best fix for this is. Unless it's a small change, we
> should probably discuss the design with Brad and Tony before putting
> significant effort into coding.
>
> Cheers,
> Jason
>
> On Sun, Sep 22, 2019 at 3:20 PM Timothy Hayes <timothy.ha...@arm.com>
> wrote:
>
> > I'm experimenting with various O3 configurations combined with Ruby's
> > MESI_Three_Level memory subsystem. I notice that it's very challenging
> > to provide the core with more memory bandwidth. For typical/realistic
> > O3/Ruby/memory parameters, a single core struggles to achieve 3000
> > MB/s in STREAM. If I max out all the parameters of the O3 core, Ruby,
> > the NoC and provide a lot of memory bandwidth, STREAM just about
> > reaches 6000 MB/s. I believe this should be much higher. I've found
> > one possible explanation for this behaviour.
> >
> > The Ruby Sequencer receives memory requests from the core via the
> > function Sequencer::insertRequest(PacketPtr pkt, RubyRequestType
> > request_type). This function determine whether there are requests to
> > the same cache line and--if there are--returns without enqueing the
> > memory request. This also happens with load requests in which there is
> > already an outstanding load request to the same cache line.
> >
> > RequestTable::value_type default_entry(line_addr, (SequencerRequest*)
> > NULL); pair<RequestTable::iterator, bool> r  =
> > m_readRequestTable.insert(default_entry);
> >
> > if (r.second) {
> >     /* snip */
> > } else {
> >     // There is an outstanding read request for the cache line
> >     m_load_waiting_on_load++;
> >     return RequestStatus_Aliased;
> > }
> >
> > This eventually returns to the LSQ which interprets the Aliased
> > RequestStatus as the cache controller being blocked.
> >
> > bool LSQUnit<Impl>::trySendPacket(bool isLoad, PacketPtr data_pkt) {
> >     if (!lsq->cacheBlocked() &&
> >         lsq->cachePortAvailable(isLoad)) {
> >         if (!dcachePort->sendTimingReq(data_pkt)) {
> >             ret = false;
> >             cache_got_blocked = true;
> >         }
> >      }
> >      if (cache_got_blocked) {
> >          lsq->cacheBlocked(true);
> >         ++lsqCacheBlocked;
> >      }
> > }
> >
> > If the code is generating many load requests to contigious memory,
> > e.g. in STREAM, won't the cache get blocked extremely frequently?
> > Would this explain why it's so difficult to get the core to consume more
> bandwidth?
> >
> > I'm happy to go ahead and fix/improve this, but I wanted to check
> > first that I'm not missing something--can Ruby handle multiple
> > outstanding loads to the same cache line without blocking the cache?
> >
> >
> > --
> >
> > Timothy Hayes
> >
> > Senior Research Engineer
> >
> > Arm Research
> >
> > Phone: +44-1223405170
> >
> > timothy.ha...@arm.com
> >
> >
> > 
> >
> >
> > IMPORTANT NOTICE: The contents of this email and any attachments are
> > confidential and may also be privileged. If you are not the intended
> > recipient, please notify the sender immediately and do not disclose
> > the contents to any other person, use it for any purpose, or store or
> > copy the information in any medium. Thank you.
> > _______________________________________________
> > gem5-dev mailing list
> > gem5-dev@gem5.org
> > http://m5sim.org/mailman/listinfo/gem5-dev
> _______________________________________________
> gem5-dev mailing list
> gem5-dev@gem5.org
> http://m5sim.org/mailman/listinfo/gem5-dev
> _______________________________________________
> gem5-dev mailing list
> gem5-dev@gem5.org
> http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Ruby Sequencer starving O3 core?

Reply via email to