Hello again folks. I'm tempted to write up a doc which describes my
thoughts on this, but for the sake of expediency I'm just going to send a
"quick" email for now. (edit: This got pretty long. Maybe I should have put
it in a doc. Too late now, maybe later?)

TLM DMI socket API:

When a TLM initiator sends a transaction which eventually gets serviced by
a target, there is an attribute in the generic payload which says whether
or not the value accessed *could* have been accessed by DMI if that had
been requested. It seems to be that at least for some uses of this
mechanism, this is a gating mechanism. If the other guy doesn't say yes to
this, then the initiator won't bother to ask with heavier weight mechanisms.

Then there's a get_direct_mem_ptr method which also accepts a transaction,
and effectively returns whether there's a DMI-able region which corresponds
to that transaction and a descriptor of that region.

Finally, there's an invalidate_direct_mem_ptr method which basically
broadcasts backwards to let anybody know who may have a live DMI descriptor
that some region is going away, and they should throw away their
descriptor. Unlike the get_direct_mem_ptr which travels like an access with
one destination, this propogates back to potentially multiple initiators
who could have retrieved a DMI descriptor at some point.


Proposed gem5 mechanism:

There would be a new pair of protocol functions added to the master/slave
memory port pairs, sendAtomicBackdoor/recvAtomicBackdoor. Only the atomic
mechanism is being extended to support backdoor accesses, at least for
right now, because you'd likely use the atomic mode and DMI/backdoors under
the same circumstances, ie when you want to go fast and can sacrafice
accuracy. The sendAtomicBackdoor function would just call the
recvAtomicBackdoor function on its peer. The recvAtomicBackdoor function
would be virtual, and its default implementaiton would be to call vanilla
recvAtomic.

The same as regular sendAtomic/recvAtomic, except that it also takes a
reference to a pointer to a memory backdoor descriptor. I use the term
backdoor instead of re-using DMI to avoid them being seen as equivalent but
confusingly different, and because the term backdoor is already being used
in gem5. The exact name is up for debate. If the target which services the
atomic requiest could support backdoor accesses, or a link in the chain
wouldn't stop working if it was circumvented (like a cache), then that
pointer is set to point to a backdoor memory descriptor. If not, then it's
left pointing at nullptr.

The backdoor descriptor itself has a few basic properties (still to be
nailed down) which essentially correspond to the DMI ones, ie start and end
address, pointer to the data, and access priveleges. It also has a list of
callbacks for everyone that has a copy of the descriptor returned one way
or the other. These are registered by the caller when they successfully
receive the descriptor, a step they can skip if they're, for instance, just
checking if there is one but not actually storing it.

Optionally, there could also be a getMemoryBackdoors function which would
accept a start and end address (or AddrRange) and collect a vector of
backdoor descriptors for the whole system. This could be used when setting
up, for instance, KVM, so that you don't have to manually try to figure out
where the memory is to add into the KVM descriptor. This is largely
separate from/an extension of the other mechanism and can be added later.

If a backdoor needs to be invalidated, the owner of the backdoor just needs
to go through and call all the callbacks, and when it's done it can throw
away the backdoor.


Translation between the two:

TLM -> gem5

When a transaction comes in from TLM, the gem5 side of the bridge would
call sendAtomicBackdoor. That will trickle through to somebody on the gem5
side which will reply, and optionally pass back a pointer to the backdoor
descriptor. The assumption is that if something is backdoor-able, then it
will have already set up that descriptor and will just hand out pointers to
it as necessary. If the backdoor pointer is set, then the dmi_allowed
attribute will be set. Otherwise the bridge acts as before.

When get_direct_mem_ptr is called, the bridge will also call
sendAtomicBackdoor, but in that case it will set the NO_ACCESS flag in the
request it creates (or similar) to indicate that the packet shouldn't do
anything when it gets where it's going. When the result comes back, the
backdoor descriptor is used to set properties in the DMI object which is
returned to the caller, or false is returned if no backdoor is found. The
bridge will keep the backdoor around and will install a callback in it
which will prompt it to call invalidate_direct_mem_ptr if the backdoor is
invalidated.

gem5 -> TLM

When a gem5 request comes in that is on recvAtomicBackdoor, the bridge will
potentially make two calls. First it will use the normal transmission
mechanism to perform the access. If the target indicates DMI is possible,
then gem5 will use get_direct_mem_ptr to get a DMI data blob which it will
use to construct a backdoor descriptor which it will store and set the
backdoor pointer to. Either way, recvAtomic will then return.

Alternatively, if there's already a backdoor which covers that access for
some reason, the bridge could just take advantage of it to do the access
without forwarding anything on.

When a call to invalidate_direct_mem_ptr comes in, the bridge will look
through the backdoors it's accumulated. For each one that overlaps at all,
it will be invalidated using the procedure described above, calling the
callbacks and then deleting the backdoor.


Benefits:

I think there are a couple performance benefits to doing things this way.
First, at least on the gem5 side, there aren't a lot of temporary objects
being set up and initialized and/or copied around. The initiator of an
access doesn't have to check if a DMI would have been possible if it cares.
If a memory system participant doesn't care or know about DMI or the new
mechanism, everything automatically and transparently falls back to the
non-backdoor-aware mechanisms.


Downsides:

Not automatically transported by interconnects like normal bridges, buses,
so some porting work is necessary for it to be useful. Expansion of the
memory port API. Backdoor callbacks can't gracefully disappear before the
backdoor they track goes away, although they could have a flag set which
makes them do nothing and go away harmlessly.

Gabe

On Thu, Mar 14, 2019 at 4:18 PM Gabe Black <gabebl...@google.com> wrote:

> I think gem5 would benefit from it for the same reason SystemC simulations
> do, namely speeding up simulations when doing fast forwarding (perhaps with
> a binary translating CPU... hypothetically...). It would also be very nice
> to enable software development for not-yet-existing hardware that has gem5
> models available. Then gem5 users could do both software development and
> performance evaluation in parallel and only have to build models once. This
> is very nice when large bodies of software need to be written to support a
> bit of hardware, for instance if they have large complex drivers, need
> application level support, etc. It would also be great not to have to wait
> for hours for android to boot to get to the interesting part of a
> simulation or when debugging at a guest software level.
>
> Gabe
>
> On Thu, Mar 14, 2019 at 6:41 AM Dr. Matthias Jung <jun...@eit.uni-kl.de>
> wrote:
>
>> Hi Gabe,
>>
>> one of the main reasons for DMI is to speedup simulations, similar to the
>> temporal decoupling in TLM LT or debug transport in order to make the
>> boot-loading. AFAIK DMI is mainly used in virtual platforms that target
>> software development and not hardware architecture design space
>> explorations, because you skip interconnects, caches etc. In commercial
>> tools you can just switch on or switch off DMI and therefore you can have a
>> nice trade-off between speed and accuracy by using the same models.
>>
>> Since gem5 is mainly there for computer-system architecture research, I’m
>> not sure if the DMI feature is really required. From a TLM2 perspective,
>> even if a TLM target model included in gem5 offers DMI, the gem5 core model
>> (initiator) does not have to use it, right? Do you have any concrete use
>> case where you could exploit DMI?
>>
>> For KVM: maybe somebody with KVM experience should comment that.
>>
>> Best regards,
>> Matthias
>>
>> > Am 28.02.2019 um 06:13 schrieb Gabe Black <gabebl...@google.com>:
>> >
>> > Hi folks. TLM is a communication protocol/mechanism built on top of
>> > systemc. It supports a mechanism called DMI which stands for direct
>> memory
>> > interface. The idea is that an entity sending a request into the system
>> can
>> > ask if the target can give it a pointer it can use to directly access
>> that
>> > memory in the future. The target, if it supports that sort of thing,
>> > returns a descriptor which describes a region of memory that can be
>> > accessed in that way. If that needs to be invalidated in the future,
>> then
>> > there's another mechanism the target can use to communicate back to the
>> > sender telling it to throw away that descriptor.
>> >
>> > The way this mechanism is implemented in TLM is a bit less than ideal
>> since
>> > every request has a field that says whether the requester wants to know
>> > about DMI, and so the target has to perform an extra check on all the
>> > requests in case someone is asking when that's useful to communicate
>> only a
>> > very small fraction of the time, perhaps only once during an entire
>> > simulation.
>> >
>> > Aside from that though, this mechanism has some nice properties. First,
>> it
>> > avoids having to globally identify what a memory is or where it is for a
>> > particular simulation. A memory is just a thing on the other end of a
>> > request that may let you get at it directly if you ask nicely. Also, if
>> > there's something in the way that would get messed up if you skipped
>> over
>> > it, say a cache, it can block those requests from getting through to
>> > targets. This could be useful for KVM for instance, when it's collecting
>> > regions to act as RAM for the virtual machine.
>> >
>> > I haven't fully figured out what a good way to avoid the
>> check-every-time
>> > problem of the systemc mechanism, and ideally whatever I/we come up with
>> > will be compatible enough to be bridged effectively, but I'm thinking
>> some
>> > sort of explicit additional call like getAddrRanges which would
>> propogate
>> > through the hierarchy at specific points, either to a specific address
>> or
>> > as a broadcast.
>> >
>> > I know some folks have looked at gem5's memory system protocol and
>> > systemc's TLM before, for instance either to try making gem5 use TLM
>> > natively, or for the systemc TLM bridges. What do you think about adding
>> > this sort of mechainsm to gem5? Are there any pitfalls to avoid, known
>> > issues to figure out, suggested avenues to explore, etc? Please let me
>> > know. This is likely something I'm going to want to pursue in the next
>> few
>> > weeks.
>> >
>> > Gabe
>> > _______________________________________________
>> > gem5-dev mailing list
>> > gem5-dev@gem5.org
>> > http://m5sim.org/mailman/listinfo/gem5-dev
>>
>> _______________________________________________
>> gem5-dev mailing list
>> gem5-dev@gem5.org
>> http://m5sim.org/mailman/listinfo/gem5-dev
>
>
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to