On Mon, Aug 9, 2021 at 1:02 PM Giacomo Travaglini <
giacomo.travagl...@arm.com> wrote:

> Hi Gabe,
>
> > -----Original Message-----
> > From: Gabe Black via gem5-dev <gem5-dev@gem5.org>
> > Sent: 09 August 2021 11:02
> > To: gem5 Developer List <gem5-dev@gem5.org>
> > Cc: Gabe Black <gabe.bl...@gmail.com>
> > Subject: [gem5-dev] Re: overview/documentation/tests for vector register
> > related stuff?
> >
> > I've done a bit of digging so far, and I think I've figured out a bit
> about the
> > rename mode.
> >
> > 1. This is only used by ARM to handle the difference in how registers are
> > renamed in aarch64 vs otherwise.
> > 2. This is handled in O3 by detecting a squash in the CPU and then
> checking
> > the aarch64 bit of the PCState.
> > 3. If this changes, then O3 potentially shuffles things around to make
> register
> > chunks contiguous, and starts renaming things differently.
> > 4. The only way to switch in or out of aarch64 is through a fault.
>
> Yes, just to be more precise, it is happening by issuing a fault or by
> returning from a fault (this is just to make it clear
> the switch can happen with a non faulting instruction like an ERET)
>
> >
> > This leads me to a few conclusions.
> >
> > 1. Having the aarch64 bit in the PCState structure is probably not
> necessary
> > and may actually be harmful because it makes that structure larger and
> > slower to move around. This value does *not* change quickly or
> frequently,
> > and only changes as part of an already heavy mode switch. It does not
> need
> > to be predicted/predictable like a next PC, like something like thumb
> mode
> > might.
>
> You can figure out the execution mode (AArch64/AArch32) in a different way
> by inspecting
> The PSTATE. So I can see the redundancy. However, inspecting the
> PSTATE/CPSR from the TC is probably not
> going to be faster. We need to know the aarch64 in the decoder, so I guess
> we could cache it in there.
>
> In any case IMO I don't think removing it from the PCState is gonna affect
> in any measurable way simulation time.
>

Yeah, I think this is mostly orthogonal, I just wanted to bring it up since
I noticed it.


>
> > 2. The O3 CPU is checking renaming mode *way* more often than it really
> > needs to. Almost every single squash is *not* a switch to/from 64 bit
> mode,
> > but *every* switch involves that check, even in ISAs that don't even
> *have*
> > rename modes.
> > 3. The rename semantics switch can be handled right in the fault object
> when
> > it implements the faulting context switch. It can detect that a switch is
> > necessary and enact it without all the extra checks.
>
> Totally agree on point 2. About point 3, yes you could handle it in the
> fault object and in the ERET instruction.
> That would mean leaking uarch code in the arch directory. In other words,
> having some *O3 specific* code in
> The arch directory. This is not ideal IMHO as it is bounding the arch code
> to a single cpu model
>

Please see my CL here:
https://gem5-review.googlesource.com/c/public/gem5/+/49147

I don't think it brings uarch code into the ISA implementation. What it
does is reestablish the invariant that registers are atomic blobs which
have no structure to the CPU, and then builds the different indexing views
into the ARM implementation instead. This is the way it used to work where
an ISA would compose composite registers by reading in their parts. I would
say this actually brings *less* uarch implementation into the ISA than
before, since now the ISA doesn't need to worry about rename modes, or that
there is even a rename step. It just has to maintain the invariant that
registers are atomic blobs as far as the CPU is concerned, and build
whatever other semantics it needs on top of that.


>
> > 4. ARM can implement SVE, etc, using two different register files, one
> which
> > is indexed by element for 32 bit mode, and one which is indexed by vector
> > for 64 bit mode. The mode switch can copy values between the register
> files,
> > and we can remove what I suspect is a lot of machinery from O3 by just
> > letting it manage two different register files simply, instead of
> managing one
> > with two different personalities. This also makes the register files
> much more
> > homogenous and easier to generalize. A "real" CPU may not want to waste
> > transistors, buses, etc, for two separate register files, but in the end
> it
> > doesn't matter if the behavior is the same. This is all just in how O3
> does its
> > bookkeeping, and a redundant register file is nearly free for gem5.
> >
>
> I would love to see a cleaner implementation! But I am not entirely sure
> your solution is much different from what we are having now:
> Sure there is only one storage [1] but all remaining data structures are
> duplicated (check veRegIds and vecElemIds as an example, or the
> vecElem/vecReg freeLists [2]).
> In fact, we are already copying values from one register file to the other
> when switching from Rename::Full to Rename::Elem [3].
> I honestly believe having two different regfiles is the source of all our
> problems as it is forcing us to switch/copy values when a
> Change in rename happens. What the implementation should have been like,
> is one single set of vector data structures with 2 different views.
> No synchronization needed; AArch32 use the Enum view and AArch64 use the
> Full view.
>

Please take a look at my CLs. "Hello world" still works for ARM on the
simple CPU and O3 so it's not completely broken, but I know that's not a
very good test. This change removes hundreds of lines of code net, and will
make it much easier/possible to treat the vector registers (and predicate
registers, and cc registers, etc) as just generic register files with a
small collection of basic properties. They can then be stored in an array,
manipulated with loops, etc, and we won't have to hard wire in each new
kind of register and share them between all the ISAs, which will also
significantly simplify the CPUs and the various *Context interfaces.


>
> > Please let me know if this is correct, and I'll start chopping away.
> Some way to
> > test my changes would be very helpful, since otherwise I'll just be
> hoping for
> > the best :-P.
>
> I would recommend you to cross-compile a FP&SIMD application for AArch32
> and execute it on a AArch64 Linux kernel (with syscalls to make sure
> we change rename mode and we don't rely on the intervention of the
> scheduler). You could even cross-compile the same source for AArch64 and
> execute it as a separate process, and OFC to multiplex them on the same
> CPU.
>

Yes, I think that's a good plan. What application would you suggest though?
I think we got this far the last time too, where you suggested a small neon
test program I couldn't get to compile for some reason :-P. I think it
wouldn't compile on recent versions of compilers, although I don't remember
the specifics.

It would probably be a good idea to have a program like this as part of
gem5's tests in general. Otherwise we have no coverage on those mechanisms,
and will have no idea when they break, even from less substantial and/or
unrelated changes.

Gabe
_______________________________________________
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to