Re: [gem5-dev] Review Request 2828: cpu: implements vector registers

Nilay Vaish Mon, 06 Jul 2015 09:54:35 -0700

On Mon, 6 Jul 2015, Nilay Vaish wrote:

On Mon, 6 Jul 2015, Giacomo Gabrielli wrote:


 -----------------------------------------------------------
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2828/#review6715
 -----------------------------------------------------------


 These are my current thoughts about this patch:

 1. My impression is that there is still not enough architectural support
 to understand if the new vector register type as it stands can address all
 the
 different corner cases efficiently; I'd leave to the wider gem5 community
 decide where we want to draw that line...

Can you elaborate what corner cases might we run into? I have reimplementedSSE instructions using the new type, and I did not find the new type to belimiting in any sense.

To add more, I am sort of confident that AVX-256 and AVX-512 instructionscan be implemented without any changes to the current vector registerimplementation.


 2. Legacy SSE requires merging of upper lanes, while AVX does zeroing;
    also ARMv8 AArch64 scalar FP and NEON instructions perform zeroing.
    Assuming that destination vectors are always read is going to
    introduce unneded serialization for those ISA extensions if they are
    going to be ported to the new scheme, so I'd suggest to avoid to
    implicitly read on write.  Also for cases where merging is required,
    maybe something smarter should be done to avoid unneded
    serialization; without optimizations, any sequence of x86 FP scalar
    instructions could be significantly slow compared to real hw
    implementations.

Instructions for which the whole register would be written should be able toavoid reading the initial register. For scalar operations, I agree that wewould be reading and writing many more bytes than required. Do you have anysuggestions?

More on the speed issue: I have been testing my implementation of the SSEinstructions using an application from QEMU' source (as suggested by GabeBlack). Here are timing result for the opt build:


Average without the patch: 3.502 seconds
Average with the patch: 3.539 second

I am willing to live with this slowdown. Of course, my opinion is biasedsince I wrote the patch.



--
Nilay


Actual Data without the patch:
real    0m3.494s
user    0m3.276s
sys     0m0.216s

real    0m3.514s
user    0m3.288s
sys     0m0.223s

real    0m3.497s
user    0m3.266s
sys     0m0.229s

real    0m3.489s
user    0m3.264s
sys     0m0.224s

real    0m3.502s
user    0m3.277s
sys     0m0.223s

real    0m3.508s
user    0m3.282s
sys     0m0.225s

real    0m3.501s
user    0m3.267s
sys     0m0.231s

real    0m3.517s
user    0m3.282s
sys     0m0.232s


Actual Data with the patch:

real    0m3.531s
user    0m3.296s
sys     0m0.231s

real    0m3.528s
user    0m3.303s
sys     0m0.222s

real    0m3.526s
user    0m3.306s
sys     0m0.216s

real    0m3.552s
user    0m3.312s
sys     0m0.237s

real    0m3.566s
user    0m3.332s
sys     0m0.231s

real    0m3.541s
user    0m3.307s
sys     0m0.230s

real    0m3.521s
user    0m3.293s
sys     0m0.226s

real    0m3.543s
user    0m3.308s
sys     0m0.232s
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Review Request 2828: cpu: implements vector registers

Reply via email to