On Mon, 6 Jul 2015, Nilay Vaish wrote:
On Mon, 6 Jul 2015, Giacomo Gabrielli wrote:
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2828/#review6715
-----------------------------------------------------------
These are my current thoughts about this patch:
1. My impression is that there is still not enough architectural support
to understand if the new vector register type as it stands can address all
the
different corner cases efficiently; I'd leave to the wider gem5 community
decide where we want to draw that line...
Can you elaborate what corner cases might we run into? I have reimplemented
SSE instructions using the new type, and I did not find the new type to be
limiting in any sense.
To add more, I am sort of confident that AVX-256 and AVX-512 instructions
can be implemented without any changes to the current vector register
implementation.
2. Legacy SSE requires merging of upper lanes, while AVX does zeroing;
also ARMv8 AArch64 scalar FP and NEON instructions perform zeroing.
Assuming that destination vectors are always read is going to
introduce unneded serialization for those ISA extensions if they are
going to be ported to the new scheme, so I'd suggest to avoid to
implicitly read on write. Also for cases where merging is required,
maybe something smarter should be done to avoid unneded
serialization; without optimizations, any sequence of x86 FP scalar
instructions could be significantly slow compared to real hw
implementations.
Instructions for which the whole register would be written should be able to
avoid reading the initial register. For scalar operations, I agree that we
would be reading and writing many more bytes than required. Do you have any
suggestions?
More on the speed issue: I have been testing my implementation of the SSE
instructions using an application from QEMU' source (as suggested by Gabe
Black). Here are timing result for the opt build:
Average without the patch: 3.502 seconds
Average with the patch: 3.539 second
I am willing to live with this slowdown. Of course, my opinion is biased
since I wrote the patch.
--
Nilay
Actual Data without the patch:
real 0m3.494s
user 0m3.276s
sys 0m0.216s
real 0m3.514s
user 0m3.288s
sys 0m0.223s
real 0m3.497s
user 0m3.266s
sys 0m0.229s
real 0m3.489s
user 0m3.264s
sys 0m0.224s
real 0m3.502s
user 0m3.277s
sys 0m0.223s
real 0m3.508s
user 0m3.282s
sys 0m0.225s
real 0m3.501s
user 0m3.267s
sys 0m0.231s
real 0m3.517s
user 0m3.282s
sys 0m0.232s
Actual Data with the patch:
real 0m3.531s
user 0m3.296s
sys 0m0.231s
real 0m3.528s
user 0m3.303s
sys 0m0.222s
real 0m3.526s
user 0m3.306s
sys 0m0.216s
real 0m3.552s
user 0m3.312s
sys 0m0.237s
real 0m3.566s
user 0m3.332s
sys 0m0.231s
real 0m3.541s
user 0m3.307s
sys 0m0.230s
real 0m3.521s
user 0m3.293s
sys 0m0.226s
real 0m3.543s
user 0m3.308s
sys 0m0.232s
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev