-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/592/#review1096
-----------------------------------------------------------



src/arch/x86/isa/insts/simd128/floating_point/arithmetic/horizontal_addition.py
<http://reviews.m5sim.org/r/592/#comment1458>

    ext is no longer set to a raw bitvector that selects per instruction 
features like this since, as you can see, it's pretty opaque just looking at 
it. The maddf ext=1 becomes ext=Scalar. For msrli and mslli, ext=0 is the 
default and can be dropped. It would leave the ops as SIMD. Since they're 
already operating at the full width of the fp register type (a double) the 
value is especially redundant.



src/arch/x86/isa/insts/simd128/floating_point/arithmetic/horizontal_addition.py
<http://reviews.m5sim.org/r/592/#comment1461>

    This implementation is a bit inefficient, although not terribly so. You 
have to be careful since the two operands may be the same registers and you 
don't want to overwrite something you still need, but, for instance, the maddf 
one line above, this shift of ufp4 and the maddf on line 60 could all update 
xmmh since all "high" halves of xmm registers have been read and no faults can 
happen. The moves that read out xmmlm could be moved higher, and xmml could 
also be updated directly.
    
    I think it -may- also be possible to do something clever and cut down the 
number of microops shifting things around to pack and unpack the results. I may 
have also suspected this was true when I wrote the much simpler 64 bit wide 
version of this instruction below this one where the components are whole 
registers and can be indexed directly, but then didn't come up with anything 
and punted for later.



src/arch/x86/isa/insts/simd128/floating_point/arithmetic/horizontal_addition.py
<http://reviews.m5sim.org/r/592/#comment1459>

    This microop is changing architecturally visible state and effectively 
committing to completing the op before all the possibly faulting ops have 
executed, specifically the following loads. There are 8 microcode fp registers 
so you can just use the others and leave ufp3 around until the end.



src/arch/x86/isa/insts/simd128/floating_point/arithmetic/horizontal_addition.py
<http://reviews.m5sim.org/r/592/#comment1460>

    Like above, this can't happen before the loads.


- Gabe


On 2011-03-17 16:07:08, Lisa Hsu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/592/
> -----------------------------------------------------------
> 
> (Updated 2011-03-17 16:07:08)
> 
> 
> Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and 
> Nathan Binkert.
> 
> 
> Summary
> -------
> 
> X86:  haddps: Another patch from Vince Weaver
> 
> 
> Diffs
> -----
> 
>   src/arch/x86/isa/decoder/two_byte_opcodes.isa 2e269d6fb3e6 
>   
> src/arch/x86/isa/insts/simd128/floating_point/arithmetic/horizontal_addition.py
>  2e269d6fb3e6 
> 
> Diff: http://reviews.m5sim.org/r/592/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Lisa
> 
>

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to