https://bugs.kde.org/show_bug.cgi?id=429354

Carl Love <c...@us.ibm.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #133461|0                           |1
        is obsolete|                            |

--- Comment #5 from Carl Love <c...@us.ibm.com> ---
Created attachment 136162
  --> https://bugs.kde.org/attachment.cgi?id=136162&action=edit
Functional support for ISA 3.1, VSX Mask manipulation operations

Updated the patch per comments.

I looked at the suggestions on how to make copy_MSB_bit_fields() more
efficient.  PPC64 supports doing the basic algorithm that was outlined as a
single V128 value except for doing the sum across lanes.  That would need to be
done using a clean helper.  In studying the code, I realized that the
copy_MSB_bit_fields() is very similar to the ISA 3.1 instruction vgnb.  The
difference is the result is stored in the low order bits not the high order
bits.  I rewrote copy_MSB_bit_fields() using the vgnb implementation as a base
leveraging the existing clean helpers.  The new function is much more
efficient.  Although I didn't use the suggested algorithm, it was very helpful
to study.  I realized that a number of the other instructions in the patch
could be done using the arithmetic right shift much more efficiently without
the use of the copy_MSB_bit_fields() or the "for(i = 0; i< max; i++)" which
Julian had concerns about.

Reimplemented the vexpandbm, vexpanddmn, vexpandhm, vexpandwm instructions 
eliminating the call to copy_MSB_bit_fileds() and the need for the
for(i = 0; i< max; i++) loop.  The new implementation is much more efficient.

Reimplemented the mtvsrbm, mtvsrhm, mtvsrwm, mtvsrdm instructions to remove the
need for the for(i = 0; i< max; i++) loop.  The new implementation is much more
efficient.

The for(i = 0; i< max; i++) has been removed.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to