https://bugs.kde.org/show_bug.cgi?id=429354
Carl Love <c...@us.ibm.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #133461|0 |1 is obsolete| | --- Comment #5 from Carl Love <c...@us.ibm.com> --- Created attachment 136162 --> https://bugs.kde.org/attachment.cgi?id=136162&action=edit Functional support for ISA 3.1, VSX Mask manipulation operations Updated the patch per comments. I looked at the suggestions on how to make copy_MSB_bit_fields() more efficient. PPC64 supports doing the basic algorithm that was outlined as a single V128 value except for doing the sum across lanes. That would need to be done using a clean helper. In studying the code, I realized that the copy_MSB_bit_fields() is very similar to the ISA 3.1 instruction vgnb. The difference is the result is stored in the low order bits not the high order bits. I rewrote copy_MSB_bit_fields() using the vgnb implementation as a base leveraging the existing clean helpers. The new function is much more efficient. Although I didn't use the suggested algorithm, it was very helpful to study. I realized that a number of the other instructions in the patch could be done using the arithmetic right shift much more efficiently without the use of the copy_MSB_bit_fields() or the "for(i = 0; i< max; i++)" which Julian had concerns about. Reimplemented the vexpandbm, vexpanddmn, vexpandhm, vexpandwm instructions eliminating the call to copy_MSB_bit_fileds() and the need for the for(i = 0; i< max; i++) loop. The new implementation is much more efficient. Reimplemented the mtvsrbm, mtvsrhm, mtvsrwm, mtvsrdm instructions to remove the need for the for(i = 0; i< max; i++) loop. The new implementation is much more efficient. The for(i = 0; i< max; i++) has been removed. -- You are receiving this mail because: You are watching all bug changes.