Joel Schopp writes: > As for the technical comments, I agree with all of them and will > incorporate them into the next version.
Mark Nelson is working on new memcpy and __copy_tofrom_user routines that look like they will be simpler than the old ones as well as being faster, particularly on Cell. It turns out that doing unaligned 8-byte loads is faster than doing aligned loads + shifts + ors on POWER5 and later machines. So I suggest that you try a loop that does say 4 ld's and 4 std's rather than worrying with all the complexity of the shifts and ors. On POWER3, ld and std that are not 4-byte aligned will cause an alignment interrupt, so there I suggest we fall back to just using lwz and stw as at present (though maybe with the loop unrolled a bit more). We'll be adding a feature bit to tell whether the cpu can do unaligned 8-bytes loads and stores without trapping. Paul. _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev