Re: [PATCH/RFC] 64 bit csum_partial_copy_generic

Paul Mackerras Wed, 15 Oct 2008 23:13:19 -0700

Joel Schopp writes:

> As for the technical comments, I agree with all of them and will 
> incorporate them into the next version.


Mark Nelson is working on new memcpy and __copy_tofrom_user routines
that look like they will be simpler than the old ones as well as being
faster, particularly on Cell.  It turns out that doing unaligned
8-byte loads is faster than doing aligned loads + shifts + ors on
POWER5 and later machines.  So I suggest that you try a loop that does
say 4 ld's and 4 std's rather than worrying with all the complexity of
the shifts and ors.  On POWER3, ld and std that are not 4-byte aligned
will cause an alignment interrupt, so there I suggest we fall back to
just using lwz and stw as at present (though maybe with the loop
unrolled a bit more).  We'll be adding a feature bit to tell whether
the cpu can do unaligned 8-bytes loads and stores without trapping.

Paul.
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH/RFC] 64 bit csum_partial_copy_generic

Reply via email to