> static void mlx4_bf_copy(unsigned long *dst, unsigned long *src, unsigned bytecnt) { > + int i; > + __le32 *psrc = (__le32 *)src; > + > + /* > + * the buffer is already in big endian. For little endian machines that's > + * fine. For big endain machines we must swap since the chipset swaps again > + */ > + for (i = 0; i < bytecnt / 4; ++i) > + psrc[i] = le32_to_cpu(psrc[i]); > + > __iowrite64_copy(dst, src, bytecnt / 8); > }
That code looks horrid... 1) I'm not sure the caller expects the buffer to be corrupted. 2) It contains a lot of memory cycles. 3) It looked from the calls that this code is copying descriptors, so the transfer length is probably 1 or 2 words - so the loop is inefficient. 4) ppc doesn't have a fast byteswap instruction (very new gcc might use the byteswapping memery access for the le32_to_cpu() though), so it would be better getting the byteswap done inside __iowrite64_copy() - since that is probably requesting a byteswap anyway. OTOH I'm not at all clear about the 64bit xfers.... _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev