> I'm not sure what the chip's expectation is for the actual bus > transfers in this area, but I think you are right to be concerned > about atomicity, even when transfering based on longs.
The chip docs seem to suggest that we're OK as long as we do 4-byte writes aligned to 4 bytes. > It is worth looking at using SSE instructions to burst transfer the > entire message in one atomic go. I'm not aware of any SSE instructions that work on chunks bigger than 16 bytes at a time. In fact the latest mlx4 kernel driver maps the blueflame page to userspace with write-combining enabled, and this improves performance quite a bit. The HCA doesn't care what order that the CPU drains the WC buffer in (according to docs at least) - R. _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
