> I'm not sure what the chip's expectation is for the actual bus
 > transfers in this area, but I think you are right to be concerned
 > about atomicity, even when transfering based on longs.

The chip docs seem to suggest that we're OK as long as we do 4-byte
writes aligned to 4 bytes.

 > It is worth looking at using SSE instructions to burst transfer the
 > entire message in one atomic go.

I'm not aware of any SSE instructions that work on chunks bigger than 16
bytes at a time.

In fact the latest mlx4 kernel driver maps the blueflame page to
userspace with write-combining enabled, and this improves performance
quite a bit.  The HCA doesn't care what order that the CPU drains the WC
buffer in (according to docs at least)

 - R.
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to