> We have thought that we could try CRC32C first and if that works then use
 > that, otherwise fall back to CRC32 which would allow rxe to use CRC32C when
 > talking to itself and get a speedup on new Intel CPUs.

How would you negotiate that?  Is this being worked in the definition of
the IBoE spec?  It seems in a final standard, having something stronger
than two CRC32s and something that can handle jumbo frames would be
worth it.

 > The crc32 code we have is based on some suggestions from Intel and seems to
 > be much faster than the 'standard' code in the kernel and we are/were
 > planning to merge the crc computation with the data copy. It is actually
 > less explicitly optimized than the code in the kernel but when we use the
 > loop unrolling directive in the compiler we get better overall results. That
 > core loop gives us about 5-6GB/sec for the copy/crc vs about 500MB/sec for
 > the 'standard' crc + copy.

Interesting ... can the upstream kernel code be improved similarly?

 > Our plan is to strip out all the trace and dump code when the driver is
 > stable. We will keep the warn and error messages.

If you develop upstream and use the ftrace tracepoints stuff then the
overhead of (runtime) disabled trace points is nearly zero (one 5-byte
NOP, so a cycle or two).  So you can keep them forever (which is
realistically how long you're going to be debugging this :)

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to