> Things like sizeof() and offsetof() are known at compile time, so the compiler
> will recognize when a condition is always true or false and optimize it out
> accordingly.  In cases where the value cannot be known at compile time,
> checking the length in the macro and dispatching to a different
> implementation may still be advantageous, especially when the different
> implementation doesn't involve function pointers.

Ok, multiple issues resolved and have new numbers:

1) Implemented the new COMP_CRC32 macro with the comparison and choice of 
avx512 vs. SSE42 at compile time for static structures.
2) You were right about the baseline numbers, it seems that the binaries were 
compiled with the direct call version of the SSE 4.2 CRC implementation thus 
avoiding the function pointer. I rebuilt with 
USE_SSE42_CRC32C_WITH_RUNTIME_CHECK for the numbers below.
3) ran through all the tests again and ended up with no regression (meaning run 
sets would fall either 0.5% below or 1.5% above the baseline and the margin of 
error was MUCH tighter this time at ~3%. :)

New Table of Rates (looks correct with fixed font width) below:

+------------------+----------------+----------------+------------------+-------+------+
| Rate in bytes/us |    SDP (SPR)   |       m6i      |       m7i        |       
|      |
+------------------+----------------+----------------+------------------+ 
Multi-|      |
| higher is better | SSE42  | AVX512 | SSE42 | AVX512 | SSE42  | AVX512 | plier 
|  %   |
+==================+=================+=======+========+========+========+=======+======+
| AVG Rate 64-8192 | 10,095 | 82,101 | 8,591 | 38,652 | 11,867 | 83,194 | 6.68  
| 568% |
+------------------+--------+--------+-------+--------+--------+--------+-------+------+
| AVG Rate 64-255  |  9,034 |  9,136 | 7,619 |  7,437 |  9,030 |  9,293 | 1.01  
|   1% |
+------------------+--------+--------+-------+--------+--------+--------+-------+------+

* With a data profile of 99% buffer sizes <256 bytes the improvement is still 
6% and will not regress (except withing the margin of error)!
* There is not a regression anymore (previously showing a 14% regression).

Thanks for the pointers!!!
Paul

Attachment: 0001-v4-Refactor-Move-all-HW-checks-to-common-file.patch
Description: 0001-v4-Refactor-Move-all-HW-checks-to-common-file.patch

Attachment: 0002-v4-Feat-Add-support-for-the-SIMD-AVX-512-crc32c-algorit.patch
Description: 0002-v4-Feat-Add-support-for-the-SIMD-AVX-512-crc32c-algorit.patch

Attachment: 0003-v4-Feat-New-COMP_CRC32C-macro-for-AVX512-simplify-code-.patch
Description: 0003-v4-Feat-New-COMP_CRC32C-macro-for-AVX512-simplify-code-.patch

Reply via email to