[ https://issues.apache.org/jira/browse/HADOOP-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084519#comment-13084519 ]
Kihwal Lee commented on HADOOP-7446: ------------------------------------ I wrote a simple C test program that reads a file 64K at a time and calls the bulk verification function in a loop. In zlib case, it reads 512 bytes at a time and calls the zlib API. The page cache was warmed up by running multiple times and the best numbers were picked. The following is to show roughly how they perform differently. Do not take it as an accurate analysis. *Environment & Setup* data file 65536000 bytes 64 KB buffer, 512B chunk size GCC optimization level 2 (-O2) 2400 MHz Intel Xeon E5530 *Results* {noformat} [Software slice-by-8] 64 bit software 137,944,134 cycles 1140 MB/s 32 bit software 183,254,871 cycles 858 MB/s [Hardware non-pipelined] 64 bit hardware 53,547,822 cycles 2937 MB/s 32 bit hardware 73,407,888 cycles 2142 MB/s [Hardware Pipelined] 64 bit hardware 42,188,922 cycles 3728 MB/s 32 bit hardware 49,568,661 cycles 3173 MB/s [zlib] 32 bit 478,252,812 cycles 328 MB/s 64 bit 490,850,076 cycles 320 MB/s {noformat} I will soon post the code for the pipelined version after rebasing to your new patch. You can take a look and decide what to do. > Implement CRC32C native code using SSE4.2 instructions > ------------------------------------------------------ > > Key: HADOOP-7446 > URL: https://issues.apache.org/jira/browse/HADOOP-7446 > Project: Hadoop Common > Issue Type: Improvement > Components: native > Affects Versions: 0.23.0 > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Fix For: 0.23.0 > > Attachments: hadoop-7446.txt, hadoop-7446.txt, hadoop-7446.txt > > > Once HADOOP-7445 is implemented, we can get further performance improvements > by implementing CRC32C using the hardware support available in SSE4.2. This > support should be dynamically enabled based on CPU feature flags, and of > course should be ifdeffed properly so that it doesn't break the build on > architectures/platforms where it's not available. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira