Youwei Wang has posted comments on this change. Change subject: IMPALA-2809: Improve ByteSwap with builtin function or SSSE3 or AVX2. ......................................................................
Patch Set 40: (11 comments) http://gerrit.cloudera.org:8080/#/c/3081/40/be/src/util/bit-util-test.cc File be/src/util/bit-util-test.cc: Line 131: void TestByteswapSIMD(const int64_t CpuFlag, const int buf_size) { > Simd, not SIMD Done http://gerrit.cloudera.org:8080/#/c/3081/40/be/src/util/bit-util.cc File be/src/util/bit-util.cc: Line 170: const uint8_t* src = reinterpret_cast<const uint8_t*>(source); > i'm hoping the microbenchmark would tell us. Greetings, all. I have conducted an experiment on my workstation, the CPU model is: Machine Info: Intel(R) Core(TM) i5-4460 CPU @ 3.20GHz. Corresponding data of experiment result is recorded at this link: https://docs.google.com/spreadsheets/d/1Wi3UvcIvflVrEIWf-q3maupUtqY9qIDvYmcjBwk2zyc/edit?usp=sharing As a quick summary, it seems using template parameter to pass the function pointer has a slight performance advantage here over using branch. Thank you.:) http://gerrit.cloudera.org:8080/#/c/3081/40/be/src/util/bit-util.h File be/src/util/bit-util.h: Line 246: public: > formatting (see above for example) Done Line 247: /// Function ByteSwapScalar is used to byteswap an array in the scalar way with some > remove "Function Byte.. is used to " Done Line 248: /// builtin optimization for the array of length <= 16Byte. > 16Byte -> 16 bytes, etc. Done Line 252: static void ByteSwapScalar(void* dst, const void* src, int len); > output params go at the end Hi Marcel. According to the original form of the BitUtil::ByteSwap function found at this link: https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/util/bit-util.cc void BitUtil::ByteSwap(void* dest, const void* source, int len) It has the output params go at the head. In order to avoid any possible confusion, all functions I have added related to this BitUtil::ByteSwap have followed the same convention. Besides, other standard functions similar to this also have the same convention. If you are interested, you can take a quick look at this link: http://en.cppreference.com/w/cpp/string/byte/memcpy Thank you. :) Line 262: typedef void (*SimdFuncPointer)(uint8_t* dst, const uint8_t* src); > drop "pointer" (we don't tag type names in that way) Done Line 263: /// Template function ByteSwapSIMD is the entry point function to byteswap an array > precede w/ blank line Done Line 266: /// int TEMPLATE_DATA_WIDTH: only 16 or 32 is availble now; > what does "is available now" mean? "are valid"? Done Line 268: /// if TEMPLATE_DATA_WIDTH is 16, SIMD_FUNC should be ByteSwap128; > then why not just call that directly from within the function (and drop the Hi Marcel. Please take a quick look at this link: https://docs.google.com/spreadsheets/d/1Wi3UvcIvflVrEIWf-q3maupUtqY9qIDvYmcjBwk2zyc/edit?usp=sharing I believe we should still use the template parameter since it has some performance advantage here. Line 275: static void ByteSwapSIMD(void* dst, const void* src, const int len); > name formatting: Simd, not SIMD Done -- To view, visit http://gerrit.cloudera.org:8080/3081 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I392ed5a8d5683f30f161282c228c1aedd7b648c1 Gerrit-PatchSet: 40 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Youwei Wang <youwei.a.w...@intel.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Jim Apple <jbap...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Youwei Wang <youwei.a.w...@intel.com> Gerrit-HasComments: Yes