Youwei Wang has posted comments on this change.

Change subject: IMPALA-2809: Improve ByteSwap with builtin function or SSSE3 or 
AVX2.
......................................................................


Patch Set 40:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3081/40/be/src/util/bit-util.cc
File be/src/util/bit-util.cc:

Line 170:   const uint8_t* src = reinterpret_cast<const uint8_t*>(source);
> 1. I find this doc inscrutable without more labeling. Are the four differen
Hi Jim.
1. I am sorry for my first coarse table there for I am a little lost due to 
weird push issue mentioned in the mailinglist. Please break down the table into 
two parts when you read this table: one part is for the benchmark result of 
using template parameter without branch, which is colored in blue. The other 
part is for the benchmark result of not using template parameter but with 
branch, which is colored in red. 

Each part includes five runs. Each run will yield three performance data for 
FastScalar, SSSE3, AVX2 and SIMD. So for each run, we can get one single 
average performance data for FastScalar, SSSE3, AVX2 and SIMD respectively. And 
for all these five runs, we can get the FINAL average performance data for 
FastScalar, SSSE3, AVX2 and SIMD respectively.

After these two parts are done, I just copy the final average performance data 
for each part and exhibit them side by side to make a easier comparsion. So I 
believe we can take a quick conclusion by going through the final table.

I have colored some table columns to make it easier to read. If you are 
interested, would you please revisit the sheet link? And please feel free to 
tell me if you still feel confused about this table. Thank you.

2. I have used the objdump tool to check the assembly code from the libUtil.a 
binary. I have copid the aasembly code of different implementations of the 
template function (with and without the function pointer in the template 
paramenter list) to an online document link as following:
https://docs.google.com/document/d/1bCCjKPg7ytpbRTeC6UrnxoSDHCp0IOAVsQOdcQTrM9M/edit?usp=sharing

As you can see here, two different codebases have generated the same libUtil.a 
binary. (They have the same md5sum value.) Based on this fact, I guess the 
compiler optimization has taken care of this issue.

Thank you for sharing any of your ideas. :)


-- 
To view, visit http://gerrit.cloudera.org:8080/3081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I392ed5a8d5683f30f161282c228c1aedd7b648c1
Gerrit-PatchSet: 40
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Youwei Wang <youwei.a.w...@intel.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbap...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Youwei Wang <youwei.a.w...@intel.com>
Gerrit-HasComments: Yes

Reply via email to