Hi Gopal: Really thanks for your reply! You mean that if I limit only 1 cpu to run VectorizedLogicBench.IfExprLongColumnLongColumnBench, the variation will be small, is my understanding right? If yes, the variation became smaller than before after using taskset -cp 1 $pid. But I am confused all the tests in VectorizedLogicBench is better pipelined and vectorized, why there is no large variation for other tests in VectorizedLogicBench? My guess is that the complex expression used in VectorizedLogicBench.IfExprLongColumnLongColumnBench actually uses more CPU than other expression.
The expression used in VectorizedLogicBench.IfExprLongColumnLongColumnBench: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprLongColumnLongColumn.java#L90 Best Regards Kelly Zhang/Zhang,Liyun -----Original Message----- From: Gopal Vijayaraghavan [mailto:gop...@apache.org] Sent: Thursday, November 16, 2017 5:40 AM To: dev@hive.apache.org Cc: Zhang, Liyun <liyun.zh...@intel.com>; Teddy Choi <tc...@hortonworks.com> Subject: Re: Anyone knows the problem I found in VectorizedLogicBench.IfExprLongColumnLongColumnBench? Hi, > You see that there is a great float for > IfExprLongColumnLongColumnBench.bench, the float is 583775 and the average > value is 1621602. In my tests, the single core tests tended to have huge variations on Intel with Turbo boost. CPU operations which are fast when stressing CPU in single threaded mode tended to get really slow when the other cores spin up and hitting thermal limits. For most memory bound operations this is not easily visible, but the better pipelined and vectorized the loops get the worse the impact of dynamic CPU frequency scaling. Can you collect active CPU frequency when running this benchmark and do "taskset -c 1" to force the run to stick to a single CPU? Cheers, Gopal