[ https://issues.apache.org/jira/browse/HIVE-18080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16254829#comment-16254829 ]
liyunzhang edited comment on HIVE-18080 at 11/16/17 6:41 AM: ------------------------------------------------------------- [~teddy.choi]: I retested {{VectorizedLogicBench#IfExprLongColumnLongColumnBench}},{{VectorizedLogicBench#IfExprRepeatingLongColumnLongColumnBench}} and {{VectorizedLogicBench#IfExprLongColumnRepeatingLongColumnBench}} in AVX1 and AVX2 the result is AVX1 {code} o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench avgt 20 1595748.343 ± 16887.073 us/op o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnRepeatingLongColumnBench.bench avgt 20 1735827.809 ± 18129.173 us/op o.a.h.b.v.VectorizedLogicBench.IfExprRepeatingLongColumnLongColumnBench.bench avgt 20 1768004.314 ± 14489.511 us/op {code} AVX2 {code} o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench avgt 20 1691559.843 ± 118986.372 us/op o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnRepeatingLongColumnBench.bench avgt 20 1837327.456 ± 76084.038 us/op o.a.h.b.v.VectorizedLogicBench.IfExprRepeatingLongColumnLongColumnBench.bench avgt 20 1760544.684 ± 93512.838 us/op {code} the test script I used {code} export JAVA_HOME=/home/zly/sr601/jmh/jdk-9.0.1/ export PATH=$JAVA_HOME/bin:$PATH export LD_LIBRARY_PATH=/home/zly/sr601/jmh/jdk-9.0.1/mylib for i in 0 1 2; do java -server -XX:UseAVX=1 -jar benchmarks.jar org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 -f 1 -bm avgt -tu us >log.logic.avx1.single.$i & export pid=$! taskset -cp 1 $pid wait $pid done for i in 0 1 2; do java -server -XX:UseAVX=2 -jar benchmarks.jar org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 -f 1 -bm avgt -tu us >log.logic.avx2.single.$i & export pid=$! taskset -cp 1 $pid wait $pid done {code} It seems that no much improvement comparing AVX1 and AVX2. Can you spend some time to help find the root cause? Thanks! was (Author: kellyzly): [~teddy.choi]: I retested {{VectorizedLogicBench#IfExprLongColumnLongColumnBench}},{{VectorizedLogicBench#IfExprRepeatingLongColumnLongColumnBench}} and {{VectorizedLogicBench#IfExprLongColumnRepeatingLongColumnBench}} in AVX1 and AVX2 the result is AVX1 {code} o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench avgt 20 1595748.343 ± 16887.073 us/op o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnRepeatingLongColumnBench.bench avgt 20 1735827.809 ± 18129.173 us/op o.a.h.b.v.VectorizedLogicBench.IfExprRepeatingLongColumnLongColumnBench.bench avgt 20 1768004.314 ± 14489.511 us/op {code} AVX2 {code} o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnLongColumnBench.bench avgt 20 1691559.843 ± 118986.372 us/op o.a.h.b.v.VectorizedLogicBench.IfExprLongColumnRepeatingLongColumnBench.bench avgt 20 1837327.456 ± 76084.038 us/op o.a.h.b.v.VectorizedLogicBench.IfExprRepeatingLongColumnLongColumnBench.bench avgt 20 1760544.684 ± 93512.838 us/op {code} It seems that no much improvement comparing AVX1 and AVX2. Can you spend some time to help find the root cause? Thanks! > Performance degradation on > VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled > ------------------------------------------------------------------------------------------------------ > > Key: HIVE-18080 > URL: https://issues.apache.org/jira/browse/HIVE-18080 > Project: Hive > Issue Type: Bug > Reporter: liyunzhang > > Use Xeon(R) Platinum 8180 CPU to test the performance of > [AVX512|https://en.wikipedia.org/wiki/AVX-512]. > {code} > #cat /proc/cpuinfo |grep "model name"|head -n 1 > model name : Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz > {code} > Before that I have compiled hive with JDK9 as JDK9 enables AVX512 > Use hive microbenchmark(HIVE-10189) to evaluate the performance improvement. > It seems performance(20%+) in cases in > {{VectorizedArithmeticBench}},{{VectorizedComparisonBench}},{{VectorizedLikeBench}},{{VectorizedLogicBench}} > execpt > {{VectorizedLogicBench#IfExprLongColumnLongColumnBench}},{{VectorizedLogicBench#IfExprRepeatingLongColumnLongColumnBench}} > and > {{VectorizedLogicBench#IfExprLongColumnRepeatingLongColumnBench}}.The data is > like following > When i use Skylake CPU to evaluate the performance improvement of AVX512. > I found the performance in VectorizedLogicBench is like following > || ||AVX2 us/op||AVX512 us/op || (AVX2-AVX512)/AVX2|| > |ColAndColBench|122510| 87014| 28.9%| > |IfExprLongColumnLongColumnBench | 1325759| 1436073| -8.3% | > |IfExprLongColumnRepeatingLongColumnBench|1397447|1480450| -5.9%| > |IfExprRepeatingLongColumnLongColumnBench|1401164|1483062| -5.9% | > |NotColBench|77042.83|51513.28| 33%| > There are degradation in > IfExprLongColumnLongColumnBench,IfExprLongColumnRepeatingLongColumnBench, > IfExprRepeatingLongColumnLongColumnBench, very confused why there is > degradation on IfExprLongColumnLongColumnBench cases. > Here we use {{taskset -cp 1 $pid}} to run the benchmark on single core to > avoid the impact of dynamic CPU frequency scaling. > my script > {code} > export JAVA_HOME=/home/zly/jdk-9.0.1/ > export PATH=$JAVA_HOME/bin:$PATH > export LD_LIBRARY_PATH=/home/zly/jdk-9.0.1/mylib > for i in 0 1 2; do > java -server -XX:UseAVX=3 -jar benchmarks.jar > org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 > -f 1 -bm avgt -tu us >log.logic.avx3.single.$i & export pid=$! > taskset -cp 1 $pid > wait $pid > done > for i in 0 1 2; do > java -server -XX:UseAVX=2 -jar benchmarks.jar > org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 > -f 1 -bm avgt -tu us >log.logic.avx2.single.$i & export pid=$! > taskset -cp 1 $pid > wait $pid > done > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)