Fei Wang created SPARK-20184: -------------------------------- Summary: performance regression for complex sql when enable codegen Key: SPARK-20184 URL: https://issues.apache.org/jira/browse/SPARK-20184 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.1.0, 1.6.0 Reporter: Fei Wang
Execute flowing sql with spark 2.x when codegen enabled, the performance is muchworse than the case when turn off codegen. SELECT sum(COUNTER_57) ,sum(COUNTER_71) ,sum(COUNTER_3) ,sum(COUNTER_70) ,sum(COUNTER_66) ,sum(COUNTER_75) ,sum(COUNTER_69) ,sum(COUNTER_55) ,sum(COUNTER_63) ,sum(COUNTER_68) ,sum(COUNTER_56) ,sum(COUNTER_37) ,sum(COUNTER_51) ,sum(COUNTER_42) ,sum(COUNTER_43) ,sum(COUNTER_1) ,sum(COUNTER_76) ,sum(COUNTER_54) ,sum(COUNTER_44) ,sum(COUNTER_46) ,DIM_1 ,DIM_2 ,DIM_3 FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100; codegen on: 40s codegen off: 6s after some analysis, i think this is related to the huge java method which generated when codegen on. And If i config -XX:-DontCompileHugeMethods the performance of codegen on get much better. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org