zhangbutao commented on code in PR #4744:
URL: https://github.com/apache/hive/pull/4744#discussion_r1421703321


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/impl/FindColumnsWithStatsHandler.java:
##########
@@ -37,11 +37,16 @@ public class FindColumnsWithStatsHandler implements 
QueryHandler<List<String>> {
 
   //language=SQL
   private static final String TABLE_SELECT = "SELECT \"COLUMN_NAME\" FROM 
\"TAB_COL_STATS\" " +
-      "WHERE \"DB_NAME\" = :dbName AND \"TABLE_NAME\" = :tableName";
+      "INNER JOIN \"TBLS\" ON \"TAB_COL_STATS\".\"TBL_ID\" = 
\"TBLS\".\"TBL_ID\" " +

Review Comment:
   Chime in with some thoughts :)
   Not sure if the index can be used to avoid performance degression when 
multi-join.
   There are many read(select) operations related to statistics in Hive, 
especially in CBO stage. Sometimes the performance of mutli-join operation in 
MySQL is bad than single table operation, and the multi-join also cause mysql 
performance stress. 
   
https://github.com/apache/hive/pull/4744/files#diff-bcca13f6cc251df321e8fe80568ef0334a1d44f7e5e7ff2fcaa06ab4f05bbdf9
 `MetaStoreDirectSql.java` also changed the stats related operation from single 
table to multi-join operation.
   
   Can we do some performance stress tests to verify that performance won't 
decline. (maybe not easy to test)
   Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to