zhangbutao commented on code in PR #4744:
URL: https://github.com/apache/hive/pull/4744#discussion_r1421703321
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/impl/FindColumnsWithStatsHandler.java:
##########
@@ -37,11 +37,16 @@ public class FindColumnsWithStatsHandler implements
QueryHandler<List<String>> {
//language=SQL
private static final String TABLE_SELECT = "SELECT \"COLUMN_NAME\" FROM
\"TAB_COL_STATS\" " +
- "WHERE \"DB_NAME\" = :dbName AND \"TABLE_NAME\" = :tableName";
+ "INNER JOIN \"TBLS\" ON \"TAB_COL_STATS\".\"TBL_ID\" =
\"TBLS\".\"TBL_ID\" " +
Review Comment:
Chime in with some thoughts :)
Not sure if the index can be used to avoid performance degression when
multi-join.
There are many read(select) operations related to statistics in Hive,
especially in CBO stage. Sometimes the performance of mutli-join operation in
MySQL is bad than single table operation, and the multi-join also cause mysql
performance stress.
https://github.com/apache/hive/pull/4744/files#diff-bcca13f6cc251df321e8fe80568ef0334a1d44f7e5e7ff2fcaa06ab4f05bbdf9
`MetaStoreDirectSql.java` also changed the stats related operation from single
table to multi-join operation.
Can we do some performance stress tests to verify that performance won't
decline. (maybe not easy to test)
Thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]