Neer393 commented on code in PR #6039:
URL: https://github.com/apache/hive/pull/6039#discussion_r2293531746
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergTableOptimizer.java:
##########
@@ -85,52 +85,53 @@ public IcebergTableOptimizer(HiveConf conf, TxnStore
txnHandler, MetadataCache m
*/
@Override
public Set<CompactionInfo> findPotentialCompactions(long lastChecked,
ShowCompactResponse currentCompactions,
- Set<String> skipDBs, Set<String> skipTables) {
+ Set<String> skipDBs,
Set<String> skipTables) {
Set<CompactionInfo> compactionTargets = Sets.newHashSet();
- getTableNames().stream()
- .filter(table -> !skipDBs.contains(table.getDb()))
- .filter(table -> !skipTables.contains(table.getNotEmptyDbTable()))
- .map(table -> {
- org.apache.hadoop.hive.ql.metadata.Table hiveTable =
getHiveTable(table.getDb(), table.getTable());
- org.apache.iceberg.Table icebergTable =
IcebergTableUtil.getTable(conf, hiveTable.getTTable());
- return Pair.of(hiveTable, icebergTable);
- })
- .filter(t -> hasNewCommits(t.getRight(),
- snapshotTimeMilCache.get(t.getLeft().getFullyQualifiedName())))
- .forEach(t -> {
- String qualifiedTableName = t.getLeft().getFullyQualifiedName();
- org.apache.hadoop.hive.ql.metadata.Table hiveTable = t.getLeft();
- org.apache.iceberg.Table icebergTable = t.getRight();
-
- if (icebergTable.spec().isPartitioned()) {
- List<org.apache.hadoop.hive.ql.metadata.Partition> partitions =
findModifiedPartitions(hiveTable,
- icebergTable, snapshotTimeMilCache.get(qualifiedTableName),
true);
-
- partitions.forEach(partition ->
addCompactionTargetIfEligible(hiveTable.getTTable(), icebergTable,
- partition.getName(), compactionTargets, currentCompactions,
skipDBs, skipTables));
-
- if (IcebergTableUtil.hasUndergonePartitionEvolution(icebergTable)
&& !findModifiedPartitions(hiveTable,
- icebergTable, snapshotTimeMilCache.get(qualifiedTableName),
false).isEmpty()) {
- addCompactionTargetIfEligible(hiveTable.getTTable(),
icebergTable,
- null, compactionTargets, currentCompactions, skipDBs,
skipTables);
- }
- } else {
- addCompactionTargetIfEligible(hiveTable.getTTable(), icebergTable,
null, compactionTargets,
- currentCompactions, skipDBs, skipTables);
- }
-
- snapshotTimeMilCache.put(qualifiedTableName,
icebergTable.currentSnapshot().timestampMillis());
- });
+ Iterable<Table> tables = getTables();
Review Comment:
> IcebergTableUtil.getTable would be making 1 HMS call per table. Would be
nice if we can optimize by caching the data (i.e. use
`HiveMetaStoreClientWithLocalCache` in both `HiveTableOperations` and
`TableFetcher`).
>
> Catalogs.loadTable -> HiveTableOperations.doRefresh ->
metaClients.run(client -> client.getTable(database, tableName));
>
> ```
> if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.MSC_CACHE_ENABLED)) {
> HiveMetaStoreClientWithLocalCache.init(conf);
> }
>
> MSC_CACHE_ENABLED("hive.metastore.client.cache.v2.enabled", true,
> "This property enables a Caffeine Cache for Metastore client")
> ```
I did not understand much but what I understood is that you are asking me to
make HiveTableOperations.java and TableFetcher.java make use of
HiveMetaStoreClientWithLocalCache for which, I have to just add the following
piece of code to both the files. Am I right?
```
if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.MSC_CACHE_ENABLED)) {
HiveMetaStoreClientWithLocalCache.init(conf);
}
```
Now comes the questions:-
1. How is Catalogs.loadTable invoking HiveTableOperations.doRefresh as when
I see in HiveTableOperations.doRefresh, it shows no usages ? Am I missing
something here ?
2. Even if we consider adding it to code, let's say I added this to
HiveTableOperations' constructor but adding this to TableFetcher is not
possible as TableFetcher does not have access to `conf`. One thing that can be
done is adding this to IcebergHouseKeeperService.java but I don't know if it
will add any value
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]