Jibing-Li commented on code in PR #25175:
URL: https://github.com/apache/doris/pull/25175#discussion_r1353830896
##########
fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalTable.java:
##########
@@ -635,6 +636,30 @@ public void gsonPostProcess() throws IOException {
super.gsonPostProcess();
estimatedRowCount = -1;
}
+
+ @Override
+ public List<Long> getChunkSizes() {
+ HiveMetaStoreCache.HivePartitionValues partitionValues =
StatisticsUtil.getPartitionValuesForTable(this);
+ List<HiveMetaStoreCache.FileCacheValue> filesByPartitions
+ = StatisticsUtil.getFilesForPartitions(this, partitionValues,
0);
+ List<Long> result = Lists.newArrayList();
+ for (HiveMetaStoreCache.FileCacheValue files : filesByPartitions) {
+ for (HiveMetaStoreCache.HiveFileStatus file : files.getFiles()) {
+ result.add(file.getLength());
+ }
+ }
+ return result;
+ }
+
+ @Override
+ public long getDataSize(boolean singleReplica) {
+ List<Long> chunkSizes = getChunkSizes();
Review Comment:
It is a heavy operation, as we discussed earlier, this brings a redundant
fetching of all the files.
Usually we can get total size of hive table in hms, but this call is not
only to get the total size, but the size of each file. We use it to calculate
the accurate sample ratio.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]