dengzhhu653 commented on code in PR #5950:
URL: https://github.com/apache/hive/pull/5950#discussion_r2371007808


##########
ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java:
##########
@@ -182,6 +184,48 @@ public Object process(StatsAggregator statsAggregator) 
throws HiveException, Met
         parameters.putAll(providedBasicStats);
       }
 
+      try {
+        long totalSize = 0L, numFiles = 0L;
+        // the stats are in the parameters already
+        String ts = parameters.get(StatsSetupConst.TOTAL_SIZE);
+        String nf = parameters.get(StatsSetupConst.NUM_FILES);
+        if (ts != null && nf != null) {
+          try {
+            totalSize = Long.parseLong(ts);
+            numFiles  = Long.parseLong(nf);
+          } catch (NumberFormatException ignore) {
+          }
+        }
+        if (numFiles > 1 && totalSize > 0) {
+          long threshold = (conf != null)
+                  ? 
conf.getLongVar(HiveConf.ConfVars.HIVE_MERGE_MAP_FILES_AVG_SIZE)
+                  : 
HiveConf.ConfVars.HIVE_MERGE_MAP_FILES_AVG_SIZE.defaultLongVal;
+
+          long avg = totalSize / numFiles;
+          if (avg <= threshold) {
+            // both work for non-partitioned and partitioned tables
+            String who = (p.getPartition() == null)
+                    ? ("table " + p.getTable().getFullyQualifiedName())
+                    : ("partition " + p.getPartition().getName());
+
+            // add the small files warnings in the log
+            LOG.info("[ANALYZE] Small files detected: {} avgBytes={}, 
files={}, totalBytes={}",

Review Comment:
   this message will be logged as well in case ss != null && ss.getConsole() != 
null, which may lead to duplication in the log file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to