Re: [PR] HIVE-27751 Log Query Compilation summary in an accumulated way [hive]

via GitHub Wed, 03 Jan 2024 06:38:10 -0800


zabetak commented on code in PR #4755:
URL: https://github.com/apache/hive/pull/4755#discussion_r1440459341



##########
common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java:
##########
@@ -92,6 +94,8 @@ public class PerfLogger {
 
   protected final Map<String, Long> startTimes = new ConcurrentHashMap<>();
   protected final Map<String, Long> endTimes = new ConcurrentHashMap<>();
+  protected final Map<String, Long> durations = new ConcurrentHashMap<>();

Review Comment:
   Why do we need a separate data structure for duration? If we have startTime 
and endTime for every distinct entry that should be enough.



##########
ql/src/java/org/apache/hadoop/hive/ql/Compiler.java:
##########
@@ -504,6 +506,15 @@ private void cleanUp(Throwable compileException, boolean 
parsed, boolean deferCl
       driverState.compilationFinishedWithLocking(compileException != null);
       LOG.info("Completed compiling command(queryId={}); Time taken: {} 
seconds", driverContext.getQueryId(),
           duration);
+      if (HiveConf.getBoolVar(driverContext.getConf(), 
HiveConf.ConfVars.HIVE_COMPILE_SUMMARY)) {
+        LogHelper console = SessionState.getConsole();
+        console.printInfo("Query Compilation Summary");
+        console.printInfo(SEPARATOR);
+        for (String qs : perfLogger.getQuerySteps()) {
+          console.printInfo(String.format("%-99s %9s", qs, 
perfLogger.getDurations().get(qs) + " ms"));
+        }
+        console.printInfo(SEPARATOR);
+      }

Review Comment:
   The `Query Execution Summary` is handled in a subclass of `PrintSummary` 
shouldn't we encapsulate this logic in a similar fashion?



##########
common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java:
##########
@@ -152,10 +178,18 @@ public long perfLogEnd(String callerName, String method) {
    * @return long duration  the difference between now and startTime, or -1 if 
startTime is null
    */
   public long perfLogEnd(String callerName, String method, String 
additionalInfo) {
-    long startTime = startTimes.getOrDefault(method, -1L);
+    String key = method;
+    if (additionalInfo != null) {
+      key = key + " - " + additionalInfo;
+    }
+    long startTime = startTimes.getOrDefault(key, -1L);
     long endTime = System.currentTimeMillis();
-    long duration = startTime < 0 ? -1 : endTime - startTime;
-    endTimes.put(method, Long.valueOf(endTime));
+    long duration = startTime < 0 ? -1 : (endTime - startTime + 
durations.getOrDefault(key, 0L));
+    endTimes.put(key, Long.valueOf(endTime));
+    durations.put(key, duration);
+    synchronized(querySteps) {
+      querySteps.add(key);

Review Comment:
   The `querySteps` data structure and this code seems redundant since the same 
`key` seems to be present in `startTimes`, `endTimes`, and `durations`.



##########
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##########
@@ -76,8 +76,8 @@
  * Handles transaction related duties of the Driver.
  */
 class DriverTxnHandler {
-  private static final String CLASS_NAME = Driver.class.getName();
-  private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME);
+  private static final String SIMPLE_CLASS_NAME = Driver.class.getSimpleName();

Review Comment:
   Why did we switch from full name to simple name? I suppose it is for 
readability purposes but this may have undesirable side effects so maybe we 
should defer this part to another PR.



##########
common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java:
##########
@@ -129,9 +133,31 @@ public static void setPerfLogger(PerfLogger 
resetPerfLogger) {
    * @param method method or ID that identifies this perf log element.
    */
   public void perfLogBegin(String callerName, String method) {
+    perfLogBegin(callerName, method, null);
+  }
+
+  /**
+   * Call this function when you start to measure time spent by a piece of 
code.
+   * @param callerName the logging object to be used.
+   * @param method method or ID that identifies this perf log element.
+   * @param additionalInfo particular step within the method that is being 
measured
+   */
+  public void perfLogBegin(String callerName, String method, String 
additionalInfo) {
     long startTime = System.currentTimeMillis();
-    startTimes.put(method, Long.valueOf(startTime));
-    LOG.debug("<PERFLOG method={} from={}>", method, callerName);
+    String key = method;
+    if (additionalInfo != null) {
+      key = key + " - " + additionalInfo;

Review Comment:
   It doesn't seem that `additionalInfo` was ever meant to be used as a key. 
Instead of concatenating the key with the info in order to obtain a unique id 
maybe it would be better that the caller passes the appropriate measurement 
event in the first place. This class has many static constants for the various 
events that we want to track but as far as I understand we want to have more 
fine grained measurements. 
   
   For instance, it seems that we want to track `PerfLogger.OPTIMIZER` in more 
detail. If that's the case then possibly it makes sense to introduce more 
entries for the various sub-steps.



##########
ql/src/java/org/apache/hadoop/hive/ql/Driver.java:
##########
@@ -71,8 +71,8 @@
  */
 public class Driver implements IDriver {
 
-  private static final String CLASS_NAME = Driver.class.getName();
-  private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME);
+  private static final String SIMPLE_CLASS_NAME = Driver.class.getName();

Review Comment:
   The rename is confusing since we still call `Class#getName` method.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##########
@@ -1995,7 +1998,8 @@ protected RelNode 
applyMaterializedViewRewriting(RelOptPlanner planner, RelNode
         return calcitePreMVRewritingPlan;
       }
 
-      perfLogger.perfLogBegin(this.getClass().getName(), PerfLogger.OPTIMIZER);
+      perfLogger.perfLogBegin(this.getClass().getSimpleName(), 
PerfLogger.OPTIMIZER
+          , "Calcite: View-based rewriting");

Review Comment:
   Instead of passing additional info in begin maybe we should decompose the 
`PerfLogger.OPTIMIZER` entry into multiple ones.



##########
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##########
@@ -76,8 +76,8 @@
  * Handles transaction related duties of the Driver.
  */
 class DriverTxnHandler {
-  private static final String CLASS_NAME = Driver.class.getName();
-  private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME);
+  private static final String SIMPLE_CLASS_NAME = Driver.class.getSimpleName();
+  private static final Logger LOG = LoggerFactory.getLogger(SIMPLE_CLASS_NAME);

Review Comment:
   If we change the logger name then we may have to change the log4j 
configuration files. Is this a breaking change? Do we need to do it?



##########
common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java:
##########
@@ -207,6 +241,14 @@ public Map<String, Long> getStartTimes() {
     return ImmutableMap.copyOf(startTimes);
   }
 
+  public Set<String> getQuerySteps() {
+    return querySteps;
+  }
+
+  public Map<String, Long> getDurations() {
+    return durations;
+  }
+

Review Comment:
   Returning modifiable data structures especially when we are dealing with 
thread safe classes is not a good idea.



##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##########
@@ -1932,7 +1932,7 @@ public List<String> getTablesForDb(String database, 
String tablePattern)
   public List<String> getTablesByType(String dbName, String pattern, TableType 
type)
       throws HiveException {
     PerfLogger perfLogger = SessionState.getPerfLogger();
-    perfLogger.perfLogBegin(CLASS_NAME, PerfLogger.HIVE_GET_TABLE);
+    perfLogger.perfLogBegin(SIMPLE_CLASS_NAME, PerfLogger.HIVE_GET_TABLE, 
"HS2-cache");

Review Comment:
   The `PerfLogger.HIVE_GET_TABLE` entry is used only here so I am not sure if 
we really need to pass the additional info (`HS2-cache`). The same comment 
applies to other places where we pass something in `perfLogBegin`.



##########
common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java:
##########
@@ -92,6 +94,8 @@ public class PerfLogger {
 
   protected final Map<String, Long> startTimes = new ConcurrentHashMap<>();

Review Comment:
   Should we add some tests in TestPerfLogger, TestQueryDisplay, or somewhere 
else to ensure that the data structures are populated correctly when the query 
compiles?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-27751 Log Query Compilation summary in an accumulated way [hive]

Reply via email to