zabetak commented on code in PR #4755:
URL: https://github.com/apache/hive/pull/4755#discussion_r1440459341
##########
common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java:
##########
@@ -92,6 +94,8 @@ public class PerfLogger {
protected final Map<String, Long> startTimes = new ConcurrentHashMap<>();
protected final Map<String, Long> endTimes = new ConcurrentHashMap<>();
+ protected final Map<String, Long> durations = new ConcurrentHashMap<>();
Review Comment:
Why do we need a separate data structure for duration? If we have startTime
and endTime for every distinct entry that should be enough.
##########
ql/src/java/org/apache/hadoop/hive/ql/Compiler.java:
##########
@@ -504,6 +506,15 @@ private void cleanUp(Throwable compileException, boolean
parsed, boolean deferCl
driverState.compilationFinishedWithLocking(compileException != null);
LOG.info("Completed compiling command(queryId={}); Time taken: {}
seconds", driverContext.getQueryId(),
duration);
+ if (HiveConf.getBoolVar(driverContext.getConf(),
HiveConf.ConfVars.HIVE_COMPILE_SUMMARY)) {
+ LogHelper console = SessionState.getConsole();
+ console.printInfo("Query Compilation Summary");
+ console.printInfo(SEPARATOR);
+ for (String qs : perfLogger.getQuerySteps()) {
+ console.printInfo(String.format("%-99s %9s", qs,
perfLogger.getDurations().get(qs) + " ms"));
+ }
+ console.printInfo(SEPARATOR);
+ }
Review Comment:
The `Query Execution Summary` is handled in a subclass of `PrintSummary`
shouldn't we encapsulate this logic in a similar fashion?
##########
common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java:
##########
@@ -152,10 +178,18 @@ public long perfLogEnd(String callerName, String method) {
* @return long duration the difference between now and startTime, or -1 if
startTime is null
*/
public long perfLogEnd(String callerName, String method, String
additionalInfo) {
- long startTime = startTimes.getOrDefault(method, -1L);
+ String key = method;
+ if (additionalInfo != null) {
+ key = key + " - " + additionalInfo;
+ }
+ long startTime = startTimes.getOrDefault(key, -1L);
long endTime = System.currentTimeMillis();
- long duration = startTime < 0 ? -1 : endTime - startTime;
- endTimes.put(method, Long.valueOf(endTime));
+ long duration = startTime < 0 ? -1 : (endTime - startTime +
durations.getOrDefault(key, 0L));
+ endTimes.put(key, Long.valueOf(endTime));
+ durations.put(key, duration);
+ synchronized(querySteps) {
+ querySteps.add(key);
Review Comment:
The `querySteps` data structure and this code seems redundant since the same
`key` seems to be present in `startTimes`, `endTimes`, and `durations`.
##########
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##########
@@ -76,8 +76,8 @@
* Handles transaction related duties of the Driver.
*/
class DriverTxnHandler {
- private static final String CLASS_NAME = Driver.class.getName();
- private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME);
+ private static final String SIMPLE_CLASS_NAME = Driver.class.getSimpleName();
Review Comment:
Why did we switch from full name to simple name? I suppose it is for
readability purposes but this may have undesirable side effects so maybe we
should defer this part to another PR.
##########
common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java:
##########
@@ -129,9 +133,31 @@ public static void setPerfLogger(PerfLogger
resetPerfLogger) {
* @param method method or ID that identifies this perf log element.
*/
public void perfLogBegin(String callerName, String method) {
+ perfLogBegin(callerName, method, null);
+ }
+
+ /**
+ * Call this function when you start to measure time spent by a piece of
code.
+ * @param callerName the logging object to be used.
+ * @param method method or ID that identifies this perf log element.
+ * @param additionalInfo particular step within the method that is being
measured
+ */
+ public void perfLogBegin(String callerName, String method, String
additionalInfo) {
long startTime = System.currentTimeMillis();
- startTimes.put(method, Long.valueOf(startTime));
- LOG.debug("<PERFLOG method={} from={}>", method, callerName);
+ String key = method;
+ if (additionalInfo != null) {
+ key = key + " - " + additionalInfo;
Review Comment:
It doesn't seem that `additionalInfo` was ever meant to be used as a key.
Instead of concatenating the key with the info in order to obtain a unique id
maybe it would be better that the caller passes the appropriate measurement
event in the first place. This class has many static constants for the various
events that we want to track but as far as I understand we want to have more
fine grained measurements.
For instance, it seems that we want to track `PerfLogger.OPTIMIZER` in more
detail. If that's the case then possibly it makes sense to introduce more
entries for the various sub-steps.
##########
ql/src/java/org/apache/hadoop/hive/ql/Driver.java:
##########
@@ -71,8 +71,8 @@
*/
public class Driver implements IDriver {
- private static final String CLASS_NAME = Driver.class.getName();
- private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME);
+ private static final String SIMPLE_CLASS_NAME = Driver.class.getName();
Review Comment:
The rename is confusing since we still call `Class#getName` method.
##########
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##########
@@ -1995,7 +1998,8 @@ protected RelNode
applyMaterializedViewRewriting(RelOptPlanner planner, RelNode
return calcitePreMVRewritingPlan;
}
- perfLogger.perfLogBegin(this.getClass().getName(), PerfLogger.OPTIMIZER);
+ perfLogger.perfLogBegin(this.getClass().getSimpleName(),
PerfLogger.OPTIMIZER
+ , "Calcite: View-based rewriting");
Review Comment:
Instead of passing additional info in begin maybe we should decompose the
`PerfLogger.OPTIMIZER` entry into multiple ones.
##########
ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java:
##########
@@ -76,8 +76,8 @@
* Handles transaction related duties of the Driver.
*/
class DriverTxnHandler {
- private static final String CLASS_NAME = Driver.class.getName();
- private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME);
+ private static final String SIMPLE_CLASS_NAME = Driver.class.getSimpleName();
+ private static final Logger LOG = LoggerFactory.getLogger(SIMPLE_CLASS_NAME);
Review Comment:
If we change the logger name then we may have to change the log4j
configuration files. Is this a breaking change? Do we need to do it?
##########
common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java:
##########
@@ -207,6 +241,14 @@ public Map<String, Long> getStartTimes() {
return ImmutableMap.copyOf(startTimes);
}
+ public Set<String> getQuerySteps() {
+ return querySteps;
+ }
+
+ public Map<String, Long> getDurations() {
+ return durations;
+ }
+
Review Comment:
Returning modifiable data structures especially when we are dealing with
thread safe classes is not a good idea.
##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##########
@@ -1932,7 +1932,7 @@ public List<String> getTablesForDb(String database,
String tablePattern)
public List<String> getTablesByType(String dbName, String pattern, TableType
type)
throws HiveException {
PerfLogger perfLogger = SessionState.getPerfLogger();
- perfLogger.perfLogBegin(CLASS_NAME, PerfLogger.HIVE_GET_TABLE);
+ perfLogger.perfLogBegin(SIMPLE_CLASS_NAME, PerfLogger.HIVE_GET_TABLE,
"HS2-cache");
Review Comment:
The `PerfLogger.HIVE_GET_TABLE` entry is used only here so I am not sure if
we really need to pass the additional info (`HS2-cache`). The same comment
applies to other places where we pass something in `perfLogBegin`.
##########
common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java:
##########
@@ -92,6 +94,8 @@ public class PerfLogger {
protected final Map<String, Long> startTimes = new ConcurrentHashMap<>();
Review Comment:
Should we add some tests in TestPerfLogger, TestQueryDisplay, or somewhere
else to ensure that the data structures are populated correctly when the query
compiles?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]