-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71766/
-----------------------------------------------------------

Review request for hive, Peter Vary and Slim Bouguerra.


Repository: hive-git


Description
-------

Recently I wanted to add some additional capability, and add more, performance 
logging to support my troubleshooting efforts. I started looking at PerfLogger 
and started to examine its usage. I discovered a few things:

Since 'loggers' must be open and closed manually, I found a couple of places 
where loggers were opened, but not closed, rendering them useless
Since 'loggers' must be closed manually, I found a few places where an 
early-return or Exception thrown would cause a logger to not be closed, thereby 
rendering it useless
Session information is not logged, so it can be difficult to precisely pinpoint 
which session is taking lots of time
PerfLogger overloaded. Most of the time, it's being used as a simple timer 
mechanism with automatic logging in SLF4J debug. However, it is also a facade 
over the Hive Metrics subsystem and timing results are automatically published 
to Metrics and then there becomes this dependency on a 'logger' to be able to 
access metric data as well.
The last bullet is the most challenging part and why I propose to deprecate the 
Hive PerfLogger and not simply remove it. I am proposing a new system... a 
PerfTimer that is allows for Java 8's try-with-resources feature to protect 
against the developer having to care about manually close measurements and not 
having to carefully consider all early-exits. The base implementation logs to 
SLF4J. An extended version automatically publishes to the Hive Metric subsystem 
as well.

The current Hive PerfLogger has a bit of a clunky system for allowing plugable 
implementations. However, the current default implementation has a side-effect 
of also publishing timing information to the Hive Metrics subsystem. There are 
code sections that look up various timers in the Metrics Subsytem and publish 
the results back to the client. Since, in theory, the implementation is 
plugable, any other implementation that does not also have this side-effect of 
also publishing to the Metrics Subsystem will break these non-optional code 
paths.  Also, these code paths create and interact with PerfLoggers in a static 
way, and then the publishing code pulls the data from the {{PerfLogger}} (as a 
facade to the Metrics subsystem) in a static way. Therefore, when I tried to 
replace the entire PerfLogger code, I came across an issue because there is not 
(and should not) be a way to just statically pull this information down from 
any point in the code. Information that is required for publish
 ing should be passed around within some sort of context object, separate from 
the Metrics subsystem. There was no obvious way to string a new PerfTimer to 
all the required locations. I propose marking the PerfLogger as deprecated and 
leaving these complex section alone. Instead, replace only the simple "I want a 
timer" use cases.


Diffs
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 282f4cdb0b 
  common/src/java/org/apache/hadoop/hive/ql/log/CachedPerfTimerLogger.java 
PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/ql/log/LoggingPerfTimerLogger.java 
PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/ql/log/MetricsPerfTimerLogger.java 
PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java 2707987f0b 
  common/src/java/org/apache/hadoop/hive/ql/log/PerfTimedAction.java 
PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/ql/log/PerfTimer.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/ql/log/PerfTimerFactory.java 
PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/ql/log/PerfTimerLogger.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 91910d1c0c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 0643a54753 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 695d08bbe2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java 
e205c08d84 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java 
10144a1352 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java a7770b4e53 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
 b9285accbd 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
530131f207 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java 8244dcb1a9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
806deb5f31 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
f29a9f807c 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
07cb5cb936 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 92775107bc 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MapRecordProcessor.java 
8c9d53f521 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MergeFileRecordProcessor.java 
13f5f12989 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/RecordProcessor.java 
6697f62d13 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 
03edbf7bdb 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java 
72446afeda 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java fa6160fe3c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java aecd1084e6 
  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
1f72477666 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java c97c961481 
  ql/src/java/org/apache/hadoop/hive/ql/log/LogDivertAppender.java dd25f622c7 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 6143e85664 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMaterializedViewsRegistry.java
 4592f5ec34 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 25e9cd0482 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Transform.java 6c57797177 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
673d8580d5 
  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 91ec00b9c6 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 7d5807720b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
24429b4a1f 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java e224f2c348 


Diff: https://reviews.apache.org/r/71766/diff/1/


Testing
-------


Thanks,

David Mollitor

Reply via email to