[ https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729623#comment-16729623 ]
BELUGA BEHR commented on HIVE-21071: ------------------------------------ {{./ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java:2442: @VisibleForTesting:3: warning: Method length is 161 lines (max allowed is 150).}} That's not going to be corrected with this patch. That would require a larger overhaul. Please accept the provided patch for inclusion into the project. > Improve getInputSummary > ----------------------- > > Key: HIVE-21071 > URL: https://issues.apache.org/jira/browse/HIVE-21071 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 > Affects Versions: 3.0.0, 4.0.0, 3.1.1 > Reporter: BELUGA BEHR > Assignee: BELUGA BEHR > Priority: Major > Attachments: HIVE-21071.1.patch > > > There is a global lock in the {{getInptSummary}} code, so it is important > that it be fast. The current implementation has quite a bit of overhead that > can be re-engineered. > For example, the current implementation keeps a map of File Path to > ContentSummary object. This map is populated by several threads > concurrently. The method then loops through the map, in a single thread, at > the end to add up all of the ContentSummary objects and ignores the paths. > The code can be be re-engineered to not use a map, or a collection at all, to > store the results and instead just keep a running tally. By keeping a tally, > there is no {{O\(n)}} operation at the end to perform the addition. > There are other things can be improved. The method returns an object which > is never used anywhere, so change method to void return type. -- This message was sent by Atlassian JIRA (v7.6.3#76005)