[ https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
BELUGA BEHR updated HIVE-21071: ------------------------------- Status: Open (was: Patch Available) > Improve getInputSummary > ----------------------- > > Key: HIVE-21071 > URL: https://issues.apache.org/jira/browse/HIVE-21071 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 > Affects Versions: 3.1.1, 3.0.0, 4.0.0 > Reporter: BELUGA BEHR > Assignee: BELUGA BEHR > Priority: Major > Attachments: HIVE-21071.1.patch, HIVE-21071.2.patch > > > There is a global lock in the {{getInptSummary}} code, so it is important > that it be fast. The current implementation has quite a bit of overhead that > can be re-engineered. > For example, the current implementation keeps a map of File Path to > ContentSummary object. This map is populated by several threads > concurrently. The method then loops through the map, in a single thread, at > the end to add up all of the ContentSummary objects and ignores the paths. > The code can be be re-engineered to not use a map, or a collection at all, to > store the results and instead just keep a running tally. By keeping a tally, > there is no {{O\(n)}} operation at the end to perform the addition. > There are other things can be improved. The method returns an object which > is never used anywhere, so change method to void return type. -- This message was sent by Atlassian JIRA (v7.6.3#76005)