[ https://issues.apache.org/jira/browse/HIVE-21040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726254#comment-16726254 ]
Hive QA commented on HIVE-21040: -------------------------------- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 16s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 46s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 1m 5s{color} | {color:blue} standalone-metastore/metastore-server in master has 188 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 42s{color} | {color:blue} ql in master has 2310 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 13s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 26s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-15410/dev-support/hive-personality.sh | | git revision | master / e103abc | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | modules | C: standalone-metastore/metastore-server ql U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-15410/yetus.txt | | Powered by | Apache Yetus http://yetus.apache.org | This message was automatically generated. > msck does unnecessary file listing at last level of directory tree > ------------------------------------------------------------------ > > Key: HIVE-21040 > URL: https://issues.apache.org/jira/browse/HIVE-21040 > Project: Hive > Issue Type: Improvement > Reporter: Vihang Karajgaonkar > Assignee: Vihang Karajgaonkar > Priority: Major > Attachments: HIVE-21040.01.patch, HIVE-21040.02.patch, > HIVE-21040.03.patch, HIVE-21040.04.patch > > > Here is the code snippet which is run by {{msck}} to list directories > {noformat} > final Path currentPath = pd.p; > final int currentDepth = pd.depth; > FileStatus[] fileStatuses = fs.listStatus(currentPath, > FileUtils.HIDDEN_FILES_PATH_FILTER); > // found no files under a sub-directory under table base path; it is > possible that the table > // is empty and hence there are no partition sub-directories created > under base path > if (fileStatuses.length == 0 && currentDepth > 0 && currentDepth < > partColNames.size()) { > // since maxDepth is not yet reached, we are missing partition > // columns in currentPath > logOrThrowExceptionWithMsg( > "MSCK is missing partition columns under " + > currentPath.toString()); > } else { > // found files under currentPath add them to the queue if it is a > directory > for (FileStatus fileStatus : fileStatuses) { > if (!fileStatus.isDirectory() && currentDepth < > partColNames.size()) { > // found a file at depth which is less than number of partition > keys > logOrThrowExceptionWithMsg( > "MSCK finds a file rather than a directory when it searches > for " > + fileStatus.getPath().toString()); > } else if (fileStatus.isDirectory() && currentDepth < > partColNames.size()) { > // found a sub-directory at a depth less than number of partition > keys > // validate if the partition directory name matches with the > corresponding > // partition colName at currentDepth > Path nextPath = fileStatus.getPath(); > String[] parts = nextPath.getName().split("="); > if (parts.length != 2) { > logOrThrowExceptionWithMsg("Invalid partition name " + > nextPath); > } else if > (!parts[0].equalsIgnoreCase(partColNames.get(currentDepth))) { > logOrThrowExceptionWithMsg( > "Unexpected partition key " + parts[0] + " found at " + > nextPath); > } else { > // add sub-directory to the work queue if maxDepth is not yet > reached > pendingPaths.add(new PathDepthInfo(nextPath, currentDepth + 1)); > } > } > } > if (currentDepth == partColNames.size()) { > return currentPath; > } > } > {noformat} > You can see that when the {{currentDepth}} at the {{maxDepth}} it still does > a unnecessary listing of the files. We can improve this call by checking the > currentDepth and bailing out early. > This can improve the performance of msck command significantly especially > when there are lot of files in each partitions on remote filesystems like S3 > or ADLS -- This message was sent by Atlassian JIRA (v7.6.3#76005)