[ 
https://issues.apache.org/jira/browse/HIVE-21040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726254#comment-16726254
 ] 

Hive QA commented on HIVE-21040:
--------------------------------

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m  
5s{color} | {color:blue} standalone-metastore/metastore-server in master has 
188 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
42s{color} | {color:blue} ql in master has 2310 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 26s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15410/dev-support/hive-personality.sh
 |
| git revision | master / e103abc |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: standalone-metastore/metastore-server ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15410/yetus.txt |
| Powered by | Apache Yetus    http://yetus.apache.org |


This message was automatically generated.



> msck does unnecessary file listing at last level of directory tree
> ------------------------------------------------------------------
>
>                 Key: HIVE-21040
>                 URL: https://issues.apache.org/jira/browse/HIVE-21040
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>         Attachments: HIVE-21040.01.patch, HIVE-21040.02.patch, 
> HIVE-21040.03.patch, HIVE-21040.04.patch
>
>
> Here is the code snippet which is run by {{msck}} to list directories
> {noformat}
> final Path currentPath = pd.p;
>       final int currentDepth = pd.depth;
>       FileStatus[] fileStatuses = fs.listStatus(currentPath, 
> FileUtils.HIDDEN_FILES_PATH_FILTER);
>       // found no files under a sub-directory under table base path; it is 
> possible that the table
>       // is empty and hence there are no partition sub-directories created 
> under base path
>       if (fileStatuses.length == 0 && currentDepth > 0 && currentDepth < 
> partColNames.size()) {
>         // since maxDepth is not yet reached, we are missing partition
>         // columns in currentPath
>         logOrThrowExceptionWithMsg(
>             "MSCK is missing partition columns under " + 
> currentPath.toString());
>       } else {
>         // found files under currentPath add them to the queue if it is a 
> directory
>         for (FileStatus fileStatus : fileStatuses) {
>           if (!fileStatus.isDirectory() && currentDepth < 
> partColNames.size()) {
>             // found a file at depth which is less than number of partition 
> keys
>             logOrThrowExceptionWithMsg(
>                 "MSCK finds a file rather than a directory when it searches 
> for "
>                     + fileStatus.getPath().toString());
>           } else if (fileStatus.isDirectory() && currentDepth < 
> partColNames.size()) {
>             // found a sub-directory at a depth less than number of partition 
> keys
>             // validate if the partition directory name matches with the 
> corresponding
>             // partition colName at currentDepth
>             Path nextPath = fileStatus.getPath();
>             String[] parts = nextPath.getName().split("=");
>             if (parts.length != 2) {
>               logOrThrowExceptionWithMsg("Invalid partition name " + 
> nextPath);
>             } else if 
> (!parts[0].equalsIgnoreCase(partColNames.get(currentDepth))) {
>               logOrThrowExceptionWithMsg(
>                   "Unexpected partition key " + parts[0] + " found at " + 
> nextPath);
>             } else {
>               // add sub-directory to the work queue if maxDepth is not yet 
> reached
>               pendingPaths.add(new PathDepthInfo(nextPath, currentDepth + 1));
>             }
>           }
>         }
>         if (currentDepth == partColNames.size()) {
>           return currentPath;
>         }
>       }
> {noformat}
> You can see that when the {{currentDepth}} at the {{maxDepth}} it still does 
> a unnecessary listing of the files. We can improve this call by checking the 
> currentDepth and bailing out early.
> This can improve the performance of msck command significantly especially 
> when there are lot of files in each partitions on remote filesystems like S3 
> or ADLS



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to