[ https://issues.apache.org/jira/browse/HIVE-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895818#comment-15895818 ]
Steve Loughran commented on HIVE-14864: --------------------------------------- {{FileSystem.getContentSummary()}} does a recursive treewalk, so is pathologically bad on a blobstore which has to mock directories through many., many HTTP requests. If you need to use it, could you actually supply a patch (+ FS contract tests) for the method so that it uses listFiles(path, recursive=true)? That does the same treewalk against HDFS, but blobstores can do it as an O(1) listing call instead. If you can get that patch in, then enumerating the size of a blobstore tree will be fast > Distcp is not called from MoveTask when src is a directory > ---------------------------------------------------------- > > Key: HIVE-14864 > URL: https://issues.apache.org/jira/browse/HIVE-14864 > Project: Hive > Issue Type: Bug > Reporter: Vihang Karajgaonkar > Assignee: Sahil Takiar > Attachments: HIVE-14864.1.patch, HIVE-14864.2.patch, > HIVE-14864.3.patch, HIVE-14864.patch > > > In FileUtils.java the following code does not get executed even when src > directory size is greater than HIVE_EXEC_COPYFILE_MAXSIZE because > srcFS.getFileStatus(src).getLen() returns 0 when src is a directory. We > should use srcFS.getContentSummary(src).getLength() instead. > {noformat} > /* Run distcp if source file/dir is too big */ > if (srcFS.getUri().getScheme().equals("hdfs") && > srcFS.getFileStatus(src).getLen() > > conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE)) { > LOG.info("Source is " + srcFS.getFileStatus(src).getLen() + " bytes. > (MAX: " + conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE) + > ")"); > LOG.info("Launch distributed copy (distcp) job."); > HiveConfUtil.updateJobCredentialProviders(conf); > copied = shims.runDistCp(src, dst, conf); > if (copied && deleteSource) { > srcFS.delete(src, true); > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)