Vihang Karajgaonkar created HIVE-14864:
------------------------------------------

             Summary: Distcp is not called from MoveTask when src is a directory
                 Key: HIVE-14864
                 URL: https://issues.apache.org/jira/browse/HIVE-14864
             Project: Hive
          Issue Type: Bug
            Reporter: Vihang Karajgaonkar
            Assignee: Vihang Karajgaonkar


In FileUtils.java the following code does not get executed even when src 
directory size is greater than HIVE_EXEC_COPYFILE_MAXSIZE because 
srcFS.getFileStatus(src).getLen() returns 0 when src is a directory. We should 
use srcFS.getContentSummary(src).getLength() instead.

{noformat}
    /* Run distcp if source file/dir is too big */
    if (srcFS.getUri().getScheme().equals("hdfs") &&
        srcFS.getFileStatus(src).getLen() > 
conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE)) {
      LOG.info("Source is " + srcFS.getFileStatus(src).getLen() + " bytes. 
(MAX: " + conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE) + ")");
      LOG.info("Launch distributed copy (distcp) job.");
      HiveConfUtil.updateJobCredentialProviders(conf);
      copied = shims.runDistCp(src, dst, conf);
      if (copied && deleteSource) {
        srcFS.delete(src, true);
      }
    }
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to