Optimize CombineHiveFileInputFormat execution speed
---------------------------------------------------

                 Key: HIVE-1149
                 URL: https://issues.apache.org/jira/browse/HIVE-1149
             Project: Hadoop Hive
          Issue Type: Bug
            Reporter: Zheng Shao


When there are a lot of files and a lot of pools, CombineHiveFileInputFormat is 
pretty slow.
One of the culprit is the "new URI" call in the following function. We should 
try to get rid of it.

{code}
  protected static PartitionDesc getPartitionDescFromPath(
      Map<String, PartitionDesc> pathToPartitionInfo, Path dir) throws 
IOException {
    // The format of the keys in pathToPartitionInfo sometimes contains a port
    // and sometimes doesn't, so we just compare paths.
    for (Map.Entry<String, PartitionDesc> entry : pathToPartitionInfo
        .entrySet()) {
      try {
        if (new URI(entry.getKey()).getPath().equals(dir.toUri().getPath())) {
          return entry.getValue();
        }
      } catch (URISyntaxException e2) {
      }
    }
    throw new IOException("cannot find dir = " + dir.toString()
        + " in partToPartitionInfo!");
  }
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to