[ https://issues.apache.org/jira/browse/HIVE-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-1149: ----------------------------- Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) > Optimize CombineHiveFileInputFormat execution speed > --------------------------------------------------- > > Key: HIVE-1149 > URL: https://issues.apache.org/jira/browse/HIVE-1149 > Project: Hadoop Hive > Issue Type: Improvement > Reporter: Zheng Shao > Priority: Minor > > When there are a lot of files and a lot of pools, CombineHiveFileInputFormat > is pretty slow. > One of the culprit is the "new URI" call in the following function. We should > try to get rid of it. > {code} > protected static PartitionDesc getPartitionDescFromPath( > Map<String, PartitionDesc> pathToPartitionInfo, Path dir) throws > IOException { > // The format of the keys in pathToPartitionInfo sometimes contains a port > // and sometimes doesn't, so we just compare paths. > for (Map.Entry<String, PartitionDesc> entry : pathToPartitionInfo > .entrySet()) { > try { > if (new URI(entry.getKey()).getPath().equals(dir.toUri().getPath())) { > return entry.getValue(); > } > } catch (URISyntaxException e2) { > } > } > throw new IOException("cannot find dir = " + dir.toString() > + " in partToPartitionInfo!"); > } > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.