[ 
https://issues.apache.org/jira/browse/HIVE-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1149:
-----------------------------

      Priority: Minor  (was: Major)
    Issue Type: Improvement  (was: Bug)

> Optimize CombineHiveFileInputFormat execution speed
> ---------------------------------------------------
>
>                 Key: HIVE-1149
>                 URL: https://issues.apache.org/jira/browse/HIVE-1149
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Priority: Minor
>
> When there are a lot of files and a lot of pools, CombineHiveFileInputFormat 
> is pretty slow.
> One of the culprit is the "new URI" call in the following function. We should 
> try to get rid of it.
> {code}
>   protected static PartitionDesc getPartitionDescFromPath(
>       Map<String, PartitionDesc> pathToPartitionInfo, Path dir) throws 
> IOException {
>     // The format of the keys in pathToPartitionInfo sometimes contains a port
>     // and sometimes doesn't, so we just compare paths.
>     for (Map.Entry<String, PartitionDesc> entry : pathToPartitionInfo
>         .entrySet()) {
>       try {
>         if (new URI(entry.getKey()).getPath().equals(dir.toUri().getPath())) {
>           return entry.getValue();
>         }
>       } catch (URISyntaxException e2) {
>       }
>     }
>     throw new IOException("cannot find dir = " + dir.toString()
>         + " in partToPartitionInfo!");
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to