[jira] Updated: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

Siying Dong (JIRA) Mon, 14 Mar 2011 15:43:55 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Siying Dong updated HIVE-2051:
------------------------------

    Attachment: HIVE-2051.2.patch

Updates:
1. use ConcurrentHashMap
2. wait for Future objbect too
3. Share jobConf among threads
4. if user set mapred.dfsclient.parallelism.max to be 0 or 1, don't start new 
thread to execute it.
5. use Map.Entry<K,V> when iterating

> getInputSummary() to call FileSystem.getContentSummary() in parallel
> --------------------------------------------------------------------
>
>                 Key: HIVE-2051
>                 URL: https://issues.apache.org/jira/browse/HIVE-2051
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch
>
>
> getInputSummary() now call FileSystem.getContentSummary() one by one, which 
> can be extremely slow when the number of input paths are huge. By calling 
> those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

Reply via email to