[ 
https://issues.apache.org/jira/browse/IMPALA-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7320.
-----------------------------------
    Resolution: Fixed

> Loading HDFS tables calls getFileStatus on each partition serially
> ------------------------------------------------------------------
>
>                 Key: IMPALA-7320
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7320
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 3.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Major
>
> The catalog caches the access level (permissions) of each of the partitions 
> in an HDFS table. This is all loaded when the table is first loaded, and is 
> done so by making serial calls to getFileStatus() on each of the partitions. 
> In most case, all of the partitions are in a single directory and we could 
> get all of the information through a single call to listFileStatus() on the 
> parent. In my testing, a typical getFileStatus call took 1-2 milliseconds, so 
> on a large table with tens of thousands of partitions this can shave many 
> seconds off of the table load time as well as reduce load on the NN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to