[ https://issues.apache.org/jira/browse/IMPALA-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aman Sinha resolved IMPALA-10314. --------------------------------- Resolution: Fixed > Planning time for simple SELECT with LIMIT could be improved > ------------------------------------------------------------ > > Key: IMPALA-10314 > URL: https://issues.apache.org/jira/browse/IMPALA-10314 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Affects Versions: Impala 3.4.0 > Reporter: Aman Sinha > Assignee: Aman Sinha > Priority: Major > Fix For: Impala 4.0 > > > Consider a table t1 with following characteristics: > {noformat} > HDFS, Parquet format, external table > number of partitions in t1 : 39000 (2 level partitioning) > number of column : 72 > number of files : 350000 > {noformat} > The planning time for the following query with LIMIT without order-by is > fairly long: > {noformat} > select * from t1 limit 10; > Query Compilation: 4s411ms > - Single node plan created: 3s812ms (3s259ms) > {noformat} > The bulk of the time is spent in HdfsScanNode.computeScanRangeLocations() > which iterates over all the partitions and file descriptors within the > partitions to assign scan ranges based on data affinity. For trivial LIMIT > queries especially with small LIMIT values, we should look at ways to improve > the planning time. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org