[ https://issues.apache.org/jira/browse/DRILL-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124349#comment-15124349 ]
Jason Altekruse commented on DRILL-4308: ---------------------------------------- To match your query a little more closely I changed my current schema and fully qualified the table name, still same results. {code} 0: jdbc:drill:zk=local> select dir0 from dfs.mxd.mock_data where dir0 = maxdir('dfs.mxd','mock_data') limit 1; +-------+ | dir0 | +-------+ | 1997 | +-------+ 1 row selected (0.125 seconds) 0: jdbc:drill:zk=local> select dir0 from dfs.mxd.mock_data where dir0 = mindir('dfs.mxd','mock_data') limit 1; +-------+ | dir0 | +-------+ | 1994 | +-------+ 1 row selected (0.116 seconds) {code} > Aggregate operations on dir<N> columns can be more efficient for certain use > cases > ---------------------------------------------------------------------------------- > > Key: DRILL-4308 > URL: https://issues.apache.org/jira/browse/DRILL-4308 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Relational Operators > Affects Versions: 1.4.0 > Reporter: Aman Sinha > > For queries that perform plain aggregates or DISTINCT operations on the > directory partition columns (dir0, dir1 etc.) and there are no other columns > referenced in the query, the performance could be substantially improved by > not having to scan the entire dataset. > Consider the following types of queries: > {noformat} > select min(dir0) from largetable; > select distinct dir0 from largetable; > {noformat} > The number of distinct values of dir<N> columns is typically quite small and > there's no reason to scan the large table. This is also come as user > feedback from some Drill users. Of course, if there's any other column > referenced in the query (WHERE, ORDER-BY etc.) then we cannot apply this > optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)