[ https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207311#comment-15207311 ]
ASF GitHub Bot commented on DRILL-3623: --------------------------------------- Github user sudheeshkatkam commented on the pull request: https://github.com/apache/drill/pull/405#issuecomment-200026538 Thank you for the reviews. All regression tests passed; I am running unit tests right now. Note that, the `planner.enable_limit0_optimization` option is disabled by default. To summarize (and document) the limitations: If, during validation, the planner is able to resolve that the types of the columns (i.e. types are non late binding), the shorter execution path is taken. Some types are excluded: + DECIMAL type is not fully supported in general. + VARBINARY is not fully tested. + MAP, ARRAY are currently not exposed to the planner. + TINYINT, SMALLINT are defined in the Drill type system but have been turned off for now. + SYMBOL, MULTISET, DISTINCT, STRUCTURED, ROW, OTHER, CURSOR, COLUMN_LIST are Calcite types currently not supported by Drill, nor defined in the Drill type list. Three scenarios when the planner can do type resolution during validation: + Queries on Hive tables + Queries with explicit casts on table columns, example: `SELECT CAST(col1 AS BIGINT), ABS(CAST(col2 AS INTEGER)) FROM table;` + Queries on views with casts on table columns In the latter two cases, the schema of the query with LIMIT 0 clause has relaxed nullability compared to the query without the LIMIT 0 clause. Example: Say the schema definition of the Parquet file (`numbers.parquet`) is: ``` message Numbers { required int col1; optional int col2; } ``` Since the view definition does not specify nullability of columns, and schema of a parquet file is not yet leveraged by Drill's planner: ``` CREATE VIEW dfs.tmp.mynumbers AS SELECT CAST(col1 AS INTEGER) as col1, CAST(col2 AS INTEGER) AS col2 FROM dfs.tmp.`numbers.parquet`; ``` (1) For query with LIMIT 0 clause, since the file/ metadata is not read, Drill assumes the nullability of both columns is [`columnNullable`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNullable). ``` SELECT col1, col2 FROM dfs.tmp.mynumbers LIMIT 0; ``` (2) For query without LIMIT 0 clause, since the file is read, Drill knows the nullability of `col1` is [`columnNoNulls`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNoNulls), and `col2` is [`columnNullable`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNullable). ``` SELECT col1, col2 FROM dfs.tmp.mynumbers LIMIT 1; ``` > Limit 0 should avoid execution when querying a known schema > ----------------------------------------------------------- > > Key: DRILL-3623 > URL: https://issues.apache.org/jira/browse/DRILL-3623 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Hive > Affects Versions: 1.1.0 > Environment: MapR cluster > Reporter: Andries Engelbrecht > Assignee: Sudheesh Katkam > Labels: doc-impacting > Fix For: Future > > > Running a select * from hive.table limit 0 does not return (hangs). > Select * from hive.table limit 1 works fine > Hive table is about 6GB with 330 files with parquet using snappy compression. > Data types are int, bigint, string and double. > Querying directory with parquet files through the DFS plugin works fine > select * from dfs.root.`/user/hive/warehouse/database/table` limit 0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)