[ https://issues.apache.org/jira/browse/DRILL-6383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461943#comment-16461943 ]
Paul Rogers commented on DRILL-6383: ------------------------------------ All the above would seem to be academic, but it affects user-visible behavior. {noformat} DESCRIBE myView5; +--------------+------------+--------------+ | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | +--------------+------------+--------------+ | custId | INTEGER | YES | +--------------+------------+--------------+ SELECT modeOf(custId) FROM myView5 LIMIT 1; +-----------+ | EXPR$0 | +-----------+ | NOT NULL | +-----------+ {noformat} And: {noformat} DESCRIBE myView4; +--------------+------------+--------------+ | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | +--------------+------------+--------------+ | columns | ANY | YES | +--------------+------------+--------------+ SELECT typeOf(columns), modeOf(columns) FROM myView4 LIMIT 1; +----------+---------+ | EXPR$0 | EXPR$1 | +----------+---------+ | VARCHAR | ARRAY | +----------+---------+ {noformat} Since the types are wrong, the `DESCRIBE` is useless for an application to understand the types of data types it will need when running the query. > View column types, modes are plan-time guesses, not actual types > ---------------------------------------------------------------- > > Key: DRILL-6383 > URL: https://issues.apache.org/jira/browse/DRILL-6383 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.13.0 > Reporter: Paul Rogers > Priority: Minor > > Create a view views and look at the list of columns within the view. You'll > see that they are often wrong in name, type and mode. > Consider a very simple CSV file with headers: > {noformat} > custId,name,balance,status > 123,Fred,456.78 > 125,Betty,98.76,VIP > 128,Barney,1.23,PAST DUE,30 > {noformat} > Define the simplest possible view: > {noformat} > CREATE VIEW myView2 AS SELECT * FROM `csvh/cust.csvh`; > {noformat} > Then look at the view file: > {noformat} > { > "name" : "myView2", > "sql" : "SELECT *\nFROM `csvh/cust.csvh`", > "fields" : [ { > "name" : "**", > "type" : "DYNAMIC_STAR", > "isNullable" : true > } ], > "workspaceSchemaPath" : [ "local", "data" ] > } > {noformat} > It is clear that the view simply captured the plan-time list of the new > double-star for the wildcard. Since this is not a true type, it should not > have an `isNullable` attribute. > OK, we have to spell out the columns: > {noformat} > CREATE VIEW myView3 AS SELECT custId FROM `csvh/cust.csvh`; > {noformat} > Let's look at the view file: > {noformat} > { > "name" : "myView3", > "sql" : "SELECT `custId`\nFROM `csvh/cust.csvh`", > "fields" : [ { > "name" : "custId", > "type" : "ANY", > "isNullable" : true > } ], > "workspaceSchemaPath" : [ "local", "data" ] > } > {noformat} > The name is correct. The type is `ANY`, which is wrong. Since this is a CSV > file, the column type is `VARCHAR`. Further, because this is a CSV file which > headers, the mode is REQUIRED, but is listed as nullable. To verify: > {noformat} > SELECT sqlTypeOf(custId), modeOf(custId) FROM myView3 LIMIT 1; > +--------------------+-----------+ > | EXPR$0 | EXPR$1 | > +--------------------+-----------+ > | CHARACTER VARYING | NOT NULL | > +--------------------+-----------+ > {noformat} > Now, let's try a CSV file without headers: > {noformat} > 123,Fred,456.78 > 125,Betty,98.76,VIP > {noformat} > {noformat} > CREATE VIEW myView4 AS SELECT columns FROM `csv/cust.csv`; > SELECT * FROM myView4; > +--------------------------------+ > | columns | > +--------------------------------+ > | ["123","Fred","456.78"] | > | ["125","Betty","98.76","VIP"] | > +--------------------------------+ > {noformat} > Let's look at the view file: > {noformat} > { > "name" : "myView4", > "sql" : "SELECT `columns`\nFROM `csv/cust.csv`", > "fields" : [ { > "name" : "columns", > "type" : "ANY", > "isNullable" : true > } ], > "workspaceSchemaPath" : [ "local", "data" ] > } > {noformat} > This is almost non-sensical. `columns` is reported as type `ANY` and > nullable. But, `columns` is Repeated `VARCHAR` and repeated types cannot be > nullable. > The conclusion is that the type information is virtually worthless and the > `isNullable` information is worse than worthless: it is plain wrong. > The type information is valid only if the planner can inver types: > {noformat} > CREATE VIEW myView5 AS > SELECT CAST(custId AS INTEGER) AS custId FROM `csvh/cust.csvh`; > {noformat} > The view file: > {noformat} > { > "name" : "myView5", > "sql" : "SELECT CAST(`custId` AS INTEGER) AS `custId`\nFROM > `csvh/cust.csvh`", > "fields" : [ { > "name" : "custId", > "type" : "INTEGER", > "isNullable" : true > } ], > "workspaceSchemaPath" : [ "local", "data" ] > } > {noformat} > Note that the `type` is inferred from the cast, but `isNullable` is wrong > because the underlying column is non-nullable: > {noformat} > SELECT modeOf(custId) FROM myView5 LIMIT 1; > +-----------+ > | EXPR$0 | > +-----------+ > | NOT NULL | > +-----------+ > {noformat} > Expected that Drill would run the underlying query as a `LIMIT 0` query to > extract the actual column types, and use that in the view. > Or, expected that Drill would simply omit the column list from the view if > the data is meaningless. -- This message was sent by Atlassian JIRA (v7.6.3#76005)