Peter McTaggart created DRILL-4648: -------------------------------------- Summary: select count(*) on csv file fails with UNSUPPORTED_OPERATION Key: DRILL-4648 URL: https://issues.apache.org/jira/browse/DRILL-4648 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types, Functions - Drill Affects Versions: 1.6.0 Reporter: Peter McTaggart
When trying to perform a select count(*) on a CSV file the following error is encountered: 0: jdbc:drill:drillbit=10.1.101.10> select count(*) from `views/db/test.csv`; Error: UNSUPPORTED_OPERATION ERROR: With extractHeader enabled, only header names are supported column name columns column index Fragment 0:0 [Error Id: b38a1e44-c2f5-44a3-9960-6062debc6b50 on xxxxxx.compute.internal:31010] (state=,code=0) If we refer to a column in the file by name it works, eg: 0: jdbc:drill:drillbit=10.1.101.10> select count(COLUMN_ONE) from `views/db/test.csv`; +---------+ | EXPR$0 | +---------+ | 1 | +---------+ 1 row selected (0.144 seconds) 0: jdbc:drill:drillbit=10.1.101.10> The test.csv file contents: ~/D❯❯❯ cat test.csv "COLUMN_ONE","COLUMN_TWO" "Hello","World" ~/D❯❯❯ Drill is talking to a file mounted on Alluxio. More info: Mounting s3 directly gives the following results: With extractHeaders NOT turned on: : jdbc:drill:drillbit=10.1.101.10> select count(*) from `src/db/test.csv`; +---------+ | EXPR$0 | +---------+ | 2 | +---------+ 1 row selected (0.951 seconds) 0: jdbc:drill:drillbit=10.1.101.10> **With extractHeaders = true :** 0: jdbc:drill:drillbit=10.1.101.10> select count(*) from `src/db/test.csv`; Error: UNSUPPORTED_OPERATION ERROR: With extractHeader enabled, only header names are supported column name columns column index Fragment 0:0 [Error Id: 5609cf0d-7553-44b5-bd90-40bce1c020a9 on ixxxxxx.compute.internal:31010] (state=,code=0) 0: jdbc:drill:drillbit=10.1.101.10> Workspace file: { "type": "file", "enabled": true, "connection": "s3a://<my-bucket>", "config": { "fs.s3a.access.key": "xxx", "fs.s3a.secret.key": "xxx" }, "workspaces": { "root": { "location": "/", "writable": false, "defaultInputFormat": null }, "tmp": { "location": "/tmp", "writable": true, "defaultInputFormat": null } }, "formats": { "psv": { "type": "text", "extensions": [ "tbl" ], "delimiter": "|" }, "csv": { "type": "text", "extensions": [ "csv" ], "extractHeader": true, "delimiter": "," }, "tsv": { "type": "text", "extensions": [ "tsv" ], "delimiter": "\t" }, "parquet": { "type": "parquet" }, "json": { "type": "json", "extensions": [ "json" ] }, "avro": { "type": "avro" }, "sequencefile": { "type": "sequencefile", "extensions": [ "seq" ] }, "csvh": { "type": "text", "extensions": [ "csvh" ], "extractHeader": true, "delimiter": "," } } } -- This message was sent by Atlassian JIRA (v6.3.4#6332)