GitHub user paul-rogers opened a pull request:
https://github.com/apache/drill/pull/1242
DRILL-6361: Revised typeOf() function versions
Drill provides the `typeof()` function to return the type of a column.
However, this function has two key limitations:
1. It returns NULL if any column value is NULL. But, Drill has no NULL
type, so this masks the underlying type. This is especially annoying for
columns which are all NULL, such as "missing" columns.
2. It does not return the cardinality (AKA "mode") of the column.
This PR introduces two new functions that solve these issues.
### New Functions
`sqlTypeOf()` returns the data type (using the SQL names) whether the
column is NULL or not. The SQL name is the one that can be used in a CAST
statement. Thus,
```
sqlTypeOf( CAST(x AS <type> ))
```
returns <type> as the type name.
`modeOf()` returns the cardinality (mode) of the column as "NOT NULL",
"NULLABLE" or "ARRAY". (Suggestions for better terms are welcome.) The Drill
terms are not used because they are more Parquet-like than SQL-like.
Finally, the `drillTypeOf()` function that works just like `sqlTypeOf()`,
but returns the internal Drill names.
### Example
Here is an example usage that highlights our old friend, "nullable int" for
a missing column:
```
SELECT sqlTypeOf(a) AS a_type, modeOf(a) AS a_mode FROM
`json/all-null.json`;
+----------+-----------+
| a_type | a_mode |
+----------+-----------+
| INTEGER | NULLABLE |
+----------+-----------+
```
For arrays (repeated) types:
```
SELECT sqlTypeOf(columns) as col_type, modeOf(columns) as col_mode
FROM `csv/cust.csv`;
+--------------------+-----------+
| col_type | col_mode |
+--------------------+-----------+
| CHARACTER VARYING | ARRAY |
+--------------------+-----------+
```
For non-null types:
```
SELECT sqlTypeOf(`name`) AS name_type,
modeOf(`name`) AS name_mode FROM `csvh/cust.csvh`;
+--------------------+------------+
| name_type | name_mode |
+--------------------+------------+
| CHARACTER VARYING | NOT NULL |
+--------------------+------------+
```
The result is that the internal Drill type is made very plain to the user
of `sqlline`.
### UDF Utility Methods
To save some typing, this PR also includes a few helper functions to make
it easier to write UDFs. These functions were first described in the blog post
[UDF Background
Information](https://github.com/paul-rogers/drill/wiki/UDFs-Background-Information),
on the
[Troublshooting](https://github.com/paul-rogers/drill/wiki/UDF-Troubleshooting)
page.
In particular, to return a string, the old `typeof()` implementation uses:
```
byte[] type = typeName.getBytes();
buf = buf.reallocIfNeeded(type.length);
buf.setBytes(0, type);
out.buffer = buf;
out.start = 0;
out.end = type.length;
```
While the new functions use:
```
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varCharOutput(
typeName, buf, out);
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/paul-rogers/drill DRILL-6361
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/1242.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1242
----
commit 7acf6cc77581c15981cf5cc7ac1a2b3780324f40
Author: Paul Rogers <progers@...>
Date: 2018-04-29T06:04:26Z
DRILL-6361: Revised typeOf() function versions
----
---