GitHub user paul-rogers opened a pull request:

    https://github.com/apache/drill/pull/1242

    DRILL-6361: Revised typeOf() function versions

    Drill provides the `typeof()` function to return the type of a column. 
However, this function has two key limitations:
    
    1. It returns NULL if any column value is NULL. But, Drill has no NULL 
type, so this masks the underlying type. This is especially annoying for 
columns which are all NULL, such as "missing" columns.
    2. It does not return the cardinality (AKA "mode") of the column.
    
    This PR introduces two new functions that solve these issues.
    
    ### New Functions
    
    `sqlTypeOf()` returns the data type (using the SQL names) whether the 
column is NULL or not. The SQL name is the one that can be used in a CAST 
statement. Thus,
    
    ```
    sqlTypeOf( CAST(x AS <type> ))
    ```
    
    returns &lt;type> as the type name.
    
    `modeOf()` returns the cardinality (mode) of the column as "NOT NULL", 
"NULLABLE" or "ARRAY". (Suggestions for better terms are welcome.) The Drill 
terms are not used because they are more Parquet-like than SQL-like.
    
    Finally, the `drillTypeOf()` function that works just like `sqlTypeOf()`, 
but returns the internal Drill names.
    
    ### Example
    
    Here is an example usage that highlights our old friend, "nullable int" for 
a missing column:
    
    ```
    SELECT sqlTypeOf(a) AS a_type, modeOf(a) AS a_mode FROM 
`json/all-null.json`;
    
    +----------+-----------+
    |  a_type  |  a_mode   |
    +----------+-----------+
    | INTEGER  | NULLABLE  |
    +----------+-----------+
    ```
    
    For arrays (repeated) types:
    
    ```
    SELECT sqlTypeOf(columns) as col_type, modeOf(columns) as col_mode
    FROM `csv/cust.csv`;
    
    +--------------------+-----------+
    |      col_type      | col_mode  |
    +--------------------+-----------+
    | CHARACTER VARYING  | ARRAY     |
    +--------------------+-----------+
    ```
    
    For non-null types:
    
    ```
    SELECT sqlTypeOf(`name`) AS name_type, 
    modeOf(`name`) AS name_mode FROM `csvh/cust.csvh`;
    
    +--------------------+------------+
    |     name_type      | name_mode  |
    +--------------------+------------+
    | CHARACTER VARYING  | NOT NULL   |
    +--------------------+------------+
    ```
    
    The result is that the internal Drill type is made very plain to the user 
of `sqlline`.
    
    ### UDF Utility Methods
    
    To save some typing, this PR also includes a few helper functions to make 
it easier to write UDFs. These functions were first described in the blog post 
[UDF Background 
Information](https://github.com/paul-rogers/drill/wiki/UDFs-Background-Information),
 on the 
[Troublshooting](https://github.com/paul-rogers/drill/wiki/UDF-Troubleshooting) 
page.
    
    In particular, to return a string, the old `typeof()` implementation uses:
    
    ```
          byte[] type = typeName.getBytes();
          buf = buf.reallocIfNeeded(type.length);
          buf.setBytes(0, type);
          out.buffer = buf;
          out.start = 0;
          out.end = type.length;
    ```
    
    While the new functions use:
    
    ```
          
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varCharOutput(
            typeName, buf, out);
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/paul-rogers/drill DRILL-6361

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1242.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1242
    
----
commit 7acf6cc77581c15981cf5cc7ac1a2b3780324f40
Author: Paul Rogers <progers@...>
Date:   2018-04-29T06:04:26Z

    DRILL-6361: Revised typeOf() function versions

----


---

Reply via email to