[ 
https://issues.apache.org/jira/browse/DRILL-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-7499:
-------------------------------
    Description: 
The {{sqltypeof()}} function was introduced in Drill 1.14 to work around 
limitations of the original {{typeof()}} function. The function is mentioned in 
_Learning Apache Drill_, Chapter 8, page 152:


{noformat}
ELECT sqlTypeOf(columns) AS cols_type,
       modeOf(columns) AS cols_mode
FROM `csv/cust.csv` LIMIT 1;

+--------------------+------------+
|     cols_type      | cols_mode  |
+--------------------+------------+
| CHARACTER VARYING  | ARRAY      |
+--------------------+------------+
{noformat}

When the same query is run against the just-released Drill 1.17, we get the 
*wrong* results:

{noformat}
+-----------+-----------+
| cols_type | cols_mode |
+-----------+-----------+
| ARRAY     | ARRAY     |
+-----------+-----------+
{noformat}

The definition of {{sqlTypeOf()}} is that it should return the type portion of 
the columns (type, mode) major type. Clearly, it is no longer doing so for 
arrays. As a result, there is no function to obtain the data type for arrays.

The problem also shows up in the query from page 158:

{code:sql}
SELECT a, b,
       sqlTypeOf(b) AS b_type, modeof(b) AS b_mode
FROM `gen/70kmissing.json`
WHERE mod(a, 70000) = 1;
{code}

Expected (table from the book with Drill 1.14 results):

{noformat}
+--------+-------+----------+-----------+
|   a    |   b   |  b_type  |  b_mode   |
+--------+-------+----------+-----------+
| 1      | null  | INTEGER  | NULLABLE  |
+--------+-------+----------+-----------+
{noformat}

Actual Drill 1.17 results:

{noformat}
+-------+-----------+-------------------+----------+
|   a   |     b     |      b_type       |  b_mode  |
+-------+-----------+-------------------+----------+
| 1     | null      | NULL              | NULLABLE |
+-------+-----------+-------------------+----------+
{noformat}

(Second line of table is omitted because something else changed, not relevant 
to this ticket.)

The above might not actually be a bug, however if someone has changed the type 
of missing columns from the old {{INT}} to a newer (untyped) {{NULL}}. But, an 
indirect test suggests that the column is still `INT` and the function is wrong:

{code:sql}
SELECT a, b
FROM `gen/70kdouble.json`
WHERE b IS NOT NULL ORDER BY a;
{code}

Data:

{noformat}
{a: 1}
...
{a: 69999}
{a: 70001, b: 10.5}
{noformat}

Error:

{noformat}
Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External 
Sort. Please enable Union type.

Previous schema BatchSchema [fields=[[`a` (BIGINT:OPTIONAL)], [`b` 
(INT:OPTIONAL)]], selectionVector=NONE]
Incoming schema BatchSchema [fields=[[`a` (BIGINT:OPTIONAL)], [`b` 
(FLOAT8:OPTIONAL)]], selectionVector=NONE]

{noformat}

Oddly, however, the query on page 160 works as expected:

{code:sql}
SELECT sqlTypeOf(a) AS a_type, modeOf(a) AS a_mode 
FROM `json/all-null.json` LIMIT 1;
{code}

{noformat}
+---------+----------+
| a_type  |  a_mode  |
+---------+----------+
| INTEGER | NULLABLE |
+---------+----------+
{noformat}

 Someone will have to do some investigating to understand the current behaviour.

  was:
The {{sqltypeof()}} function was introduced in Drill 1.14 to work around 
limitations of the original {{typeof()}} function. The function is mentioned in 
_Learning Apache Drill_, Chapter 8, page 152:


{noformat}
ELECT sqlTypeOf(columns) AS cols_type,
       modeOf(columns) AS cols_mode
FROM `csv/cust.csv` LIMIT 1;

+--------------------+------------+
|     cols_type      | cols_mode  |
+--------------------+------------+
| CHARACTER VARYING  | ARRAY      |
+--------------------+------------+
{noformat}

When the same query is run against the just-released Drill 1.17, we get the 
*wrong* results:

{noformat}
+-----------+-----------+
| cols_type | cols_mode |
+-----------+-----------+
| ARRAY     | ARRAY     |
+-----------+-----------+
{noformat}

The definition of {{sqlTypeOf()}} is that it should return the type portion of 
the columns (type, mode) major type. Clearly, it is no longer doing so for 
arrays. As a result, there is no function to obtain the data type for arrays.

The problem also shows up in the query from page 158:

{code:sql}
SELECT a, b,
       sqlTypeOf(b) AS b_type, modeof(b) AS b_mode
FROM `gen/70kmissing.json`
WHERE mod(a, 70000) = 1;
{code}

Expected (table from the book with Drill 1.14 results):

{noformat}
+--------+-------+----------+-----------+
|   a    |   b   |  b_type  |  b_mode   |
+--------+-------+----------+-----------+
| 1      | null  | INTEGER  | NULLABLE  |
+--------+-------+----------+-----------+
{noformat}

Actual Drill 1.17 results:

{noformat}
+-------+-----------+-------------------+----------+
|   a   |     b     |      b_type       |  b_mode  |
+-------+-----------+-------------------+----------+
| 1     | null      | NULL              | NULLABLE |
+-------+-----------+-------------------+----------+
{noformat}

(Second line of table is omitted because something else changed, not relevant 
to this ticket.)

The above might not actually be a bug, however if someone has changed the type 
of missing columns from the old {{INT}} to a newer (untyped) {{NULL}}. But, an 
indirect test suggests that the column is still `INT` and the function is wrong:

{code:sql}
SELECT a, b
FROM `gen/70kdouble.json`
WHERE b IS NOT NULL ORDER BY a;
{code}

Data:

{noformat}
{a: 1}
...
{a: 69999}
{a: 70001, b: 10.5}
{noformat}

Error:

{noformat}
Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External 
Sort. Please enable Union type.

Previous schema BatchSchema [fields=[[`a` (BIGINT:OPTIONAL)], [`b` 
(INT:OPTIONAL)]], selectionVector=NONE]
Incoming schema BatchSchema [fields=[[`a` (BIGINT:OPTIONAL)], [`b` 
(FLOAT8:OPTIONAL)]], selectionVector=NONE]

{noformat}

 


> sqltypeof() function with an array returns "ARRAY", not type
> ------------------------------------------------------------
>
>                 Key: DRILL-7499
>                 URL: https://issues.apache.org/jira/browse/DRILL-7499
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.17.0
>            Reporter: Paul Rogers
>            Priority: Minor
>              Labels: regresion
>
> The {{sqltypeof()}} function was introduced in Drill 1.14 to work around 
> limitations of the original {{typeof()}} function. The function is mentioned 
> in _Learning Apache Drill_, Chapter 8, page 152:
> {noformat}
> ELECT sqlTypeOf(columns) AS cols_type,
>        modeOf(columns) AS cols_mode
> FROM `csv/cust.csv` LIMIT 1;
> +--------------------+------------+
> |     cols_type      | cols_mode  |
> +--------------------+------------+
> | CHARACTER VARYING  | ARRAY      |
> +--------------------+------------+
> {noformat}
> When the same query is run against the just-released Drill 1.17, we get the 
> *wrong* results:
> {noformat}
> +-----------+-----------+
> | cols_type | cols_mode |
> +-----------+-----------+
> | ARRAY     | ARRAY     |
> +-----------+-----------+
> {noformat}
> The definition of {{sqlTypeOf()}} is that it should return the type portion 
> of the columns (type, mode) major type. Clearly, it is no longer doing so for 
> arrays. As a result, there is no function to obtain the data type for arrays.
> The problem also shows up in the query from page 158:
> {code:sql}
> SELECT a, b,
>        sqlTypeOf(b) AS b_type, modeof(b) AS b_mode
> FROM `gen/70kmissing.json`
> WHERE mod(a, 70000) = 1;
> {code}
> Expected (table from the book with Drill 1.14 results):
> {noformat}
> +--------+-------+----------+-----------+
> |   a    |   b   |  b_type  |  b_mode   |
> +--------+-------+----------+-----------+
> | 1      | null  | INTEGER  | NULLABLE  |
> +--------+-------+----------+-----------+
> {noformat}
> Actual Drill 1.17 results:
> {noformat}
> +-------+-----------+-------------------+----------+
> |   a   |     b     |      b_type       |  b_mode  |
> +-------+-----------+-------------------+----------+
> | 1     | null      | NULL              | NULLABLE |
> +-------+-----------+-------------------+----------+
> {noformat}
> (Second line of table is omitted because something else changed, not relevant 
> to this ticket.)
> The above might not actually be a bug, however if someone has changed the 
> type of missing columns from the old {{INT}} to a newer (untyped) {{NULL}}. 
> But, an indirect test suggests that the column is still `INT` and the 
> function is wrong:
> {code:sql}
> SELECT a, b
> FROM `gen/70kdouble.json`
> WHERE b IS NOT NULL ORDER BY a;
> {code}
> Data:
> {noformat}
> {a: 1}
> ...
> {a: 69999}
> {a: 70001, b: 10.5}
> {noformat}
> Error:
> {noformat}
> Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External 
> Sort. Please enable Union type.
> Previous schema BatchSchema [fields=[[`a` (BIGINT:OPTIONAL)], [`b` 
> (INT:OPTIONAL)]], selectionVector=NONE]
> Incoming schema BatchSchema [fields=[[`a` (BIGINT:OPTIONAL)], [`b` 
> (FLOAT8:OPTIONAL)]], selectionVector=NONE]
> {noformat}
> Oddly, however, the query on page 160 works as expected:
> {code:sql}
> SELECT sqlTypeOf(a) AS a_type, modeOf(a) AS a_mode 
> FROM `json/all-null.json` LIMIT 1;
> {code}
> {noformat}
> +---------+----------+
> | a_type  |  a_mode  |
> +---------+----------+
> | INTEGER | NULLABLE |
> +---------+----------+
> {noformat}
>  Someone will have to do some investigating to understand the current 
> behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to