[jira] [Commented] (DRILL-7515) ORDER BY clause produce error on GROUP BY with array field manager with any_value

Vova Vysotskyi (Jira) Fri, 10 Jan 2020 05:17:25 -0800


    [ 
https://issues.apache.org/jira/browse/DRILL-7515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012854#comment-17012854
 ]


Vova Vysotskyi commented on DRILL-7515:
---------------------------------------

It looks like the issue there is in {{StreamingAggBatch}} and the way how it 
handles complex agg functions. It adds null vectors for complex results into 
the container, and when actual data is obtained, it creates writers that 
replace these null vectors. Perhaps between these two stages, was returned 
empty batch with OK_NEW_SCHEMA status, sort handled it and failed when a batch 
with the data was obtained.

> ORDER BY clause produce error on GROUP BY with array field manager with 
> any_value
> ---------------------------------------------------------------------------------
>
>                 Key: DRILL-7515
>                 URL: https://issues.apache.org/jira/browse/DRILL-7515
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.17.0
>            Reporter: benj
>            Priority: Major
>
> With a parquet containing an array field, for example:
> {code:sql}
> apache drill 1.17> CREATE TABLE dfs.TEST.`example_any_pqt` AS (SELECT 'foo' 
> AS a, 'bar' b, split('foo,bar',',') as c);
> apache drill 1.17> SELECT *, typeof(c) AS type, sqltypeof(c) AS sql_type FROM 
> dfs.TEST.`example_any_pqt`;
> +-----+-----+---------------+---------+----------+
> |  a  |  b  |       c       |  type   | sql_type |
> +-----+-----+---------------+---------+----------+
> | foo | bar | ["foo","bar"] | VARCHAR | ARRAY    |
> +-----+-----+---------------+---------+----------+
> {code}
> The next request work well
> {code:sql}
> apache drill 1.17> SELECT * FROM 
> (SELECT a, any_value(c) FROM dfs.TEST.`example_any_pqt` GROUP BY a)
> ORDER BY a;
> +-----+---------------+
> |  a  |    EXPR$1     |
> +-----+---------------+
> | foo | ["foo","bar"] |
> +-----+---------------+
> {code}
> But the next request (with the same struct as the previous request) failed
> {code:sql}
> apache drill 1.17> SELECT * FROM 
> (SELECT a, b, any_value(c) FROM dfs.TEST.`example_any_pqt` GROUP BY a, b)
> ORDER BY a;
> Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External 
> Sort. Please enable Union type.
> Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
> (VARCHAR:OPTIONAL)], [`EXPR$2` (NULL:OPTIONAL)]], selectionVector=NONE]
> Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
> (VARCHAR:OPTIONAL)], [`EXPR$2` (VARCHAR:REPEATED), children=([`$data$` 
> (VARCHAR:REQUIRED)])]], selectionVector=NONE]
> Fragment 0:0
> {code}
> Note that the same request +without the order by+ works well. It's also 
> possible to use intermediate table and apply the ORDER BY in a second time.
> {code:sql}
> apache drill 1.17> SELECT * FROM 
> (SELECT a, b, any_value(c) FROM dfs.TEST.`example_any_pqt` GROUP BY a, b);
> +-----+-----+---------------+
> |  a  |  b  |    EXPR$2     |
> +-----+-----+---------------+
> | foo | bar | ["foo","bar"] |
> +-----+-----+---------------+
> apache drill 1.17> CREATE TABLE dfs.TEST.`ok_pqt` AS (SELECT * FROM (SELECT 
> a, b, any_value(c) FROM dfs.TEST.`example_any_pqt` GROUP BY a, b));
> +----------+---------------------------+
> | Fragment | Number of records written |
> +----------+---------------------------+
> | 0_0      | 1                         |
> +----------+---------------------------+
> apache drill 1.17> SELECT * FROM dfs.TEST.`ok_pqt` ORDER BY a;
> +-----+-----+---------------+
> |  a  |  b  |    EXPR$2     |
> +-----+-----+---------------+
> | foo | bar | ["foo","bar"] |
> +-----+-----+---------------+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (DRILL-7515) ORDER BY clause produce error on GROUP BY with array field manager with any_value

Reply via email to