Victoria Markman created DRILL-3652:
---------------------------------------
Summary: Need to document order of operations with window
functions and flatten
Key: DRILL-3652
URL: https://issues.apache.org/jira/browse/DRILL-3652
Project: Apache Drill
Issue Type: Improvement
Components: Documentation
Affects Versions: 1.2.0
Reporter: Victoria Markman
Assignee: Bridget Bevens
In standard SQL, window functions are the last set of operations performed in a
query except for the final order by clause.
Using window function with flatten is a bit confusing, because it appears as an
operator in the query plan and I expected flatten to run first followed by a
window function.
This is not what is happening:
{code}
0: jdbc:drill:schema=dfs> select * from `complex.json`;
+----+-----------+----------+
| x | y | z |
+----+-----------+----------+
| 5 | a string | [1,2,3] |
+----+-----------+----------+
1 row selected (0.128 seconds)
0: jdbc:drill:schema=dfs> select sum(x) over(), x , y, flatten(z) from
`complex.json`;
+---------+----+-----------+---------+
| EXPR$0 | x | y | EXPR$3 |
+---------+----+-----------+---------+
| 5 | 5 | a string | 1 |
| 5 | 5 | a string | 2 |
| 5 | 5 | a string | 3 |
+---------+----+-----------+---------+
3 rows selected (0.152 seconds)
0: jdbc:drill:schema=dfs> explain plan for select sum(x) over(), x , y,
flatten(z) from `complex.json`;
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 ProjectAllowDup(EXPR$0=[$0], x=[$1], y=[$2], EXPR$3=[$3])
00-02 Project(w0$o0=[$3], x=[$0], y=[$1], EXPR$3=[$4])
00-03 Flatten(flattenField=[$4])
00-04 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3],
EXPR$5=[$2])
00-05 Project(x=[$1], y=[$2], z=[$3], w0$o0=[$4])
00-06 Window(window#0=[window(partition {} order by [] range
between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])])
00-07 Project(T38¦¦*=[$0], x=[$1], y=[$2], z=[$3])
00-08 Scan(groupscan=[EasyGroupScan
[selectionRoot=maprfs:/drill/testdata/subqueries/complex.json, numFiles=1,
columns=[`*`], files=[maprfs:///drill/testdata/subqueries/complex.json]]]
{code}
We should suggest to users to put flatten in a subquery if they want to run
window function on top of the result set returned by flatten.
{code}
0: jdbc:drill:schema=dfs> select x, y, a, sum(x) over() from ( select x , y,
flatten(z) as a from `complex.json`);
+----+-----------+----+---------+
| x | y | a | EXPR$3 |
+----+-----------+----+---------+
| 5 | a string | 1 | 15 |
| 5 | a string | 2 | 15 |
| 5 | a string | 3 | 15 |
+----+-----------+----+---------+
3 rows selected (0.145 seconds)
{code}
I suggest we document this issue in the window function section, perhaps in
"Usage notes".
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)