Paul Rogers created DRILL-5371:
----------------------------------
Summary: Large run-time overhead for nested SELECT queries
Key: DRILL-5371
URL: https://issues.apache.org/jira/browse/DRILL-5371
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.10.0
Reporter: Paul Rogers
See DRILL-5370 - a test in which Drill was stress-tested with nested SELECT
queries of ever-increasing size.
Semantically, the query does nothing other than:
SELECT a AS b AS c AS ... AS z FROM foo;
The above is not valid SQL, of course, but it shows that the nested SELECTs do
nothing other than create static aliases for columns, and do so many times via
layers of nested SELECTs.
{code}
SELECT y AS z FROM
(SELECT x AS y FROM
(SELECT w AS x FROM ...
(SELECT a FROM someTable))))...))
{code}
Because the nested selects do not actual processing, only impose aliases, the
optimizer should be able to optimize away the aliasing. That is, there should
be no need for any run-time work to simply change the name of a column.
However, when run (with 200 columns, each with 500 character names, but only 10
rows), the overhead in a debug build is somewhere between 1/2 and 1 second per
nesting.
That is, for just 10 rows, each layer of nested SELECT adds about 1 second to
the execution time.
Queries of this form may be pathological if written by humans. But, they are
typical of queries generated by BI tools. Hence, Drill performance for such
tools can be increased simply by avoiding doing unnecessary work.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)