Jinfeng Ni created DRILL-5586:
---------------------------------

             Summary: UnionAll operator does more than necessary value vector 
allocation and copy
                 Key: DRILL-5586
                 URL: https://issues.apache.org/jira/browse/DRILL-5586
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Jinfeng Ni


When inputs to UnionAll operators are just simple field reference, in stead of 
an expression involving a function, which requires evaluation, it should 
leverage value vector's transfer API.  Doing transfer would avoid the 
allocation of buffer for value vector in outgoing batch, plus the overhead to 
copy the data from incoming batch to outgoing batch. 

For example, in the following query:
{code}
select l_orderkey from cp.`tpch/lineitem.parquet` l union all select 
n_nationkey from cp.`tpch/nation.parquet`
{code}

Both left and right side of UnionAll operator is simple filed reference, and 
Drill should call transfer API. However, the current code would do buffer 
allocation & copy for both left and right. Such processing would significantly 
slow UnionAll operator's performance, and eventually slow down query evaluation.

DRILL-5521 reverts a change in logic whether applying transfer logic made in 
DRILL-5419, based on SchemaPath equal comparison.  Even we fix that problem, 
it's not enough to use SchemaPath equal comparison as criteria whether transfer 
should be used. Ideally, even the output field and incoming field have 
different names, UnionAll operator should do {{transfer}}, instead of {{copy}}, 
as long as the expression is simple field reference. 

{code}
select l_orderkey as Key1 from cp.`tpch/lineitem.parquet` l union all select 
n_nationkey as Key2 from cp.`tpch/nation.parquet`
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to