Hyunsik Choi created TAJO-889:
---------------------------------
Summary: Separate a data flow into a logical flow and physical
flows
Key: TAJO-889
URL: https://issues.apache.org/jira/browse/TAJO-889
Project: Tajo
Issue Type: Sub-task
Reporter: Hyunsik Choi
Currently, DataChannel represents a data flow between execution blocks (query
stages). In the current DAG framework, a data flow indicates only a physical
data flow. It should be improved in order to enable users to easily deal with
complex data flows.
For example, see the following examples:
{code}
select * from A join (select * from B union select * from C) D;
{code}
The above cases will make the data flows as the figure (physical.png) I
attached.
The main problem is that each ScanNode can have only one data source. But, in
the above cases, one ScanNode has to involve two data sources B and C. So,
currently, we use some hack to change B and C into some fake data source id. It
works well, but it results in messy code.
A potential solution is to separate the current data flow model into a logical
data flow and a physical data flow. For example, in the figure (logical.png),
the dotted line represents one logical data flow.
If a user uses a logical data flow instead of directly handling physical data
flow when needed, it would be very helpful for distributed plan generation.
Also, it will simplify the global plan code.
--
This message was sent by Atlassian JIRA
(v6.2#6252)