Hyunsik Choi created TAJO-889:
---------------------------------

             Summary: Separate a data flow into a logical flow and physical 
flows
                 Key: TAJO-889
                 URL: https://issues.apache.org/jira/browse/TAJO-889
             Project: Tajo
          Issue Type: Sub-task
            Reporter: Hyunsik Choi


Currently, DataChannel represents a data flow between execution blocks (query 
stages). In the current DAG framework, a data flow indicates only a physical 
data flow. It should be improved in order to enable users to easily deal with 
complex data flows.

For example, see the following examples:

{code}
select * from A join (select * from B union select * from C) D;
{code}

The above cases will make the data flows as the figure (physical.png) I 
attached.

The main problem is that each ScanNode can have only one data source. But, in 
the above cases, one ScanNode has to involve two data sources B and C. So, 
currently, we use some hack to change B and C into some fake data source id. It 
works well, but it results in messy code.

A potential solution is to separate the current data flow model into a logical 
data flow and a physical data flow. For example, in the figure (logical.png), 
the dotted line represents one logical data flow. 

If a user uses a logical data flow instead of directly handling physical data 
flow when needed, it would be very helpful for distributed plan generation. 
Also, it will simplify the global plan code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to