[ 
https://issues.apache.org/jira/browse/TAJO-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated TAJO-889:
------------------------------

    Attachment: physical.png
                logical.png

> Separate a data flow into a logical flow and physical flows
> -----------------------------------------------------------
>
>                 Key: TAJO-889
>                 URL: https://issues.apache.org/jira/browse/TAJO-889
>             Project: Tajo
>          Issue Type: Sub-task
>            Reporter: Hyunsik Choi
>         Attachments: logical.png, physical.png
>
>
> Currently, DataChannel represents a data flow between execution blocks (query 
> stages). In the current DAG framework, a data flow indicates only a physical 
> data flow. It should be improved in order to enable users to easily deal with 
> complex data flows.
> For example, see the following examples:
> {code}
> select * from A join (select * from B union select * from C) D;
> {code}
> The above cases will make the data flows as the figure (physical.png) I 
> attached.
> The main problem is that each ScanNode can have only one data source. But, in 
> the above cases, one ScanNode has to involve two data sources B and C. So, 
> currently, we use some hack to change B and C into some fake data source id. 
> It works well, but it results in messy code.
> A potential solution is to separate the current data flow model into a 
> logical data flow and a physical data flow. For example, in the figure 
> (logical.png), the dotted line represents one logical data flow. 
> If a user uses a logical data flow instead of directly handling physical data 
> flow when needed, it would be very helpful for distributed plan generation. 
> Also, it will simplify the global plan code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to