[
https://issues.apache.org/jira/browse/TAJO-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyunsik Choi updated TAJO-889:
------------------------------
Attachment: physical.png
logical.png
> Separate a data flow into a logical flow and physical flows
> -----------------------------------------------------------
>
> Key: TAJO-889
> URL: https://issues.apache.org/jira/browse/TAJO-889
> Project: Tajo
> Issue Type: Sub-task
> Reporter: Hyunsik Choi
> Attachments: logical.png, physical.png
>
>
> Currently, DataChannel represents a data flow between execution blocks (query
> stages). In the current DAG framework, a data flow indicates only a physical
> data flow. It should be improved in order to enable users to easily deal with
> complex data flows.
> For example, see the following examples:
> {code}
> select * from A join (select * from B union select * from C) D;
> {code}
> The above cases will make the data flows as the figure (physical.png) I
> attached.
> The main problem is that each ScanNode can have only one data source. But, in
> the above cases, one ScanNode has to involve two data sources B and C. So,
> currently, we use some hack to change B and C into some fake data source id.
> It works well, but it results in messy code.
> A potential solution is to separate the current data flow model into a
> logical data flow and a physical data flow. For example, in the figure
> (logical.png), the dotted line represents one logical data flow.
> If a user uses a logical data flow instead of directly handling physical data
> flow when needed, it would be very helpful for distributed plan generation.
> Also, it will simplify the global plan code.
--
This message was sent by Atlassian JIRA
(v6.2#6252)