Fabian Hueske created FLINK-1443:
------------------------------------

             Summary: Add replicated data source
                 Key: FLINK-1443
                 URL: https://issues.apache.org/jira/browse/FLINK-1443
             Project: Flink
          Issue Type: New Feature
          Components: Java API, JobManager, Optimizer
    Affects Versions: 0.9
            Reporter: Fabian Hueske
            Priority: Minor


This issue proposes to add support for data sources that read the same data in 
all parallel instances. This feature can be useful, if the data is replicated 
to all machines in a cluster and can be locally read. 
For example, a replicated input format can be used for a broadcast join without 
sending any data over the network.

The following changes are necessary to achieve this:
1) Add a replicating InputSplitAssigner which assigns all splits to the all 
parallel instances. This requires also to extend the InputSplitAssigner 
interface to identify the exact parallel instance that requests an InputSplit 
(currently only the hostname is provided).
2) Make sure that the DOP of the replicated data source is identical to the DOP 
of its successor.
3) Let the optimizer know that the data is replicated and ensure that plan 
enumeration works correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to