Venki Korukanti created DRILL-4446:
--------------------------------------

             Summary: Improve current fragment parallelization module
                 Key: DRILL-4446
                 URL: https://issues.apache.org/jira/browse/DRILL-4446
             Project: Apache Drill
          Issue Type: New Feature
    Affects Versions: 1.5.0
            Reporter: Venki Korukanti
            Assignee: Venki Korukanti
             Fix For: 1.6.0


Current fragment parallelizer {{SimpleParallelizer.java}} can’t handle 
correctly the case where an operator has mandatory scheduling requirement for a 
set of DrillbitEndpoints and affinity for each DrillbitEndpoint (i.e how much 
portion of the total tasks to be scheduled on each DrillbitEndpoint). It 
assumes that scheduling requirements are soft (except one case where Mux and 
DeMux case where mandatory parallelization requirement of 1 unit). 

An example is:
Cluster has 3 nodes running Drillbits and storage service on each. Data for a 
table is only present at storage services in two nodes. So a GroupScan needs to 
be scheduled on these two nodes in order to read the data. Storage service 
doesn't support (or costly) reading data from remote node.

Inserting the mandatory scheduling requirements within existing 
SimpleParallelizer is not sufficient as you may end up with a plan that has a 
fragment with two GroupScans each having its own hard parallelization 
requirements.

Proposal is:
Add a property to each operator which tells what parallelization implementation 
to use. Most operators don't have any particular strategy (such as Project or 
Filter), they depend on incoming operator. Current existing operators which 
have requirements (all existing GroupScans) default to current parallelizer 
{{SimpleParallelizer}}. {{Screen}} defaults to new mandatory assignment 
parallelizer. It is possible that PhysicalPlan generated can have a fragment 
with operators having different parallelization strategies. In that case an 
exchange is inserted in between operators where a change in parallelization 
strategy is required.

Will send a detailed design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to