[ https://issues.apache.org/jira/browse/DRILL-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Venki Korukanti resolved DRILL-4446. ------------------------------------ Resolution: Fixed > Improve current fragment parallelization module > ----------------------------------------------- > > Key: DRILL-4446 > URL: https://issues.apache.org/jira/browse/DRILL-4446 > Project: Apache Drill > Issue Type: New Feature > Affects Versions: 1.5.0 > Reporter: Venki Korukanti > Assignee: Venki Korukanti > Fix For: 1.7.0 > > > Current fragment parallelizer {{SimpleParallelizer.java}} can’t handle > correctly the case where an operator has mandatory scheduling requirement for > a set of DrillbitEndpoints and affinity for each DrillbitEndpoint (i.e how > much portion of the total tasks to be scheduled on each DrillbitEndpoint). It > assumes that scheduling requirements are soft (except one case where Mux and > DeMux case where mandatory parallelization requirement of 1 unit). > An example is: > Cluster has 3 nodes running Drillbits and storage service on each. Data for a > table is only present at storage services in two nodes. So a GroupScan needs > to be scheduled on these two nodes in order to read the data. Storage service > doesn't support (or costly) reading data from remote node. > Inserting the mandatory scheduling requirements within existing > SimpleParallelizer is not sufficient as you may end up with a plan that has a > fragment with two GroupScans each having its own hard parallelization > requirements. > Proposal is: > Add a property to each operator which tells what parallelization > implementation to use. Most operators don't have any particular strategy > (such as Project or Filter), they depend on incoming operator. Current > existing operators which have requirements (all existing GroupScans) default > to current parallelizer {{SimpleParallelizer}}. {{Screen}} defaults to new > mandatory assignment parallelizer. It is possible that PhysicalPlan generated > can have a fragment with operators having different parallelization > strategies. In that case an exchange is inserted in between operators where a > change in parallelization strategy is required. > Will send a detailed design doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)