[ https://issues.apache.org/jira/browse/DRILL-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177027#comment-15177027 ]
ASF GitHub Bot commented on DRILL-4446: --------------------------------------- Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/403#discussion_r54828997 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/DistributionAffinity.java --- @@ -0,0 +1,61 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * <p/> + * http://www.apache.org/licenses/LICENSE-2.0 + * <p/> + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.fragment; + +/** + * Describes an operator's endpoint assignment requirements. + */ +public enum DistributionAffinity { + /** + * No affinity to any endpoints. Operator can run on any endpoint. + */ + NONE(0, SoftAffinityFragmentParallelizer.INSTANCE), + + /** + * Operator has soft distribution affinity to one or more endpoints. Operator performs better when fragments are + * assigned to the endpoints with affinity, but not a mandatory requirement. + */ + SOFT(1, SoftAffinityFragmentParallelizer.INSTANCE), + + /** + * Hard distribution affinity to one or more endpoints. Fragments having the operator must be scheduled on the nodes + * with affinity. + */ + HARD(2, HardAffinityFragmentParallelizer.INSTANCE); + + private int level; + private FragmentParallelizer fragmentParallelizer; + + DistributionAffinity(final int level, final FragmentParallelizer fragmentParallelizer) { + this.level = level; + this.fragmentParallelizer = fragmentParallelizer; + } + + public FragmentParallelizer getFragmentParallelizer() { + return fragmentParallelizer; + } + + /** + * Is the current DistributionAffinity less or equal restrictive than the given DistributionAffinity? + * @param distributionAffinity + * @return + */ + public boolean isLessOrEqualRestrictive(final DistributionAffinity distributionAffinity) { --- End diff -- name suggestion change: lessThanOrEqualTo? At the same time, this enum implements the Comparable<DistributionAffinity> interface, so as long as the level is the same as the index, it might be enough... > Improve current fragment parallelization module > ----------------------------------------------- > > Key: DRILL-4446 > URL: https://issues.apache.org/jira/browse/DRILL-4446 > Project: Apache Drill > Issue Type: New Feature > Affects Versions: 1.5.0 > Reporter: Venki Korukanti > Assignee: Venki Korukanti > Fix For: 1.6.0 > > > Current fragment parallelizer {{SimpleParallelizer.java}} can’t handle > correctly the case where an operator has mandatory scheduling requirement for > a set of DrillbitEndpoints and affinity for each DrillbitEndpoint (i.e how > much portion of the total tasks to be scheduled on each DrillbitEndpoint). It > assumes that scheduling requirements are soft (except one case where Mux and > DeMux case where mandatory parallelization requirement of 1 unit). > An example is: > Cluster has 3 nodes running Drillbits and storage service on each. Data for a > table is only present at storage services in two nodes. So a GroupScan needs > to be scheduled on these two nodes in order to read the data. Storage service > doesn't support (or costly) reading data from remote node. > Inserting the mandatory scheduling requirements within existing > SimpleParallelizer is not sufficient as you may end up with a plan that has a > fragment with two GroupScans each having its own hard parallelization > requirements. > Proposal is: > Add a property to each operator which tells what parallelization > implementation to use. Most operators don't have any particular strategy > (such as Project or Filter), they depend on incoming operator. Current > existing operators which have requirements (all existing GroupScans) default > to current parallelizer {{SimpleParallelizer}}. {{Screen}} defaults to new > mandatory assignment parallelizer. It is possible that PhysicalPlan generated > can have a fragment with operators having different parallelization > strategies. In that case an exchange is inserted in between operators where a > change in parallelization strategy is required. > Will send a detailed design doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)