keith-turner commented on code in PR #3262:
URL: https://github.com/apache/accumulo/pull/3262#discussion_r1406979175


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -613,8 +641,15 @@ public void run() {
       final long waitTimeBetweenScans = manager.getConfiguration()
           .getTimeInMillis(Property.MANAGER_TABLET_GROUP_WATCHER_INTERVAL);
 
+      // Override the set of table ranges that this manager will manage
+      Optional<List<Range>> ranges = Optional.empty();
+      if (store.getLevel() == Ample.DataLevel.USER) {
+        ranges = calculateRangesForMultipleManagers(

Review Comment:
   Not sure about this, was getting a bit lost.  I think this may return 
Optional.empty() with the expectation that the last computation will be used 
when empty is returned, but I am not sure that is actually being done.



##########
server/manager/src/main/java/org/apache/accumulo/manager/MultipleManagerUtil.java:
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.manager;
+
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.SortedSet;
+import java.util.TreeSet;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+
+import org.apache.accumulo.core.data.TableId;
+import org.apache.accumulo.core.util.Pair;
+import org.apache.accumulo.server.ServerContext;
+
+public class MultipleManagerUtil {
+
+  /**
+   * Each Manager will be responsible for a range(s) of metadata tablets, but 
we don't want to split
+   * up a table's metadata tablets between managers as it will throw off the 
tablet balancing. If
+   * there are three managers, then we want to split up the metadata tablets 
roughly into thirds and
+   * have each manager responsible for one third, for example.
+   *
+   * @param context server context
+   * @param tables set of table ids
+   * @param numManagers number of managers
+   * @return list of num manager size, each element containing a set of tables 
for the manager to
+   *         manage
+   */
+  public static List<Set<TableId>> getTablesForManagers(ServerContext context, 
Set<TableId> tables,
+      int numManagers) {
+
+    if (numManagers == 0) {
+      throw new IllegalStateException("No managers, one or more expected");
+    }
+
+    if (numManagers == 1) {
+      return List.of(tables);
+    }
+
+    SortedSet<Pair<TableId,Long>> tableTabletCounts = new TreeSet<>(new 
Comparator<>() {
+      @Override
+      public int compare(Pair<TableId,Long> table1, Pair<TableId,Long> table2) 
{
+        // sort descending by number of tablets
+        int result = table1.getSecond().compareTo(table2.getSecond());
+        if (result == 0) {
+          return table1.getFirst().compareTo(table2.getFirst());
+        }
+        return -1 * result;
+      }
+    });
+    tables.forEach(tid -> {

Review Comment:
   When the set of managers and tables are steady for a bit, all manager 
processes need to arrive at the same decisions for partitioning tables into 
buckets.  With the algorithm in this method different manager processes may see 
different counts for the same tables at different times and end up partitioning 
tables into different buckets. This could lead to overlap in the partitions or 
in the worst case a table that no manager processes.  We could start with a 
deterministic hash partitioning of tables and open a follow in issue to 
improve.  One possible way to improve would be to have a single manager process 
run this algorithm and publish the partitioning information, with all other 
manager just using it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to