keith-turner commented on code in PR #3262:
URL: https://github.com/apache/accumulo/pull/3262#discussion_r1406979175
##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -613,8 +641,15 @@ public void run() {
final long waitTimeBetweenScans = manager.getConfiguration()
.getTimeInMillis(Property.MANAGER_TABLET_GROUP_WATCHER_INTERVAL);
+ // Override the set of table ranges that this manager will manage
+ Optional<List<Range>> ranges = Optional.empty();
+ if (store.getLevel() == Ample.DataLevel.USER) {
+ ranges = calculateRangesForMultipleManagers(
Review Comment:
Not sure about this, was getting a bit lost. I think this may return
Optional.empty() with the expectation that the last computation will be used
when empty is returned, but I am not sure that is actually being done.
##########
server/manager/src/main/java/org/apache/accumulo/manager/MultipleManagerUtil.java:
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.manager;
+
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.SortedSet;
+import java.util.TreeSet;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+
+import org.apache.accumulo.core.data.TableId;
+import org.apache.accumulo.core.util.Pair;
+import org.apache.accumulo.server.ServerContext;
+
+public class MultipleManagerUtil {
+
+ /**
+ * Each Manager will be responsible for a range(s) of metadata tablets, but
we don't want to split
+ * up a table's metadata tablets between managers as it will throw off the
tablet balancing. If
+ * there are three managers, then we want to split up the metadata tablets
roughly into thirds and
+ * have each manager responsible for one third, for example.
+ *
+ * @param context server context
+ * @param tables set of table ids
+ * @param numManagers number of managers
+ * @return list of num manager size, each element containing a set of tables
for the manager to
+ * manage
+ */
+ public static List<Set<TableId>> getTablesForManagers(ServerContext context,
Set<TableId> tables,
+ int numManagers) {
+
+ if (numManagers == 0) {
+ throw new IllegalStateException("No managers, one or more expected");
+ }
+
+ if (numManagers == 1) {
+ return List.of(tables);
+ }
+
+ SortedSet<Pair<TableId,Long>> tableTabletCounts = new TreeSet<>(new
Comparator<>() {
+ @Override
+ public int compare(Pair<TableId,Long> table1, Pair<TableId,Long> table2)
{
+ // sort descending by number of tablets
+ int result = table1.getSecond().compareTo(table2.getSecond());
+ if (result == 0) {
+ return table1.getFirst().compareTo(table2.getFirst());
+ }
+ return -1 * result;
+ }
+ });
+ tables.forEach(tid -> {
Review Comment:
When the set of managers and tables are steady for a bit, all manager
processes need to arrive at the same decisions for partitioning tables into
buckets. With the algorithm in this method different manager processes may see
different counts for the same tables at different times and end up partitioning
tables into different buckets. This could lead to overlap in the partitions or
in the worst case a table that no manager processes. We could start with a
deterministic hash partitioning of tables and open a follow in issue to
improve. One possible way to improve would be to have a single manager process
run this algorithm and publish the partitioning information, with all other
manager just using it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]