[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out pipeline allocation on datanodes and add metrcs/test

2019-12-16 Thread GitBox
xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out 
pipeline allocation on datanodes and add metrcs/test
URL: https://github.com/apache/hadoop-ozone/pull/291#discussion_r358276649
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/pipeline/TestPipelineDatanodesIntersection.java
 ##
 @@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.pipeline;
+
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.protocol.proto.HddsProtos;
+import org.apache.hadoop.hdds.scm.container.MockNodeManager;
+import org.apache.hadoop.hdds.scm.exceptions.SCMException;
+import org.apache.hadoop.hdds.scm.node.NodeManager;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collection;
+
+import static 
org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_DATANODE_MAX_PIPELINE_ENGAGEMENT;
+import static 
org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_PIPELINE_AUTO_CREATE_FACTOR_ONE;
+
+/**
+ * Test for pipeline datanodes intersection.
+ */
+@RunWith(Parameterized.class)
+public class TestPipelineDatanodesIntersection {
+  private int nodeCount;
+  private int nodeHeaviness;
+  private OzoneConfiguration conf;
+  private boolean end;
+
+  @Before
+  public void initialize() {
+conf = new OzoneConfiguration();
+end = false;
+  }
+
+  public TestPipelineDatanodesIntersection(int nodeCount, int nodeHeaviness) {
+this.nodeCount = nodeCount;
+this.nodeHeaviness = nodeHeaviness;
+  }
+
+  @Parameterized.Parameters
+  public static Collection inputParams() {
+return Arrays.asList(new Object[][] {
+{4, 5},
+{10, 5},
+{20, 5},
+{50, 5},
+{100, 5},
+{100, 10}
+});
+  }
+
+  @Test
+  public void testPipelineDatanodesIntersection() {
+NodeManager nodeManager= new MockNodeManager(true, nodeCount);
+conf.setInt(OZONE_DATANODE_MAX_PIPELINE_ENGAGEMENT, nodeHeaviness);
+conf.setBoolean(OZONE_SCM_PIPELINE_AUTO_CREATE_FACTOR_ONE, false);
+PipelineStateManager stateManager = new PipelineStateManager(conf);
+PipelineProvider provider = new MockRatisPipelineProvider(nodeManager,
+stateManager, conf);
+
+int healthyNodeCount = nodeManager
+.getNodeCount(HddsProtos.NodeState.HEALTHY);
+int intersectionCount = 0;
+int createdPipelineCount = 0;
+while (!end && createdPipelineCount <= healthyNodeCount * nodeHeaviness) {
+  try {
+Pipeline pipeline = 
provider.create(HddsProtos.ReplicationFactor.THREE);
+stateManager.addPipeline(pipeline);
+nodeManager.addPipeline(pipeline);
+Pipeline overlapPipeline = RatisPipelineUtils
+.checkPipelineContainSameDatanodes(stateManager, pipeline);
+if (overlapPipeline != null){
+  intersectionCount++;
+  System.out.println("This pipeline: " + pipeline.getId().toString() +
+  " overlaps with previous pipeline: " + overlapPipeline.getId() +
+  ". They share same set of datanodes as: " +
+  pipeline.getNodesInOrder().get(0).getUuid() + "/" +
+  pipeline.getNodesInOrder().get(1).getUuid() + "/" +
+  pipeline.getNodesInOrder().get(2).getUuid() + " and " +
+  overlapPipeline.getNodesInOrder().get(0).getUuid() + "/" +
+  overlapPipeline.getNodesInOrder().get(1).getUuid() + "/" +
+  overlapPipeline.getNodesInOrder().get(2).getUuid() +
+  " is the same.");
+}
+createdPipelineCount++;
+  } catch(SCMException e) {
+end = true;
+  } catch (IOException e) {
+end = true;
+// Should not throw regular IOException.
+Assert.fail();
+  }
+}
+
+end = false;
+
+System.out.println("Among total " +
 
 Review comment:
   can we change to a log4j output instead of console output?


This is an automated message from the Apache Git Servic

[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out pipeline allocation on datanodes and add metrcs/test

2019-12-12 Thread GitBox
xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out 
pipeline allocation on datanodes and add metrcs/test
URL: https://github.com/apache/hadoop-ozone/pull/291#discussion_r357511301
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/SCMPipelineManager.java
 ##
 @@ -163,6 +177,18 @@ public synchronized Pipeline 
createPipeline(ReplicationType type,
 metrics.incNumPipelineCreated();
 metrics.createPerPipelineMetrics(pipeline);
   }
+  Pipeline overlapPipeline = RatisPipelineUtils
+  .checkPipelineContainSameDatanodes(stateManager, pipeline);
+  if (overlapPipeline != null) {
+metrics.incNumPipelineContainSameDatanodes();
+//TODO remove until pipeline allocation is proved equally distributed.
+LOG.info("Pipeline: " + pipeline.getId().toString() +
 
 Review comment:
   Talked offline. OK to have this at INFO level for now. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out pipeline allocation on datanodes and add metrcs/test

2019-12-12 Thread GitBox
xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out 
pipeline allocation on datanodes and add metrcs/test
URL: https://github.com/apache/hadoop-ozone/pull/291#discussion_r357509565
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/RatisPipelineUtils.java
 ##
 @@ -100,4 +105,45 @@ static void destroyPipeline(DatanodeDetails dn, 
PipelineID pipelineID,
   true, p.getId());
 }
   }
+
+  static int encodeNodeIdsOfFactorThreePipeline(List nodes) {
+if (nodes.size() != HddsProtos.ReplicationFactor.THREE.getNumber()) {
+  return 0;
+}
+List nodeIds = nodes.stream()
+.map(DatanodeDetails::getUuid).distinct()
+.collect(Collectors.toList());
+nodeIds.sort(new ComparableComparator());
+// Only for Factor THREE pipeline.
+return new HashCodeBuilder()
+.append(nodeIds.get(0).toString())
+.append(nodeIds.get(1).toString())
+.append(nodeIds.get(2).toString())
+.toHashCode();
+  }
+
+  /**
+   * Return first existed pipeline which share the same set of datanodes
+   * with the input pipeline.
+   * @param stateManager PipelineStateManager
+   * @param pipeline input pipeline
+   * @return first matched pipeline
+   */
+  static Pipeline checkPipelineContainSameDatanodes(
 
 Review comment:
   Both should work just non-static seems cleaner. But if there are too much 
additional work to fix the unit tests. I'm OK with current approach. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out pipeline allocation on datanodes and add metrcs/test

2019-12-10 Thread GitBox
xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out 
pipeline allocation on datanodes and add metrcs/test
URL: https://github.com/apache/hadoop-ozone/pull/291#discussion_r356248292
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/SCMPipelineManager.java
 ##
 @@ -163,6 +177,18 @@ public synchronized Pipeline 
createPipeline(ReplicationType type,
 metrics.incNumPipelineCreated();
 metrics.createPerPipelineMetrics(pipeline);
   }
+  Pipeline overlapPipeline = RatisPipelineUtils
+  .checkPipelineContainSameDatanodes(stateManager, pipeline);
+  if (overlapPipeline != null) {
+metrics.incNumPipelineContainSameDatanodes();
+//TODO remove until pipeline allocation is proved equally distributed.
+LOG.info("Pipeline: " + pipeline.getId().toString() +
 
 Review comment:
   can we 
   1) put this with if (LOG.isDebugEnabled()) 
   2) change the log level to DEBUG 
   3) change to use parameterized log formatting
   to avoid performance penalty from logging as this is could affect write?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out pipeline allocation on datanodes and add metrcs/test

2019-12-10 Thread GitBox
xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out 
pipeline allocation on datanodes and add metrcs/test
URL: https://github.com/apache/hadoop-ozone/pull/291#discussion_r356248292
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/SCMPipelineManager.java
 ##
 @@ -163,6 +177,18 @@ public synchronized Pipeline 
createPipeline(ReplicationType type,
 metrics.incNumPipelineCreated();
 metrics.createPerPipelineMetrics(pipeline);
   }
+  Pipeline overlapPipeline = RatisPipelineUtils
+  .checkPipelineContainSameDatanodes(stateManager, pipeline);
+  if (overlapPipeline != null) {
+metrics.incNumPipelineContainSameDatanodes();
+//TODO remove until pipeline allocation is proved equally distributed.
+LOG.info("Pipeline: " + pipeline.getId().toString() +
 
 Review comment:
   can we put this with if (LOG.isDebugEnabled()) and change the log level to 
DEBUG as this is could affect write performance?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out pipeline allocation on datanodes and add metrcs/test

2019-12-10 Thread GitBox
xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out 
pipeline allocation on datanodes and add metrcs/test
URL: https://github.com/apache/hadoop-ozone/pull/291#discussion_r356245738
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/SCMPipelineManager.java
 ##
 @@ -128,6 +128,19 @@ public void setPipelineProvider(ReplicationType 
replicationType,
 pipelineFactory.setProvider(replicationType, provider);
   }
 
+  private int computeNodeIdHash(Pipeline pipeline) {
+if (pipeline.getType() != ReplicationType.RATIS) {
+  return 0;
+}
+
+if (pipeline.getFactor() != ReplicationFactor.THREE) {
+  return 0;
+}
+
+return RatisPipelineUtils.
+encodeNodeIdsOfFactorThreePipeline(pipeline.getNodes());
 
 Review comment:
   unordered node list can produce different hash for pipelines with the same 
datanodes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out pipeline allocation on datanodes and add metrcs/test

2019-12-10 Thread GitBox
xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out 
pipeline allocation on datanodes and add metrcs/test
URL: https://github.com/apache/hadoop-ozone/pull/291#discussion_r356244699
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/RatisPipelineUtils.java
 ##
 @@ -100,4 +105,45 @@ static void destroyPipeline(DatanodeDetails dn, 
PipelineID pipelineID,
   true, p.getId());
 }
   }
+
+  static int encodeNodeIdsOfFactorThreePipeline(List nodes) {
+if (nodes.size() != HddsProtos.ReplicationFactor.THREE.getNumber()) {
+  return 0;
+}
+List nodeIds = nodes.stream()
+.map(DatanodeDetails::getUuid).distinct()
+.collect(Collectors.toList());
+nodeIds.sort(new ComparableComparator());
+// Only for Factor THREE pipeline.
+return new HashCodeBuilder()
+.append(nodeIds.get(0).toString())
+.append(nodeIds.get(1).toString())
+.append(nodeIds.get(2).toString())
+.toHashCode();
+  }
+
+  /**
+   * Return first existed pipeline which share the same set of datanodes
+   * with the input pipeline.
+   * @param stateManager PipelineStateManager
+   * @param pipeline input pipeline
+   * @return first matched pipeline
+   */
+  static Pipeline checkPipelineContainSameDatanodes(
+  PipelineStateManager stateManager, Pipeline pipeline) {
+List matchedPipelines = stateManager.getPipelines(
+HddsProtos.ReplicationType.RATIS,
+HddsProtos.ReplicationFactor.THREE)
+.stream().filter(p -> !p.equals(pipeline) &&
 
 Review comment:
   Need to check the override of equals() in Pipeline object, the list of nodes 
may not in the same order. This requires us to use getNodesInOrder instead of 
getNodes().


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out pipeline allocation on datanodes and add metrcs/test

2019-12-10 Thread GitBox
xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out 
pipeline allocation on datanodes and add metrcs/test
URL: https://github.com/apache/hadoop-ozone/pull/291#discussion_r356241293
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/RatisPipelineUtils.java
 ##
 @@ -100,4 +105,45 @@ static void destroyPipeline(DatanodeDetails dn, 
PipelineID pipelineID,
   true, p.getId());
 }
   }
+
+  static int encodeNodeIdsOfFactorThreePipeline(List nodes) {
+if (nodes.size() != HddsProtos.ReplicationFactor.THREE.getNumber()) {
+  return 0;
+}
+List nodeIds = nodes.stream()
+.map(DatanodeDetails::getUuid).distinct()
+.collect(Collectors.toList());
+nodeIds.sort(new ComparableComparator());
+// Only for Factor THREE pipeline.
+return new HashCodeBuilder()
+.append(nodeIds.get(0).toString())
+.append(nodeIds.get(1).toString())
+.append(nodeIds.get(2).toString())
+.toHashCode();
+  }
+
+  /**
+   * Return first existed pipeline which share the same set of datanodes
+   * with the input pipeline.
+   * @param stateManager PipelineStateManager
+   * @param pipeline input pipeline
+   * @return first matched pipeline
+   */
+  static Pipeline checkPipelineContainSameDatanodes(
 
 Review comment:
   Should we move this to a non-static PipelineStateManager method?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out pipeline allocation on datanodes and add metrcs/test

2019-12-10 Thread GitBox
xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out 
pipeline allocation on datanodes and add metrcs/test
URL: https://github.com/apache/hadoop-ozone/pull/291#discussion_r356232676
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/RatisPipelineUtils.java
 ##
 @@ -100,4 +105,45 @@ static void destroyPipeline(DatanodeDetails dn, 
PipelineID pipelineID,
   true, p.getId());
 }
   }
+
+  static int encodeNodeIdsOfFactorThreePipeline(List nodes) {
+if (nodes.size() != HddsProtos.ReplicationFactor.THREE.getNumber()) {
+  return 0;
+}
+List nodeIds = nodes.stream()
+.map(DatanodeDetails::getUuid).distinct()
+.collect(Collectors.toList());
+nodeIds.sort(new ComparableComparator());
 
 Review comment:
   Can we use orderless hash here to avoid sorting? E.g., xor the hash of all 
the nodeId hash?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out pipeline allocation on datanodes and add metrcs/test

2019-12-10 Thread GitBox
xiaoyuyao commented on a change in pull request #291: HDDS-1574 Average out 
pipeline allocation on datanodes and add metrcs/test
URL: https://github.com/apache/hadoop-ozone/pull/291#discussion_r356221673
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/RatisPipelineProvider.java
 ##
 @@ -170,12 +169,18 @@ public Pipeline create(ReplicationFactor factor) throws 
IOException {
   throw new IllegalStateException("Unknown factor: " + factor.name());
 }
 
+int nodeIdHash = 0;
+if (factor == ReplicationFactor.THREE) {
 
 Review comment:
   NIT: this condition check can be combined into the switch(factor) after line 
165. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org