[GitHub] [hadoop] supratimdeka commented on a change in pull request #1364: HDDS-1843. Undetectable corruption after restart of a datanode.

2019-09-03 Thread GitBox
supratimdeka commented on a change in pull request #1364: HDDS-1843. 
Undetectable corruption after restart of a datanode.
URL: https://github.com/apache/hadoop/pull/1364#discussion_r320362456
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java
 ##
 @@ -258,8 +257,9 @@ private long loadSnapshot(SingleFileSnapshotInfo snapshot)
* @throws IOException
*/
   public void persistContainerSet(OutputStream out) throws IOException {
 
 Review comment:
   should we rename this function as well?
   given that other instances of containerSet have been renamed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] supratimdeka commented on a change in pull request #1364: HDDS-1843. Undetectable corruption after restart of a datanode.

2019-09-03 Thread GitBox
supratimdeka commented on a change in pull request #1364: HDDS-1843. 
Undetectable corruption after restart of a datanode.
URL: https://github.com/apache/hadoop/pull/1364#discussion_r320359663
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/impl/BlockManagerImpl.java
 ##
 @@ -282,5 +282,4 @@ public void deleteBlock(Container container, BlockID 
blockID) throws
   public void shutdown() {
 BlockUtils.shutdownCache(ContainerCache.getInstance(config));
   }
-
 
 Review comment:
   unintended whitespace only change?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] supratimdeka commented on a change in pull request #1364: HDDS-1843. Undetectable corruption after restart of a datanode.

2019-09-03 Thread GitBox
supratimdeka commented on a change in pull request #1364: HDDS-1843. 
Undetectable corruption after restart of a datanode.
URL: https://github.com/apache/hadoop/pull/1364#discussion_r320313492
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java
 ##
 @@ -228,11 +231,14 @@ private ContainerCommandResponseProto dispatchRequest(
   audit(action, eventType, params, AuditEventStatus.FAILURE, sce);
   return ContainerUtils.logAndReturnError(LOG, sce, msg);
 }
-Preconditions.checkArgument(isWriteStage && containerIdSet != null
+Preconditions.checkArgument(isWriteStage && container2BCSIDMap != null
 || dispatcherContext == null);
-if (containerIdSet != null) {
+if (container2BCSIDMap != null) {
   // adds this container to list of containers created in the pipeline
-  containerIdSet.add(containerID);
+  // with initial BCSID recorded as 0.
+  Preconditions
+  .checkArgument(!container2BCSIDMap.containsKey(containerID));
 
 Review comment:
   is this assert safe? asking the question because there was no equivalent 
assert before.
   
   trying to imagine a situation where this might be a false alarm - is this 
possible?
   1. container create is successful, 
   2. container2BCSIDMap.putIfAbsent is successful
   3. datanode restarts - container2BCSIDMap is persisted, but the container 
itself is not persisted (create container involves a rename operation which may 
not be sync'ed inline)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] supratimdeka commented on a change in pull request #1364: HDDS-1843. Undetectable corruption after restart of a datanode.

2019-09-03 Thread GitBox
supratimdeka commented on a change in pull request #1364: HDDS-1843. 
Undetectable corruption after restart of a datanode.
URL: https://github.com/apache/hadoop/pull/1364#discussion_r320287270
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java
 ##
 @@ -240,14 +240,37 @@ public ContainerReportsProto getContainerReport() throws 
IOException {
   }
 
   /**
-   * Builds the missing container set by taking a diff total no containers
-   * actually found and number of containers which actually got created.
+   * Builds the missing container set by taking a diff between total no
+   * containers actually found and number of containers which actually
+   * got created. It also validates the BCSID stored in the snapshot file
+   * for each container as against what is reported in containerScan.
* This will only be called during the initialization of Datanode Service
* when  it still not a part of any write Pipeline.
-   * @param createdContainerSet ContainerId set persisted in the Ratis snapshot
+   * @param container2BCSIDMap Map of containerId to BCSID persisted in the
+   *   Ratis snapshot
*/
-  public void buildMissingContainerSet(Set createdContainerSet) {
-missingContainerSet.addAll(createdContainerSet);
-missingContainerSet.removeAll(containerMap.keySet());
+  public void buildMissingContainerSetAndValidate(
+  Map container2BCSIDMap) throws IOException {
+for (Map.Entry mapEntry : container2BCSIDMap.entrySet()) {
+  long id = mapEntry.getKey();
+  if (!containerMap.containsKey(id)) {
+LOG.warn("Adding container {} to missing container set.", id);
+missingContainerSet.add(id);
+  } else {
+Container container = containerMap.get(id);
+long containerBCSID = container.getBlockCommitSequenceId();
+long snapshotBCSID = mapEntry.getValue();
+if (containerBCSID < snapshotBCSID) {
+  LOG.warn(
+  "Marking container {} unhealthy as reported BCSID {} is smaller"
 
 Review comment:
   argument appears to be missing. container not passed as the first argument.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] supratimdeka commented on a change in pull request #1364: HDDS-1843. Undetectable corruption after restart of a datanode.

2019-09-03 Thread GitBox
supratimdeka commented on a change in pull request #1364: HDDS-1843. 
Undetectable corruption after restart of a datanode.
URL: https://github.com/apache/hadoop/pull/1364#discussion_r320368137
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java
 ##
 @@ -329,6 +336,21 @@ private ContainerCommandResponseProto dispatchRequest(
 }
   }
 
+  private void updateBCSID(Container container,
+  DispatcherContext dispatcherContext, ContainerProtos.Type cmdType) {
+long bcsID = container.getBlockCommitSequenceId();
+long containerId = container.getContainerData().getContainerID();
+Map container2BCSIDMap;
+if (dispatcherContext != null && (cmdType == ContainerProtos.Type.PutBlock
 
 Review comment:
   an alternative implementation would be to ignore the cmdType.
   For all requests, read the BCS ID from the map and compare it to the bcsid 
from the container. if the map value is lower, then update it to the value from 
the container. 
   
   Advantage:
   updateBCSID function becomes de-coupled from the knowledge of which cmdType 
changes the bcsid.
   Disadvantage:
   possibly more CPU cost, because every request will pay the cost of reading 
the bcsid map. but it might be premature to assume that this additional cost 
will be significant.

   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org