[jira] [Commented] (HDDS-1753) Datanode unable to find chunk while replication data using ratis.
[ https://issues.apache.org/jira/browse/HDDS-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917434#comment-16917434 ] Hudson commented on HDDS-1753: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17192 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17192/]) HDDS-1753. Datanode unable to find chunk while replication data using (ljain: rev 5d31a4eff785ba4da22bf0b30c9b995495c98844) * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/CommandDispatcher.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/impl/ChunkManagerImpl.java * (edit) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/common/TestBlockDeletingService.java * (edit) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/TestBlockDeletion.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/XceiverServerRatis.java * (edit) hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/testutils/BlockDeletingServiceTestImpl.java * (edit) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ContainerTestHelper.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/statemachine/background/BlockDeletingService.java * (add) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestDeleteWithSlowFollower.java * (edit) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/common/impl/TestContainerDeletionChoosingPolicy.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerData.java * (edit) hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/CSMMetrics.java > Datanode unable to find chunk while replication data using ratis. > - > > Key: HDDS-1753 > URL: https://issues.apache.org/jira/browse/HDDS-1753 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Attachments: HDDS-1753.000.patch > > Time Spent: 5.5h > Remaining Estimate: 0h > > Leader datanode is unable to read chunk from the datanode while replicating > data from leader to follower. > Please note that deletion of keys is also happening while the data is being > replicated. > {code} > 2019-07-02 19:39:22,604 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. > Reply:76a3eb0f-d7cd-477b-8973-db1 > 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#70:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782 > 2019-07-02 19:39:22,605 ERROR impl.ChunkManagerImpl > (ChunkUtils.java:readData(161)) - Unable to find the chunk file. chunk info : > ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3 > -4d64-93d8-fa2ebafee933_chunk_1, offset=0, len=2048} > 2019-07-02 19:39:22,605 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(990)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot > (9770) already h > as the append entries (first index: 1) > 2019-07-02 19:39:22,605 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. > Reply:76a3eb0f-d7cd-477b-8973-db1 > 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#71:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782 > 2019-07-02 19:39:22,605 INFO keyvalue.KeyValueHandler > (ContainerUtils.java:logAndReturnError(146)) - Operation: ReadChunk : Trace > ID: 4216d461a4679e17:4216d461a4679e17:0:0 : Message: Unable to find the c > hunk file. chunk info > ChunkInfo{chunkNa
[jira] [Commented] (HDDS-1753) Datanode unable to find chunk while replication data using ratis.
[ https://issues.apache.org/jira/browse/HDDS-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911296#comment-16911296 ] Hadoop QA commented on HDDS-1753: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 55s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 4m 31s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 24s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 7m 12s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m 31s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 33s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 51s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 44s{color} | {color:orange} hadoop-hdds: The patch generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 44s{color} | {color:orange} hadoop-ozone: The patch generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 51s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 11s{color} | {color:green} hadoop-hdds in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 30s{color} | {color:red} hadoop-ozone in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 47s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}141m 16s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.ozone.container.common.impl.TestContainerDeletionChoosingPolicy | | | hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory | | | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures | | | hadoop.ozone.container.common.Tes
[jira] [Commented] (HDDS-1753) Datanode unable to find chunk while replication data using ratis.
[ https://issues.apache.org/jira/browse/HDDS-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908895#comment-16908895 ] Shashikant Banerjee commented on HDDS-1753: --- Uploaded patch v0 to address the issue. The patch is rebased on top of HDDS-1610. > Datanode unable to find chunk while replication data using ratis. > - > > Key: HDDS-1753 > URL: https://issues.apache.org/jira/browse/HDDS-1753 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster > Attachments: HDDS-1753.000.patch > > > Leader datanode is unable to read chunk from the datanode while replicating > data from leader to follower. > Please note that deletion of keys is also happening while the data is being > replicated. > {code} > 2019-07-02 19:39:22,604 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. > Reply:76a3eb0f-d7cd-477b-8973-db1 > 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#70:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782 > 2019-07-02 19:39:22,605 ERROR impl.ChunkManagerImpl > (ChunkUtils.java:readData(161)) - Unable to find the chunk file. chunk info : > ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3 > -4d64-93d8-fa2ebafee933_chunk_1, offset=0, len=2048} > 2019-07-02 19:39:22,605 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(990)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot > (9770) already h > as the append entries (first index: 1) > 2019-07-02 19:39:22,605 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. > Reply:76a3eb0f-d7cd-477b-8973-db1 > 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#71:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782 > 2019-07-02 19:39:22,605 INFO keyvalue.KeyValueHandler > (ContainerUtils.java:logAndReturnError(146)) - Operation: ReadChunk : Trace > ID: 4216d461a4679e17:4216d461a4679e17:0:0 : Message: Unable to find the c > hunk file. chunk info > ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1, > offset=0, len=2048} : Result: UNABLE_TO_FIND_CHUNK > 2019-07-02 19:39:22,605 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(990)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot > (9770) already h > as the append entries (first index: 2) > 2019-07-02 19:39:22,606 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. > Reply:76a3eb0f-d7cd-477b-8973-db1 > 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#72:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782 > 19:39:22.606 [pool-195-thread-19] ERROR DNAudit - user=null | ip=null | > op=READ_CHUNK {blockData=conID: 3 locID: 102372189549953034 bcsId: 0} | > ret=FAILURE > java.lang.Exception: Unable to find the chunk file. chunk info > ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1, > offset=0, len=2048} > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:320) > ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?] > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148) > ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?] > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:346) > ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?] > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:476) > ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?] > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$getCachedStateMachineData$2(ContainerStateMachine.java:495) > ~[hadoop-hdds-container-service-0.5.0-SN > APSHOT.jar:?] > at > com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767) > ~[guava-11.0.2.jar:?] > at > com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568) > ~[guava-11.0.2.jar:?] > at > com.google.common.cache.LocalCache$Segment.loadSy
[jira] [Commented] (HDDS-1753) Datanode unable to find chunk while replication data using ratis.
[ https://issues.apache.org/jira/browse/HDDS-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883727#comment-16883727 ] Shashikant Banerjee commented on HDDS-1753: --- The issue being caused here is as data is still to be replicated to the followers via leader, as a result of key delete , a block in a closed container can get deleted on the leader. When the follower asks for the chunk data from the leader, it fails as the chunk file does not exist in the leader. The solution being proposed here is as follows: Whenever a delete command gets received on a datanode from SCM, it should first check the min replicated index across all the servers in the pipeline. ContainerStateMachine will also track, the close container log index for each cotainer. Now, if the min replicated index >= close container index in the leader, a delete operation will be queued over Ratis in the leader and same will be ignored in the follower and now delete will happen over Ratis. In case, close container index is not replicated, delete transaction will never be enqueued over Ratis and ignored. SCM already has a retry policy in place to retry the same delete. In case, the Ratis pipeline does not exist, delete will work as is. > Datanode unable to find chunk while replication data using ratis. > - > > Key: HDDS-1753 > URL: https://issues.apache.org/jira/browse/HDDS-1753 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster > > Leader datanode is unable to read chunk from the datanode while replicating > data from leader to follower. > Please note that deletion of keys is also happening while the data is being > replicated. > {code} > 2019-07-02 19:39:22,604 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. > Reply:76a3eb0f-d7cd-477b-8973-db1 > 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#70:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782 > 2019-07-02 19:39:22,605 ERROR impl.ChunkManagerImpl > (ChunkUtils.java:readData(161)) - Unable to find the chunk file. chunk info : > ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3 > -4d64-93d8-fa2ebafee933_chunk_1, offset=0, len=2048} > 2019-07-02 19:39:22,605 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(990)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot > (9770) already h > as the append entries (first index: 1) > 2019-07-02 19:39:22,605 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. > Reply:76a3eb0f-d7cd-477b-8973-db1 > 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#71:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782 > 2019-07-02 19:39:22,605 INFO keyvalue.KeyValueHandler > (ContainerUtils.java:logAndReturnError(146)) - Operation: ReadChunk : Trace > ID: 4216d461a4679e17:4216d461a4679e17:0:0 : Message: Unable to find the c > hunk file. chunk info > ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1, > offset=0, len=2048} : Result: UNABLE_TO_FIND_CHUNK > 2019-07-02 19:39:22,605 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(990)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot > (9770) already h > as the append entries (first index: 2) > 2019-07-02 19:39:22,606 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. > Reply:76a3eb0f-d7cd-477b-8973-db1 > 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#72:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782 > 19:39:22.606 [pool-195-thread-19] ERROR DNAudit - user=null | ip=null | > op=READ_CHUNK {blockData=conID: 3 locID: 102372189549953034 bcsId: 0} | > ret=FAILURE > java.lang.Exception: Unable to find the chunk file. chunk info > ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1, > offset=0, len=2048} > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:320) > ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?] > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148) > ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?] > at > org.apache.hadoo