[jira] [Updated] (HDFS-14636) SBN : If you configure the default proxy provider still read Request going to Observer namenode only.
[ https://issues.apache.org/jira/browse/HDFS-14636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshakiran Reddy updated HDFS-14636: - Labels: SBN (was: ) > SBN : If you configure the default proxy provider still read Request going to > Observer namenode only. > - > > Key: HDFS-14636 > URL: https://issues.apache.org/jira/browse/HDFS-14636 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.1.1 >Reporter: Harshakiran Reddy >Assignee: Ranith Sardar >Priority: Major > Labels: SBN > > {noformat} > In Observer cluster, will configure the default proxy provider instead of > "org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider", still > Read request going to Observer namenode only.{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1728) Add metrics for leader's latency in ContainerStateMachine
[ https://issues.apache.org/jira/browse/HDDS-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-1728: Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the review [~ljain]. I have committed this to trunk. > Add metrics for leader's latency in ContainerStateMachine > - > > Key: HDDS-1728 > URL: https://issues.apache.org/jira/browse/HDDS-1728 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > This jira proposes to add metrics around leaders round trip reply to ratis > client. This will be done via startTransaction api -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14636) SBN : If you configure the default proxy provider still read Request going to Observer namenode only.
[ https://issues.apache.org/jira/browse/HDFS-14636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ranith Sardar reassigned HDFS-14636: Assignee: Ranith Sardar > SBN : If you configure the default proxy provider still read Request going to > Observer namenode only. > - > > Key: HDFS-14636 > URL: https://issues.apache.org/jira/browse/HDFS-14636 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.1.1 >Reporter: Harshakiran Reddy >Assignee: Ranith Sardar >Priority: Major > > {noformat} > In Observer cluster, will configure the default proxy provider instead of > "org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider", still > Read request going to Observer namenode only.{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14636) SBN : If you configure the default proxy provider still read Request going to Observer namenode only.
Harshakiran Reddy created HDFS-14636: Summary: SBN : If you configure the default proxy provider still read Request going to Observer namenode only. Key: HDFS-14636 URL: https://issues.apache.org/jira/browse/HDFS-14636 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.1.1 Reporter: Harshakiran Reddy {noformat} In Observer cluster, will configure the default proxy provider instead of "org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider", still Read request going to Observer namenode only.{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1728) Add metrics for leader's latency in ContainerStateMachine
[ https://issues.apache.org/jira/browse/HDDS-1728?focusedWorklogId=273092&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-273092 ] ASF GitHub Bot logged work on HDDS-1728: Author: ASF GitHub Bot Created on: 08/Jul/19 06:49 Start Date: 08/Jul/19 06:49 Worklog Time Spent: 10m Work Description: mukul1987 commented on pull request #1022: HDDS-1728. Add metrics for leader's latency in ContainerStateMachine. Contributed by Mukul Kumar Singh. URL: https://github.com/apache/hadoop/pull/1022 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 273092) Time Spent: 1h 10m (was: 1h) > Add metrics for leader's latency in ContainerStateMachine > - > > Key: HDDS-1728 > URL: https://issues.apache.org/jira/browse/HDDS-1728 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > This jira proposes to add metrics around leaders round trip reply to ratis > client. This will be done via startTransaction api -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14034) Support getQuotaUsage API in WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880062#comment-16880062 ] Wei-Chiu Chuang commented on HDFS-14034: Barring from the trivial checkstyle warnings, the patch looks good to me. Of course, now we will need to implement the corresponding HttpFs handlers too. We should file a new jira for that. Unrelated. ContentSummary has a field erasureCodingPolicy which was added in HDFS-11647, but webhdfs GETCONTENTSUMMARY doesn't include that. > Support getQuotaUsage API in WebHDFS > > > Key: HDFS-14034 > URL: https://issues.apache.org/jira/browse/HDFS-14034 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs, webhdfs >Reporter: Erik Krogen >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14034.000.patch > > > HDFS-8898 added support for a new API, {{getQuotaUsage}} which can fetch > quota usage on a directory with significantly lower impact than the similar > {{getContentSummary}}. This JIRA is to track adding support for this API to > WebHDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1728) Add metrics for leader's latency in ContainerStateMachine
[ https://issues.apache.org/jira/browse/HDDS-1728?focusedWorklogId=273077&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-273077 ] ASF GitHub Bot logged work on HDDS-1728: Author: ASF GitHub Bot Created on: 08/Jul/19 06:31 Start Date: 08/Jul/19 06:31 Worklog Time Spent: 10m Work Description: lokeshj1703 commented on issue #1022: HDDS-1728. Add metrics for leader's latency in ContainerStateMachine. Contributed by Mukul Kumar Singh. URL: https://github.com/apache/hadoop/pull/1022#issuecomment-509096885 @mukul1987 Thanks for updating the PR. The changes look good to me. +1. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 273077) Time Spent: 1h (was: 50m) > Add metrics for leader's latency in ContainerStateMachine > - > > Key: HDDS-1728 > URL: https://issues.apache.org/jira/browse/HDDS-1728 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > This jira proposes to add metrics around leaders round trip reply to ratis > client. This will be done via startTransaction api -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880060#comment-16880060 ] Hadoop QA commented on HDFS-14483: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} branch-2.9 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 33s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 31s{color} | {color:green} branch-2.9 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 51s{color} | {color:green} branch-2.9 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 10s{color} | {color:green} branch-2.9 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 36s{color} | {color:green} branch-2.9 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 23s{color} | {color:green} branch-2.9 passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-hdfs-project/hadoop-hdfs-native-client {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 37s{color} | {color:red} hadoop-common-project/hadoop-common in branch-2.9 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 21s{color} | {color:green} branch-2.9 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 49s{color} | {color:green} branch-2.9 passed with JDK v1.8.0_212 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 47s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 11m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 59s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 10m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-hdfs-project/hadoop-hdfs-native-client {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 27s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 46s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 44s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 33s{color} | {color:green} hadoop-hdfs-cli
[jira] [Created] (HDFS-14635) Support to refresh the rack awareness dynamically
liying created HDFS-14635: - Summary: Support to refresh the rack awareness dynamically Key: HDFS-14635 URL: https://issues.apache.org/jira/browse/HDFS-14635 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 2.7.2 Reporter: liying At present , there are two ways to load the rack script in the hadoop codes. The class ScriptBasedMapping is the cache way, and the class ScriptBasedMapping#RawScriptBasedMapping is the way that it will load script every time(every request)。 The cache is the good way to implement this feature, because it consumes cpu performance if loading the script for every quest。But here's another question that is we cann't refresh the cache, so it is import to support to refresh the rack awareness dynamically. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du
[ https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880052#comment-16880052 ] Wei-Chiu Chuang commented on HDFS-14313: Thank you [~leosun08]. I was out for a few days. I think overall the patch is almost ready. Please take care of a few nits that I spotted. ReplicaCachingGetSpaceUsed#run() would throw an NPE if ExternalDatasetImpl is used since ExternalDatasetImpl#deepCopyReplica() returns a null. IMO, it should throw an exception to indicate it is not supported, or return an empty Collection. For the new configuration keys, please update them with a prefix. For example, deep.copy.replica.threshold.ms --> fs.getspaceused.deep.copy.replica.threshold.ms Please add description of the new configurations into hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml > Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory > instead of df/du > > > Key: HDFS-14313 > URL: https://issues.apache.org/jira/browse/HDFS-14313 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, performance >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, > HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, > HDFS-14313.005.patch > > > There are two ways of DU/DF getting used space that are insufficient. > # Running DU across lots of disks is very expensive and running all of the > processes at the same time creates a noticeable IO spike. > # Running DF is inaccurate when the disk sharing by multiple datanode or > other servers. > Getting hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfos in memory > is very small and accurate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1603) Handle Ratis Append Failure in Container State Machine
[ https://issues.apache.org/jira/browse/HDDS-1603?focusedWorklogId=273067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-273067 ] ASF GitHub Bot logged work on HDDS-1603: Author: ASF GitHub Bot Created on: 08/Jul/19 06:04 Start Date: 08/Jul/19 06:04 Worklog Time Spent: 10m Work Description: mukul1987 commented on pull request #1019: HDDS-1603. Handle Ratis Append Failure in Container State Machine. Contributed by Supratim Deka URL: https://github.com/apache/hadoop/pull/1019#discussion_r300932058 ## File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/pipeline/TestPipelineClose.java ## @@ -180,4 +188,77 @@ public void testPipelineCloseWithPipelineAction() throws Exception { } catch (PipelineNotFoundException e) { } } + + @Test + public void testPipelineCloseWithLogFailure() throws IOException { + +EventQueue eventQ = (EventQueue) scm.getEventQueue(); +PipelineActionHandler pipelineActionTest = +Mockito.mock(PipelineActionHandler.class); +eventQ.addHandler(SCMEvents.PIPELINE_ACTIONS, pipelineActionTest); +ArgumentCaptor actionCaptor = +ArgumentCaptor.forClass(PipelineActionsFromDatanode.class); + +ContainerInfo containerInfo = containerManager +.allocateContainer(RATIS, THREE, "testOwner"); +ContainerWithPipeline containerWithPipeline = +new ContainerWithPipeline(containerInfo, +pipelineManager.getPipeline(containerInfo.getPipelineID())); +Pipeline openPipeline = containerWithPipeline.getPipeline(); +RaftGroupId groupId = RaftGroupId.valueOf(openPipeline.getId().getId()); + +try { + pipelineManager.getPipeline(openPipeline.getId()); +} catch (PipelineNotFoundException e) { + Assert.assertTrue("pipeline should exist", false); Review comment: In Junit, the test will exit if an uncaught exception is thrown, so this might not be needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 273067) Time Spent: 1h (was: 50m) > Handle Ratis Append Failure in Container State Machine > -- > > Key: HDDS-1603 > URL: https://issues.apache.org/jira/browse/HDDS-1603 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM >Reporter: Supratim Deka >Assignee: Supratim Deka >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > RATIS-573 would add notification to the State Machine on encountering failure > during Log append. > The scope of this jira is to build on RATIS-573 and define the handling for > log append failure in Container State Machine. > 1. Enqueue pipeline unhealthy action to SCM, add a reason code to the message. > 2. Trigger heartbeat to SCM > 3. Notify Ratis volume unhealthy to the Datanode, so that DN can trigger > async volume checker > Changes in the SCM to leverage the additional failure reason code, is outside > the scope of this jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1603) Handle Ratis Append Failure in Container State Machine
[ https://issues.apache.org/jira/browse/HDDS-1603?focusedWorklogId=273066&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-273066 ] ASF GitHub Bot logged work on HDDS-1603: Author: ASF GitHub Bot Created on: 08/Jul/19 06:04 Start Date: 08/Jul/19 06:04 Worklog Time Spent: 10m Work Description: mukul1987 commented on pull request #1019: HDDS-1603. Handle Ratis Append Failure in Container State Machine. Contributed by Supratim Deka URL: https://github.com/apache/hadoop/pull/1019#discussion_r300931037 ## File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/XceiverServerRatis.java ## @@ -545,18 +545,28 @@ private void handlePipelineFailure(RaftGroupId groupId, + roleInfoProto.getRole()); } +triggerPipelineClose(groupId, msg, Review comment: Lets have 2 Reasons, a) candidate failed, b) leader failed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 273066) Time Spent: 50m (was: 40m) > Handle Ratis Append Failure in Container State Machine > -- > > Key: HDDS-1603 > URL: https://issues.apache.org/jira/browse/HDDS-1603 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM >Reporter: Supratim Deka >Assignee: Supratim Deka >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > RATIS-573 would add notification to the State Machine on encountering failure > during Log append. > The scope of this jira is to build on RATIS-573 and define the handling for > log append failure in Container State Machine. > 1. Enqueue pipeline unhealthy action to SCM, add a reason code to the message. > 2. Trigger heartbeat to SCM > 3. Notify Ratis volume unhealthy to the Datanode, so that DN can trigger > async volume checker > Changes in the SCM to leverage the additional failure reason code, is outside > the scope of this jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1603) Handle Ratis Append Failure in Container State Machine
[ https://issues.apache.org/jira/browse/HDDS-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-1603: Status: Patch Available (was: Open) > Handle Ratis Append Failure in Container State Machine > -- > > Key: HDDS-1603 > URL: https://issues.apache.org/jira/browse/HDDS-1603 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM >Reporter: Supratim Deka >Assignee: Supratim Deka >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > RATIS-573 would add notification to the State Machine on encountering failure > during Log append. > The scope of this jira is to build on RATIS-573 and define the handling for > log append failure in Container State Machine. > 1. Enqueue pipeline unhealthy action to SCM, add a reason code to the message. > 2. Trigger heartbeat to SCM > 3. Notify Ratis volume unhealthy to the Datanode, so that DN can trigger > async volume checker > Changes in the SCM to leverage the additional failure reason code, is outside > the scope of this jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1748) Error message for 3 way commit failure is not verbose
[ https://issues.apache.org/jira/browse/HDDS-1748?focusedWorklogId=273057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-273057 ] ASF GitHub Bot logged work on HDDS-1748: Author: ASF GitHub Bot Created on: 08/Jul/19 05:52 Start Date: 08/Jul/19 05:52 Worklog Time Spent: 10m Work Description: mukul1987 commented on pull request #1051: HDDS-1748. Error message for 3 way commit failure is not verbose. Contributed by Supratim Deka URL: https://github.com/apache/hadoop/pull/1051#discussion_r300929755 ## File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientRatis.java ## @@ -258,8 +258,15 @@ public XceiverClientReply watchForCommit(long index, long timeout) .sendWatchAsync(index, RaftProtos.ReplicationLevel.ALL_COMMITTED); replyFuture.get(timeout, TimeUnit.MILLISECONDS); } catch (Exception e) { + String nodes = " with Datanodes : "; Throwable t = HddsClientUtils.checkForException(e); - LOG.warn("3 way commit failed ", e); + for (DatanodeDetails datanodeDetails : pipeline.getNodes()) { +nodes += datanodeDetails.getHostName() + "[" Review comment: This line will thrown findbugs, lets use StringBuilder here in place of concatenation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 273057) Time Spent: 0.5h (was: 20m) > Error message for 3 way commit failure is not verbose > - > > Key: HDDS-1748 > URL: https://issues.apache.org/jira/browse/HDDS-1748 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Supratim Deka >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The error message for 3 way client commit is not verbose, it should include > blockID and pipeline ID along with node details for debugging. > {code} > 2019-07-02 09:58:12,025 WARN scm.XceiverClientRatis > (XceiverClientRatis.java:watchForCommit(262)) - 3 way commit failed > java.util.concurrent.ExecutionException: > org.apache.ratis.protocol.NotReplicatedException: Request with call Id 39482 > and log index 11562 is not yet replicated to ALL_COMMITTED > at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at > org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:259) > at > org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:194) > at > org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchOnFirstIndex(CommitWatcher.java:135) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:355) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFullBuffer(BlockOutputStream.java:332) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:259) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:211) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193) > at > org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) > at java.io.OutputStream.write(OutputStream.java:75) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:103) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:147) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.ratis.protocol.NotReplicatedException: Request with > call Id 39482 and log index 11562 is not yet replicated to ALL_COMMITTED > at > org.apache.ratis.client.impl.ClientProtoUtils.toRaftClientReply(ClientProtoUtils.java:245) > at > org
[jira] [Updated] (HDDS-1748) Error message for 3 way commit failure is not verbose
[ https://issues.apache.org/jira/browse/HDDS-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-1748: Status: Patch Available (was: Open) > Error message for 3 way commit failure is not verbose > - > > Key: HDDS-1748 > URL: https://issues.apache.org/jira/browse/HDDS-1748 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Supratim Deka >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The error message for 3 way client commit is not verbose, it should include > blockID and pipeline ID along with node details for debugging. > {code} > 2019-07-02 09:58:12,025 WARN scm.XceiverClientRatis > (XceiverClientRatis.java:watchForCommit(262)) - 3 way commit failed > java.util.concurrent.ExecutionException: > org.apache.ratis.protocol.NotReplicatedException: Request with call Id 39482 > and log index 11562 is not yet replicated to ALL_COMMITTED > at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at > org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:259) > at > org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:194) > at > org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchOnFirstIndex(CommitWatcher.java:135) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:355) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFullBuffer(BlockOutputStream.java:332) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:259) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:211) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193) > at > org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) > at java.io.OutputStream.write(OutputStream.java:75) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:103) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:147) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.ratis.protocol.NotReplicatedException: Request with > call Id 39482 and log index 11562 is not yet replicated to ALL_COMMITTED > at > org.apache.ratis.client.impl.ClientProtoUtils.toRaftClientReply(ClientProtoUtils.java:245) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:254) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:249) > at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:421) > at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33) > at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33) > at > org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:519) > at > org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > ... 3 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1750) Add block allocation metric for pipelines in SCM
[ https://issues.apache.org/jira/browse/HDDS-1750?focusedWorklogId=273054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-273054 ] ASF GitHub Bot logged work on HDDS-1750: Author: ASF GitHub Bot Created on: 08/Jul/19 05:46 Start Date: 08/Jul/19 05:46 Worklog Time Spent: 10m Work Description: mukul1987 commented on pull request #1047: HDDS-1750. Add block allocation metrics for pipelines in SCM URL: https://github.com/apache/hadoop/pull/1047#discussion_r300928815 ## File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/SCMPipelineManager.java ## @@ -152,6 +152,7 @@ public synchronized Pipeline createPipeline( stateManager.addPipeline(pipeline); nodeManager.addPipeline(pipeline); metrics.incNumPipelineCreated(); + metrics.createNumBlocksAllocatedMetric(pipeline); Review comment: This should be named as createPerPipelineMetrics or something like that :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 273054) Time Spent: 0.5h (was: 20m) > Add block allocation metric for pipelines in SCM > > > Key: HDDS-1750 > URL: https://issues.apache.org/jira/browse/HDDS-1750 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > This Jira aims to add block allocation metrics for pipelines in SCM. This > would help in determining the distribution of block allocations among various > pipelines in SCM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1750) Add block allocation metric for pipelines in SCM
[ https://issues.apache.org/jira/browse/HDDS-1750?focusedWorklogId=273055&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-273055 ] ASF GitHub Bot logged work on HDDS-1750: Author: ASF GitHub Bot Created on: 08/Jul/19 05:46 Start Date: 08/Jul/19 05:46 Worklog Time Spent: 10m Work Description: mukul1987 commented on pull request #1047: HDDS-1750. Add block allocation metrics for pipelines in SCM URL: https://github.com/apache/hadoop/pull/1047#discussion_r300928623 ## File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/SCMPipelineManager.java ## @@ -152,6 +152,7 @@ public synchronized Pipeline createPipeline( stateManager.addPipeline(pipeline); nodeManager.addPipeline(pipeline); metrics.incNumPipelineCreated(); + metrics.createNumBlocksAllocatedMetric(pipeline); Review comment: Can line 154 and 155 be done in one state to pipelineMetrics ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 273055) Time Spent: 0.5h (was: 20m) > Add block allocation metric for pipelines in SCM > > > Key: HDDS-1750 > URL: https://issues.apache.org/jira/browse/HDDS-1750 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > This Jira aims to add block allocation metrics for pipelines in SCM. This > would help in determining the distribution of block allocations among various > pipelines in SCM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1750) Add block allocation metric for pipelines in SCM
[ https://issues.apache.org/jira/browse/HDDS-1750?focusedWorklogId=273056&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-273056 ] ASF GitHub Bot logged work on HDDS-1750: Author: ASF GitHub Bot Created on: 08/Jul/19 05:46 Start Date: 08/Jul/19 05:46 Worklog Time Spent: 10m Work Description: mukul1987 commented on pull request #1047: HDDS-1750. Add block allocation metrics for pipelines in SCM URL: https://github.com/apache/hadoop/pull/1047#discussion_r300928722 ## File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/SCMPipelineManager.java ## @@ -362,6 +364,7 @@ private void finalizePipeline(PipelineID pipelineId) throws IOException { for (ContainerID containerID : containerIDs) { eventPublisher.fireEvent(SCMEvents.CLOSE_CONTAINER, containerID); } + metrics.clearMetrics(pipelineId); Review comment: Lets rename this to removePipelineMetrics This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 273056) Time Spent: 0.5h (was: 20m) > Add block allocation metric for pipelines in SCM > > > Key: HDDS-1750 > URL: https://issues.apache.org/jira/browse/HDDS-1750 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > This Jira aims to add block allocation metrics for pipelines in SCM. This > would help in determining the distribution of block allocations among various > pipelines in SCM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1735) Create separate unit and integration test executor dev-support script
[ https://issues.apache.org/jira/browse/HDDS-1735?focusedWorklogId=273048&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-273048 ] ASF GitHub Bot logged work on HDDS-1735: Author: ASF GitHub Bot Created on: 08/Jul/19 05:25 Start Date: 08/Jul/19 05:25 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1035: HDDS-1735. Create separate unit and integration test executor dev-support script URL: https://github.com/apache/hadoop/pull/1035#issuecomment-509083277 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 32 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | 0 | shelldocs | 0 | Shelldocs was not available. | | 0 | @author | 0 | Skipping @author checks as author.sh has been patched. | | +1 | test4tests | 0 | The patch appears to include 1 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 32 | Maven dependency ordering for branch | | +1 | mvninstall | 474 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | -1 | pylint | 1 | Error running pylint. Please check pylint stderr files. | | +1 | shadedclient | 774 | branch has no errors when building and testing our client artifacts. | ||| _ Patch Compile Tests _ | | 0 | mvndep | 30 | Maven dependency ordering for patch | | +1 | mvninstall | 453 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | -1 | pylint | 2 | Error running pylint. Please check pylint stderr files. | | +1 | pylint | 2 | There were no new pylint issues. | | +1 | shellcheck | 2 | The patch generated 0 new + 0 unchanged - 7 fixed = 0 total (was 7) | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 702 | patch has no errors when building and testing our client artifacts. | ||| _ Other Tests _ | | +1 | unit | 104 | hadoop-hdds in the patch passed. | | +1 | unit | 179 | hadoop-ozone in the patch passed. | | +1 | asflicense | 49 | The patch does not generate ASF License warnings. | | | | 3029 | | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1035/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1035 | | Optional Tests | dupname asflicense mvnsite unit shellcheck shelldocs pylint | | uname | Linux cc2fdb9998fb 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 9c90729 | | pylint | https://builds.apache.org/job/hadoop-multibranch/job/PR-1035/4/artifact/out/branch-pylint-stderr.txt | | pylint | https://builds.apache.org/job/hadoop-multibranch/job/PR-1035/4/artifact/out/patch-pylint-stderr.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1035/4/testReport/ | | Max. process+thread count | 411 (vs. ulimit of 5500) | | modules | C: hadoop-ozone hadoop-ozone/fault-injection-test/network-tests U: hadoop-ozone | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1035/4/console | | versions | git=2.7.4 maven=3.3.9 shellcheck=0.4.6 pylint=1.9.2 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 273048) Time Spent: 1h 40m (was: 1.5h) > Create separate unit and integration test executor dev-support script > - > > Key: HDDS-1735 > URL: https://issues.apache.org/jira/browse/HDDS-1735 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Attachments: Screen Shot 2019-07-02 at 3.25.33 PM.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > hadoop-ozone/dev-support/checks directory contains multiple helper script to > execute different type of testing (findbugs, rat, unit, build). > They easily define how tests should be executed, with the following contract: > * The problems should be pri
[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du
[ https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880028#comment-16880028 ] He Xiaoqiao commented on HDFS-14313: [~leosun08] Thanks for your report + patch this issue. And sorry for missing this information. I would like to take review this week. Thanks again. > Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory > instead of df/du > > > Key: HDFS-14313 > URL: https://issues.apache.org/jira/browse/HDFS-14313 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, performance >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, > HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, > HDFS-14313.005.patch > > > There are two ways of DU/DF getting used space that are insufficient. > # Running DU across lots of disks is very expensive and running all of the > processes at the same time creates a noticeable IO spike. > # Running DF is inaccurate when the disk sharing by multiple datanode or > other servers. > Getting hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfos in memory > is very small and accurate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12703) Exceptions are fatal to decommissioning monitor
[ https://issues.apache.org/jira/browse/HDFS-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880022#comment-16880022 ] He Xiaoqiao commented on HDFS-12703: [~elgoiri], thanks for your reviews. [^HDFS-12703.006.patch] update with above review comments. {quote}Should we be extra careful and catch also in the run() just in case? {quote} Correct, we should also catch exception in the run since {{Preconditions.checkState}} at the last part of Monitor#check. Thanks [~elgoiri] again. Please let me know if there are other corrections. > Exceptions are fatal to decommissioning monitor > --- > > Key: HDFS-12703 > URL: https://issues.apache.org/jira/browse/HDFS-12703 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Daryn Sharp >Assignee: He Xiaoqiao >Priority: Critical > Attachments: HDFS-12703.001.patch, HDFS-12703.002.patch, > HDFS-12703.003.patch, HDFS-12703.004.patch, HDFS-12703.005.patch, > HDFS-12703.006.patch > > > The {{DecommissionManager.Monitor}} runs as an executor scheduled task. If > an exception occurs, all decommissioning ceases until the NN is restarted. > Per javadoc for {{executor#scheduleAtFixedRate}}: *If any execution of the > task encounters an exception, subsequent executions are suppressed*. The > monitor thread is alive but blocked waiting for an executor task that will > never come. The code currently disposes of the future so the actual > exception that aborted the task is gone. > Failover is insufficient since the task is also likely dead on the standby. > Replication queue init after the transition to active will fix the under > replication of blocks on currently decommissioning nodes but future nodes > never decommission. The standby must be bounced prior to failover – and > hopefully the error condition does not reoccur. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12703) Exceptions are fatal to decommissioning monitor
[ https://issues.apache.org/jira/browse/HDFS-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-12703: --- Attachment: HDFS-12703.006.patch > Exceptions are fatal to decommissioning monitor > --- > > Key: HDFS-12703 > URL: https://issues.apache.org/jira/browse/HDFS-12703 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Daryn Sharp >Assignee: He Xiaoqiao >Priority: Critical > Attachments: HDFS-12703.001.patch, HDFS-12703.002.patch, > HDFS-12703.003.patch, HDFS-12703.004.patch, HDFS-12703.005.patch, > HDFS-12703.006.patch > > > The {{DecommissionManager.Monitor}} runs as an executor scheduled task. If > an exception occurs, all decommissioning ceases until the NN is restarted. > Per javadoc for {{executor#scheduleAtFixedRate}}: *If any execution of the > task encounters an exception, subsequent executions are suppressed*. The > monitor thread is alive but blocked waiting for an executor task that will > never come. The code currently disposes of the future so the actual > exception that aborted the task is gone. > Failover is insufficient since the task is also likely dead on the standby. > Replication queue init after the transition to active will fix the under > replication of blocks on currently decommissioning nodes but future nodes > never decommission. The standby must be bounced prior to failover – and > hopefully the error condition does not reoccur. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880012#comment-16880012 ] stack commented on HDFS-14483: -- [~leosun08] Thanks. Looking at history of hdfs builds, I see that it files in the build just before this one for the HDFS-13694 patch. Unrelated then. Let me push. > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, > HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, > HDFS-14585.branch-2.9.v3.patch, HDFS-14585.branch-2.9.v3.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880012#comment-16880012 ] stack edited comment on HDFS-14483 at 7/8/19 3:52 AM: -- [~leosun08] Thanks. Looking at history of hdfs builds, I see that it files in the build just before this one for the HDFS-13694 patch. Unrelated then. Let me push. Will do tomorrow in case someone else wants to comment in meantime. was (Author: stack): [~leosun08] Thanks. Looking at history of hdfs builds, I see that it files in the build just before this one for the HDFS-13694 patch. Unrelated then. Let me push. > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, > HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, > HDFS-14585.branch-2.9.v3.patch, HDFS-14585.branch-2.9.v3.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880009#comment-16880009 ] Lisheng Sun commented on HDFS-14483: Hi [~stack] I confirm again UT is ok with this patch in TestJournalNodeRespectsBindHostKeys of my local. Thank you. {code:java} TestJournalNodeRespectsBindHostKeys [INFO] Running org.apache.hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.781 s - in org.apache.hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys {code} > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, > HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, > HDFS-14585.branch-2.9.v3.patch, HDFS-14585.branch-2.9.v3.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880008#comment-16880008 ] stack commented on HDFS-14483: -- ...and +1 on patch. Lets just figure the story on this last flakey...and then I'll commit (unless objection). Thanks [~leosun08] > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, > HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, > HDFS-14585.branch-2.9.v3.patch, HDFS-14585.branch-2.9.v3.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-14483: - Attachment: HDFS-14585.branch-2.9.v3.patch > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, > HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, > HDFS-14585.branch-2.9.v3.patch, HDFS-14585.branch-2.9.v3.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880004#comment-16880004 ] stack commented on HDFS-14483: -- Thanks for fixing the short circuit unit test [~leosun08]. Seems to have worked. As said above.. hadoop.hdfs.web.TestWebHdfsTimeouts hadoop.hdfs.server.datanode.TestDirectoryScanner ... are for sure flakey. TestJournalNodeRespectsBindHostKeys I'm not so sure. Will do a survey of recent test history... Meantime let me get another run in. > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, > HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, > HDFS-14585.branch-2.9.v3.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14634) the original active namenode should have priority to participate in the election when the zookeeper recovery
[ https://issues.apache.org/jira/browse/HDFS-14634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1687#comment-1687 ] Hadoop QA commented on HDFS-14634: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HDFS-14634 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-14634 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12973869/HDFS-14634-v1.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27161/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > the original active namenode should have priority to participate in the > election when the zookeeper recovery > - > > Key: HDFS-14634 > URL: https://issues.apache.org/jira/browse/HDFS-14634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover >Affects Versions: 2.7.2 >Reporter: liying >Priority: Major > Attachments: HDFS-14634-v1.patch > > > Dynamically generates the namenode's election priorities in the zkfc Module。 > For example,when the zookeeper crash,all of the namenode remain in their > original state。 Then the zookeeper service recovery,the original active > namenode should have priority to participate in the election。 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14634) the original active namenode should have priority to participate in the election when the zookeeper recovery
[ https://issues.apache.org/jira/browse/HDFS-14634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liying updated HDFS-14634: -- Status: Open (was: Patch Available) > the original active namenode should have priority to participate in the > election when the zookeeper recovery > - > > Key: HDFS-14634 > URL: https://issues.apache.org/jira/browse/HDFS-14634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover >Affects Versions: 2.7.2 >Reporter: liying >Priority: Major > Attachments: HDFS-14634-v1.patch > > > Dynamically generates the namenode's election priorities in the zkfc Module。 > For example,when the zookeeper crash,all of the namenode remain in their > original state。 Then the zookeeper service recovery,the original active > namenode should have priority to participate in the election。 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14634) the original active namenode should have priority to participate in the election when the zookeeper recovery
[ https://issues.apache.org/jira/browse/HDFS-14634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liying updated HDFS-14634: -- Attachment: HDFS-14634-v1.patch Status: Patch Available (was: Open) wait for some time when the standby namenode join the elector to keep the the original active namenode should have priority to participate in the election when the zookeeper recovery. > the original active namenode should have priority to participate in the > election when the zookeeper recovery > - > > Key: HDFS-14634 > URL: https://issues.apache.org/jira/browse/HDFS-14634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover >Affects Versions: 2.7.2 >Reporter: liying >Priority: Major > Attachments: HDFS-14634-v1.patch > > > Dynamically generates the namenode's election priorities in the zkfc Module。 > For example,when the zookeeper crash,all of the namenode remain in their > original state。 Then the zookeeper service recovery,the original active > namenode should have priority to participate in the election。 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14634) the original active namenode should have priority to participate in the election when the zookeeper recovery
[ https://issues.apache.org/jira/browse/HDFS-14634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liying updated HDFS-14634: -- Release Note: Wait for some time when the standby namenode join the elector to keep the the original active namenode should have priority to participate in the election when the zookeeper recovery. Status: Patch Available (was: Open) > the original active namenode should have priority to participate in the > election when the zookeeper recovery > - > > Key: HDFS-14634 > URL: https://issues.apache.org/jira/browse/HDFS-14634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover >Affects Versions: 2.7.2 >Reporter: liying >Priority: Major > > Dynamically generates the namenode's election priorities in the zkfc Module。 > For example,when the zookeeper crash,all of the namenode remain in their > original state。 Then the zookeeper service recovery,the original active > namenode should have priority to participate in the election。 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13694) Making md5 computing being in parallel with image loading
[ https://issues.apache.org/jira/browse/HDFS-13694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-13694: --- Fix Version/s: 3.1.3 2.9.3 3.2.1 3.0.4 2.10.0 > Making md5 computing being in parallel with image loading > - > > Key: HDFS-13694 > URL: https://issues.apache.org/jira/browse/HDFS-13694 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: zhouyingchao >Assignee: Lisheng Sun >Priority: Major > Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-13694-001.patch, HDFS-13694-002.patch, > HDFS-13694-003.patch, HDFS-13694-004.patch, HDFS-13694-005.patch, > HDFS-13694-006.patch, HDFS-13694-007.patch > > > During namenode image loading, it firstly compute the md5 and then load the > image. Actually these two steps can be in parallel. > Test this patch against a fsimage of a 70PB 2.4 cluster (200million files > and 300million blocks), the image loading time be reduced from 1210 seconds > to 1105 seconds.So it can reduce up to about 10% of time. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13694) Making md5 computing being in parallel with image loading
[ https://issues.apache.org/jira/browse/HDFS-13694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879994#comment-16879994 ] Íñigo Goiri commented on HDFS-13694: Cherry-picked to branch-3.2, branch-3.1, branch-3.0, branch-2, and branch-2.9. > Making md5 computing being in parallel with image loading > - > > Key: HDFS-13694 > URL: https://issues.apache.org/jira/browse/HDFS-13694 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: zhouyingchao >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-13694-001.patch, HDFS-13694-002.patch, > HDFS-13694-003.patch, HDFS-13694-004.patch, HDFS-13694-005.patch, > HDFS-13694-006.patch, HDFS-13694-007.patch > > > During namenode image loading, it firstly compute the md5 and then load the > image. Actually these two steps can be in parallel. > Test this patch against a fsimage of a 70PB 2.4 cluster (200million files > and 300million blocks), the image loading time be reduced from 1210 seconds > to 1105 seconds.So it can reduce up to about 10% of time. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12703) Exceptions are fatal to decommissioning monitor
[ https://issues.apache.org/jira/browse/HDFS-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879992#comment-16879992 ] Íñigo Goiri commented on HDFS-12703: Thanks [~hexiaoqiao] for checking. It also looked to me like some multi threading issue with the state of the datanodes. Based on that I think is OK to catch it in the {{check()}} and not in the {{run()}}. Should we be extra careful and catch also in the {{run()}} just in case? Comments on [^HDFS-12703.005.patch]: * In the log, I would report the DN that is failing too as we are playing with it in the catch. * In the unit test I would explain what we are trying to catch in the main javadoc and mention the executor swallowing the exceptions by default. * I think we should extend the unit test and make sure this is not happening. Should we check the value before triggering? Checking that the thread is alive (and it dies without it)? Checking for the exception message? > Exceptions are fatal to decommissioning monitor > --- > > Key: HDFS-12703 > URL: https://issues.apache.org/jira/browse/HDFS-12703 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Daryn Sharp >Assignee: He Xiaoqiao >Priority: Critical > Attachments: HDFS-12703.001.patch, HDFS-12703.002.patch, > HDFS-12703.003.patch, HDFS-12703.004.patch, HDFS-12703.005.patch > > > The {{DecommissionManager.Monitor}} runs as an executor scheduled task. If > an exception occurs, all decommissioning ceases until the NN is restarted. > Per javadoc for {{executor#scheduleAtFixedRate}}: *If any execution of the > task encounters an exception, subsequent executions are suppressed*. The > monitor thread is alive but blocked waiting for an executor task that will > never come. The code currently disposes of the future so the actual > exception that aborted the task is gone. > Failover is insufficient since the task is also likely dead on the standby. > Replication queue init after the transition to active will fix the under > replication of blocks on currently decommissioning nodes but future nodes > never decommission. The standby must be bounced prior to failover – and > hopefully the error condition does not reoccur. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14361) SNN will always upload fsimage
[ https://issues.apache.org/jira/browse/HDFS-14361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879989#comment-16879989 ] hunshenshi commented on HDFS-14361: --- [~starphin] Thanks. So it should move out from if block. [~brahmareddy] could you help review the patch again? Thanks > SNN will always upload fsimage > -- > > Key: HDFS-14361 > URL: https://issues.apache.org/jira/browse/HDFS-14361 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 3.2.0 >Reporter: hunshenshi >Priority: Major > Fix For: 3.2.0 > > > Related to -HDFS-12248.- > {code:java} > boolean sendRequest = isPrimaryCheckPointer > || secsSinceLastUpload >= checkpointConf.getQuietPeriod(); > doCheckpoint(sendRequest); > {code} > If sendRequest is true, SNN will upload fsimage. But isPrimaryCheckPointer > always is true, > {code:java} > if (ie == null && ioe == null) { > //Update only when response from remote about success or > lastUploadTime = monotonicNow(); > // we are primary if we successfully updated the ANN > this.isPrimaryCheckPointer = success; > } > {code} > isPrimaryCheckPointer should be outside the if condition. > If the ANN update was not successful, then isPrimaryCheckPointer should be > set to false. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14361) SNN will always upload fsimage
[ https://issues.apache.org/jira/browse/HDFS-14361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879988#comment-16879988 ] star commented on HDFS-14361: - Right. isPrimaryCheckPointer will not be changed when any error/exception occurred. > SNN will always upload fsimage > -- > > Key: HDFS-14361 > URL: https://issues.apache.org/jira/browse/HDFS-14361 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 3.2.0 >Reporter: hunshenshi >Priority: Major > Fix For: 3.2.0 > > > Related to -HDFS-12248.- > {code:java} > boolean sendRequest = isPrimaryCheckPointer > || secsSinceLastUpload >= checkpointConf.getQuietPeriod(); > doCheckpoint(sendRequest); > {code} > If sendRequest is true, SNN will upload fsimage. But isPrimaryCheckPointer > always is true, > {code:java} > if (ie == null && ioe == null) { > //Update only when response from remote about success or > lastUploadTime = monotonicNow(); > // we are primary if we successfully updated the ANN > this.isPrimaryCheckPointer = success; > } > {code} > isPrimaryCheckPointer should be outside the if condition. > If the ANN update was not successful, then isPrimaryCheckPointer should be > set to false. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14634) the original active namenode should have priority to participate in the election when the zookeeper recovery
[ https://issues.apache.org/jira/browse/HDFS-14634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liying updated HDFS-14634: -- Summary: the original active namenode should have priority to participate in the election when the zookeeper recovery (was: Dynamically generates the namenode's election priorities) > the original active namenode should have priority to participate in the > election when the zookeeper recovery > - > > Key: HDFS-14634 > URL: https://issues.apache.org/jira/browse/HDFS-14634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover >Affects Versions: 2.7.2 >Reporter: liying >Priority: Major > > Dynamically generates the namenode's election priorities in the zkfc Module。 > For example,when the zookeeper crash,all of the namenode remain in their > original state。 Then the zookeeper service recovery,the original active > namenode should have priority to participate in the election。 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14634) Dynamically generates the namenode's election priorities
liying created HDFS-14634: - Summary: Dynamically generates the namenode's election priorities Key: HDFS-14634 URL: https://issues.apache.org/jira/browse/HDFS-14634 Project: Hadoop HDFS Issue Type: Improvement Components: auto-failover Affects Versions: 2.7.2 Reporter: liying Dynamically generates the namenode's election priorities in the zkfc Module。 For example,when the zookeeper crash,all of the namenode remain in their original state。 Then the zookeeper service recovery,the original active namenode should have priority to participate in the election。 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1735) Create separate unit and integration test executor dev-support script
[ https://issues.apache.org/jira/browse/HDDS-1735?focusedWorklogId=272986&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272986 ] ASF GitHub Bot logged work on HDDS-1735: Author: ASF GitHub Bot Created on: 07/Jul/19 22:56 Start Date: 07/Jul/19 22:56 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1035: HDDS-1735. Create separate unit and integration test executor dev-support script URL: https://github.com/apache/hadoop/pull/1035#issuecomment-509037232 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 32 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 1 | No case conflicting files found. | | 0 | shelldocs | 1 | Shelldocs was not available. | | 0 | @author | 0 | Skipping @author checks as author.sh has been patched. | | +1 | test4tests | 0 | The patch appears to include 1 new or modified test files. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 44 | Maven dependency ordering for branch | | +1 | mvninstall | 504 | trunk passed | | +1 | mvnsite | 0 | trunk passed | | -1 | pylint | 1 | Error running pylint. Please check pylint stderr files. | | +1 | shadedclient | 772 | branch has no errors when building and testing our client artifacts. | ||| _ Patch Compile Tests _ | | 0 | mvndep | 32 | Maven dependency ordering for patch | | +1 | mvninstall | 454 | the patch passed | | +1 | mvnsite | 0 | the patch passed | | -1 | pylint | 2 | Error running pylint. Please check pylint stderr files. | | +1 | pylint | 2 | There were no new pylint issues. | | +1 | shellcheck | 1 | The patch generated 0 new + 0 unchanged - 7 fixed = 0 total (was 7) | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 708 | patch has no errors when building and testing our client artifacts. | ||| _ Other Tests _ | | +1 | unit | 103 | hadoop-hdds in the patch passed. | | +1 | unit | 179 | hadoop-ozone in the patch passed. | | +1 | asflicense | 48 | The patch does not generate ASF License warnings. | | | | 3076 | | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1035/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1035 | | Optional Tests | dupname asflicense mvnsite unit shellcheck shelldocs pylint | | uname | Linux f16dcd073b52 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 9c90729 | | pylint | https://builds.apache.org/job/hadoop-multibranch/job/PR-1035/3/artifact/out/branch-pylint-stderr.txt | | pylint | https://builds.apache.org/job/hadoop-multibranch/job/PR-1035/3/artifact/out/patch-pylint-stderr.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1035/3/testReport/ | | Max. process+thread count | 446 (vs. ulimit of 5500) | | modules | C: hadoop-ozone hadoop-ozone/fault-injection-test/network-tests U: hadoop-ozone | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-1035/3/console | | versions | git=2.7.4 maven=3.3.9 shellcheck=0.4.6 pylint=1.9.2 | | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 272986) Time Spent: 1.5h (was: 1h 20m) > Create separate unit and integration test executor dev-support script > - > > Key: HDDS-1735 > URL: https://issues.apache.org/jira/browse/HDDS-1735 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Attachments: Screen Shot 2019-07-02 at 3.25.33 PM.png > > Time Spent: 1.5h > Remaining Estimate: 0h > > hadoop-ozone/dev-support/checks directory contains multiple helper script to > execute different type of testing (findbugs, rat, unit, build). > They easily define how tests should be executed, with the following contract: > * The problems should be print
[jira] [Commented] (HDFS-12703) Exceptions are fatal to decommissioning monitor
[ https://issues.apache.org/jira/browse/HDFS-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879944#comment-16879944 ] Hadoop QA commented on HDFS-12703: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 41s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 2s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 52s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}138m 38s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.balancer.TestBalancerRPCDelay | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-12703 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12973858/HDFS-12703.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 26b692590bd0 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9c90729 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/27160/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/27160/testReport/ | | Max. process+thread count | 3921 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apa
[jira] [Commented] (HDFS-14034) Support getQuotaUsage API in WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879932#comment-16879932 ] Hadoop QA commented on HDFS-14034: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 47s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 5s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 46s{color} | {color:orange} hadoop-hdfs-project: The patch generated 5 new + 261 unchanged - 0 fixed = 266 total (was 261) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 53s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 9s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}150m 2s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14034 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12973855/HDFS-14034.000.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4a2fca299251 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9c90729 | | maven | version: Apache Ma
[jira] [Comment Edited] (HDFS-12703) Exceptions are fatal to decommissioning monitor
[ https://issues.apache.org/jira/browse/HDFS-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879917#comment-16879917 ] He Xiaoqiao edited comment on HDFS-12703 at 7/7/19 6:10 PM: Upload patch [^HDFS-12703.005.patch] with unit test and try to fix this issue. I think the root cause is that interface of DatanodeDescriptor is non-thread-safe after dig the decommission logic. Consider that {{DatanodeAdminManager#monitor}} is running, another thread set {{adminState}} to {{Decommissioned}} of corresponding DataNode, then this issue will reprod. [^HDFS-12703.005.patch] just catch the exception and remove datanode from {{outOfServiceNodeBlocks}} and push back to {{pendingNodes}} then it will process next loop. {quote}Does it need a restart or another refreshNodes to take it out of the invalid state? {quote} Since postpone to check and it will meet the proper state in next loop, so we do not need to operation DataNode or refreshNodes again. To [~xuel1], I just assign JIRA to myself, please feel free to assign back to you if would like to go on working on this issue before we resolve it. was (Author: hexiaoqiao): Upload patch [^HDFS-12703.005.patch] with unit test and try to fix this issue. I think the root cause is that interface of DatanodeDescriptor is non-thread-safe after dig the decommission logic. Consider that {{DatanodeAdminManager#monitor}} is running, another thread set {{adminState}} to {{Decommissioned}} of corresponding DataNode, then this issue will reprod. [^HDFS-12703.005.patch] just catch the exception and remove datanode from {{outOfServiceNodeBlocks}} and push back to {{pendingNodes}} then it will process next loop. {code:java} Does it need a restart or another refreshNodes to take it out of the invalid state? {code} Since postpone to check and it will meet the proper state in next loop, so we do not need to operation DataNode or refreshNodes again. To [~xuel1], I just assign JIRA to myself, please feel free to assign back to you if would like to go on working on this issue before we resolve it. > Exceptions are fatal to decommissioning monitor > --- > > Key: HDFS-12703 > URL: https://issues.apache.org/jira/browse/HDFS-12703 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Daryn Sharp >Assignee: He Xiaoqiao >Priority: Critical > Attachments: HDFS-12703.001.patch, HDFS-12703.002.patch, > HDFS-12703.003.patch, HDFS-12703.004.patch, HDFS-12703.005.patch > > > The {{DecommissionManager.Monitor}} runs as an executor scheduled task. If > an exception occurs, all decommissioning ceases until the NN is restarted. > Per javadoc for {{executor#scheduleAtFixedRate}}: *If any execution of the > task encounters an exception, subsequent executions are suppressed*. The > monitor thread is alive but blocked waiting for an executor task that will > never come. The code currently disposes of the future so the actual > exception that aborted the task is gone. > Failover is insufficient since the task is also likely dead on the standby. > Replication queue init after the transition to active will fix the under > replication of blocks on currently decommissioning nodes but future nodes > never decommission. The standby must be bounced prior to failover – and > hopefully the error condition does not reoccur. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12703) Exceptions are fatal to decommissioning monitor
[ https://issues.apache.org/jira/browse/HDFS-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879917#comment-16879917 ] He Xiaoqiao commented on HDFS-12703: Upload patch [^HDFS-12703.005.patch] with unit test and try to fix this issue. I think the root cause is that interface of DatanodeDescriptor is non-thread-safe after dig the decommission logic. Consider that {{DatanodeAdminManager#monitor}} is running, another thread set {{adminState}} to {{Decommissioned}} of corresponding DataNode, then this issue will reprod. [^HDFS-12703.005.patch] just catch the exception and remove datanode from {{outOfServiceNodeBlocks}} and push back to {{pendingNodes}} then it will process next loop. {code:java} Does it need a restart or another refreshNodes to take it out of the invalid state? {code} Since postpone to check and it will meet the proper state in next loop, so we do not need to operation DataNode or refreshNodes again. To [~xuel1], I just assign JIRA to myself, please feel free to assign back to you if would like to go on working on this issue before we resolve it. > Exceptions are fatal to decommissioning monitor > --- > > Key: HDFS-12703 > URL: https://issues.apache.org/jira/browse/HDFS-12703 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Daryn Sharp >Assignee: He Xiaoqiao >Priority: Critical > Attachments: HDFS-12703.001.patch, HDFS-12703.002.patch, > HDFS-12703.003.patch, HDFS-12703.004.patch, HDFS-12703.005.patch > > > The {{DecommissionManager.Monitor}} runs as an executor scheduled task. If > an exception occurs, all decommissioning ceases until the NN is restarted. > Per javadoc for {{executor#scheduleAtFixedRate}}: *If any execution of the > task encounters an exception, subsequent executions are suppressed*. The > monitor thread is alive but blocked waiting for an executor task that will > never come. The code currently disposes of the future so the actual > exception that aborted the task is gone. > Failover is insufficient since the task is also likely dead on the standby. > Replication queue init after the transition to active will fix the under > replication of blocks on currently decommissioning nodes but future nodes > never decommission. The standby must be bounced prior to failover – and > hopefully the error condition does not reoccur. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12703) Exceptions are fatal to decommissioning monitor
[ https://issues.apache.org/jira/browse/HDFS-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-12703: --- Attachment: HDFS-12703.005.patch > Exceptions are fatal to decommissioning monitor > --- > > Key: HDFS-12703 > URL: https://issues.apache.org/jira/browse/HDFS-12703 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Daryn Sharp >Assignee: Xue Liu >Priority: Critical > Attachments: HDFS-12703.001.patch, HDFS-12703.002.patch, > HDFS-12703.003.patch, HDFS-12703.004.patch, HDFS-12703.005.patch > > > The {{DecommissionManager.Monitor}} runs as an executor scheduled task. If > an exception occurs, all decommissioning ceases until the NN is restarted. > Per javadoc for {{executor#scheduleAtFixedRate}}: *If any execution of the > task encounters an exception, subsequent executions are suppressed*. The > monitor thread is alive but blocked waiting for an executor task that will > never come. The code currently disposes of the future so the actual > exception that aborted the task is gone. > Failover is insufficient since the task is also likely dead on the standby. > Replication queue init after the transition to active will fix the under > replication of blocks on currently decommissioning nodes but future nodes > never decommission. The standby must be bounced prior to failover – and > hopefully the error condition does not reoccur. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12703) Exceptions are fatal to decommissioning monitor
[ https://issues.apache.org/jira/browse/HDFS-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao reassigned HDFS-12703: -- Assignee: He Xiaoqiao (was: Xue Liu) > Exceptions are fatal to decommissioning monitor > --- > > Key: HDFS-12703 > URL: https://issues.apache.org/jira/browse/HDFS-12703 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Daryn Sharp >Assignee: He Xiaoqiao >Priority: Critical > Attachments: HDFS-12703.001.patch, HDFS-12703.002.patch, > HDFS-12703.003.patch, HDFS-12703.004.patch, HDFS-12703.005.patch > > > The {{DecommissionManager.Monitor}} runs as an executor scheduled task. If > an exception occurs, all decommissioning ceases until the NN is restarted. > Per javadoc for {{executor#scheduleAtFixedRate}}: *If any execution of the > task encounters an exception, subsequent executions are suppressed*. The > monitor thread is alive but blocked waiting for an executor task that will > never come. The code currently disposes of the future so the actual > exception that aborted the task is gone. > Failover is insufficient since the task is also likely dead on the standby. > Replication queue init after the transition to active will fix the under > replication of blocks on currently decommissioning nodes but future nodes > never decommission. The standby must be bounced prior to failover – and > hopefully the error condition does not reoccur. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14034) Support getQuotaUsage API in WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-14034: Status: Patch Available (was: Open) > Support getQuotaUsage API in WebHDFS > > > Key: HDFS-14034 > URL: https://issues.apache.org/jira/browse/HDFS-14034 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs, webhdfs >Reporter: Erik Krogen >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14034.000.patch > > > HDFS-8898 added support for a new API, {{getQuotaUsage}} which can fetch > quota usage on a directory with significantly lower impact than the similar > {{getContentSummary}}. This JIRA is to track adding support for this API to > WebHDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14034) Support getQuotaUsage API in WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879892#comment-16879892 ] Chao Sun commented on HDFS-14034: - Sorry for the delay. Submitted patch v0. > Support getQuotaUsage API in WebHDFS > > > Key: HDFS-14034 > URL: https://issues.apache.org/jira/browse/HDFS-14034 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs, webhdfs >Reporter: Erik Krogen >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14034.000.patch > > > HDFS-8898 added support for a new API, {{getQuotaUsage}} which can fetch > quota usage on a directory with significantly lower impact than the similar > {{getContentSummary}}. This JIRA is to track adding support for this API to > WebHDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14034) Support getQuotaUsage API in WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-14034: Attachment: HDFS-14034.000.patch > Support getQuotaUsage API in WebHDFS > > > Key: HDFS-14034 > URL: https://issues.apache.org/jira/browse/HDFS-14034 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs, webhdfs >Reporter: Erik Krogen >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14034.000.patch > > > HDFS-8898 added support for a new API, {{getQuotaUsage}} which can fetch > quota usage on a directory with significantly lower impact than the similar > {{getContentSummary}}. This JIRA is to track adding support for this API to > WebHDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du
[ https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879882#comment-16879882 ] Hadoop QA commented on HDFS-14313: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 11s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 15s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 28s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 16s{color} | {color:orange} root: The patch generated 2 new + 245 unchanged - 1 fixed = 247 total (was 246) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 22s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 59s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 44s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}192m 35s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestMultipleNNPortQOP | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14313 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12973852/HDFS-14313.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux cd4c7e158ee8 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven
[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du
[ https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14313: --- Attachment: HDFS-14313.005.patch > Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory > instead of df/du > > > Key: HDFS-14313 > URL: https://issues.apache.org/jira/browse/HDFS-14313 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, performance >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, > HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, > HDFS-14313.005.patch > > > There are two ways of DU/DF getting used space that are insufficient. > # Running DU across lots of disks is very expensive and running all of the > processes at the same time creates a noticeable IO spike. > # Running DF is inaccurate when the disk sharing by multiple datanode or > other servers. > Getting hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfos in memory > is very small and accurate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org